1 A universal and structured way to derive dual optimization
2 problem formulations
∗ † ‡ † 3 Kees Roos Marleen Balvert Bram L. Gorissen Dick den Hertog
4 December 19, 2019
5 Abstract
6 The dual problem of a convex optimization problem can be obtained in a relatively simple and
7 structural way by using a well-known result in convex analysis, namely Fenchel’s duality theorem.
8 This alternative way of forming a strong dual problem is the subject in this paper. We recall some
9 standard results from convex analysis and then discuss how the dual problem can be written in terms
10 of the conjugates of the objective function and the constraint functions. This is a didactically valuable
11 method to explicitly write the dual problem. We demonstrate the method by deriving dual problems
12 for several classical problems and also for a practical model for radiotherapy treatment planning for
13 which deriving the dual problem using other methods is a more tedious task. Additional material is
14 presented in appendices, including useful tables for finding conjugate functions of many functions.
15 Key words. convex optimization, duality theory, conjugate functions, support functions, Fenchel duality
16 AMS subject classifications 46N10, 65K99, 90C25, 90C46
17 1 Introduction
18 Nowadays optimization is one of the most widely taught and applied subjects of mathematics. A mathe-
19 matical optimization problem consists of an objective function that has to be minimized (or maximized)
20 subject to some constraints. Every maximization problem can be written as a minimization problem by
∗Department of Electrical Engineering, Mathematics and Computer Science, Technical University Delft, 2628 CD Delft, The Netherlands (+31 15 278 2530, [email protected]) †Department of Econometrics and Operations Research, Tilburg University, 5000 LE Tilburg, The Netherlands ([email protected], [email protected]) ‡Faculty of Exact Sciences, Mathematics, Free University, 1081 HV Amsterdam, The Netherlands ([email protected])
1 21 replacing the objective function by its opposite. In this paper we always assume that the given problem is
22 a minimization problem. To every such problem one can assign another problem, called the dual problem
23 of the original problem; the original problem is called the primal problem. In many cases a dual problem
24 exists that has the same optimal value as the primal problem. If this happens then we say that we have
25 strong duality, otherwise weak duality. It is well-known that if the primal problem is convex then there
26 exists a strong dual problem under some mild conditions. In this paper we consider only convex problems
27 with a strong dual. Strong duality and obtaining explicit formulations of the dual problem are of great
28 importance for at least four reasons:
29 1. Sometimes the dual problem is easier to solve than the primal problem.
30 2. If we have feasible solutions of the primal problem and the dual problem, respectively, and their
31 objective values are equal, then we may conclude that both solutions are optimal. In that case the
32 primal solution is a certificate for the optimality of the dual solution, and vice versa.
33 3. If the two objective values are not equal then the dual objective value is a lower bound for the
34 optimal value of the primal problem and the primal objective value is an upper bound for the
35 optimal value of the dual problem. This provides information on the ‘quality’ of the two solutions,
36 i.e., on how much their values may differ from the optimal value.
37 4. Some optimization problems benefit from being solved with the dual decomposition method. This
38 method requires repeated evaluation of the dual objective value of two (smaller) problems for fixed
39 dual variables. Our explicit formulation of the dual can potentially speed up the evaluation of the
40 dual objective function.
41 In textbooks, different ways for deriving dual optimization problems are discussed. One option is to
42 write the optimization problem as a conic optimization problem, and then state the dual optimization
43 problem in terms of the dual cone. However, rewriting general convex constraints as conic constraints
44 may be awkward or even not possible. Another option is to derive the dual via Lagrange duality, which
45 often leads to an unnecessarily long and tedious derivation process. We propose a universal and more
46 structured way to derive dual problems, using basic results from convex analysis. The duals derived this
47 way are ordinary Lagrange duals. The value of this paper is therefore not to present new duality theory,
48 but rather to provide a unifying, pedagogically relevant framework of deriving the dual.
49 Let the primal optimization problem be given. Then by using Fenchel’s duality theorem one can derive a
50 dual problem in terms of so-called perspectives of the conjugates of the objective function and constraint
2 51 functions. The main task is therefore to derive expressions for the conjugates of each of these functions
52 separately (taking the perspective functions is easy). At first this may seem to be a difficult task –
53 and this may be the reason that in textbooks this is not further elaborated on. However, the key issue
54 underlying this paper is that it is not necessary to derive explicit expressions for the conjugates of these
55 functions: if one can derive an expression for the conjugate as an “inf over some variables”, then this is
56 sufficient to state the dual problem. And, indeed, by using standard composition rules for conjugates –
57 well-known results in convex analysis – we arrive at such expressions with an inf.
58 Therefore, the aim of this paper is to show that the dual problem can be derived by using the composition
59 rules for conjugates and by using the well-known results for the conjugates of basic univariate and
60 multivariate functions. Thus, the proposed method is similar in nature to the way Fourier and Laplace
61 transforms are taught, where a table with known transforms of functions is combined with a table with
62 identities for transformations of functions. We consider this the most structured way to teach deriving
63 dual optimization problems. Using this structured way, one might even computerize the derivation of the
64 dual optimization problem. We must emphasize that the ingredients in this approach are not new. About
65 the same dual is presented in [8, p. 256], where it is obtained via Lagrange duality, and in [15, p.45]. We
66 also mention [24] where three strategies for deriving a dual problem are discussed, and Fenchel’s duality
67 theorem is used to construct the dual problem of a specific second order cone problem. As far as we
68 know, no comprehensive survey as presented in this paper exists.
69 We should also mention that in special cases where one would like to derive a partial Lagrange dual
70 problem, e.g., for relaxation purposes, Fenchel’s duality formulation cannot be used. On the other hand,
71 the use of Fenchel’s dual has found application in recent developments in the field of Robust Optimization,
72 see, e.g., [6, 13].
73 The outline of the paper is as follows. In the next section we recall some fundamental notions from convex
74 analysis, as defined in [19]. In Section 3 we present the general form of the primal problem that we want
75 to solve and Fenchel’s dual problem. We also discuss under which condition strong duality is guaranteed.
76 To make the paper self-supporting, we present a proof of this strong duality in the Appendices C and D.
77 In Section 4 we present a scheme that describes in four steps how to get Fenchel’s dual problem. In this
78 scheme, tables of conjugate functions and support functions in Appendix E are of crucial importance. In
79 Appendix A we demonstrate how conjugates for some functions are computed.
80 The use of the scheme in Section 4 is illustrated in Section 6, where we show how dual problems for
81 several classical convex optimization problems can be obtained. In these classical problems the functions
3 82 that appear in the primal problem all have a similar structure. However, also problems with a variety of
83 constraint functions can be handled systematically. In Section 6.7 we demonstrate this by applying our
84 approach to a model for radiotherapy treatment.
85 2 Preliminaries
86 We recall in this section some notations and terminology that are common in the literature. The effective
n 87 domain of a convex function f : R [ , + ] is defined by → −∞ ∞
88 dom f = x f(x) < . { | ∞}
n n 89 A convex function f is proper if f(x) > for all x R and f(x) < for at least one x R . The −∞ ∈ ∞ ∈ 90 function f is closed if the set x f(x) α is closed for every α R. { | ≤ } ∈
91 For any function f its convex conjugate f ∗ is defined by
T 92 f ∗(y) := sup y x f(x) . (1) x dom f − ∈
93 Figure 1 illustrates this notion graphically for the case where n = 1. yx f(x)
0 x
f ∗(y) −
Figure 1: Value of convex conjugate at y
94 Given y, the maximal value of yx f(x) occurs where the derivative of f(x) equals y, and the tangent − 95 of the graph of f where this happens intersects the y-axis in the point (0, f ∗(y)). −
96 Since f ∗(y) is the pointwise maximum of linear functions (linear in y), f ∗ is convex. Moreover, if f is
97 convex then f ∗ is closed, (cl f)∗ = f ∗ and f ∗∗ = cl f, which implies f ∗∗ = f if f is closed convex [19,
4 98 Theorem 12.2].
n 99 The indicator function of a set S R is denoted as δS(x) and defined by ⊆ 0, x S 100 δS(x) = ∈ , x / S. ∞ ∈
101 Note that δS(x) is convex if and only if S is convex, and proper if S is nonempty. Obviously, we have
102 x S δS(x) 0. ∈ ⇔ ≤
103 This implies that any convex set-constraint can be replaced with a convex functional constraint and vice
104 versa.
105 The conjugate function of δS(x) is called the support function of S and given by
T 106 δS∗ (y) = sup y x . (2) x S ∈
107 Our approach highly depends on a simple relation between the conjugate f ∗ of a convex function f and
108 the support function of the set S := x f(x) 0 . Provided that S is nonempty, and ri S is nonempty { | ≤ } 109 if f is nonlinear, we have (cf. Lemma D.1):
110 δS∗ (y) = min (λf)∗(y) . (3) λ 0 { } ≥
111 Here the function λf is defined in the natural way: (λf)(x) = λf(x). If λ > 0 then one has
T n T y o y 112 (λf)∗(y) = sup y x λf(x) = sup λ x f(x) = λf ∗ , (4) x − x λ − λ
113 and for λ = 0 we get∗
0, if y = 0 T 114 (0f)∗(y) = sup y x = = δ 0 (y). (5) { } x otherwise ∞
115 The function (λf)∗ is called the perspective function of f ∗ (with λ as the new variable). Due to the
116 first equality in (4), the perspective function of f ∗ is convex (more precisely: jointly convex in y and λ).
117 Although (λf)∗ is closed for each fixed value of λ, it is not necessarily closed for λ. It may therefore ∗Here we adopt the calculation rules 0∞ = ∞0 = 0, as in [19, p. 24].
5 118 occur that (λf)∗(y) is well-defined even if λ approaches zero whereas y stays away from zero. Closing
119 the perspective function for λ can therefore result in the wrong dual. An example of this phenomenon
120 can be found in Section B.9.
121 The concave conjugate g of a function g is defined by ∗
T 122 g (y) := inf y x g(x) . ∗ x dom g − ∈
123 Since g (y) is the pointwise minimum of linear functions, g is concave. Putting f = g , one easily ∗ ∗ − 124 verifies the following relation [19, p. 308]:
125 g (y) = f ∗( y) = ( g)∗( y). (6) ∗ − − − − −
126 Hence, all properties of convex conjugates lead to similar properties of concave conjugates.
127 3 Fenchel’s dual problem of a convex optimization problem
128 In this section we derive Fenchel’s dual problem for a convex optimization problem by using conjugate
129 functions. We will use some properties of conjugate functions without proving them at this stage; this
130 will be the subject in the following sections and in the Appendix. So for the moment we take these
131 properties for granted.
132 Let us first consider the case of unconstrained minimization. So, we have a convex function f0 and want
133 to know its minimal value z∗. Due to definition (1) of f0∗ we may write
134 z∗ = inf f0(x) = sup f0(x) = f0∗(0). (7) x R { } − x {− } − ∈
135 This already reveals the relevance of the notion of a conjugate function for this relatively simple opti-
136 mization problem: if we know the conjugate function f ∗(y) of f0 then f ∗(0) gives the minimal value. 0 − 0 137 As we will see this simple property is the keystone of Fenchel’s duality theory.
138 Next we consider the case where S is a convex subset of R and we want to find the minimal value of f0
139 on the set S. So the problem we want to solve is
140 inf f0(x) x S . x { | ∈ }
6 141 In that case we consider the function g(x) defined by
142 g(x) := f0(x) + δS(x).
143 Obviously the effective domain of g(x) is the set S and on this set g coincides with f0. So, we have
144 transformed our constrained problem to an unconstrained problem, and therefore the optimal value will
145 be g∗(0). The question arises how to find the conjugate function of g. It turns out that it can be derived − 146 easily from the conjugate functions of f0 and δS. Using the formula in line 6 of Table 2 we obtain
0 1 0 1 147 g∗(y) = min f0∗(y ) + δS∗ (y ) y + y = y . y0,y1 |
0 1 148 We need to compute g∗(y) at y = 0. Substitution of y = 0 in the above expression yields y + y = 0,
1 0 1 149 i.e., y = y . So, we can eliminate y . Thus we get −
0 0 150 g∗(0) = min f0∗(y ) + δS∗ ( y ) . y0 −
151 In general the problem that we want to solve is the following (primal) convex optimization problem
152 (P ) inf f0(x) fi(x) 0, i = 1, . . . , m , x { | ≤ }
n 153 where the functions fi : R [ , ] are proper convex functions, for i = 0, . . . , m. In that case the → −∞ ∞ 154 set S is the intersection of the sets
155 Si = x fi(x) 0 , 1 i m. { | ≤ } ≤ ≤
156 Obviously the indicator function of S satisfies
157 δS(x) = δS1 (x) + ... + δSm (x)
158 and hence, by line 6 in Table 3,
0 1 m 1 m 0 159 δS∗ ( y ) = min δS∗ (y ) + ... + δS∗ (y ) y + ... + y = y . − y1, ..., ym 1 m | −
7 160 By (3) we have
i i 161 δS∗i (y ) = min (uifi)∗(y ) , 1 i m. ui 0 ≤ ≤ ≥
162 Substitution gives
( m m ) 0 X i X i 163 δ∗ ( y ) = min min (u f )∗(y ) y = 0 S m i i − yi ui 0 { }i=1 i=1 ≥ i=0 ( m m ) X i X i 164 = min (uifi)∗(y ) y = 0 . yi m , u 0 { }i=1 i=1 i=0
165 Substitution into the expression for g∗(0) gives
( m m ) 0 X i X i 166 g∗(0) = min f ∗(y ) + (uifi)∗(y ) y = 0 . (yi)m , u 0 i=0 i=1 i=0
167 By changing the sign at both sides this implies
( m m ) 0 X i X i 168 g∗(0) = max f ∗(y ) (uifi)∗(y ) y = 0 . − (yi)m , u 0 − − i=0 i=1 i=0
169 Thus we may conclude that the optimal value of (P ) is the same as that of the problem
( m m ) 0 X i X i 170 (D) sup f0∗(y ) (uifi)∗ y y = 0, u 0 . yi m , u − − { }i=0 i=1 i=0
171 This is Fenchel’s dual problem of (P ).†
172 Note that the objective function in (D) is concave (since the perspective functions under the sum are
173 convex) and the constraints are linear (so convex as well). Hence, when writing (D) as a minimization
174 problem it becomes a convex problem.
175 It is clear that (D) is a practical formulation of the dual if the convex conjugates fi∗ can easily be
176 computed, which turns out to be true in many cases. This important fact is the main motivation for this
177 paper.
178 In the sequel, we make the following assumption to satisfy the condition in line 6 of Table 2.
m 179 Assumption 3.1 Assume that ri.dom f0 S = or ri.dom fi = . ∩ 6 ∅ ∩i=0 6 ∅
180 This is the same as requiring that (P ) is Slater regular [21]. In that case the optimal values of (P ) and
181 (D) are equal; for a proof we refer to Appendix D. We should add that if f0 is linear then x ri.dom f0 ∈ †As in [8], the notation x 0 defines a vector x ∈ Rn to be nonnegative; only if n = 1 we use the notation x ≥ 0.
8 182 can be replaced by x dom f0 in the Slater condition. ∈
183 In the sequel, especially when the objective function has the same form as the constraint functions, we
184 extend the vector u with the entry u0 = 1. Then (D) can be written as
( m m ) X i X i 185 (D0) sup (uifi)∗ y y = 0, u 0, u0 = 1 . yi m , u − | { }i=0 i=0 i=0
186 In the above expressions for (D) and (D0) we used the sup operator to indicate that we are dealing with
187 a maximization problem. Below we use the max operator only if the maximum value is attained. In this
188 paper it is not our goal to establish when this happens; our main focus is to find a dual problem yielding
189 strong duality. Therefore, if we use the sup operator in a maximization problem below, it is not excluded
190 that the maximal value is attained, and similarly for the inf operator.
191 4 Recipe for setting up Fenchel’s dual problem
192 As became clear in the previous section, Fenchel’s dual gives us a straightforward method for finding the
193 dual problem of (P ). The method can be split into four steps as discussed below. We will give some
194 demonstrations of the resulting recipe in Section 6.
195 Step 1: Formulate the primal problem
196 The primal problem should be a convex optimization problem. Moreover, all set constraints should be
197 replaced by their indicator functions. Strong duality, i.e, equality of the optimal values of (P ) and (D),
198 is guaranteed only if (P ) is Slater regular.
199 Step 2: Derive tractable expressions for the conjugates
200 In our approach we need the conjugates of the functions that occur in the problem. For that purpose
201 one can use the three tables in Appendix E. Table 1 contains expressions for the conjugates of some
202 basic functions and the domains of these functions, Table 2 some formulas for conjugates of function
203 transformations, and Table 3 some support functions of common sets. Usually a conjugate function (or
204 its perspective function) occurring in (D) or (D0) is written as infy h(y, z) , where h(y, z) is jointly { }
9 205 convex in y and z. A frequently used tool in our approach is the obvious equality
206 sup inf h(y, z) = sup h(y, z) . z − y y,z {− }
207 In other words, when writing down (D) or (D0) we can get rid of inf operators by moving the concerning
208 variables under the sup operator. This helps to simplify the formulation of the dual problem. Deriving
209 tractable expressions for the conjugates is further elaborated on and illustrated in Appendix A. The
i 210 domain of a conjugate function is also important. For each i the vector y belongs to the domain of
211 (uifi)∗ in (D) or (D0). This not only leads to convex constraints in Fenchel’s dual problem; in most
i 212 cases it also enables to eliminate the variables y . In some cases we get a conjugate function that consists
213 of ’branches’, i.e., one obtains different formulas for f ∗(y) on several parts of the domain of f ∗. This
214 phenomenon is demonstrated in Appendix B.7, where it is also shown how this can be prevented.
215 Step 3: Derive tractable convex expressions for perspectives of conjugates
216 The conjugate functions found in step 2 are substituted into the term (uf)∗ in (D) or (D0). For u > 0
217 this term is y y 218 uf ∗ , with dom f ∗, u u ∈
219 One may be troubled by the fact that the quotient y/u is not jointly convex in y and u. Indeed,
220 sometimes the resulting formulation of the dual problem may be not convex, even though the set (y, u): { 221 u > 0, y/u dom f ∗ is convex. A simple substitution yields a convex formulation. For example, when ∈ } 222 using line 8b in Table 2 one has
T T 223 f ∗(y) = inf h∗(z) b z A z = y . z − |
224 This gives rise to the term (uf)∗(y) in the dual problem, i.e.,
n T T y o 225 (uf)∗(y) = inf uh∗(z) ub z A z = , u > 0 z − | u T T 226 = inf uh∗(z) b (uz) A (uz) = y , u > 0. z − |
227 Due to the introduction of the variable u we are left with an expression that is not jointly convex in y
228 and u, because of the occurrence of the products uh∗(z) and uz. However, by the substitutionz ˜ = uz we
10 229 get
z˜ T T 230 (uf)∗(y) = inf uh∗ b z˜ A z˜ = y z˜ u − | T T 231 = inf (uh)∗(˜z) b z˜ A z˜ = y . z˜ − |
232 Obviously, both the objective function and the constraint are now convex. In Section 6 one will find many
233 substitutions of this type, that are necessary to obtain a convex (and hence computationally tractable)
234 formulation of the dual problem.
235 Step 4: Formulate the dual problem
236 The results of Steps 1-3 are used in the dual formulation (D) or (D0). Often one can simplify the resulting
237 formulation by eliminating variables.
238 5 Sensitivity analysis
239 The existing theory for sensitivity analysis is based on Lagrange multipliers (e.g., [8, par 5.6.3]). We
240 therefore show that the variables u in (D) are precisely the Lagrange multipliers.
241 The classical approach to sensitivity theory investigates how the objective value of the dual problem
242 depends on the right-hand sides in the primal problem.
243 We first recall the classical approach to sensitivity theory. The Lagrange dual of (P ) is given by
( m ) X 244 (LD) sup inf f0(x) + λifi(x) . λ 0 x i=1
245 Let (x, λ) be an optimal solution. If the i-th constraint fi(x) 0 is replaced by fi(x) + εi 0, with ≤ ≤ 246 εi R, then the optimal value z becomes ∈
m m m X X X 247 z = f0(x) + λi(fi(x) + εi) = f0(x) + λifi(x) + λiεi. i=1 i=1 i=1
248 Due to the above expression the Lagrange multiplier λi is called the shadow price for the i-th constraint.
249 This is because one usually concludes that adding a small number εi to fi(x) will change the objective
250 value with the amount εiλi. Though this might be true in many cases, it does not hold in general,
251 especially not if either the primal or the dual problem is degenerate. For examples we refer to [16].
11 252 We now show that the variables ui in Fenchel’s dual problem are also shadow prices in the above sense.
253 Recall that Fenchel’s dual of (P ) is given by
( m m ) 0 X i X i 254 (D) sup f0∗(y ) (uifi)∗ y y = 0 . yi,u 0 − − | i=1 i=0
i i 255 Now (ui(fi + εi))∗(y ) = (uifi)∗(y ) uiεi, by line 1 in Table 2. Hence at optimality the part of the − 256 objective value in the above problem that is due to the perturbation in the primal problem is just Pm 257 i=0 uiεi, where u is optimal for (D). This proves that in sensitivity analysis u takes over the role of
258 the vector λ of Lagrange multipliers.
259 6 Some applications
260 In this section we demonstrate how Fenchel’s dual can be obtained for some well-known convex optimiza-
261 tion problems and a mathematical model for radiotherapy treatment planning.
262 6.1 Linear optimization
263 We consider the standard linear optimization problem:
T n 264 inf c x Ax = b, x R+ , x | ∈
m n 265 where A is an m n matrix and b R , c R . For the conjugate f ∗ of the objective function × ∈ ∈ T 266 f(x) = c x we have (cf. Table 1, line 1)
267 f ∗(y) = 0, dom f ∗ = c . { }
268 Denoting the support function of the set x Ax = b as g∗(y) we have (cf. Table 3, line 1) { | } 1
T T 269 g1∗(y) = min b z A z = y . z |
n 270 Denoting the support function of the set constraint x R as g∗(y) we have (cf. Table 3, line 2) ∈ + 2
n 271 g∗(y) = 0, dom g∗ = y y 0 = R . 2 2 { | } − +
12 272 It follows that Fenchel’s dual problem is given by
( 2 2 ) 0 X i X i 273 sup f ∗(y ) g∗ y y = 0 . − − i i=1 i=0
274 Substitution of the expressions for f ∗ and gi∗ (i = 1, 2), yields the following formulation of the dual
275 problem:
T 0 1 2 0 T 1 2 276 sup b z y + y + y = 0, y = c, A z = y , y 0 . z, yi − |
i 277 We can eliminate the vectors y , which gives
T T 278 sup b z c + A z 0 . z − |
279 Changing the sign of z leads to the well-known duality theorem for linear optimization, namely
T T T 280 inf c x Ax = b, x 0 = sup b z A z c . x | z |
281 6.2 Conic optimization
n 282 Let be a convex cone in R . We consider the standard linear optimization problem over the cone : K K
T 283 inf c x Ax = b, x , x | ∈ K
284 with A, b and c as in Section 6.1. This problem differs from the linear optimization problem in Section
285 6.1 only in the set constraint x , so it suffices to find the support function of this constraint. Denoting ∈ K
286 this function as g2∗ we have (cf. Table 3, line 3)
287 g2∗(y) = 0, dom g2∗ = . −K∗
288 An overview of commonly used cones and their dual cones is given in Table 4. Just as in the previous
289 section we obtain that Fenchel’s dual problem is given by
T T 290 sup b z A z + c . z − | ∈ K∗
13 291 Replacing z by z we get the usual form of the duality theorem for conic optimization, namely −
T T T 292 inf c x Ax = b, x = sup b z c A z . x | ∈ K z | − ∈ K∗
293 This formulation is valid for any cone, including the positive semidefinite cone and the copositive cone.
294 Assumption 3.1 simplifies to the typical Slater condition: there should be an x in the interior of that K 295 satisfies Ax = b.
296 6.3 Quadratically constrained quadratic optimization
297 Consider the following quadratically constrained quadratic optimization (QCQO) problem:
1 T T 1 T T 298 inf x P0x + q0 x + r0 x Pix + qi x + ri 0, i = 1, . . . , m , x 2 | 2 ≤
1 T 299 where Pi is positive semidefinite for i = 0, . . . , m. Defining fi(x) := gi(x) + hi(x), where gi(x) = 2 x Pix T 300 and hi(x) = qi x + ri, we have (cf. Table 1, lines 1 and 13)
1 T n gi∗(y) = 2 y Pi†y, dom gi∗ = y y = Piz, z R , 301 { | ∈ }
h∗(x) = ri, dom h∗ = qi . i − i { }
302 Hence, by using the sum rule in Table 2 (line 6), we get
1 2 1 2 303 fi∗(y) = min gi∗(y ) + hi∗(y ) y + y = y y1,y2 | nn 1 1T 1 1 o 2 1 2 o 304 = min y Pi†y y = Piz + ri y = qi y + y = y y1,y2 2 | − | | n 1 T o 305 = min z PiP †Piz ri Piz + qi = y z 2 i − | 1 T 306 = min z Piz ri Piz + qi = y , z 2 − |
14 307 where we used that the generalized inverse Pi† of Pi satisfies PiPi†Pi = Pi (see, e.g., [2]). The dual
308 problem (D0) thus becomes
( m m ) X i X i 309 sup (uifi)∗ y y = 0, u0 = 1 u 0,yi − i=0 i=0 ( m i m ) X 1 i T i i y X i 310 = sup ui min 2 (z ) Piz ri Piz + qi = y = 0, u0 = 1 u 0,yi − zi − ui i=0 i=0 ( m m ) X 1 i T i X i 311 = sup ui ri 2 (z ) Piz ui Piz + qi = 0, u 0, u0 = 1 . i − u, z i=0 i=0
i i 312 Definingz ˜ = uiz this becomes
( m i T i m m ) T 1 X (˜z ) Piz˜ X X i 313 sup u r 2 uiqi + Piz˜ = 0, u 0, u0 = 1 . i − u u, z˜ i=0 i i=0 i=0
314 It may be noted that by using standard tricks, the dual problem can be written as a conic quadratic
315 problem.
316 6.4 Second order cone optimization
317 Consider the following primal second order cone optimization (SOCO) problem:
T T 318 min A0x b0 p x + q0 Aix bi p x + qi 0, i = 1, . . . , m . k − k2 − 0 | k − k2 − i ≤
319 The objective function and the constraint functions are written as fi(x) = gi(x)+hi(x), for i = 0, . . . , m,
T 320 where gi(x) = Aix bi , and hi(x) = p x + qi. This enables us to find the conjugate of fi(x) by k − k2 − i 321 using the sum rule. The conjugate of hi(x) is given by
322 h∗(y) = qi, dom h∗ = pi . i − i {− }
323 In order to derive the conjugate of gi(x), first note that this is a function of a linear transformation of
324 x, i.e., gi(x) = σi(Aix bi), where σi(x) = x 2. Due to line 6 in Table 1 we have σ∗(y) = 0, with − || || i 325 dom σ∗ = y y 1 . Hence, by the linear substitution rule in Table 2 (line 8b), the conjugate of { | k k2 ≤ } 326 gi(x) is given by
i T i T i T i i T i 327 gi∗(y) = inf σi∗(z ) + bi z Ai z = y = inf bi z z 1,Ai z = y . zi | zi | 2 ≤
15 328 Putting the above results together, we obtain:
1 2 1 2 329 fi∗(y) = inf gi∗(y ) + hi∗(y ) y + y = y y1,y2 | T i i T i 1 2 1 2 330 = inf inf bi z z 1,Ai z = y + qi y = pi y + y = y y1,y2 zi | 2 ≤ − | − | T i i T i 331 = inf bi z qi z 1,Ai z pi = y . zi − | 2 ≤ −
332 When plugging the above expression for fi∗(y) into (D0) we obtain the following dual for the SOCO
333 problem:
( m i m ) X T i i T i y X i 334 sup ui inf bi z qi z 2 1,Ai z pi = y = 0, u0 = 1 . yi, u 0 − zi − | ≤ − ui i=0 i=0
i 335 We eliminate the variables y and omit the inf-operator, which gives
( m m ) X T i i X T i 336 sup ui qi bi z z 2 1, i, ui Ai z pi = 0, u 0, u0 = 1 . i − ≤ ∀ − u, z i=0 i=0
i i 337 Introducingz ˜ = uiz we get
( m m m ) T X T i i X X T i 338 sup u q bi z˜ z˜ 2 ui, i, uipi + Ai z˜ = 0, u 0, u0 = 1 . i − ≤ ∀ − u, z˜ i=0 i=0 i=0
339 Finally, defining matrices A and P according to
T T T 340 A = A0 ...Am ,P = p0 . . . pm ,
i 341 and b andz ˜ as the vectors that arise by concatenating the vectors bi andz ˜ , respectively, the dual problem
342 gets the form
T T i T 343 sup u q b z˜ z˜ 2 ui, i, A z˜ = P u, u 0, u0 = 1 . u, z˜i − | k k ≤ ∀
344 This is the known dual for the SOCO problem as written in, e.g., [3].
16 345 6.5 Geometric optimization
346 Consider the geometric optimization problem (cf. [8, p. 254]):‡
( K0 Ki ) T T X a x+b0k X a x+bik 347 min log e 0k log e ik 0, i = 1, . . . , m , (8) x ≤ k=1 k=1
n 348 where Ki 1, aik R , bik R for i = 0, . . . , m. We define Ai as the n Ki matrix with columns ≥ ∈ ∈ ×
349 ai1, . . . , aiKi and bi as the vector with entries bi1, . . . , biKi , for i = 0, . . . , m. In order to compute the
350 conjugate functions of the objective function and the constraint functions we define, for i = 0, . . . , m,
Ki X xk 351 hi(x) := log e k=1 Ki T T X aikx+bik 352 fi(x) := hi(Ai x + bi) = log e . k=1
353 Using the linear substitution rule (Table 2, line 8b) and the formula for the conjugate of hi(x) (Table 1,
354 line 5), the conjugate of fi(x) can be written as:
T T 355 fi∗(y) = inf hi∗(z) bi z Ai z = y z − | ( Ki ) X T T T 356 = inf zk log zk bi z z 0, 1 z = 1,Ai z = y . z − k=1
357 To simplify notation, let p∗ be the optimal value of (8). Using the dual problem as given by (D0) we
358 obtain
( m m ) X i X i 359 p∗ = sup (uifi)∗ y y = 0, u0 = 1 yi, u 0 − i=0 i=0 ( m ( Ki ) X X i i T i 360 = sup ui inf zk log zk bi z yi, u 0 − zi 0 − i=0 k=1 m i ) X i T i T i y 361 s.t. y = 0, 1 z = 1,A z = , i, u0 = 1 . i u ∀ i=0 i
‡We use e to denote the base of the natural logarithm, according to ISO Standard 80000-2:2009.
17 362 This is equivalent to
( m " Ki # X T i X i i 363 p∗ = sup ui bi z zk log zk yi, u 0, zi 0 − i=0 k=1 m i ) X i T i T i y 364 s.t. y = 0, 1 z = 1,A z = , i, u0 = 1 , i u ∀ i=0 i ( m " Ki # m ) X T i X i i X T i T i 365 = sup ui bi z zk log zk uiAi z = 0, 1 z = 1, i, u0 = 1 . u 0, zi 0 − ∀ i=0 k=1 i=0
i i 366 Definingz ˜ = uiz , i = 0, . . . , m, we obtain a convex formulation of Fenchel’s dual problem:
( m " Ki i # m ) X T i X i z˜k X T i T i 367 p∗ = sup bi z˜ z˜k log Ai z˜ = 0, 1 z˜ = ui, i, u0 = 1 u 0, z˜i 0 − ui ∀ i=0 k=1 i=0 ( m " Ki i # m ) X T i X i z˜k X T i T 0 368 = sup bi z˜ z˜k log T i Ai z˜ = 0, 1 z˜ = 1 . z˜i 0 − 1 z˜ i=0 k=1 i=0
369 For m = 0 this is the same dual problem as the one given in [8, p. 254]. For earlier versions of this dual
370 problem we refer to [9, 10, 11].
371 6.6 `p-norm optimization
372 We consider the following `p-norm optimization problem [9, 23]:
( ) T X 1 T pi T 373 max η x ai x ci + bk x dk 0, k = 1, . . . , m , (9) x pi | − | − ≤ i Ik ∈
n 374 where I1,...,Im denotes a partition of 1, . . . , r , with 1 m r. Moreover, bk R and dk R for { } ≤ ≤ ∈ ∈ n 375 each k and ai R , ci R and pi 1 for 1 i r. In order or to obtain Fenchel’s dual problem we ∈ ∈ ≥ ≤ ≤ 376 consider the minimization problem
( ) T X 1 T pi T 377 min η x ai x ci + bk x dk 0, k = 1, . . . , m , (10) x − pi | − | − ≤ i Ik ∈
378 whose optimal value is opposite to the optimal value of (9). We define
T 379 f0(x) := η x −
380 fk(x) := gk(x) + hk(x), k = 1, . . . , m,
18 381 where
X 1 T pi T 382 gk(x) := ai x ci , hk(x) := bk x dk. pi | − | − i Ik ∈
383 To derive the dual problem of (10), we need the conjugates of f0, . . . , fm. The functions f0 and hk are
384 linear, so one has (Table 1, line 1)
385 f ∗(y) = 0, dom f ∗ = η 0 {− }
386 h∗(y) = dk, dom h∗ = bk . k k { }
P T pi 387 The function g (x) can be written as g (x) = σ (a x c ), where σ (x) = x /p . Using the sum k k i Ik i i i i i ∈ − | | 388 rule (Table 2, line 6), the linear substitution rule (Table 2, line 8b) and the convex conjugate of σi(x)
389 (Table 1, line 10), we obtain the following conjugate function of gk(x):
( ) X 1 qi i X i 390 gk∗(y) = inf inf zi + cizi aizi = y y = y yi zi R qi | | | i Ik ∈ i Ik ∈ ∈ ( ) X 1 qi X 391 = inf zi + cizi aizi = y , zi qi | | i Ik i Ik ∈ ∈
392 with qi such that 1/pi + 1/qi = 1. Now the convex conjugate of fk(x) can be written as:
1 2 1 2 393 fk∗(y) = inf gk∗(y ) + hk∗(y ) y + y = y y1,y2 | ( ! ) X 1 qi X 394 = inf zi + cizi + dk aizi + bk = y . zi qi | | i Ik i Ik ∈ ∈
395 Fenchel’s dual of (10) is now given by
( m m ) 0 X k X k 396 max f0∗(y ) (ukfk)∗ y y = 0 . yk, u 0 − − k=1 k=0
397 Using the formulas that we derived above, we get the problem
( m ( k )) X X 1 qi 0 X y 398 max uk inf zi + cizi + dk y = η, aizi + bk = , yk, u 0 − zi qi | | − uk k=1 i Ik i Ik ∈ ∈
Pm k 399 subject to the condition k=0 y = 0. This can be simplified to
( m " # m ! ) X X 1 qi X X 400 max uk zi + cizi + dk uk aizi + bk = η . u 0, zi − qi | | k=1 i Ik k=1 i Ik ∈ ∈
19 401 To further simplify this formulation we definez ˜i = ukzi for each i. Then we get
( m " qi # m ! ) X X 1 z˜i z˜i X X z˜i 402 max uk + ci + dk uk ai + bk = η . u 0, z˜i − qi uk uk uk k=1 i Ik k=1 i Ik ∈ ∈
403 By changing the sign of the optimal value we arrive at
( m qi m m ) T T X X 1 z˜i X X X 404 min c z˜ + u d + uk aiz˜i + ukbk = η , u 0, z˜i qi uk k=1 i Ik k=1 i Ik k=1 ∈ ∈
405 which is the dual problem of (9) that we are looking for. Finally, defining matrices A and B according to
406 A = a1 . . . an ,B = b1 . . . br ,
407 the last problem gets the form
( m qi ) T T X X 1 z˜i 408 min u d + c z˜ + uk Az˜ + Bu = η , u 0, z˜i qi uk k=1 i Ik ∈
409 which is the same dual as was obtained in [23].
410 6.7 Radiotherapy treatment planning
411 Optimization plays an important role in radiotherapy treatment planning. The resulting optimization
412 problems contain many different nonlinear objective and constraint functions. To the best of our knowl-
413 edge the corresponding dual problems have not been derived. These dual problems might be useful for
414 reasons already stated in the Introduction. In this section we show that with the framework proposed in
415 this paper, the dual problem can be derived in a structured and straightforward way.
416 Treatment planning for external beam radiotherapy aims at finding beam intensities such that the tumor
417 is controlled, while limiting the dose to the surrounding organs at risk (OARs). For planning purposes,
418 the structures of interest (e.g. tumor, lung, heart) are virtually divided into a large number of small cubic
419 volumes, so-called voxels. An important input parameter to all treatment planning models is the matrix
420 D, whose (i, j)-element is the dose per unit intensity (dose rate) from beamlet j to voxel i (each beam is
421 subdivided into beamlets). The dose from beamlet j to voxel i is equal to Dijxj with xj the intensity
th 422 of beamlet j. The total dose to voxel i can be written as di = Dix, where Di is the i row of D and x
423 the vector of beamlet intensities. Treatment plans are evaluated based on the dose to each of the voxels.
20 424 Treatment planning models employ constraints and objectives such as the minimum, mean or maximum
425 dose over all voxels within a structure, dose-volume metrics such as the minimum dose to the hottest
426 p% of the voxels, or biological metrics such as the tumor control probability (TCP). Treatment planning
427 models may thus contain different types of constraints and do usually not fit a generic optimization format
428 (as, e.g., linear or second order cone optimization). A readily given dual problem is thus generally not
429 available. Here, we give an example of such a treatment planning model and derive its dual using the
430 method proposed in this paper.
431 The objective of our model is to maximize the TCP [22]:
Y αDix 432 TCP (x) = exp( N0vie− ), − i ∈T
433 where is the set of voxels in the target volume, N0 is the number of clonogenic cells, vi is the relative T 434 volume of voxel i and α describes the radio resistance of cells in the target volume. This function is not
435 concave. Taking the log of TCP gives the concave function LTCP:
X αDix 436 LT CP (x) = N0vie− , − i ∈T
437 that yields the same optimal solution for x as log(x) is increasing [20]. Therefore, LTCP is optimized
438 instead.
439 Our planning model contains a constraint on the dose to the OARs. For simplicity, we assume there is
440 only one OAR, denoted as S. First, we consider the generalized equivalent uniform dose (gEUD), which
441 is the generalized mean of the dose to a structure S [18]:
1 ! p X p 442 gEUD(x) = vi(Dix) , i ∈S
443 where denotes the set of voxels in structure S, and p 1 is a structure-dependent parameter for OARs S ≥ 444 that indicates if it is a serial or a parallel organ. We limit the normal tissue complication probability
445 (NTCP) to this OAR [1]: gEUD(x)p 446 NTCP (x) = 1 exp , − − ∆
447 where ∆ is a structure-dependent parameter that indicates the uniform dose level for which there is a
448 63% probability on complications. Furthermore, p > 1 since we restrict ourselves to a non-parallel organ.
449 We would like to constrain NTCP from above, however, NTCP is not a convex function. Therefore,
21 450 following [20], we use instead
1 X p 451 ln(1 NTCP (x)) = vi(Dix) , − − ∆p i ∈S
452 which is convex. As was the case with the transformation from TCP to LTCP, this does not change the
453 optimal solution. Note that we need to apply the same transformation to the upper bound on NTCP.
454 We can now formulate our treatment plan optimization model:
1 P p ∆p i vi(Dix) γ 0 X αDix ∈S − ≤ 455 min N0 vie− . x i ∈T x 0
456 where γ is a predetermined upper bound on ln(1 NTCP ). The non-negativity constraint is included − − 457 since a negative beam intensity is physically impossible.
458 Before giving the dual of this problem, we derive the conjugates of the objective function and the con-
459 straint function.
460 Objective function
461 The objective function f0(x) is a constant multiplied by the sum of vihi(x), where hi(x) = g( αDix), − u 462 g(u) = e . Using Table 1 (line 2) and the linear substitution rule (Table 2, line 8b), we obtain the convex
463 conjugate of hi(x): T 464 hi∗(y) = inf zi log zi zi αziDi = y, zi 0 . zi − | − ≥
465 Application of the sum rule for conjugates (Table 2, line 6) now enables us to write the conjugate of f0(x)
22 466 as follows:
! X ∗ y 467 f0∗(y) = N0 vihi (Product with a scalar) N0 i ∈T ( ) X i X i y 468 = N0 inf (vihi)∗ (y ) y = (Sum rule) i y i∈T N0 { } i i ∈T ∈T ( i ) X y X i y 469 = N0 inf vih∗ y = (Product with a scalar) i i y i∈T vi N0 { } i i ∈T ∈T ( i ) X T y X i y 470 = N0 inf vi inf zi log zi zi αziD = , zi 0 y = i i y i∈T zi − | − vi ≥ N0 { } i i ∈T ∈T ( ) X wi X T y 471 = N0 inf wi log wi αwiDi = , w 0 (wi := vizi) w vi − − N0 i i ∈T ∈T ( ) X wi T 472 = N0 inf wi log wi αN0D w = y, w 0 , w vi − − T i ∈T
473 where D consists of the rows in D related to the voxels in . T T
474 NTCP constraint
475 The NTCP constraint function f1(x) is given by
1 X p 1 X 476 f1(x) = vi(Dix) γ = vigi(x) γ, ∆p − ∆p − i i ∈S ∈S
p 477 where gi(x) = hi(Dix), hi(u) = u (u 0), with p > 1. The convex conjugates of hi(u) and gi(x) are ≥ 478 (using Table 1, line 8 and Table 2, lines 4 and 8b),
p q y 479 hi∗(y) = inf ti ti 0, ti ti q | ≥ ≥ p p q ri T 480 gi∗(y) = inf inf ti ti 0, ti Di ri = y ri ti q | ≥ ≥ p | p q T ri 481 = inf ti Di ri = y, ti 0, ti . ri,ti q | ≥ ≥ p
482 By using the sum rule for conjugates, the multiply-with-a-constant rule and add-a-constant rule (Table 2,
23 483 lines 6, 4 and 1, respectively) we can now compute f2∗(y):
( i ) 1 X y X i p 484 f2∗(y) = p inf vigi∗ y = ∆ y + γ ∆ yi vi i i ∈S ∈S ( i ) 1 X p q T y ri X i p 485 = p inf vi inf ti Di ri = , ti 0, ti y = ∆ y + γ ∆ yi ri,ti q | vi ≥ ≥ p i i ∈S ∈S ( ) p X q X T p 486 = p inf viti viDi ri = ∆ y, ti 0, p ti ri + γ q∆ ri, ti ≥ ≥ i i ∈S ∈S p T q T p 487 = inf γ + p v t D V r = ∆ y, t 0, p t r , r, t q∆ S | S S
q q 488 where t is the vector with entries t , i . i ∈ S
489 Non-negativity constraint
n 490 We finally have to deal with the non-negativity constraint x 0. Denoting the indicator function of R + 491 as f2(x), the corresponding support function is (cf. Table 3, line 2)
492 f ∗(y) = 0, dom f ∗ = y y 0 . 1 1 { | }
493 As we know, Fenchel’s dual problem is given by
( 1 2 ) 0 y 2 X k 494 sup f0∗(y ) uf1∗ f2∗(y ) y = 0, u 0 . yi 2 ,u − − u − ≥ { }i=0 k=0
2 495 We can already eliminate y , yielding the equivalent problem
1 0 y 0 1 496 sup f0∗(y ) uf1∗ y + y 0, u 0 . y0,y1,u − − u ≥
497 Thus we obtain the following dual for the treatment planning optimization problem:
( ( ) X wi T 0 498 sup N0 inf wi log wi αN0D w = y , w 0 0 1 − w vi − − T y , y , u i ∈T 1 p T q T p y 499 u inf γ + p v t D V r = ∆ , t 0, p t r − r, t q∆ S S S u ) 0 1 500 y + y 0, u 0 . ≥
24 501 By omitting the inf operators we get
( ( ) X wi T 0 502 sup N0 wi log wi αN0D w = y , w 0 0 1 − vi − − T y , y , w, z, r, t, s, u i ∈T 1 p T q T p y 503 u γ + p v t D V r = ∆ , t 0, pt r − q∆ S S S u ) 0 1 504 y + y 0, u 0 . ≥
505 This is equivalent to
( T q X wi upv t 506 sup N0 wi log wi uγ Sp w, r, t, u − vi − − − q∆ i ∈T 507 w 0, t 0, p t r, u 0, ≥ T 1 T 508 αN0DT w + p D V (ur) 0 . − ∆ S S
509 By redefining s according to r := ur, t := ut this becomes
( q X wi up T t 510 sup N0 wi log wi uγ p v w, r, t, u − vi − − − q∆ S u i ∈T 511 w 0, t 0, p t r, u 0, ≥ T 1 T 512 αN0DT w + p D V r + s 0 . − ∆ S S
513 One easily verifies that this dual problem of the treatment plan optimization problem is indeed a convex
514 optimization problem.
515 7 Concluding remarks
516 Mathematical education usually includes the use of tables, e.g., tables of derivatives, primitive functions,
517 Laplace transforms, Fourier transforms, etc. It may have become clear from this paper that the availability
518 of tables as presented in in Appendix E simplifies the process of dualizing an optimization problem.
519 Combined with the use of Fenchel’s duality theorem it yields a universal and structured way for deriving
520 dual problems for a wide spectrum of convex optimization problems.
25 521 Acknowledgement
522 The authors thank Lieven Vandenberghe (UCLA) for useful suggestions.
523 A Some examples of conjugate functions
524 By way of example we demonstrate how some of the results in Table 1 are obtained; this table also gives
525 appropriate references to the existing literature whenever available. The examples in this section are used
526 in Section 6 where we compute the dual problems of some more or less standard optimization problems.
527 A.1 Linear function
T n 528 We start by proving the formula in line 1 of Table 1. With f(x) = a x + b, a R and b R, we have ∈ ∈
T T T 529 f ∗(y) = sup y x (a x + b) = sup (y a) x b . x − x − −
530 Obviously the sup equals b if y = a and otherwise infinity (take x = λ(y a), λ > 0). Hence we obtain − −
531 f ∗(y) = b, dom f ∗ = a . − { }
532 A.2 Log-sum-exp function
Pn xi n 533 We next deal with line 5 in Table 1. Let f(x) = log ( e ) , x R . We then have i=1 ∈
( n !) T X xi 534 f ∗(y) = sup y x log e . x − i=1
T 535 Define g(x) = y x f(x). If yi < 0 for some i, we take x = λei, where λ > 0 and ei is the i-th unit − − λ 536 vector. Then g(x) = λyi log(n 1 + e− ), which goes to infinity if λ goes to infinity. If y 0 but − − − T T λ T 537 1 y = 1, take x = λ1. Then g(x) = λ1 y log(ne ) = λ(1 y 1) log n, which goes to infinity both 6 − − − T T 538 if 1 y > 1 (when λ goes to infinity) and if 1 y < 1 (when λ goes to minus infinity). Hence y dom f ∗ ∈ T 539 holds only if 1 y = 1 and y 0. We proceed by computing the maximal value of g(x) under these 540 conditions. One has ∂g(x) exi 541 = yi n . P xi ∂xi − i=1 e
26 542 By setting these partial derivatives equal to zero we obtain the condition
exi 543 yi = n , i = 1, . . . , n. P xi i=1 e
544 Substitution of these values of y into g(x) yields
n X T 545 f ∗(y) = yi log yi, y 0, 1 y = 1. i=1
546 By interpreting 0 log 0 = 0 this expression remains valid if some entries of y vanish. Thus we conclude
547 that n X T 548 f ∗(y) = yi log yi, dom f ∗ = y 1 y = 1, y 0 . | i=1
549 A.3 Arbitrary norm
n 550 In this section we deal with line 6 in Table 1. Let x be an arbitrary norm on R and f(x) := x . We k k k k 551 then have
T 552 f ∗(y) = sup y x x . x − k k
553 Recall that the dual norm is defined by
T 554 y = sup y x x 1 . k k∗ x | k k ≤
x n 555 R Since x = 1 if x = 0, we have for any nonzero x k k 6 ∈
T T x y x 556 y y = , k k∗ ≥ x x k k k k
557 which gives
T 558 y x y x . (11) ≤ k k∗ k k
n T 559 Since this inequality holds also for x = 0, it holds for each x R . Define g(x) := y x x . Then ∈ − k k 560 g(λx) = λg(x) for λ 0. ≥ T 561 If y > 1 then there exists an x with x 1 and y x > 1. Then g(x) > 0. Hence g(λx) goes to k k∗ k k ≤ 562 infinity if λ grows to infinity. We conclude from this that y dom f ∗ holds only if y 1. ∈ k k∗ ≤
27 T n 563 If y 1 then (11) implies y x y x x , whence g(x) 0, for all x R . Since g(0) = 0, k k∗ ≤ ≤ k k∗ k k ≤ k k ≤ ∈ 564 we obtain
565 f ∗(y) = 0, dom f ∗ = y y 1 . { | k k∗ ≤ }
566 Let us recall that if . represents the p-norm then it is denoted as . and k k k kp
1 n ! p X p n 567 x = xi , p 1, x R . k kp | | ≥ ∈ i=1
1 1 568 In that case the dual norm is given by . , with q such that + = 1. k kq p q
569 A.4 Quadratic function
1 T n 570 In this section we deal with line 13 in Table 1. Let f(x) = x P x, x R , where P is a (symmetric) 2 ∈ 571 positive semidefinite n n matrix. We then have ×
T 1 T 572 f ∗(y) = sup y x 2 x P x . x −
T 1 T 573 Since y x x P x is concave in x, its maximal value occurs if y = P x. If P is nonsingular then this − 2 1 574 implies x = P − y and hence
T 1 1 1 T 1 1 T 1 575 f ∗(y) = y P − y P − y PP − y = y P − y. − 2 2
576 If P is singular, the equation y = P x has a solution if and only if y belongs to the column space
n 577 L = P z z R . If y L then the solution is given by x = P †y, where P † is the generalized inverse { | ∈ } ∈ 578 of P , and we obtain
T 1 T 1 T 579 f ∗(y) = y P †y P †y PP †y = y P †y, − 2 2
580 where the last equality is due to the fact that P †PP † = P † [2].
581 If y / L we may write y = y1 + y2 with y1 L and 0 = y2 L⊥. Take λ > 0 and x = λy2. Then ∈ ∈ 6 ∈ T T T T T 2 T 582 x y = x y1 + x y2 = x y2 = λy y2 = λ y2 . Since x L⊥, we have P x = 0, whence x P x = 0. 2 k k ∈ T 1 T 2 583 Hence y x x P x = λ y2 , which goes to infinity if λ grows to infinity. This proves that y / dom f ∗ − 2 k k ∈
28 584 if y / L. Thus we have shown that ∈
1 T n 585 f ∗(y) = y P †y, dom f ∗ = y y = P z, z R . 2 { | ∈ }
1 p 586 A.5 Conjugate of p x , x ≥ 0, p > 1
1 p 587 We prove the formula in line 12 of Table 1. Let f(x) = x , x 0, p > 1. One has p ≥
p 1 588 f 0(x) = x −
p 2 589 f 00(x) = (p 1)x − , −
590 which shows that f is convex, because x 0 and p > 1. Then ≥
591 f ∗(y) = sup (yx f(x)) . x 0 − ≥
592 Since yx f(x) 0 if y < 0 the sup value the occurs for x = 0 and then it equals 0 f(0) = 0. Otherwise, − ≤ − 593 if y 0, we have ≥ 1 p 594 f ∗(y) = sup yx p x . x 0 − ≥
p 1 1 595 The sup value is then attained when y = f 0(x) = x − , which gives x = y p−1 , and
1+ 1 1 p 1 p 1 q 596 f ∗(y) = y p−1 y p−1 = 1 y p−1 = y , − p − p q
1 1 597 where q is such that p + q = 1. Summarizing,
0, y < 0, 598 f ∗(y) = 1 yq, y 0. q ≥
599 Since f ∗(y) is monotonically increasing, this can be written as
n 1 q o 600 f ∗(y) = inf z z y, z 0 . z q | ≥ ≥
29 1 601 A.6 Conjugate of pxp , x > 0, p > 0
1 602 We prove the formula in line 11 of Table 1. Let f(x) = pxp , x > 0, p > 0. One has
1 603 f 0(x) = − xp+1 p + 1 604 f 00(x) = , xp+2
605 which shows that f is convex, because x > 0 and p + 1 > 1 > 0. Also note that f(x) is monotonically
606 decreasing to zero if x goes to . We have ∞
607 f ∗(y) = sup (yx f(x)) . x>0 −
608 If y > 0 then yx f(x) goes to if x goes to . Hence the sup value is if y > 0. So, we may assume − ∞ ∞ ∞ 609 that y 0. Since then ≤ 1 610 f ∗(y) = sup yx pxp , x>0 −
−1 1 p+1 611 the sup value is attained when y = f 0(x) = −p+1 , which gives x = ( y) , and x − −1 1 p 1 p p + 1 p 612 f ∗(y) = y( y) p+1 ( y) p+1 = 1 ( y) p+1 = ( y) p+1 . − − p − − − p − − p −
613 Summarizing, p + 1 p 614 f ∗(y) = ( y) p+1 , dom f ∗ = y y 0 = R+. − p − { | ≤ } −
615 B Some identities for conjugate functions
616 B.1 Conjugate of f(x) = maxi fi(x)
617 First note that we may write
( m m ) ( m ) X X X 618 f(x) = max zifi(x) zi = 1, z 0 = max zifi(x) , z z S i=1 i=1 ∈ i=1
30 619 where S denotes the simplex. By the definition of f ∗ we have
( ( m )) T X 620 f ∗(y) = sup y x max zifi(x) x − z S ∈ i=1 ( m ) T X 621 = sup min y x zifi(x) . x z S − ∈ i=1
622 Since the argument is convex in z and concave in x, and the domain of z is compact, we may interchange
623 the sup and the min operators, yielding
( m ) m !∗ T X X 624 f ∗(y) = min sup y x zifi(x) = min zifi (y). z S x − z S ∈ i=1 ∈ i=1
625 Using the sum rule for conjugates (Table 2, line 6) we obtain
( m m ) X i X i 626 f ∗(y) = min (zifi)∗(y ) y = y z S ∈ i=1 i=1 ( m m m ) X i X i X 627 = min (zifi)∗(y ) y = y, zi = 1, z 0 . z i=1 i=1 i=1
628 B.2 Deriving conjugates via the adjoint I
629 In this section we prove the formula in line 10 of Table 2. This formula enables us to write f ∗(y) as an
630 inf expression in cases where no closed formula is available for f ∗(y), whereas such a formula exists for
631 (f ♦)∗(y). The proof below is an alternative for that in [14].
632 By the definition of (f ♦)∗(y) we have
633 inf z (f ♦)∗( z) y = inf z sup zx f ♦(x) y . z | − ≤ − z | x − − ≤ −
634 Since dom f ♦ = R++ we have f ♦(x) = if x 0. Hence ∞ ≤ 1 635 inf z (f ♦)∗( z) y = inf z sup zx xf y . z | − ≤ − z | x>0 − − x ≤ −
636 Due to Lagrange duality this implies
1 637 inf z (f ♦)∗( z) y = sup inf z + λ y + sup zx xf z | − ≤ − λ 0 z x>0 − − x ≥ 1 638 = sup inf sup λy λxf + z(1 λx) . (12) λ 0 z x>0 − x − ≥
31 639 Observe that the argument in the last expression is the Lagrangian of the problem
1 640 sup λy λxf 1 λx = 0 . x>0 − x | −
641 Since this problem is Slater regular, its Lagrangian has a saddle point. As a consequence, in (12) we may
642 replace infz supx>0 by supx>0 infz. One has
1 1 643 sup inf λy λxf + z(1 λx) = sup λy λxf λx = 1 = λy f(λ). x>0 z − x − x>0 − x | −
644 Substitution into (12) yields
645 inf z (f ♦)∗( z) y = sup λy f(λ) = f ∗(y). z | − ≤ − λ 0 { − } ≥
646 The last equality holds because the domain of f is R+.
647 B.3 Deriving conjugates via the adjoint II
648 In this section we prove formula 11 in Table 2. This formula can be applied if f has an inverse, which is
1 649 denoted as h = f − .
650 Since h is concave we have, by definition, h (y) = infx yx h(x) . Thus we may write ∗ { − }
1 1 x 651 (h )♦(y) = yh = y inf x h(x) = y sup h(x) − ∗ − ∗ y − x y − x − y x 652 = y sup t t h(x) , x,t − y | ≤
x 653 where we used that h is (strictly) increasing. If y < 0 then t goes to infinity if x goes to infinity. − y 654 Hence, we may assume that y 0. Also using that t h(x) holds if and only if f(t) x we may proceed ≥ ≤ ≤ 655 as follows:
656 (h )♦(y) = sup ty x t h(x) = sup ty x f(t) x − ∗ x,t { − | ≤ } x,t { − | ≤ }
657 = sup ty f(t) = f ∗(y). t { − }
32 658 B.4 Linear substitution rule
659 In this section we prove the formulas in lines 8a and 8b of Table 2. Let f be defined by f(x) = h(Ax + b),
m m 660 where A is an m n matrix, b R and h : R R convex. We then have × ∈ →
T 661 f ∗(y) = sup y x h(Ax + b) x − T 662 = sup y x h(v) Ax + b = v . x,v − |
663 By writing the last problem as a minimization problem and then using Lagrange’s duality theorem we
664 get
T T 665 f ∗(y) = min sup y x h(v) + z (v b Ax) z x,v − − − T T T T 666 = min sup z v h(v) z b + x (y A z) . z x,v − − −
T 667 The contribution of the term containing x is finite if and only if y A z = 0. Thus we obtain −
T T T 668 f ∗(y) = min sup z v h(v) z b A z = y z v − − | T T 669 = min h∗(z) b z A z = y . z − |
T T 670 If A is square and nonsingular then A z = y holds if and only if z = A− y. Hence, in that case we have
T T T 671 f ∗(y) = h∗(A− y) b A− y. −
672 B.5 Composite function
673 In this section we prove the formula in line 9 of Table 2. Let f(x) = g(h(x)), with g and h convex and g
674 nondecreasing. We may write
675 f(x) = inf g(z) h(x) z . z { | ≤ }
676 Hence
n T o 677 f ∗(y) = sup y x inf g(z) h(x) z x − z { | ≤ } T 678 = sup y x g(z) h(x) z . x, z − | ≤
33 679 Using the Lagrange dual of the last problem, we get
T 680 f ∗(y) = min sup y x g(z) + u (z h(x)) u 0 x, z − − ≥ T 681 = min sup uz g(z) + sup y x uh(x) u 0 z { − } x − ≥
682 = min g∗(u) + (uh)∗(y) . u 0 { } ≥
683 B.6 Adjoint function