4. Constrained Minimization
Total Page:16
File Type:pdf, Size:1020Kb
4. CONSTRAINED MINIMIZATION n The minimization of a function f0 over a set C ⊂ IR can be much harder than n the minimization of f0 over all of IR , and it raises a host of issues. Some of these concern the ways a numerical method might be able to maneuver around in C or near to it, while others come up in characterizing optimality itself. Feasible directions: A vector w 6= 0 is said to give a feasible direction into C at a point x ∈ C if there’s an ε > 0 such that the line segment x + τw 0 ≤ τ ≤ ε lies in C. Descent methods with constraints: The general class of descent methods described for unconstrained optimization could be adapted to the minimization of f0 over C if the nature of C is such that feasible directions can readily be found. The rough idea is this. Starting from a point x0 ∈ C, a sequence of points is generated by ν ν the scheme that, when at x ∈ C, a descent vector w is chosen for f0 which at the same time gives a feasible direction into C at xν . (If no such wν exists, the method terminates.) Next, some sort of line search is executed to produce ν ν ν ν ν ν ν ν a value τ > 0 such that both x + τ w ∈ C and f0(x + τ w ) < f0(x ). The next point is taken then to be xν+1 := xν + τ ν wν . In particular, one can ν ν ν imagine choosing τ to be a value that minimizes ϕ(τ) = f0(x + τw ) over the ν ν set τ ≥ 0 x + τw ∈ C ; this would be the analogue of exact line search. Pitfalls: Many troubles can plague this scheme, unless the situation is safeguarded by rather special features. Finding a descent vector wν that gives a feasible direction may be no easy matter, and even if one is found, there may be difficulties in using it effectively because of the need to keep within the confines of C. A phenomenon called jamming is possible, where progress is stymied by frequent collisions with the boundary of C and the method “gets stuck in a corner” of C or makes too many little zigzagging steps. Lack of feasible directions at all: Of course, this kind of approach doesn’t make much sense unless one is content to regard, as a “quasi-solution” to the prob- lem, any point x ∈ C at which there is no descent vector for f0 giving a feasible direction into C. That may be acceptable for some sets C such as boxes, but not for sets C in which curvature dominates. For example, if C is a curved surface there may be no point x ∈ C at which there’s a feasible direction into C, because feasible directions refer only to movement along straight line segments embedded in C. Then there would be no point of C from which progress could be made by a modified descent method in minimizing f0. 49 Variational geometry: Whether or not some form of descent method might be made to work, it’s essential to have a solid grip on the geometry of the feasible set C. Classical geometry doesn’t meet the needs, so new concepts have to be brought in. To get rolling, we need a notion of tangency which includes the tangent spaces long associated with “linearization” of curved structures, but at the same time covers vectors giving feasible directions. Tangent vectors: For a closed set C ⊂ IRn and a pointx ¯ ∈ C, a vector w is said to be tangent to C atx ¯ if there is a sequence of vectors xν → x¯ in C along with scalars τ ν & 0 such that the vectors wν = (xν − x¯)/τ ν converge to w. Interpretation: The tangent vectors w to C atx ¯, apart from w = 0 (which corresponds in the definition to xν ≡ x¯, wν ≡ 0), are the vectors pointing in a possibly asymptotic direction from which a sequence of points xν ∈ C can converge tox ¯. The direction of xν as seen fromx ¯, which is the direction of wν , is not necessarily that of w, but gets closer and closer to it as ν → ∞. Relation to feasible directions: Every vector w 6= 0 giving a feasible direction into C atx ¯ is a tangent vector to C atx ¯. Indeed, for such a vector w one can take xν =x ¯ + τ ν w for any sequence of values τ ν & 0 sufficiently small and have for wν = (xν − x¯)/τ ν that wν ≡ w, hence trivially wν → w. Note from this that tangent vectors, in the sense defined here, can well point right into the interior of C, if that is nonempty; they don’t have to lie along the boundary of C. Relation to classical tangent spaces: When C is a “nice two-dimensional surface” in IR3, the tangent vectors w to C at a pointx ¯ form a two-dimensional linear subspace of IR3, which gives the usual tangent space to C atx ¯. When C is a “nice one-dimensional curve,” a one-dimensional linear subspace is obtained instead. Generalization can be made to tangent spaces to “nice curvilinear manifolds” of various dimensions d in IRn, with 0 < d < n. These comments are offered here on a heuristic, motivational level, but they will latter be supplied with rigor once such manifolds are identified with sets defined by equality constraints satisfying a constraint qualification that ensures a robust representation. Tangent cone at a point: In general, the set of all vectors w that are tangent to C at a pointx ¯ ∈ C is called the tangent cone to C atx ¯ and is denoted by TC (¯x). Basic properties: Always, the set TC (¯x) contains the vector 0. Further, for every vector w ∈ TC (¯x) and every scalar λ > 0, the vector λw is again in TC (¯x) (because of the arbitrary size of the scaling factors in the definition). But in 50 many situations one can have w ∈ TC (¯x), yet −w∈ / TC (¯x). In particular, TC (¯x) can well be something other than a linear subspace of IRn. Cones: A subset of IRn is called a cone if it contains the zero vector and contains with each of its vectors all positive multiples of that vector. Geometrically, this means that a cone, unless it consists of 0 alone, is a “bundle of rays.” ν Limits of tangent vectors: The limit of any sequence of vectors w ∈ TC (¯x) is another vector w ∈ TC (¯x). In other words, TC (¯x) is always a closed set. (This can readily be gleaned from the definition of TC (¯x) and the general properties of convergent sequences.) In particular, if the vectors wν give feasible directions to C atx ¯ and ν w → w, then w ∈ TC (¯x). n n Tangents in some extreme cases: If C = IR , then TC (¯x) = IR . Indeed, any time n x¯ lies in the interior of C, one has TC (¯x) = IR . If C is a one-element set {a} and x¯ = a, then TC (¯x) = {0}. More generally, the latter holds wheneverx ¯ is an isolated point of C in the sense that there is no sequence of points xν → x¯ in C apart from the constant sequence xν ≡ x¯. n Tangents to boxes: If X = I1 × · · · × In in IR with Ij a closed interval in IR, then at any pointx ¯ = (¯x1,..., x¯n) ∈ X (the componentx ¯j lying in Ij) one has TX (¯x) = TI1 (¯x1) × · · · × TIn (¯xn), where [0, ∞) ifx ¯j is the left endpoint (only) of Ij, (−∞, 0] ifx ¯j is the right endpoint (only) of Ij, TIj (¯xj) = (−∞, ∞) ifx ¯j lies in the interior of Ij, [0, 0] if Ij is a one-point interval, consisting just ofx ¯j. In other words, the condition w ∈ TX (¯x) for w = (w1, . , wn) amounts to restricting wj to lie in a one of the intervals (−∞, 0], [0, ∞), (−∞, ∞), or [0, 0] (the one-point interval consisting just of 0). The particular interval for an index j depends on the location ofx ¯j relative to the endpoints of Ij. n Tangents to the nonnegative orthant: The set X = IR+ is a box, the product of the intervals Ij = [0, ∞). In this case one has w ∈ TX (¯x) if and only if wj ≥ 0 for all indices j withx ¯j = 0. (For indices j withx ¯j > 0, wj can be any real number.) Feasible directions in a box: When X is a box, not only does every vector w 6= 0 giving a feasible direction in X atx ¯ belong to TX (¯x), but conversely, every vector w 6= 0 in TX (¯x) gives a feasible direction in X atx ¯. This important property generalizes as follows to any set specified by finitely many linear constraints. 51 Tangents to polyhedral sets: For a polyhedral set C, like a box, simple formulas describe all the tangent cones. As a specific case ripe for many applications, suppose ai·x ≤ bi for i ∈ [1, s], x ∈ C ⇐⇒ ai·x = bi for i ∈ [s + 1, m], x ∈ X, with X a box. Letx ¯ ∈ C and denote by I(¯x) the set of i ∈ [1, s] for which actually ai·x = bi. Then ai·w ≤ 0 for i ∈ I(¯x), w ∈ TC (¯x) ⇐⇒ ai·w = 0 for i ∈ [s + 1, m], w ∈ TX (¯x). Argument: Let K denote the set of vectors w described by the final conditions (on the right). The question is whether K = TC (¯x). For a vector w and scalar τ > 0, one hasx ¯ + τw ∈ C if and only ifx ¯ + τw ∈ X and ai·w ≤ [bi − ai·x¯]/τ for i ∈ [1, s], whereas ai·w = 0 for i ∈ [s + 1, m].