Polyhedral Approximation Extended Monotropic Programming Special Cases

Polyhedral Approximations in

Dimitri P. Bertsekas

Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology

Tsinghua University, 2009 Polyhedral Approximation Extended Monotropic Programming Special Cases

Convex Optimization Algorithms: Generalities

Primary methodology for large scale problems. Arise in the context of , network optimization, machine learning. Principal methods date to the 60s ... but have been used in new ways recently. Descent methods (e.g., subgradient). Polyhedral approximation (e.g., cutting plane). Proximal (regularization) methods - possibly in combination with polyhedral approximation. Polyhedral Approximation4 Extended Monotropic Programming Special Cases Our Focus in this Talk 4 A unifying framework for polyhedral approximation methods. IncludesPo classicallyhed methods:ral Convexity Template

Cutting plane/Outerx h2(λ) linearization− h1(λ) h2(λ∗) − h1(λ∗) x∗ (−λ∗, 1) (−λ, 1) Slope = λ SimplicialS decomposition/Innerlope = λ∗ linearization x h2(λ) − h1(λ) h2(λ∗) − h1(λ∗) x∗ (−λ∗, 1) (−λ, 1) Slope = λ Slope = λ∗ IncludesPoly newhed methods,ral C ando newnv versions/extensionsefx1(xi) tfy2(x) Tem ofp oldla methods.te Based onf a1(x convex)(a)f2(bx) conjugacy framework and outer/inner linearization (a) (b) duality. minimize f(x) = x1 + x2 minimize f(x) = x1 + x2 subject to g(x) = x1 ≤ 0 subject to g(x) = x1 ≤ 0 x ∈ X = (x1, x2) | x1 > 0} x ∈ X = (x1, x2) | x1 > 0} Outer linearization! of f ! f ∗ = ∞, q∗ = −∞ ∗ x0 x∗ 1 x2 x3 x4 f(x) X x f = ∞, q = −∞ F (x) H(y) y h(y) ! f(x0S) =+ (x(x−2,xx0))|g0x > 0 S = (x2, x) | x > 0 ! " ! " sup inf φ(x, zOu) tersuplineari!inf 2zφˆation(x, z)of=fq = p˜(0) p(0) = w = inf sup φ(x, z) 2 fq((xµ1))=+m(xin−xx1+)µgx1 ∗ ∗ q(µ) = min x + µx z Z x X x∈≤# z Z x X ≤ x X z Z x∈# ∈ ∈ ∈ ∈ ∈ ∈ ! " ! " xx00−xx101/(xx412µ2x)gx203gigf01xµg4g12>xfg00(2x1) xX2Slgopx0 geSl1=Slopg2opye0 e=Sl=−opy10y/e0Sl(=4Slopµopy)e1 e=Slif=opyµ1ye>1Sl=Sl0opopye2Fe=(x=y)2yH2 (y) y h(y) n = Shapley-Fαx + (1 −=olαk)my,an0 ≤Theorem:α ≤ 1 Let S = S1 + + Sm with Si , # −∞ if µ ≤ 0 i =Ou1,te. .rOu. ,OuLimtenearizationter#rLi−Linearization∞nearizationof fif µofof≤ff0 · · · ⊂ # Slope = y0 Slope = y1 Slope = y2 ! If sf(xcon0) v+(S()xthen− x0s)=g0s1 + + sm where minimiOuzePte(Purf)(Li(u=x)nearization)c==mc−amxx{a0x,{Pu0}(,ofuu)}f= c m∈maxin{i0m, uiz}e f◦(x) = −Fx(x·)·FH·F((x(y)x))HyH(hy(()yy)y)yh(hy()y) Slope = y0 Slope = y1 Slope =s y2 conv(S <) for90 all i = 1, . . . , m, ˆ i sup iinf φ(x, z) sup inf φ(x, z) = q∗ = p˜(0) p(0) = w∗ = inf sup φ(x, z) subjeSlctoptoe =gy(x0)Sl=opxe−=1/y21 ≤Slop0 e =∈ syu2bject to g(x) =≤x − 1/2 ≤ 0 ≤ Outer Linearizationc c of f si c Si∗ zforZ atx 2Xleast m nz Z1xindX ices i. x X z Z Outer LinearizationLevel set !xof|ffF(x(x))≤∈Hf(y)+∈yαhc∈(y/)2" Opt−!ima−∈l sol∈ution set x0 ∈ ∈ Ouxte∈rXLi=nearizationOu{t0e,r1A} pprofoxifmf(axtThei1o)n+osuf(mfxIof−nxnaxe∈r1largeX)Lign=1ena{umr0iz, a1bt}erionofofconCovnexjugsetsate ish almost convex n ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Shapley-F∗ olkman Theorem: Let S = S1 + + Sm with Si , x xh(λh)(λh) 1(hλ1)(λh)x2(hλh2()λ−))h−1(hλ1)()λhFx)2k((λxx()F)−H−(λx−kh),(λ1λH(),λ(1y()))−y(λx−Non,h1λ(),y(1−)c)onλ v,exit1) y(−ofλ,the1ˆ) sumˆˆis caused by a small number (n + 1) of sets q(µ) = min −x + µ(x − 1F/sup2()x) inHsupfsup(yφ)(inyxinf,hzfφ()y(φx)(X,xzsup,)z) insupfsupφ(inxinf, zfφ)(φ=x(,xqz,∗)z)==qp˜∗(0)q∗==p˜(0)p˜p(0)(0) =p(0)pw(0)∗==·=·w·infw∗ ∗==supinfinfφsup(xsup, zφ)(φx(,x⊂z,)z#) x∈{0,1} z Z xFzX(qzxZ(µ)Zx)Hix=X=(Xy1)m,≤y.i.nzh. (,Zym≤)−x≤zxXz+ZZxµx(XxX− 1/2) ≤ ≤≤ x X zx ZxXXz zZZ ! c c ∈" ∈∈c∈ ∈∈ x∈{0,1∈} ∈∈∈ ∈∈ ∈ ∈∈∈ ∈∈ p(v) + "v"2 2ˆp(vα)x++ "(v1If"−2s α)cony, v0(S≤) αthen≤ 1s =˜ s1 + ˇ + sm where sup=inmfinφ{(−x,µzp/)(2v,)µ+sup/2 −"inv1"f} φ(x, z) = q∗ = p˜x∈(0)0 x1 x!p2(0)g0 =g1wg∗2 =f(infx) =sup"·(·cl·φ)(fx(,xz)) z Z x X ≤ 2z Z2 x X 2 s= mconin{≤v−(µS/)2,forµ/2al−l i1=}x 1X, .z. . Z, m, n n n ∈ ∈ ∈ ∈ Shapley-FShapley-FShapley-Foli xkkm−olanolαkkmTheorem:gmikananTheorem:Theorem:Let∈ SLet∈Let= SSS1=+=S1S1+++Sm++withSmSmwithSwithi SiSi , , , ∗ ˆ ∈ · · · · ···· · ⊂ # ⊂⊂## (0, f ) = (0,sup0) in(−f 1φ/(2x,,0z)) (1/sup2, −in1)f φ(x, z)i==q1∗, =.i. =.p˜,(0)ms1, . . .Sp, m(0)for=atwl∗east= infm supn φ(1x,indz)ices i. ∗ ˆ i = 1, .i. . , mi z Zsupx Xinf φ(x,≤z)z Zsupx (X0in, ff )φ(=x,(z0)ˆ, 0=) q∗(−=1p˜/∈(0)2≤, 0) (p◦1(0)/2,=−w1)∗x=X−infzˇ Zsup− φ(x, zn) −1/2 1 ∈ ∈Shapley-Fsup infiolnφfk(i∈nxmLf,czan(L)∈xc,(λxTheorem:ksup, )λk)iniIffnfsφL(xcLet,xcon(Ifzxk),+sλv=1Sk(=S)qPcon=∗)P(then=uXSThev)1p(˜(=S(0)x+)kscsuthen−=mmaαsp+xofk(0){sg+qS0ka=,m)=u∈=larges}withw(+cl∗+∈s=)npumS(0)infwherei+bsersuppof(0)whereφ,con(x=,vzwex) sets is almost convex z Z x X ≤ z Z x X x X If s con

Vehicle for Unification

Extended monotropic programming (EMP)

m X min fi (xi ) (x ,...,xm)∈S 1 i=1

ni where fi : < 7→ (−∞, ∞] is a convex function and S is a subspace. The dual EMP is m X min hi (yi ) (y ,...,y )∈S⊥ 1 m i=1

where hi is the function of fi . Algorithmic Ideas: Outer or inner linearization for some of the fi Refinement of linearization using duality Features of outer or inner linearization use: They are combined in the same algorithm Their roles are reversed in the dual problem Become two (mathematically equivalent dual) faces of the same coin Polyhedral Approximation Extended Monotropic Programming Special Cases

Advantage over Classical Polyhedral Approximation Methods

The refinement process is much faster. Reason: At each iteration we add multiple cutting planes/break points (as many as one per function fi ). By contrast a single cutting plane/break point is added in classical methods.

The refinement process may be more convenient. For example, when fi is a scalar function, adding a cutting plane/break point to the polyhedral approximation of fi can be very simple. By contrast, adding a cutting plane/break point may require solving a complicated differentiation/optimization process in classical methods. Polyhedral Approximation Extended Monotropic Programming Special Cases

References

D. P. Bertsekas, “Extended Monotropic Programming and Duality," JOTA, 2008, Vol. 139, pp. 209-225. D. P. Bertsekas, “Convex Optimization Theory," 2009, www-based “living chapter" on algorithms. Related work that applies dual simplicial decomposition in a machine learning context: H. Yu, D. P. Bertsekas, and J. Rousu, “An Efficient Discriminative Training Method for Generative Models," Tech. Report.

m X min r(x) + fi (x) x∈

fi (x): Complicated polyhedral function; corresponds to ith data batch r(x): Regularization term Polyhedral Approximation Extended Monotropic Programming Special Cases

Outline

1 Polyhedral Approximation Review of Existing Methodology Cutting Plane and Simplicial Decomposition Methods

2 Extended Monotropic Programming Duality Theory General Approximation Algorithm

3 Special Cases Cutting Plane Methods Simplicial Decomposition for minx∈C f (x) 4

Polyhedral ApproximationPolyhedral ExtendedCo Monotropicnv Programmingexity TemplatSpeciale Cases Subgradients

Let f :

f(x0) + y0! (x x0) The set ∂f (x) of all subgradients of f at x is the subdifferential− of f at x f(x1) + y (x x1) 1! − A subgradient can be identified with a nonvertical supportingf(x0) hyperplane+ y˜! (x x0) 0 − Indicator`Fu,nc(tio)n´s to the of f at x f x f(xy) + y!(x fx(yxˆ)1) + y˜! (x xˆ1) f(x0) + y˜! (x x0)Support Functions 1 0 − − − Outer Linearization offf(xSl1)op+e(xλ x1)!y1 f(xˆ1) + y˜1! (x xˆ1) − − f(x0) + y (x x0) doDmiff(fer)enRt0iaRbl1e(Rf−2Nxo, n1d)ifferentiable f Outer Line0!ariz−ation of f Slope y λ xy0 y1 y2 y h(y) ( y, 1) f(x1) + y! (x x1) f(x0) + y (x x0) 1 0! − − − epi(f) dom(f)(−Rg0 ,R11) R2 f(x1) + y1! (x x1) y y1 y2 y h(y) ( y, 1) xˆ x˜ − − f(xy) + y (x xy) !c "2− h2( λ) H2,k( λ) gk Const. h1(λ) conv(Xk) x, f(x) f(x1) + (x x1)!y1 f(xy) + y!(x xy) − ! − −" ¯ − ¯ − −2 − Const. f1(x) Slope: yˆ f2,X2 (x) h2,X2 ( y) h2( y) f (x) − Differentia−ble f Nondifferenti−able f O−uter L2 inearization of f Slope y f(x1) + (x x1) y1 − ! x∗ xk xk+1 xk+1 gk gk+1 λ x f(z) Differentiable f Nondifferentiable f Outer Linearization of f Slope y dom(f) R0 R1 R2 2 λ x z ¯ y y1 y2 y h(y) ( y, 1) xˆ x˜ c " h2( y) h2( y) yˆk Cons−t. h1(y) conv(Xk) dom(f) R0 R1 R2 Sl−ope: x−k+1 −Slope: xi, i− k xk+1− 2f(xk) xk ≤ ∇ y y1 y2 y h(y) ( y, 1) xˆ x˜ − X x xˆk x˜k+1 xˆk+1 gk gk+1 Cons∗t. f1(x) Slope: yˆ f¯2,X (x) h¯2,X ( y) h2( y) f (x) − − 2 2 − − 2

Const. f1(x) Slope: yˆ f¯2,X (x) h¯2,X ( y) h2( y) f (x) 2 2 X x20 x1 x2 x3 xSlo0σpxe:1(yxx˜)2/k+"x1y3 "Slox4 pxe:4 =x˜ ix k fx(kx+01) ff(x(xk)1)xk f(x2) f(x3) − − − − X ≤∗ ∇ ∇∇ ∇ ∇ c "2 h2( y) h¯2( y) yˆ Const. h1(y) conv(X ) Level sets of f − − − k − − 2 k 2 xˆ c " X x0 x1 x2 x3 xˆ4 x0 x˜1 x˜2 x˜3 x˜4 x4 = x f(x0) f(x1) f(x2) f(x3) h2( y) h¯2( y) yˆk Const. h1(y) conv(Xk) ∗ − − − − − 2 x∗N∇xˆk x˜k+1∇xˆk+1 gk∇gk+1 ∇ Levf˜e(l xsets) =ofsfup #y#x −maxh(y)$ pixi x∗ xˆk x˜k+1 xˆk+1 gk gk+1 n N y∈" Slowixpie:Wx˜ i=1Slope: x˜ i k x f(x ) x i=1 ≤ k+1N ≤ k+1 ∇ k k xi=0 or 1 " max pixi Slope: x˜ Slope: x˜ i k x f(x ) x N 1 k+1 k+1 k k ! wixi W ≤ ∇ Subgradients Templati=e1 ≤ i=1 xi=0 or 1 " X x0 x1 x2 !x3 xˆ4 x0 x˜1 x˜2 x˜3 x˜4 x4 = x∗ f(x0) f(x1) f(x2) f(x3) Subgradients Template ∇ ∇ ∇ ∇ fx(d) X x0 x1 x2 x3 xˆ4 x0 x˜1 x˜2 x˜3 x˜4 x4 = x f(x0) f(x1) f(x2) f(x3) Level sets of f ∗ ∇ ∇ ∇ ∇ fx(d) Level sets of f Slopes: endpoints of ∂!f(x) N Slopes: endpoints of ∂!f(x) max pixi N wixi W N i=1 ≤ i=1 α x =0 or 1 " max pixi α i N wixi W ! i=1 ≤ i=1 y xSubgradients Template x =0 " k k+1 i or 1 Slope gˆk = − 1 ! ck Subgradients Template fx(d) 1 Slopes: endpoints of ∂!f(x) 2 fx(d) "k = δk xk+1 yk − 2ck $ − $ Slopes: endpoints of ∂!f(x) α Slope = 0 α 1 is "k-subgradient 1 γk

x

Slope = supg ∂!f(x) d$g ∈ x

Slope = infg ∂!f(x) d$g ∈ 1 Polyhedral Approximation 4 Extended Monotropic Programming Special Cases Outer Linearization - Epigraph Approximation by Halfspaces

Given a convex44 function4 f :4

where PPoolPylyhoyhie∈ldeyP∂dfrh(oxarei )al,ldylrChaCie=oldo0nC,...,rnvaovelkenxCxivtieoytxyniTvtTeyexmTmitpeyplmalTatpetelematpelate

f(x0) + y (x x0) 0! − f(x1) + y (x x1) x0 x1 x2 x3 x4 f(x) X x 1! −

! f(x0) + (x − x0) g0 f(xy) + y (x xy) ! − f(x ) + (x x ) y 1 ! 1 ! 1 f(x1) + (x − x1) g1− f(x0) + y (x x0) Outer Linearization of f Slope λ0! − dom(f) R0 R1 R2 f(x1) + y! (x x1) αx + (1 − α)y, 0 ≤ α1≤ 1− y0 y1 y2 y h(y) ( y, 1) x0xx01xx12xxx23xxx34xxf4(xxxf0)(xxX)1 Xxxf2(xx)3 Xx4xf(x) X x Outer Li0 nearization1 −2 3 of4 f Slope λ f(x ) + y (x x ) dom(f) ◦ y ! y < 90 − ! ! 2 ! ! f(xf0()x0+) f(+x(x(−x0)x−+0x)(0fgx)0(c−xg"0x)0+) g(x0 − x0) g0 h2( λ) H ( λ) fg(x1Const.) + (x xh11)(λy1) conv(X ) 2,k∗ 2 k ! k Level set !x | f−(x) ≤ f +−αc −/2" Optima−−l solution−se2t x0 Outer Linearization of f Slope λ c "2 h2( λ) H2,kx( ∗ λx)k xkg+1k Const.xk+1 gk ghk+11(λ! ) ! ! conv(Xk)! dom(f) R−0 R1 R2 − Xf−(xf1()x1+) f(+x(x(−x−1)x−+1x)(1fgx)1(−x−g1x)21+) g(x1 − x1) g1 y0 y1 y2 y h(y) ( y, 1) − x∗ xk xk+1 xk+1 gk gk+1 Slope: xk+1 Slope: xi, i k xk+1 f(xk) xk xkα−xα+xkg(+k1 (−α1xα−≤+)αy(,)1y0−,α≤0xα≤α∇)+y≤,α(10≤1−≤1αα)y≤, 10 ≤ α ≤ 1 c "2 h2( λ)xHSl2op,k(e:= λxP)k+1(xgSlk−opConst.αe: gxi,) i h1k(λx)k+1 f(conxk)vx(Xk k) − k+1 − X− k k k −≤ − ∇2 ◦ ◦ ◦ ◦ X x0 x1 x2 x3 x0 x1 x2 x3 x4 x4 = x<∗ <90f9(0x0)< 9f0(x1)

Slopes: endpoints of ∂!f(xmax) fx(d∗) ∗pixi ∗ ∗ N x x x x wixi W i=1 ≤ i=1 x =0 " Slopes: endpoints of ∂i !f(orx)α1 ! Subgradients Template yk xk+1 xkxk xk xk Slope gˆk = − α ck yk xk+1 f (d) − x 1 1 Slope gˆk = c 1 1 1 "k = δ x y 2 Slopes: endpoints kof ∂!fk (x) k+1 k − 2ck $ − $ 1 2 "k = δk xk+1 yk Slope = 0 − α2ck $ − $ is "k-subgradient Slope = 0 yk xk+1 Slope gˆk = − ck is "k-subgradient γk 1 2 "k = δk γxkk+1 yk − 2c1k $ − $ Slope = 0 x is "k-subgradient

Slope = supg ∂!f(x) d$g ∈ γk x 1 Slope = infg ∂!f(x) d$g ∈ x + αd

1 4 4 4 4 Polyhedral Convexity Template Polyhedral Convexity Template Polyhedral Convexity Template Polyhedral Approximation ExtendedP Monotropicoly Programminghedral ConvSpecialex Casesity Template Convex Hulls

Convex hull of a finite set of points xi

x0 x1 x2 x3 x4 f(x) X x

! x0 x1 x2 fx(3x0x)4+f(x)−Xx0x) g0

! x x x x x f(x) X x f(x0) + (x − x0) g0 0 1 2 3 4 ! f(x1) + (x − x1) g1 x0 x1 x2 x3 x4 f!(x) X x f(x0) + (x − x0) g0 ! f(x1) + (x − x1) g1 αx + (1 − α)y, 0 ≤ α ≤ 1 ! f(x0) + (x − x0) g0 ! of a union of a finite number off( raysx ) +R(ix − x ) g 1 αx1 +1(1 − α)y, 0 ≤ α ≤ 1 ◦ < 90 ! f(x1) + (x − x1) g1 Outer Linearization of f Slαxop+e (λ1∗− α)y2, 0 ≤ α ≤ 1 Level set !x | f(x) ≤ f + αc /2" Optima

Slopes: endpoints of ∂!f(x) fx(d) fx(d) Slopes: endpoints of ∂!f(x) α Slopes: endpoints of ∂!f(x)

yk xk+1 α Slope gˆk = − ck α yk xk+1 Slope gˆk = − ck yk x1k+1 Slope gˆk = − 2 "k = δk ck xk+1 yk − 2ck $ − $ 1 2 "k = δk xk+1 yk 1 2 Slope = 0 − 2ck"k$ = δk − $ xk+1 yk is " -subgradient − 2ck $ − $ Slope =k 0 is " -suSlbgraopedien= t0 k γ is "k-subgradient k

γk x γk

Slope = supg ∂!f(x) d$g x ∈ x Slope = supg ∂!f(x) d$g ∈ x Slope = supg ∂!f(x) d$g ∈ Slope = infg ∂!f(x) d$g x ∈ x Slope = infg ∂!f(x) d$g ∈ x + αd Slope = infg ∂!f(x) d$g ∈ x + αd 1 x + αd 1 1 Polyhedral Approximation Extended Monotropic Programming Special Cases

x hInner2(λ) − Linearizationh1(λ) h2(λ∗) − h -1( Epigraphλ∗) x∗ (−λ Approximation∗, 1) (−λ, 1) Slop bye = Convexλ Hulls Slope = λ∗

f1(x) f2(x) Given a convex function h : 0} ! Outer Linearization of f Slope λ ∗ ∗ f = ∞, q = −∞ dom(f) R0 R1 R2 y0 y1 y2 y h(y) S = (x2, x) | x > 0 ! " c "2 h2( λ) H2,k( λ) gk Const. h1(λ) conv(Xk) 2 − − − − − 2 q(µ) = min OuxOu+teµrtexLir nearizationLinearizationOuteofrOuLioffnearizationteSlf ropSlLiopenearizationλe λ of f Slofopfe Slλ ope λ x∈# dom(f) R R R !dodom(mf()fR") 0RR0 1RR1 2R2 dom0(f)1R0x2R1x Rx2 x g g −1/(4µ) if µ > 0 ∗ k k+1 k+1 k k+1 = y0yy01yy12yy2 hy(hy)(yy)0 y1 yy20 yy1hy(y2)y h(y) # −∞ if µ ≤ 0 Slope: xk+1 Slope: xi, i k xk+1 f(xk) xk minimize f(x) = −x c≤"c2 "2 ∇ c "2 c "2 h2h( 2(λ)λH) 2H,k2(,k(λh)2λ() λghk)g2HConst.k( 2Const.,kλ)( Hλ) h(1h(gλ1k())λConst.) g Const.conconhv1(λXv)(kXh)k1()λ) conv(Xconk)v(X ) subject to g(x) = x−−−1/2 ≤ −0 − −−− − − −2,k−−− − −k2 2− − − 2 − 2 k x ∈ X = {0, 1} X x0 x1 xx2∗xx∗3kxxk0kx+1k1+1x2kx+1xkx3+1∗gxkx4gkgkxkx4+1gk=+1xx∗xxk+1f(gxk0)gk+1gf(xg1) f(x2) f(x3) ∗ k ∇k+1 k+1∇ k k+1∇ ∇ q(µ) = min −x + µ(x − 1/2) x∈{0,1} Level sets of f ! SlopSlope: e:x"kx+1k+1SlopSlSlope:ope:xe:i,xSlxiiop,k+1ie:kSlxxkopkx+1e:k+1Slxopif,(e:fxi(kx)xkxk,)kxi k+1k xf(xk) fx(kx ) x ≤ ≤ k+1 ∇∇ ≤ i ∇k+1 k k = min{−µ/2, µ/2 − 1} N ≤ ∇

∗ max pixi (0, f ) = (0, 0) (−1/2, 0) (1/2, −1) N wixi W i=1 ≤ i=1 −1/2 1 XXx xx xx xx xxXxxx0xxx1xxx2xxx3xxx0x=x1=xxx2 xf3=0(fxor(4x)x14) f=(fxx"(x) )ff((xfx0()x) )ff((xfx1()x) )f(x2) f(x3) 0 01 12 23 30 01 1X2 x230 x341 x442 x4 3 x∗0 ∗xi1 x20 x03 x4 x1∗41∇= x∗ 2 f2∇(x0) 3 f3∇(x1) f∇(x2) f(x3) !∇∇ ∇∇ ∇∇ ∇ ∇∇ ∇ ∇ ∇ Subgradients Template q(µ) = min x1Lev−Levxe2l ese+l tsseµ(tsofx1off+fxLev2 −el1seLev) tseofl sefts of f x1≥0, x2≥0 ! " −µ if µ ≥ 1 N N fx(d) N N = maxmax p xmaxpixi pixi # −∞ if µ < 1 Ni i max pixi N N N Slopes: endpoinw xwtsxWofW∂!f(x) wixi W i ii i≤ i=1i=1 ≤ wixi=1W i=1i=1 ≤ i=1 i=1 ≤ i=1 x =0x =0or or1 1 ""xi=0 or 1 " i i xi=0 or 1 " q(µ) = min |x1| + x2 + µx1) !! ! !α x2≥0 SubgrSubgradienadientstTSubgrs emplatTemplatadiene ets Template ! " Subgradients Template 0 if |µ| ≤ 1 yk xk+1 Slope gˆk = − = ck f (d) # −∞ if |µ| ≥ 1 fxf(dx)(d) x fx(d) 1 SlopSlopes:es:endendpoinpoinSltsoptsofes:of∂Sl!f∂endop!(fxes:()pxoin)endtspofoin∂ts!f(ofx)∂!f(x) 2 q(µ) = inf p(u) + µ%u "k = δk xk+1 yk u∈#r − 2ck $ − $ ! " α α α α Slope = 0 (λ, µ, 1) y y x x yk xk+1 kis−k"Sl−kk+1-sopk+1uebgragˆ =dient− yk xk+1 SlopSlope gˆekgˆ=k = c c Slkope gˆk c= − k k k ck 1 γk 1 1 1 1 2 " "= =δ δ x"kx= δky y2 2 xk+1 yk 2 k k k k k+1k+1"kk=k2δck xk+1 yk − −2ck2c$k $ − −−$ $k −$ 2ck $− $ − $ x SlopSlope =e =0 0 Slope Sl=op0 e = 0 is " -subgradienis t"k-subgradient is "k-skubgradienSlopt e = supis "kg-s∂u!bgraf(x) ddien$g t ∈ γ γkγk x k γk

Slope = infg ∂!f(x) d$g ∈ x x x x 1 SlopSlope =e =supsup Slopde g=d gsupg ∂ f(x) d$g g g∂!f∂(!xf)(x)$Sl$ope = sup! g ∂!f(x) d$g ∈ ∈ ∈ ∈ x x x x

SlopSlope =e =infinf Slopdegd=ginfg ∂ f(x) d$g g g∂!f∂(!xf)(x)$ Sl$ ope = !infg ∂!f(x) d$g ∈ ∈ ∈ ∈ 1 1 1 1 4

4 Polyhedral Convexity Template

Polyhedral Convexity Template

Indicator Functions 4 4 Support Functions 4 (−x, 1)

Polyhedral Approximation Extended Monotropic Programming Special Cases 4 (−y, 1) Polyhedral ConConjugacyvexity Template Polyhedral Convexity Temnplate rf (x) PolConsideryhedIn convexdricaatlor F functionCuncotionsf v:

# (−y, 1) y x − h(y)

# inf x∈"n !f(x) − x y" = −h(y) rf (x) y (y, 1)

n # # # infx∈" !f(x) − x y" =Sl−ophfe˜((y=x))(2y=, 1s)uSplopye x=−y hx(y) f˜(x) = sup ∈"n y x − h(y) ! " y ! " y∈"n # n # infx∈"n !f(x) − x y" =i−nfhx(∈y") (!yf, 1(x))S−lopxey=" =α −x h0(y) (y, 1) Slope = y x f(x) = (c/2)x2 epi(f) w (µ, 1) q(µ) # α y x − h(y) ∗ # w xisyu−nifho(rym) ly distribuStelodpien=th1e/i2nterval [−1, 1] # # Θ θ fθ(θ) X n= x Measxu∈r"enment inf x∈" !f(xi)n−f x y"!f=((x−))h−=(yx|)xy|" = −h(y) epi(f) w (µ, 1) q(µ) 1 f(x) = (c/2)x2 ∗ w is unifo(µrm, βly) dist(ryib, u1)ted in the interval [−1, 1] Conjugacy1 theorem:Θ Theθ fθ conjugate(θ) X = x M ofehasisurefm(underent very weak 3 5 9 11 4 10 1/6 f(x) = α − β conditions)3 Meafn(xSS)lqo=upae|rxe=|d 2 (µ, β) SubgradientLeast squar duality:es estimayte∈ ∂f (x) iff x ∈ ∂h(y) h(y) = (1/2c)y2 3 5 9 11 1 4 10 1/6 α f(x) = αx − β 3 epi(f) w (µ, 1) q(Mµ)ean ESq[Θua|rXed = x] Slope = 1/2 w∗ is uniformly dLisetarisbtustqeudairnesthesetiimntaetreval [−1, 1] Θ θ fβθ/(αθ) X = x MeasuXrem=enΘt + W E[Θ | X = x] 1 ˜ ∗ β α −1 1 M M = epi(p) (0, w ) epi(p) (µ, β) epi(f) w (µ, 1) q(µ) E[Θ] var(Θ) Hyperplane {x | y#x = 0} ∗ 1 X = Θ + W w is uniformly distribuctoe3nde5(in{9at11h1,e. 3.i.n4,tae1r0v}a)1l/[6−1, 1] Θ θ fθ(θ) X = x MeasuureMvmeMeannt Squared M˜ M = epi(p) (0, w∗) epi(p) Least squares estiEm[aΘt]evar(Θ) Hyperplane {x | y#x = 0} (µ, β) cone({a1, .(.µ.,,1a)r}) 1 u v M E[Θ | X = x] 3 5 9 11 3 4 10 1/6 C1 1 1 (µ, 1) 1 C1

1 4 4 Polyhedral Approximation Extended Monotropic Programming Special Cases Polyhedral Convexity Template

Conjugacyx h2(λ) − ofh1( Outer/Innerλ) h2(λ∗) − h1(λ∗) Linearizationx∗ (−λ∗, 1) (−λ, 1) Slope = λ Slope = λ∗ x h2(λ) − h1(λ) h2(λ∗) − h1(λ∗) x∗ (−λ∗, 1) (−λ, 1) Slope = λ Slope = λ∗ Polyhedral Convefx1(xi) tfy2(x) Template Given a functionf1(x)(a)f2(fbx): 0} x ∈ X = (x1, x2) | x1 > 0} Outer linearization! of f ! f ∗ = ∞, q∗ = −∞ ∗ x0 x∗ 1 x2 x3 x4 f(x) X x f = ∞, q = −∞ F (x) H(y) y h(y) ! f(x0S) =+ (x(x−2,xx0))|g0x > 0 S = (x2, x) | x > 0 ! " ! " sup inf φ(x, zOu) tersuplineari!inf 2zφˆation(x, z)of=fq = p˜(0) p(0) = w = inf sup φ(x, z) 2 fq((xµ1))=+m(xin−xx1+)µgx1 ∗ ∗ q(µ) = min x + µx z Z x X x∈≤# z Z x X ≤ x X z Z x∈# ∈ ∈ ∈ ∈ ∈ ∈ ! " ! " xx00−xx101/(xx412µ2x)gx203gigf01xµg4g12>xfg00(2x1) xX2Slgopx0 geSl1=Slopg2opye0 e=Sl=−opy10y/e0Sl(=4Slopµopy)e1 e=Slif=opyµ1ye>1Sl=Sl0opopye2Fe=(x=y)2yH2 (y) y h(y) n = Shapley-Fαx + (1 −=olαk)my,an0 ≤Theorem:α ≤ 1 Let S = S1 + + Sm with Si , # −∞ if µ ≤ 0 i =Ou1,te. .rOu. ,OuLimtenearizationter#rLi−Linearization∞nearizationof fif µofof≤ff0 · · · ⊂ # Slope = y0 Slope = y1 Slope = y2 ! If sf(xcon0) v+(S()xthen− x0s)=g0s1 + + sm where minimiOuzePte(Purf)(Li(u=x)nearization)c==mc−amxx{a0x,{Pu0}(,ofuu)}f= c m∈maxin{i0m, uiz}e f◦(x) = −Fx(x·)·FH·F((x(y)x))HyH(hy(()yy)y)yh(hy()y) Slope = y0 Slope = y1 Slope =s y2 conv(S <) for90 all i = 1, . . . , m, ˆ i sup iinf φ(x, z) sup inf φ(x, z) = q∗ = p˜(0) p(0) = w∗ = inf sup φ(x, z) subjeSlctoptoe =gy(x0)Sl=opxe−=1/y21 ≤Slop0 e =∈ syu2bject to g(x) =≤x − 1/2 ≤ 0 ≤ Outer Linearizationc c of f si c Si∗ zforZ atx 2Xleast m nz Z1xindX ices i. x X z Z Outer LinearizationLevel set !xof|ffF(x(x))≤∈Hf(y)+∈yαhc∈(y/)2" Opt−!ima−∈l sol∈ution set x0 ∈ ∈ Ouxte∈rXLi=nearizationOu{t0e,r1A} pprofoxifmf(axtThei1o)n+osuf(mfxIof−nxnaxe∈r1largeX)Lign=1ena{umr0iz, a1bt}erionofofconCovnexjugsetsate ish almost convex n ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Shapley-F∗ olkman Theorem: Let S = S1 + + Sm with Si , x xh(λh)(λh) 1(hλ1)(λh)x2(hλh2()λ−))h−1(hλ1)()λhFx)2k((λxx()F)−H−(λx−kh),(λ1λH(),λ(1y()))−y(λx−Non,h1λ(),y(1−)c)onλ v,exit1) y(−ofλ,the1ˆ) sumˆˆis caused by a small number (n + 1) of sets q(µ) = min −x + µ(x − 1F/sup2()x) inHsupfsup(yφ)(inyxinf,hzfφ()y(φx)(X,xzsup,)z) insupfsupφ(inxinf, zfφ)(φ=x(,xqz,∗)z)==qp˜∗(0)q∗==p˜(0)p˜p(0)(0) =p(0)pw(0)∗==·=·w·infw∗ ∗==supinfinfφsup(xsup, zφ)(φx(,x⊂z,)z#) x∈{0,1} z Z xFzX(qzxZ(µ)Zx)Hix=X=(Xy1)m,≤y.i.nzh. (,Zym≤)−x≤zxXz+ZZxµx(XxX− 1/2) ≤ ≤≤ x X zx ZxXXz zZZ ! c c ∈" ∈∈c∈ ∈∈ x∈{0,1∈} ∈∈∈ ∈∈ ∈ ∈∈∈ ∈∈ p(v) + "v"2 2ˆp(vα)x++ "(v1If"−2s α)cony, v0(S≤) αthen≤ 1s =˜ s1 + ˇ + sm where sup=inmfinφ{(−x,µzp/)(2v,)µ+sup/2 −"inv1"f} φ(x, z) = q∗ = p˜x∈(0)0 x1 x!p2(0)g0 =g1wg∗2 =f(infx) =sup"·(·cl·φ)(fx(,xz)) Subgradientsz Z x X in≤ outer2z Z2 x X lin. <2==s=>mconinBreak{≤v−(µS/)2,forµ points/2al−l i1=}x 1 inX, .z. inner. Z, m, lin. n n n ∈ ∈ ∈ ∈ Shapley-FShapley-FShapley-Foli xkkm−olanolαkkmTheorem:gmikananTheorem:Theorem:Let∈ SLet∈Let= SSS1=+=S1S1+++Sm++withSmSmwithSwithi SiSi , , , ∗ ˆ ∈ · · · · ···· · ⊂ # ⊂⊂## (0, f ) = (0,sup0) in(−f 1φ/(2x,,0z)) (1/sup2, −in1)f φ(x, z)i==q1∗, =.i. =.p˜,(0)ms1, . . .Sp, m(0)for=atwl∗east= infm supn φ(1x,indz)ices i. ∗ ˆ i = 1, .i. . , mi z Zsupx Xinf φ(x,≤z)z Zsupx (X0in, ff )φ(=x,(z0)ˆ, 0=) q∗(−=1p˜/∈(0)2≤, 0) (p◦1(0)/2,=−w1)∗x=X−infzˇ Zsup− φ(x, zn) −1/2 1 ∈ ∈Shapley-Fsup infiolnφfk(i∈nxmLf,czan(L)∈xc,(λxTheorem:ksup, )λk)iniIffnfsφL(xcLet,xcon(Ifzxk),+sλv=1Sk(=S)qPcon=∗)P(then=uXSThev)1p(˜(=S(0)x+)kscsuthen−=mmaαsp+xofk(0){sg+qS0ka=,m)=u∈=larges}withw(+cl∗+∈s=)npumS(0)infwherei+bsersuppof(0)whereφ,con(x=,vzwex) sets is almost convex z Z x X ≤ z Z x X x X If s con(0)=≤q1 0is>}(c0xqqlosep}1(q(0),µisx)isd2c=)losecand|losewxind1fconcad>andandp0(}uconcav)concae+ µ∗v%euve ∗ Convex and conca∗ve parq t=can(clˇb)pe(0)estimpated(0)u∈∗#=rsepwarately ≤ Duality Gap(λ,Decomµ, 1)! p!osition ∗ Mi!n≤ComMiMin≤nmonComComPmonmonroblP∗emProblroblemem q is closed and concave f1(x) !f2(x) " minimize w ConDuvexalitfand∗Duyf=∗Gapalitconca∞= ,y∞DecomqGap,∗veq=∗par−Decom=pf∞t−osition∗can∞= p∞bosition,eMaxqes∗ti=mMaxCrossing−atedMax∞xCrossingsekCrossingparatelyProblPremProbloblemem Min Com1mon Problem q isConclosevexd andand(a)concaconca(b) vvee part canWbeakesWtiDuWemakealitatedakDuyDu(λalitseq∗,alitpµaratelyy, y1qw)∗q∗∗ ww∗ ∗ subject to (0, w) M, MaxConCrossingvex andPrconcaoblemve part can be estimated≤ sep≤arately≤ ∈ Minq Comis closeqmonisdcloseandProbldconcaandem vconcae ve 1 Weak Du2alit2 y q w m2inimize f(x) = x1 +minix2 mminiiminize mmwizieze ww MaxMiCrossingSn =ComSMi=n(xmonCom(Pr,xxobl),Pmonx|robl)e∗xm|S≤>xePm=robl0>∗ 0(exm, x) | x > 0 1 Max Crossing! ! Problem"! "subject to g"(x) = x1 ≤sub0 jectsubsubtojectject(0to, tow)(0(0, wM, w),) MM, , Weak DuMaxality Crossingq∗ w∗ Prminioblemmize w ∈ ∈∈ Weak Duality≤q∗ 2w∗ 2 q(µq)(µ=)Wm=eiaknminxDu+xalitqµ+(≤xµyµ)xq=2∗ miwn∗ x + µx x ∈ X = (x1, x2) | x1 > 0} x∈#x∈# minimsub≤ixz∈eject# wto (0, w) M, ! ! "mini" mi!ze w " ∈ ! −1−/(14/µ()4subµ)ifjectµif>µto−mini0>1/(00(m4, wµiz))e ifMw∗µ,=>∞0 , q∗ = −∞ = = = ∈ # −#∞−∞ ifsubµif#≤jectµ−sub0≤∞to0ject(0,towi)f(0µ,≤Mw),0 M, ∈ ∈ 2 minmiminizmeize f(xf)(mx=)in−=imx−izxe f(x) = −Sx = (x , x) | x > 0 subsujebcjtectot tog(xg)(sx=u)bx=je−cxt1−t/o21/≤2g(0≤x)0= x − 1/!2 ≤ 0 " q(µ) = min x + µx2 x ∈xX∈ X= {=0,{10}, 1}x ∈ X = {0, 1x}∈# ! " 1 −1/(4µ) if µ > 0 q(µq)(µ=) =minmin−qx(−µ+x) µ+=(xµ(m−xi1n−/21)/−2x) + µ(x=− 1/2) x∈{x0∈,1{}0,1} x∈{0,1} # −∞ if µ ≤ 0 ! ! !" " " = m=inm{i−n{µ−/2µ,/µ2/, 2µ=/−m21−in}{1−} µ/2m, µin/i2m−ize1} f(x) = −x (0,(f0∗,)f ∗=) (=0,(00), 0)((0−,(1f−/∗2)1,/=02),(00(),10/()21,/−2(−,1−)1/12), 0) (1/2,s−ub1j)ect to g(x) = x − 1/2 ≤ 0 x ∈ X = {0, 1} 1 1 1 1 q(µ) = min −x + µ(x − 1/21) 11 x∈{0,1} ! " 1 = min{−µ/2, µ/2 − 1} (0, f ∗) = (0, 01) (−1/2, 0) (1/2, −1) 1 1 1 4

Polyhedral Approximation Extended Monotropic Programming Special Cases

4 44Cutting44 Plane4 Method for minx∈C f (x) Polyhedral Convexity Template Given yi ∈ ∂f (xi ) for i = 0,..., k, form

˘ 0 0 ¯ Fk (x) = max f (x0) + y0(x − x0),..., f (xk ) + yk (x − xk ) Poandly lethPePdorloayPllhPyoehColdeylordyhPanrheolavedlledCryxarChoailetnolydCnvCreovTaoxenleinxvmtCiveyteopxyTxinlatiTevyttmeyeeTxmpTietlpemyalmtapTetpleealmatetpelate xk+1 ∈ arg min Fk (x) x∈C

f(x0) + y (x x0) 0! − x0 x1 x2 x3 x4 f(x) X x f(x1) + y (x x1) 1! − ! f(x0) + (x − x0) g0 f(xy) + y!(x xy) xy y!(x xy) + f(xy) z − − f(x1) +! (x x1) y1 f(x1) + (x − x1) g1 − ! f(x0) + y0! (xf (x)0) Outer Linearization of f Slope λ Y− f(x1) + y (x x1) dom(f) R0 R1 R2 1! − α∗x + (1 − α)y, 0 ≤ α ≤ 1 x x x x xx fx(xy)0xyX1xxy0x2 xxy1hx(xfy2()xxf()3(xXxyx0,)41)xxXf1 (xx2) xX3 x4ff(x)) X x 0 1 2 3 40 01 12 23x0 34x1 4x2−x3 x4 f(x) X xZ f(x ) + y (x x ) C ◦ y ! y < 90" − ! ! ! ! ! f(x0) + (x − fx0(x)f0g()0x+0)(f+x(−xf(x0(x)−0+)x+g(0x0)(−fgx0(−xc0"x)20+g)0 g(x0 − x0) g0 Primal ProximalfIterati(x1) +on(x Dualx1)!Pry1 oximal Iteration Outer Approx- h2( λ) H2∗,k( λ)2 gk Const. h1(λ) conv(Xk) Level set !imationx | f(x−)of≤ffInn+e−αr cLi/nearization−2" Optimaof−l −sColonutjuiogaten −seth2x0 Outer Linearization of f Slope λ " ! ! ! ! ! domf(f()xR1)0 +R1(xR−2 xfx∗1(x)fk1g()x1+k1+1)(f+xx(k−xf(+1x1(x)−g1+k)xg+(1kx1)+1(−fgx1(−x1x) 1+g)1 g(x1 − x1) g1 X x0 x1 x2 g0 g1 g2 y0 y1 y2 y h(y) ( y, 1) − Fk(x) Hk(λ) gk+1 Slope: x Slope: xi, i k x f(x ) x αx + (1 − α)yαx,kxk+10α+−≤xα(+1αkg−(k≤α1xα−α1)+xyα,+()1≤y0(,−≤10−αα≤k)+1xαy≤,α)+y∇10≤,(≤101−≤αk α≤α)ky≤1, 10 ≤ α ≤ 1 c "2 h2( λ)xH2,k=( Pλ) (x gk−PConst.α(ug) =) c maxh1(λ)0, u conv(Xk) − k+1 − X −k k k − { −} 2 < 90◦ < 9<0◦90◦c< 90◦ ◦ < 90◦ X x0 x1 x2 x3 x0 x1 x2 x3 x4 x4 = x∗ f(x<0) 90f(x1) f(x2) f(x3) x∗ ∗xk xk+1 xk+1∇gk gk+1∇ ∇ ∇ ∗ 2 ∗x ∗ 2 ∗2 ∗ 2 ∗ 2 Level set x |Lfe(vLxee)lvs≤eeltfLsxeextv+he|(lαxλfsc)(e|xtf/h)21!(≤(xxλL))eOf|≤vfhpe2(+tflx(isλm)αe∗+ta)≤cl!α/fsxcho21l|u/(+λf2tO∗i(αo)xpcnO)txi/s≤m∗pe2tta"(ifmlxOsλa+o∗lpl,uts1)αiotmcilou(a/ntl2ioλss"en,o1)tlOusxepttitoixnmaseltsoxl0ution set x0 ! Level!seLts!evofelfset"!x | f(x)−≤ "f "+ αc /−2"0 Optim−al sol0uti0on set x0 Slope: xk+1 Slope: xi, i kcxk+1 f(xk) xk x p(v)≤+N v 2 ∇ X k X X 2"X"X X max pixi N wixi W i=1 ≤ i=1 1 xi=0 or 1inf L"c(x, λk) X x0 x1 x2 x3 x0 x1 x2 x3 x4 x4 =x xX∗ xkf−(x0α)kgfk(xx1k) −fα(xk2g)k f(x3) xk − αkgk! xk −xkα∈−kgαk∇kgxkk −∇αkgk ∇ ∇ Subgradients Template Level sets of f xk+1 = PX (xkxk−+x1αk+=kg1kP=)xXkP(x+xXk1k+(=−1xkv=Pα−XkPg(xαXxkkk)k(+gx−1kk)=−αkPαgXk)g(xk)k − αkgk) fx(d) N Slope = g0 Slope = g1 Slope = g2 vk+1 p(v) ∗ max ∗ ∗ pix∗i ∗ ∗ Slopes: endpoinxts of ∂!fN(x) x x x x x wixi W i=1 ≤ i=1 xi=0 or 1f1(x)"f2(x) ! α Subgradients Template (a) (b) xk xk xk xkx xk yk xk+1 k Slope gˆk = − ck minimize f((xd))= x + x x 1 2 1 1 1 1 1 1 subject to g1(x) = x1 0 2 Slopes: endpoints"ofk =∂!δfk(x) xk+1 ≤yk − 2ck $ − $ x X = (x1, x2) x1 > 0 α∈ | } Slope = 0 ! f = , q = is "k-subgradienyk xtk+1 ∗ ∗ Slope gˆk = − ∞ −∞ ck S = γk(x2, x) x > 0 1 2 "k = δk xk+1 | yk − 2c!k1$ − $ " q(µ) = min x + µx2 Slope = 0 x ∈$ is "k-subgradient !1/(4µ) if" µ > 0 = − if µ 0 #γk−∞ ≤ 1 1 f(x0) + y˜ (x x0) 0! −

f(xˆ1) + y˜1! (x xˆ1) f(x0) + y˜0! (x x0) f(x0) + y˜ (−x x0) − 0! − f(xˆ1) + y˜1! (x xˆ1) f(xˆ ) + y˜ (x xˆ ) − f(x0)1+ y (x1! x0)1 f(x0) + y˜0! (x x0) 0! − − − f(x0) + y˜0! (x x0) f(x1) + y (x x1) f(xf0()xˆ+1)y+(y˜x1! (xx0)xˆ1) − 1! − 0! − − f(x0) + y0! (x x0) − f(x0) + y˜0! (x x0) f(xˆ1) + y˜1! (x xˆ1f) (x0) + y˜0! (x x0) −f(x1) + y! (x x1) − − 1 − f(x1) + y1! (x x1) f(x0) + y! (x x0) f(xy) + y!(x xy) f(xˆ1) + y˜1! (x xˆf1()x0) + y˜0! (0x x0) f(x0) + y˜! (x x0) f(xˆ1) + y˜! (x xˆ1)− − −− 0 − 1 − − f(x0) + y! (x x0) 0 f(x1) + (x x1)!y1 ff(x(fxˆy()1x)+1+) y+y!˜(1!yx(1!x(xxyxˆ)1x)1) Polyhedral Approximationf(xˆ1) + y˜! (x xˆ1) − Extended Monotropic− Programming −−− Special Cases 1 f(xy) + y!(x xy) f(x0) + y˜0! (x x0) − − f(x0) + y0! (x− x0) fD(xiff1)er+enyt1!i(axble fx1Nf) o(xnf0d)(ixff+0e)rye+0!n(txy˜ia0! (bxlxe0f)xO0)uter Linearization of−ff(Sxl1o)p+e λ(x x1) y1 − − − − ! dom(f) R R R f(x1) + (x x1)!y1 f(xˆ1) + y˜1! (x xˆ1f)(fx(x)y+) +y y(x(x xx)y) 0 1 2 − f(x1) + y1! (x− x1) 0 0! ! 0 f(x0) + y0! (x x0) f(xf()x+f1)(y˜xˆ+1()xy+1! (xy˜x1! (x)x1)xˆf1D()xi0ff)er+eny˜t0!i(axble fx0N) ond−ifferentiable f O−−uter Linearization of f Slope λ −y y1 y2 y h(y) ( 0y, 1) xˆ0! x˜ −0 − − Simplicial Decompositionf(xDy)iff+erye!n(txiablxey−f) N Methodondiffe−rentiab forle f O minuter Linearifza(tixon)of(ffS(isxlop)e+ Differentiable)λ(x x ) y − dom(f)xR∈0 CR1 R2 f(x1)1 + y1! (x x11!) 1 f(x1) + y1! (x xd1o)m(f) R fR(xˆR) + y˜ (x xˆ ) f(xˆ1) + y˜1! (x xˆ1) −− 0 1 1 2 1! 1 y y1 y2fy(xh0()−y+) (y0! (yx, 1) xxˆ0)x˜ f−(x ) + (x x ) y − f(xy) + y!(x− xy) y1 y1 y2 y h1(y!) f1((xyfy,()1x+)0)xˆy+!x˜(xy0! (xxy)x0) Differentiable−f N−ondifferentiable f Outer Linearization of f Slope λ − − ¯− − ¯ At the typicalCo iterationnst. f1(x) weSlop havee: yˆ fx2,X2and(x) h2X,dXo2m(=f(f(yx)){1Rh)x20+(R,y1y!x˜()Rxf,...,2 (x1)f)(xx˜) +},y where(x x ) − − k k −f(x1) +0−(x 1 −x2 1)!y1 y k ! y Differentifa(bxley)f+Nyo!n(xdiffexryen)tiable f Outfe(rxL1fi)(nx+e1a)(rxi+zayt1!ixo(1xn)!oyf1xff1()Sxl0o)p+e λy0! (x x0) − − − f(x0) + y0! (x −x0) − y y1 y2 y −h(y) ( y, 1) xˆ x˜ ¯ ¯ dom(f)x˜R0,...,R1 R2x˜ are extreme points of− C. Const. f1(x) Sl−ope: yˆ f2,X2 (x) h2,X2 ( y) h2( y) f (x) 1 k Differentiable f Nondifferentiable f Ouft(exr1)L+ine(axrizaxt1io)!ny1of f Slope λ 2 DifferentiCabolnestf. Nofn1d(ixff)erSelnoptiea:ble fyˆ Of¯2u,Xter(fxL()xin1h¯e)2a,+Xrizy(a−t(ixyo)n hox2f1(f) Sy)lofpe(λx−) − − y y1 y2 yfh(x(y1) +( (yx, 1)xxˆ1)x˜!y1 f(x1) + y! (x x1) 2 2f(1! xy) + y (x x2y) − Generate− − − d1om−(f) R0 R1 R2 − − !− − dom(f) R0 R1 R2 f(xy) + y!(x xy)Differecn"t2iable f Nondifferentiable f Outer Linearization of f Slope λ Differentiable f Nondifferentiable f Outer Linear¯ization of fy yS1loyp2e yλh−(y) ( y, 1) xˆ0 x˜ ¯ ¯ y y1 y2 y hh(2y() (y)yh,˜21() xˆy)x¯ yˆk Const. h1(y) Cofn(sxt.1)c+ofn(1vx((xX) kSx)l1o)p!ye1: yˆ f2,X2 (x) h2,X2 ( y) h2( y) f (x) − xk−+1 ∈− arg min−{∇−fd(oxm−k(f))2(Rx0−−R1 xR−k2)} − − − 2 dom(f) R0 R1 R2 − f(x1) + (x∈Xx1)!fy(1xy) + y!(x xy) 2 ¯ f¯(xy) + y!(x xy) y y1 y2 y h−(y) ( y, 1) xˆ x¯ c " Const. f1(x) Slope: yˆ f2,X2 (x) h2,X2 ( y) h2−( y−) f (x) 2¯ y y1 y2 y h(y) ( y, 1)−xˆ x˜ − −Differe−ntiable2 f Nonhd2iff( eryce)"nthi2a(ble−yf) OuyˆtkerCLoinnseta.rizaht1io(yn)of f Slopceonλv(Xk) − Differentihab(le yf)Nh¯on( dxiyff)exrˆeknyˆtx˜ika+bC1loenxˆfskt+O.1ufgt(kehxr1g(L)ky+i)n1e(ax−rizaxt1ioc)noy−n1ovf(fX−S)lope λ − − 2 Set Xk+1 = {x˜k+12} ∪fX(x2k1),+ and∗(dxoCmo generate(kxnf1s)t).Ry10 fR1(xR)21Sxlokp+e1: asyˆ f¯! (x)k h¯ ( y) h ( y) f (x) − − −¯ ! ¯ 1− − −2 2,X2 2,X2 2 2 Codnosmt.(f)fR1(xR) SlRope: yˆ f−2,X2 (x)−h2,X2 ( y) h2( −y) f (x) − − 2 − 0 1 2 − y y1 y2 y h(y) ( −y, 1) xˆ −x˜ 2 c " Differentiable f Nondifferentiable f Ou¯ter xL∗inxˆekarx˜izka+t1io¯xˆnko+f1fgkS¯lgokp+e1λ Differeyntyiabyle yf hN(oyn)d(iffeyr,e1n)tixˆabx¯le f Outer Line−arCizoantsihot.2n(offyf1)(hSx2l)o(Spleyoλp) e: yˆk yˆCfo2n,Xst2.(x)hh12(,yX)2 ( y) h2c(onyv)(Xf k()x) ¯ 1¯ 2 xk+x 1xˆ∈k x˜argk+1 xˆk+1 mingk gk+1 f (x) 2 Const. f1(x) Slope: yˆ f2,X2 (x) h2,X2 ( ySlo) hp2−d(e:omyx˜)(kf+f)1 (R∗Sloxx20)yRpye:1!(xRx˜2ixy)k+xfk(+x1y) zf(x−−k) xk − − − − − 2 − − −¯ dom(f) R0 R1 R−2 − 2c " − ≤x∈conv(X∇k+1) h2( y) h2( y) yˆk Const. h1y(yy) y y h(yco)n(v(Xy,k1)) xˆ x¯ − y−y1 y2−y h(y) ( y,−1) xˆ x˜1 −2 2 2 Const.− f1(x) Slo2pe: Sloyˆ pf¯2e:,Xx˜(kx+)1xh¯Slo∗2,xXˆpke:(x˜kx˜cy+)"i1 hxˆ2k(k+1xykg)+kf1g(kx+f)1(xk) xk − h2( y) h¯2( cy") yˆk Cfon(sxt).2 h1(y) 2 conv(Xk2) hCo(nsyt.) h¯ f( (xySlo) Sploe:yˆpex˜:Ck+o1nysˆSlot.f¯pe:h−(x˜x(y)i)h¯ k x(k+y1c)onh−vf((XxkYy)))xfk (x) − ≤ − ∇ 2 2 1 k 2,X−21 ≤2,−X2 −∇ 2 k 2− − 2 c "2 x∗ xˆk x−˜k+1−xˆk+−21 gk−gk+1 − − − 2− −¯ X x0 xˆ1 xˆ2 xˆ3c "xˆ4 x0 x˜1 x˜2 x˜3 x˜4 xˆ4 = x∗ f(x0h) 2( fy(xˆ)1h)2( fy()xˆ2) yˆkf(Cxˆo3n)st. h1(y) conv(Xk) ¯ Const. f1(x) Slope: yˆ f¯ (x) h¯ ( y) h2( y) f (x) h2( y) h2( y) yˆk ConCsot.nst.h1(yf1)(x) Slopceo: nv(Xyˆ kf¯)2,X (x) h¯2,X ( ∇y) h2x( 2,xyˆ∇X)−2 fx˜ (Slox)2∇pxˆ,−Xe:2 x˜kg−+1∇gSlope: x˜ 2i− k xk+−1 2f(xk) xk − − − − − 2 −2 2 − ∗ k 2k+1 k+1− k k+1− − x∗−xˆk x˜k+1 xˆk+1 gk gk−+1 − f (x) ≤ ∇ X x0 xˆ1 xˆ2 xˆ3 xˆ4 xZ0 x˜1 x˜2 x˜3 x˜4 xˆc4"=2 x∗ f(x0) f(xˆ1) f(xˆ2) f(xˆ3) Slope: x˜k+1 SloLevpee:l sx˜etsi ofkfxk+1 f(xk) xk ¯ X x0 x1 x2 x3 xˆ4 x0 x˜1 x˜2 x˜3hx˜24( xy4)=h2x(∗ y)f(x02yˆ)k Cfo(nxs1t). fhx(1∗x(2yxˆ))k x˜fk+(x13xˆ)kc+on1∇vg(kXgkk)+1∇ ∇ ∇ x xˆk x˜k+1 xˆk+1 gk gk+1 ≤ ∇ c " ∗ h2( y) h¯2( y) yˆk Const.− h1(y−)∇ − c∇onv(Xk∇−) ∇− 2 C SlopNe: x˜k+1 Slope: x˜ i k xk+1 f(xk) xk Slo− pe: x˜k−+1 Slo−pe: x˜ i k−xk+1 Levf−(exl ks2)etsxkof f ≤ 2 ∇ Level sets of f Primal¯≤ ProximalXcx∇"20Iteratix1 x2onx3 Dualxˆ4 x0Prx˜o1ximalcx˜"2 x˜Iteration3 x˜4 x4 =Oux∗ter Apf(xpro0)x- f(x1) f(x2) f(x3) ¯ h2( ym) ha2x( y) yˆkpiCxxoi∗nsxˆtk. x˜kSlo+h1p(xyˆe:)k+x˜1 gk Slogkc+op1ne:v(x˜Xik) k x∇ f∇(x ) x ∇ ∇ h2( y) h2( y) yˆk Cons−t.N h1(−y) − conv(X−k) −k+12 N k+1 k k Slope: x˜k+1 Slope: x˜ i k−xk+1 −f(xk−) xkx∗imationxˆk x˜k−+ofw1ixfxˆi kInn+W1 e−grik=Li1g2nearizationk+1 of Conjugate h ≤ ∇ X x0 x1 x2 x3 xˆ4 x0 x˜1 x˜≤2 x˜3 x˜4 x4∇= x∗ f(x0) if=(1x1) ≤ f(x2) N f(x3) ∇ ∇xi=0 or ∇1 "∇Level sets of f max pixi N X x x xmaxx xˆx xxˆ x˜x˜px˜x x˜xˆ x˜ gx g= x f(xw x) Wf(x ) f(x ) f(x ) x xˆ x˜ !0 xˆ 1 2g 3g 4∗ 0k 1k+i 21i 3k+14 k4 k+1∗ 0i i≤ i1=1 2 3 X x0 x1 x2 x3 x0 x˜1 ∗x˜2 kx˜3 xk˜+4 1x4 k=+N1x∗k fSlok+(x1p0)e: x˜fk(+x11)Slopfe:(x2x˜)i f(kx∇x3)ik=+11 f∇(xk) xk∇ ∇ Level sets of f Subgradients Template wixi W x0 x1 x2 g0 g1 g2 x =0 or 1 " N Slope: x˜k+1 Slope:i=1x˜ i∇ ≤ k xki+=∇1 f(x∇k) xk ∇≤ i ∇ x =0 or 1 X "x x x x x x˜ x˜ x˜ x˜ x = x f(x ) f(x ) f(x ) f(x ) i ≤ 0 ∇1 2 3 0 1 2!3 4 ma4x ∗ p0ixi 1 2 3 X x0 x1 x2 x3 xˆ4 x0 x˜1 x˜2 x˜3 x˜4 x4 = x∗ f(x0) f(Nx1) f(x2Lev) efl(sxets3) of f Subgradients TemplateN ∇ ∇ ∇ ∇ Level ∇sets of f∇ ∇ !FSlo∇k(xp)e:Hxk˜(kλ+)1gSlok+1pe: x˜ i k xk+1 f(xk) xwkixi W SloSupbe:grx˜akd+i1enSlotspTe:emx˜ pi latke fxxk(+d1) f(xk) xk i=1 ≤ i=1 max pixi ≤ ∇ xi=0 or 1 " N ≤ ∇ Level sets of f N Level sets of f wixi W X x0 x1 x2 x3Nxˆ4 x0 x˜1 x˜2 x˜3 x˜4 x4 = x∗ f(x0) f(x1) f(x2) f(x3) iSlo=1 pes:≤ endpi=o1ints of ∂!f(x) ∇ ! f∇x(d) ∇ ∇ X x0 x 1=0xo2rx13 x0 "x˜1 x˜2 x˜3 x˜4 x4 = x∗ f(x0) f(x1m) axf(x2) f(pxi3x)i i max fxp(idx)i SubPg(rNua)d=iecnmaxts T0e,mu plate N N ∇ ∇ w∇ixi W ∇ N w x W i=X1 x0 x1≤x2 x{3 ix=01}x1 x2 f(x0) f(x1) f(x2) ! iLevi≤el setsi=1of f Slopes: endpoints of ∂!f(x)∇ ∇ ∇ X x0 x1i=x12 x3 x0 x˜1 x˜2 x˜3 x˜4 x4 =xxi=0 orf1(x0) "f(x1m) axf(x2) f(pxix3)i SubgradXienxt0sxT1exm2 pxl3atxˆe4 x0 x˜1 x˜2 x˜3 x˜x4=x04or=1x "fα(x0) f(x1) f∗(x2) f(x3) N max LevepSlolisxetsipes:ofefndpointis of ∂!f∗(x) Level sets of∇fc ∇ w∇x fW (d)∇ N ∇ ∇ ∇ ∇ i i≤ x i=1 w x W ! i=1 i i≤ i=1 ! N x =0 " i=1 Subgradients Template i or α1 x =0 or 1 Sub"gradients TemLevplaeltesets of f N N i Level sets of ffx(d) x h(λ) 1h1(λ) h2(Sloλ )pes:hm1ea(ndpλx ) oxints( o!λfp∂i,!x1)fi (x() λ, 1) α ∗ N ∗ ∗ ∗ ! Subg−radwiexntsWTem−maxplate − pixi max pixi i i≤ i=N1 Subgradients Template N i=1 N fx(d) wixi W i=1 Slopes: endpoints of ∂!f(x) w x N W x =0 or 1 "i=1 ≤ 1 fxi(di≤) i=1 i c x =0 " i=1 2 i or 1 α x =0 "1 i or 1 max! p(v) + pixvi ! max Slopes: peindpxi oinNts of ∂!f(x) 2" " fx(d) Slopes: endpoints ofN∂ f(x)Subgradients TewmSubgrixpi laWtadiene ts Template fx(d) ! w!ixi W i=1 ≤ i=1 α i=1 ≤ i=1 Subgradients Template xi=0 or 1 " 1 xi=0 or 1 " Slopes: endpoints of ∂!f(x) Slopes: endpoints of ∂!f(x) ! α fx(d) 1 !Subgradieαnts Template inf Lfcx((x,d)λk) Subgradients Template f (d) x X x Slopes: end∈ points of ∂!f(x) α α Slopes: endpoints of ∂!f(x) 1 1 f (d) Slopes: endpoints of ∂!ffx((xd)) x α v 1 1 yk xαk+1 Slopes: endpoints of ∂!f(x)Slope gˆk = − Slopes: endpoints of ∂!f(x) α ck Slope = g0 Slope = g1 Slope = g2 vk+1 p(v) 1 α 1 2 α 1 #k = δk xk+1 yk − 2ck # − # f1(x) f2(x) 1 Slope = 0 1 (a) (b) is #k-subgradient

γ minimize f(x) = x1 + x2 k

subject to g(x) = x1 0 ≤ x x X = (x1, x2) x1 > 0 Slope = supg ∂!f(x) d#g ∈ ∈ | } ! f = , q = x ∗ ∞ ∗ −∞ Slope = infg ∂!f(x) d#g ∈ S = (x2, x) x > 0 x + αd | of ∂!f(x) ! " 2 q(µ) = min x + µx Fd(α) = f(x + αd) x ∈$ !1/(4µ) if" µ > 0 = x, f(x) # − if µ 0 − # −∞ # ≤ $ # 1

hx(g)

1 Polyhedral Approximation Extended Monotropic Programming Special Cases

Comparison: Cutting Plane - Simplicial Decomposition

Cutting plane aims to use LP with same dimension and smaller number of constraints. Most useful when problem has small dimension and: There are many linear constraints, or The cost function is nonlinear and linear versions of the problem are much simpler

Simplicial decomposition aims to use NLP over a simplex of small dimension [i.e., conv(Xk )]. Most useful when problem has large dimension and: Cost is nonlinear, and Solving linear versions of the (large-dimensional) problem is much simpler (possibly due to decomposition) The two methods appear very different, with unclear connection, despite the general conjugacy relation between outer and inner linearization. We will see that they are special cases of two methods that are dual (and mathematically equivalent) to each other. Polyhedral Approximation Extended Monotropic Programming Special Cases

Extended Monotropic Programming (EMP)

m X min fi (xi ) (x ,...,xm)∈S 1 i=1 ni where fi : < 7→ (−∞, ∞] is a closed proper convex, S is subspace.

Monotropic programming (Rockafellar, Minty), where fi : scalar functions. Single commodity network flow( S: circulation subspace of a graph). Block separable problems with linear constraints. Fenchel duality framework: Let m = 2 and S = ˘(x, x) | x ∈

min f1(x) + f2(x) x∈

Conic programs (second order, semidefinite - special case of Fenchel). Sum of functions (e.g., machine learning): For S = ˘(x,..., x) | x ∈

Dual EMP

ni Derivation: Introduce zi ∈ < and convert EMP to the equivalent form m X min fi (zi ) zi =xi , i=1,...,m, = (x1,...,xm)∈S i 1

ni Assign multiplier yi ∈ < to constraint zi = xi , and form the Lagrangian m X 0 L(x, z, y) = fi (zi ) + yi (xi − zi ) i=1

where y = (y1,..., ym). The dual problem is to maximize the dual function q(y) = inf L(x, z, y) n (x1,...,xm)∈S, zi ∈< i

Exploiting the separability of L(x, z, y) and changing sign to convert maximization to minimization, we obtain the dual EMP in symmetric form m X min hi (yi ) (y ,...,y )∈S⊥ 1 m i=1

where hi is the convex conjugate function of fi . Polyhedral Approximation Extended Monotropic Programming Special Cases

Optimality Conditions

There are powerful conditions for q∗ = f ∗ (Bertsekas 2008, generalizing classical monotropic programming results): Vector Sum Condition for Strong Duality: Assume that for all feasible x, the set ⊥ S + ∂(f1 + ··· + fm)(x) is closed for all  > 0. Then q∗ = f ∗. Special Case: Assume each fi is finite, or is polyhedral, or is essentially one-dimensional, or is domain one-dimensional. Then q∗ = f ∗. By considering the dual EMP, “finite" may be replaced by “co-finite" in the above statement.

Optimality conditions, assuming −∞ < q∗ = f ∗ < ∞: (x∗, y ∗) is an optimal primal and dual solution pair if and only if ∗ ∗ ⊥ ∗ ∗ x ∈ S, y ∈ S , yi ∈ ∂fi (xi ), i = 1,..., m Symmetric conditions involving the dual EMP: ∗ ∗ ⊥ ∗ ∗ x ∈ S, y ∈ S , xi ∈ ∂hi (yi ), i = 1,..., m Polyhedral Approximation Extended Monotropic Programming Special Cases

Outer Linearization of a Convex Function: Definition

Let f : 0 yk xk+1 ∈ | } 1 Slope gˆk = − ck ! f ∗ = , q∗ = ∞ 1 −∞ " = δ x y 2 k k − 2c $ k+1 − k$ S = (x2, xk) x > 0 | Slope = 0 ! " is "k-subgradientq(µ) = min x + µx2 x ∈$ !1/(4µ) if" µ > 0 = γk − if µ 0 # −∞ ≤ 1 minimize f(x) = x − subject to g(x) = x 1/2 0 − ≤ x X = 0, 1 ∈ { } q(µ) = min x + µ(x 1/2) x 0,1 − − ∈{ } = min !µ/2, µ/2 1 " {− − } 1 Polyhedral Approximation Extended Monotropic Programming Special Cases

Inner Linearization of a Convex Function: Definition

Let f :

( P P min x∈X αx x=z, x∈X αz f (z) if z ∈ conv(X) ¯ P fX (z) = x∈X αx =1, αx ≥0, x∈X ∞ otherwise xy y!(x xy) + f(xy) z xy y (x xy) + f(x−y) z ! − f (x) f (x) Y Y

xy y!(x xy)f+Xf(xy) z f X f − C f (x) C Primal Proximal Iteration Dual Proximal IterationY Outer Approx- PrimalimationProximalof fIteratiInneonr LinearizationDual Proximalof ConIterationjugate hOuter Approx- imation of f Inner Linearization of Conjugate h f X f x0 x1 x2 g0 g1 g2 x0 x1Cx2Xg0 g1 g2 Fk(x) Hk(λ) gk+1Primal Proximal Iteration Dual Proximal Iteration Outer Approx- Fk(x) Hk(λ) gk+1 imation of f Inner Linearization of Conjugate h P (u) = c max 0, u { } P (u) = c max 0, u x x x g g g { } c 0 1 2 0 1 2 c Fk(x) Hk(λ) gk+1 x h(λ) h1(λ) h2(λ ) h1(λ ) x ( λ , 1) ( λ, 1) ∗ − ∗ ∗ − ∗ − x h(λ) h1(λ) h2(λ∗) h1(λ∗) x∗ ( λ∗, 1) ( λ, 1) − − c − P (u) = c max 0, u c p(v) + v 2 { } p(v) + v 2 2" " 2" " c

x h(λ) hin1(fλL) c(hx,2(λλk)) h1(λ ) x ( λ , 1) ( λ, 1) x X ∗ ∗ ∗ ∗ inf Lc(x, λk)∈ − − − x X c ∈ p(v) + v 2 v 2" " v Slope = g0 Slope = g1 Slope = g2 vk+1 p(v) inf Lc(x, λk) Slope = g0 Slope = g1 Slope = g2 vk+1 p(v) x X ∈ f1(x) f2(x) f1(x) f2(x) v (a) (b) (a) (b) Slope = g0 Slope = g1 Slope = g2 vk+1 p(v) minimize f(x) = x1 + x2 minimize f(x) = x1 + x2 subject to g(x) = x1 0 f1(x) f2(x) subject to g(x) = x1 0 ≤ (a) (b) ≤x X = (x1, x2) x1 > 0 x X = (x1∈, x2) x1 > 0 | } ∈ | ! } f = , q = ! ∗ ∞minim∗ ize−∞f(x) = x1 + x2 f ∗ = , q∗ = ∞ −∞subject to g(x) = x1 0 ≤ 2 S = (x , x) x > 0x X = (x1, x2) x1 > 0 S = (x2, x) x > 0 | ∈ | } | ! " ! ! q(µ) = min "x + µx2 f ∗ = , q∗ = x2 ∞ −∞ q(µ) = min x + µx ∈$ x ∈$ !1/(4µ) if" µ > 0 !1/(4µ=) if"−µ > 0 S = (x2, x) x > 0 = if µ 0 | − #if−∞µ 0 ≤ # −∞ ≤ ! " 1 q(µ) = min x + µx2 x 1 ∈$ !1/(4µ) if" µ > 0 = − if µ 0 # −∞ ≤ 1 Polyhedral Approximation Extended Monotropic Programming Special Cases

Polyhedral Approximation Algorithm

ni Let fi : < 7→ (−∞, ∞] be closed proper convex, with conjugates hi . Consider the EMP m X min fi (xi ) (x ,...,xm)∈S 1 i=1 Introduce a fixed partition of the index set:

{1,..., m} = I ∪ I ∪ ¯I, I : Outer indices, ¯I : Inner indices

Typical Iteration: We have finite subsets Yi ⊂ dom(hi ) for each i ∈ I, ¯ and Xi ⊂ dom(fi ) for each i ∈ I.

Find primal-dual optimal pair xˆ = (xˆ1,..., xˆm), and yˆ = (yˆ1,..., yˆm) of the approximate EMP X X X¯ min fi (xi ) + f (xi )+ fi,X (xi ) i,Yi i (x1,...,xm)∈S i∈I i∈I i∈¯I

Enlarge Yi and Xi by differentiation: For each i ∈ I, add y˜i to Yi where y˜i ∈ ∂fi (xˆi ) ¯ For each i ∈ I, add x˜i to Xi where x˜i ∈ ∂hi (yˆi ). f(x0) + y˜ (x x0) 0! − f(xˆ1) + y˜! (x xˆ1) Polyhedral Approximation Extended Monotropic Programming 1 − Special Cases

fi(x) yˆi xˆi f (x) i,Yi Enlargement Step for ith Component Function f(x0) + y0! (x x0) f(x0) + y˜ (x x0) − 0! − f(x1) + y1! (x x1) f(xˆ1) + y˜ (x xˆ1) − Outer: For each i ∈ I, add y˜i to Yi where1! − y˜i ∈ ∂fi (xˆi ).

f(xy) + y!(x xy) ffi(x0))yˆ+i xˆy˜i! (fx (xx0)) − 0 i,Yi − f(x1) + (x x1) y1 − ! f(xˆ1) + y˜1! (x xˆ1) Slope y λ x Slop−e yˆ New Slope y˜i New Point x˜i ff(x(x00))f++(xy0y!˜)(x(+xy˜!x(0x)0) x0) 0! −−0 − ff(x1)) y+ˆ yxˆ! (xf x(1x)) if(xˆi 1)f+(i xˆ1y˜i) (+xi,yY˜1! (xˆi1) xˆ1f) (x) f f 1! −−i − X Y f (x) X epi(ff((x) y0d)foi+m(x(yif!0)!()(xyˆxRi 0xˆxixRyf01)) R2(xi) fi(xi) yˆi xˆi f (xi,iY)i f(x0) + y˜ (x x0) −−i,Yi 0! y yf1(xy02)y+hy˜(0!y()x( yx,01)) xˆ x˜ − ff((xx11))++(yx1! (x−x1)x!y11) − − f(xˆ1) + y˜1! (x xˆ1) f(xˆ1) +f(y˜x1!0()x+ yxˆ0! (1x) x0) − Slope y λ x Slope yˆi Nefw(xS0l)op+eyy˜0!i(−xNewxP0−)oint x˜i − ¯ ¯ Cofn(sxty.)f+(xfy1!)((x+) yS!lxo(yxp)e: x1)yˆ f2,X2 (x) h2,X2 ( y) h2( y) f (x) f(x1)−+ y! (x−1 x1) − f (x) yˆ xˆ f− (x) − 2 f (x) f1 f − i i i i,Y fi(xi) Xyˆi xˆi fY −(xi) i f(x ) + (x i,Yxi ) y 1 1 ! 1 f(x0) + y˜! (x x0) f (x) − 0 − X f(xy) + y!(x xy) Slope y λ x Slope yˆi Nefw(xSyl)o+peyy˜(i xNewxyP)oint x˜i epi(f) dom(f) R0 R1 R2 ! − f(fxˆ1(x) 0+) +y˜! (yx0! (x xˆ12x) 0) ¯ ˜ f(x0) + y0! (x −x0)˜ ˆ 1 c−" Inner: For each i ∈ I, add xi to Xh2i( wherey) h¯2( y) xi yˆ∈C∂onhsti. (yhi 1)(.y) − conv(X ) yf(yx10y)2+yy˜h0!((yx) ( xy0,)1) xˆ x˜ f(x1) +−(x x1)k!y1 k f(x1−) + (x −x1) −y1 − − 2 −− f (xi) f f−! f(x1) + y1! (x x1) f(x1i),X+i y! (x−Y x1) Slope y λ x Slope yˆ New1Slo−pe y˜ New Point x˜ − f(xˆ1) + y˜1! (x xˆ1) i i fi(x)i yˆi xˆi f (x) Sflop(xe)y λ x S−lope yˆi New Slope y˜i Nexw∗ xPˆkoix˜nkt+x˜1i xˆk+1 gk gk+i1,Yi X ¯ ¯ Const. f1(x) Slope: yˆ f2,X2 (x) h2,X2 ( y) h2(f(yx) )f+(xy)(x x ) epi(f) dom(f) R0 R1 fR(x2 y) + fy (x (xxi)y)f f y 2 ! y − − !i,Xi Y − − − f (xi) −f f fyi(yx1i)y2yˆiyxˆhi(fy) ( (xyi,)1) xˆ x˜ i,Xi Y f(x0) + y! (x x0) i,Yi− Slope: x˜k+1 Slope: x˜ fi (x1k) x+k0+(x1 fx(1x)ky)1xk f (x) f(x1) + (x x1)!y1 ≤ ∇− ! f (x) X − − X epi(f) dom(f) R0 R1 R2 f(x1) + y! (x x1) Slope y λ x Slope 2yˆi New Slop1 e y˜i New Point x˜i Sepi(lopef)y dλomx (Sflo)pRe 0yˆiRN1eRw2Slope y˜i New Pointcx˜"i − fC(xo0n)syt+.y1y0!yf(2x¯f(yx(hx)x(0Sy0)l)o+p(ey˜:0!y(,x1)yˆxˆfx¯x˜0) (x) h¯ ( y) h ( y) f (x) h2( y) h1 2( y) yˆk C−ons2t,X. 2 h1(y2),X2 c2onv(Xk)2 y y1 y2 −y−h(y)−(− y, 1−−) xˆ−x˜ − −−2 − X −x0 x1 x2 x3 xˆ4 x0 x˜1 x˜2 x˜3 x˜4 x4 = x∗ f(x0) f(x1) f(x2) f(x3) f(x1) + y1! (xf(xˆx11))+ y˜1!f(xi,X (xˆx1i) f f f(xy)f+X∇y(x()xf ∇xfy) ∇ ∇ − −i Y ! −Y x∗ xˆk x˜k+1 xˆk+1 gk gk+1 Const. f1(x) Slope: yˆ f¯2,X (x) h¯2,X ( y) h2( y) f (x) f (x) Level sfets(fx0o)f(+xf)y˜0! (x2 x0) 2 f(x ) + (x x2 ) y X − X−¯ ¯− 2 − 1 − 1 ! 1 Const. f1(x) Slope: yˆ f2,X2 (x) h2,X2 (c "y) h2( y) f −(x) epi(f(xfy))d+omy!((fxf¯)i(Rx0xi)yR)yˆ1i Rxˆi2 f epi((xif) ) dom(f) R0 R1 R2 2 h2( −y) h2( y) yˆk−iC,Yoinst. h1(y) − conv−(XN k) − −− − Sfl(oxˆp1e) +y−λy˜1!x(xSlopxˆ−e1)yˆ2 New Slope y˜ New Point x˜ y y1 y2 y hSlo(y)p(e: yx˜,k1+)1 xˆSlox˜ pe:yx˜y1i y2kyxhk−(+y1) ( fi(yx, k1)) xˆkx˜ i i − ≤ ∇m−ax pixi f(x1) + (x x1)!y1 N w x W 2 − x xˆ x˜ xˆ g g i i≤c " i=1 f(x0)¯+∗y! k(x k+x10)k+1 k k+i1=1 h2( y) h2( 0y) yˆk Const. hx1(=y0)or 1 f X"c(oxn)v(fXk)f Slope y λ x Slope yˆi New Slope y˜i New Point fx˜i−i(xi) yˆi xˆi f (xi i) c "2 Y ¯− − − ¯ ¯ i−,Yi − 2 ¯ ¯ Conhst2.( yf)1(hx2)(Slyo)pe: yˆkyˆCfo2Cn,Xsot2n.(sxt). h12,(fX!y12)((x)yS)lohp2(ec:oyn)vf(yˆXf(kx2),)X (x) h2,X ( y) h2( y) f (x) X x0 x1 x2 x3−xˆ4 xf0(xx˜11)x˜+S2uyx˜b!3g(−xrx˜fa4dxix(4ex1n=)tsxTemf(pxl0a−)te f(x1−) f(2x2) 2f(x3) 2 2 − Slope:−x˜ −1Slop−e:X x˜ i −∗k∇−x −∇f(x2 ) x∇− ∇ − − f (x ) f f k+1 x∗ xˆk x˜k+1 xˆk+k+1 1gk gk+1k k i,Xi i Y epi(f) d≤om(f) R0∇R1 R2 f(x0) + y0! (x x0) Level sets of f x∗ xˆk yx˜ky+1yxˆky+1h(gyk)−g(k+y1, 1) xˆ1x˜ f (x) f(x ) + y (x x1 )2 X y ! y − 2 Slope: x˜k+f1−(Slox1)p+e:yx˜1! (ix kxx1c)k+" 1 f(xk) xk c "2 epi(f) dom(f) R0 R1 R2 h2( y) h¯2( y) yˆk Const. hN1(y)− conv(Xk) h2( y≤) h¯2( y) ∇ yˆ Const. h1(y) conv(X ) X x0 x1 x2 x−3 xˆ4 fx(0−xx˜11) +x˜−2(x˜3 x˜x4 1x)4!y=1−x∗ f(x−0) 2 f(x1) k f(x2) f(x3) k y y1 y2 y h(y) ( y, 1) xˆ x˜ Slope: x˜k+1 Slomapxe: x˜ i −k pxikx+i 1 − f(x−k) xk − − 2 N− ∇ ∇ ∇ ¯ ∇ ¯ − wCxonWst. ≤ f1(x) S∇lope: yˆ f2,X (x) h2,X ( y) h2( y) f (x) i i≤ i=1 2 2 2 Slope y λ x Slope yˆi New Sxlopxeˆ iy˜=ix˜1 fN(exwyxˆ)P+oiy−ngt(xx˜gi xy) − − − Level sets of f ∗ xki=0k+or1 1 k+1"! k k+1 − x∗ xˆk x˜k+1 xˆk+1 gk gk+1 X x0 x1 x2 x3 xˆ4 x0!x˜1 x˜2 x˜3 x˜4 x4 = x∗ f(x0) f(x1) f(x2) f(x3) Const. f (x) SlopeS:ubgyˆraf¯dien(txs)Th¯emp(latyef)(hx1() +y)(xfN (x)1)∇!y1 ∇ ∇ ∇ 1 2,X2 f i,X2,X(2xi) f f2 −2 − X x0 x1−x2 x3 Sloxˆ4pxe:0 x˜1 ix˜2Slo−x˜3px˜e:Y4 x˜ 4−i = xk∗x f(x0f)(x f)(x 1) f(x2) f(x3) Level sets ofkf+1 max kp+ix1 i k k 2 Slope y λ x Slope yˆiNNew Slo≤pe y˜i∇SloNepw∇e:Px˜o∇kin+t1x˜Sloi p∇e: x˜ i ∇k xk+1 c "f(xk) xk f (x) wixi h1W ( y) h¯ ( y) yˆ Const. h (y) conv(X ) X i=1 ≤ 2 i=1 2 k ≤1 ∇ k x =0 or 1 −" − − − − 2 epi(f) domLev(fe)lRsets0 Ro1fRf2 i N ! f (xi) f f y y y y h(y) ( y, 1) xˆ x˜ c "2 i,mXai x Y pixi ¯ 1 2 Subgradients Template N N x xˆk x˜k+1 xˆk+1 gk gk+1 h2( y) h2( Xy)x0 x1yˆkx2Cxon3 sxˆt4. x0hx˜11(yx˜)2 x˜3 x˜4 xc4o=nwvx(xXkW)f(x0) f∗(x1) f(x2) f(x3) − i ∗i≤ i=1 − − − f (x) − −X 2x0 xi=11 x2 x∇3 xˆ4 x0 ∇x˜1 x˜2 x˜∇3 x˜4 x4 ∇= x∗ f(x0) f(x1) f(x2) f(x3) X maxi=0 or 1 p"ixi N ∇ ∇ ∇ ∇ epi(f) dom(f) R R Rwixi 1W Level sets of f 0 1i=1!2 ≤ i=1 x xˆk x˜kS+u1bxˆgkr+a1digekngtks¯+T1empla¯te Const. ∗ f1(x) Slope: yˆ f2,Xxi(=x0)Levohr 21,eXl s(ets"Sloy)opfhe:2f(x˜ky+)1fSlo(xp)e: x˜ i k xk+1 f(xk) xk y y1 y2 y h(y) ( y, 12) xˆ x˜ 2 2 ≤ ∇ − − − ! N− − Subgradients Template N max 1pixi N Slope: x˜k+1 Slope: x˜ i k xk+1 f(wxxk) Wxk i i≤ i=1 max pixi ≤ ∇i=1 ¯ ¯ N Const. f1(x) Slope: yˆ f2,X2 (x) h2,X2 ( y) h2( y) f (x) Xxxi=00 xor1 1x2 1x3"xˆ4 2x0 x˜1 x˜2 x˜3 x˜4 xw4ix=i Wx 2 f(x0) f(x1) f(x2) f(x3) ¯ − − c " − i=1 − ≤ ∗ ∇i=1 ∇ ∇ ∇ h2( y) h2( y) yˆk Con!st. h1(y) conv(Xk) xi=0 or 1 " −Subgr−adien−ts Templat−e − 2 ! LevSeulbsgetsradofiefnts Template X x0 x1 x2 x3 xˆ4 x0 x˜1 x˜2 x˜3 x˜4 x4 = x∗ f(x0) f(x1) f(x2) f(x3) x∗ xˆk∇x˜k+1 xˆk∇+1 gk 1gk∇+1 ∇ 2 ¯ c " N h2( y) h2( y) yˆk Const. h1(y) conv(X1k) Level sets of f − − − − − 2max pixi N wixi W Slope: x˜ Slope: x˜ i k x f(x ) x i=1 ≤ i=1 k+1 N k+1 k k x∗≤xˆk x˜k+1 ∇xˆk+1 gk gk+1 xi=0 or 1 " max pixi ! N Subgradients Template wixi W i=1 ≤ i=1 xi=0 or 1 Slo"pe: x˜k+1 Slope: x˜ i k xk+1 f(xk) xk X x0 x1 x2 x3!xˆ4 x0 x˜1 x˜2 x˜3 x˜4 x4 = x∗ f(x0) ≤f(x1) f∇(x2) f(x13) Subgradients Template ∇ ∇ ∇ ∇ Level sets of f 1 X x0 x1 x2 x3 xˆ4 x0 x˜1 x˜2 x˜3 x˜4 x4 = x f(x0) f(x1) f(x2) f(x3) N ∗ ∇ ∇ ∇ ∇ max pixi Level sets oNf f wixi W i=1 ≤ i=1 xi=0 or 1 " ! N Subgradients Template max pixi N wixi W i=1 ≤ i=1 x =0 " 1 i or 1 ! Subgradients Template

1 Polyhedral Approximation Extended Monotropic Programming Special Cases

Mathematically Equivalent Dual Algorithm

Instead of solving the primal approximate EMP X X X¯ min fi (xi ) + f (xi )+ fi,X (xi ) i,Yi i (x1,...,xm)∈S i∈I i∈I i∈¯I we may solve its dual X X X ¯ min hi (yi ) + h (yi )+ hi,X (xi ) ⊥ i,Yi i (y1,...,ym)∈S i∈I i∈I i∈¯I ¯ ¯ where h and hi,X are the conjugates of f and fi,X . i,Yi i i,Yi i ¯ Note that h is an inner linearization, and hi,X is an outer linearization i,Yi i (roles of inner/outer have been reversed). The choice of primal or dual is a matter of computational convenience, but does not affect the primal-dual sequences produced. Polyhedral Approximation Extended Monotropic Programming Special Cases

Comments on Polyhedral Approximation Algorithm

In some cases we may use an algorithm that solves simultaneously the primal and the dual. Example: Monotropic programming, where xi is one-dimensional. Special case: Convex separable network flow, where xi is the one-dimensional flow of a directed arc of a graph, S is the circulation subspace of the graph. In other cases, it may be preferable to focus on solution of either the primal or the dual approximate EMP.

After solving the primal, the refinement of the approximation (y˜i for i ∈ I, ¯ and x˜i for i ∈ I) may be found later by differentiation and/or some special procedure/optimization. This may be easy, e.g., in the cutting plane method, or This may be nontrivial, e.g., in the simplicial decomposition method. Subgradient duality [y ∈ ∂f (x) iff x ∈ ∂h(y)] may be useful. Polyhedral Approximation Extended Monotropic Programming Special Cases

4Cutting Plane Method for minx∈C f (x)

EMP equivalent: minx1=x2 f (x1) + δ(x2 | C), where δ(x2 | C) is the indicator function of C. Classical4 cutting4 plane4 algorithm: Outer linearize f only, and solve the primal approximatePolyhe EMP.dra Itl hasC theon formvexity Template

min f Y (x) x∈C where Y is the set of subgradients of f obtained so far. If xˆ is the solution,Poly addhe toPdYroal subgradientlyPhCeoodlnyravhy˜le∈xdC∂irft(oaxˆy)nl. vTCeeoxmnitvpyelaxTtietymTpleamteplate

xy y (x xxyy)y+(xf(xxy)yx)zy+yf((xxy) xzy) + f(xy) z ! − ! − ! − f(x0) + y˜0! (x x0) f (x) f (x) f (x) − Y Y Y x0 x1 x2 x3 x4 f(x) X x f(xˆ1) + y˜ (x xˆ1) 1! − f(x ) + (x − x )!g f X f f X0 f f0X f 0 f(x0) + y! (x x0) xy y!(x xy) + f(xy) z 0 − C X C X C X − ! Primal Proximal Iteration fDual(x1) +Pr(oxxfimal−(x1x)1+Iteration) gy11! (x xOu1) ter Approx- Primal Proximal IteratiPronimalDualProximalProximalIteratiIterationonf(xDual0) +Ouy˜Pr0!ter(oxfximal−Ap(xxpro0))Iterationx- Outer Approx- imation ofimationf InnerofLifnearizationimationInner LiofnearizationfofInnConerjuLigateofnearizationChonjugateofhConjugate−Y h f(xˆ1) + y˜1! (x xˆ1) αx + (1 − α)yf,(x0y≤) +αy≤(x1− xy) ∗ ! − x0 x1 x2 x3 x4 xf0x(0xˆ1)xxˆ1X2xxˆx230 xyxˆ03x1 0yxˆ124xy1xˆf23(xxy200)xxˆyX31 xyˆx24 xˆf3(yx0) yXf1Zy(x2x) f(x1) + (x x1)!y1 C f(◦x0) + y0! (−x x0) Fk(x) Hk(Fλk)(xg)k+1Hk(λ)Fgkk(+1x) Hk(λ) gk+1 < 90" − ! ! DiffefrPre(nximalt0ia)b+leP(froxNximal−onxd0iff)Iteratifegr(0exn0t)iona+ble(Dualxff−O(uxxPrt00e))ro+xLgimali0n(xea−rIterationizxat0i)ong0of fOuSterlopeApλprox- ∗ 2 f(x1) + y! (x x1) Level set !dxom| f(f(x))R≤ fR +Rαc /2" Optim1al solution set x0 f(ximation0) + (xf(ofx0xf)0)+Inn1y(xe2r fLix(xnearization0)0)y+ (x xof0)−Cyonjugate h − ! 0 − ! 0 − ! 0 y y1 y2 y h(y) ( y, 1) xˆ" x¯ ! ! f(x1) + (x−− x1)fg(1x1) + (xf−(xx11))+g1(x − x1) g1 f(xˆ1) + (xf(xˆ1xˆ)1)+!y(1x fxˆ(xXˆ1f)1!)(yx+1y)(x+x0yxxˆ(1x1)x!y2 1xgy0)g1 g2 − − − ! − Fk(x) Hk(λ) gk+1f(x1) + (x x1) y1 Const. f1(x) Slope: yˆ f¯2,X (x) h¯! 2,X ( y) h2( y) f (x) P (uα) x=+c maxP(1(−u−)0=α, u)cymaxx, kP0α−(≤x0uα,)+uαk=−g(k≤c1max−1 α−2)0xy, u,+0(1≤−2αα−≤)y1, 0 −≤ α ≤2 1 { } { } { } Differentiable f NondifferenPtia(ub)le=fcOmaxuter0L,iunearization of f Slope λ xk+1 = PX (xk − αkgk) dom(fc) R0 R1 Rc2 c { } ◦ ◦ ◦ y y1 y2 y h(y<) (90y, 1) xˆ x¯ < 90c

inf Lc(x, λinkf) Lc(x, λk)inf Lc(x, λk) 1 x X xSloXpe: xk+1xSloX pe: x i k xk+1 f(xk) xk ∈ ∈ ∈ inf≤Lc(x, λk)∇ 2 xk − αkgk xkx −X αkgk xk −c "αkgk h2( y) h¯2( y) yˆk Const∈. h1(y) conv(Xk) v − −v − v − − 2 xk+1 = PX (xk −xαk+kg1k=) PXx(xk+kv1−=αPkgXk()xk − αkgk) X x0 x1 x2 x3 x0 x1 x2x x3x x4xx4 =xx gf(gx0) f(x1) f(x2) f(x3) Slope = gSl0 opSleop=e g=0 gSlSl1 opopSlopee =e gg=01 gSlS2lopopvkee+1= ggp12(∗v)Svlopkk+1ek+=p1(vg)2k+∗v1k∇+1k pk(+v1)∇ ∇ ∇ Slope = g0 Slope = g1 Slope = g2 vk+1 p(v) Level sets of fx∗ x∗ x∗ f1(x) f2f(x1()x) f2(x)f1(x) f2(x) Slope: xk+1 Slope: x i k xk+1 f(xk) xk f1(≤x) f2(x)∇ (a) (b) (a) (b) (a) (b) N (a) (b) max pixi xk N xk xk wixi W i=1 minimize minif(mx)iz=e x1minif+(xx)2m=izxe1 +fx(2xi=) 1= x1≤+ x2 xi=0 or 1 " X x0 x1 x2 x3 x0 x1 minix2 xm3 ixz4e x4 f=(xx)∗= xf1(x+0)x2 f(x1) f(x2) f(x3) subject tosubgject(x) =tox1subg(x0ject) = xto1 g!0(x) = x1 0 ∇ ∇ ∇ 1 ∇ 1 1 Subgrad≤ients Tsubem≤jectplattoe g(≤x) = x1 0 xLevXel =setsx(oxf1X,fx2=) x(x1x1>, x02X) =x1 (>x10, x2) ≤x1 > 0 ∈ ∈ | ∈ } | x X}= |(x1, x2)} x1 > 0 ! ! f!x∈(d) N | } f ∗ = , qf∗∗ == , q∗ f=∗ = , q∗ = ! ∞ −∞∞ −∞max pixi Slopes: endpoints of ∂!Nf∞(x)f ∗ = −∞, q∗ = wixi W i=1 ≤ ∞i=1 −∞ xi=0 or 1 " S = (x2,Sx)= x(x>2,0x)S =x >(x02, x) αx > 0 | | ! S = | (x2, x) x > 0 Subgradients Template | ! ! " ! " " 1 1 1 !1 " q(µ) =fxmin(d) x + µx2 x ∈$ Slopes: endpoints of ∂!f(x) !1/(4µ) if" µ > 0 = − if µ 0 #α −∞ ≤ 1 1 Polyhedral Approximation Extended Monotropic Programming Special Cases

Simplicial Decomposition Method for minx∈C f (x)

EMP equivalent: minx1=x2 f (x1) + δ(x2 | C), where δ(x2 | C) is the indicator function of C. Generalized Simplicial Decomposition: Inner linearize C only, and solve the primal approximate EMP. In has the form

min f (x) ¯ x∈CX ¯ where CX is an inner approximation to C. Assume that xˆ is the solution of the approximate EMP. Dual approximate EMP solutions: ˘ ¯ ¯ (yˆ, −yˆ) | yˆ ∈ ∂f (xˆ), −yˆ ∈ (normal cone of CX at xˆ) In the classical case where f is differentiable, yˆ = ∇f (xˆ). Add to X a point x˜ such that x˜ ∈ arg min yˆ0x x∈C f(x0) + y˜ (x x0) 0! −

f(xˆ1) + y˜1! (x xˆ1) f(x0) + y˜0! (x x0) f(x0) + y˜ (−x x0) − 0! − f(xˆ1) + y˜1! (x xˆ1) f(xˆ ) + y˜ (x xˆ ) − f(x0) 1+ y (x1! x0) 1 f(x0) + y˜! (x x0) 0! − − 0 − f(x0) + y˜! (x x0) f(x1) + y (x x1) f(xf0()xˆ+1)y+(yx˜! (xx0)xˆ1) 0 − 1! − 0! 1− − f(x0) + y0! (x x0) − f(x0) + y˜0! (x x0) f(xˆ1) + y˜! (x xˆ1)f(x0) + y˜0! (x x0) −f(x1) + y (x x1) 1 − − 1! − f(x1) + y1! (x x1) f(fx(x)0+) +y˜ y(x(x x x)0) f(xy) + y!(x x−y) f(xˆ1) + y˜1! (x xˆ1) 0 0! 0! 0 f(x0) + y˜! (x x0) f(xˆ1) + y˜1! (x xˆ1)− − −− 0 − − f(x0) + y0! (x x0) f(x1) + (x x1)!y1 ff(x(fxˆy()1x)+1+)y+y!˜(1!xy(1!x(xxyxˆ)1x)1) f(xˆ1) + y˜! (x xˆ1) − − −−− 1 f(xy) + y!(x xy) f(x0) + y˜! (x x0) − − f(x0) + y00! (x− x0) fD(xi1ff)er+enyt1!i(axble fx1N)fo(xnf0d)(ixff+0e)rye+0!n(txy˜ia0! (bxlxe0f)xO0)uter Linearization of−ff(Sxl1o)p+e λ(x x1)!y1 − − − − f(x0) + y˜! (x x0) dom(f) R R R f(x1) + (x x1)!y1 f(xˆ1) + y˜1! (x xˆf1)(fx(x)y+) +y y(x(x xx)y) 0 0 1 2 − f(x1) + y1! (x x1) 0 0! ! 0 − f(x0) + y0! (x x0) f(xf1)(xˆ+1)y+1! (xy˜1! (xx1)xˆf1D()xi0ff)er+eny˜t0!i(axble fx0N) on−d−ifferentiable f O−u−ter Linearization of f Slope λ y y1 y2 y h(y)f(x0y),+1)y˜xˆ0! (x˜ x0) − − Differentiable f Nondiffe−ren−tiab−le f Outer Linearization of f Slope λ f(xˆ1) + y˜1! (x xˆ1) f(xy) + y!(x xy−) dom(f) R0 R1 R2 ff(x(1x)1)++y1!((xx xx11))!y1 − f(x1) + y1! (x xd1)om(f) R− R R f(xˆ1) + y˜1! (x xˆ1) −− 0 f(x1ˆ1) 2+ y˜1! (x xˆ1) y y1 y2fy(xh0()−y+) (y0! (yx, 1) xxˆ0)x˜ f(x0) + y˜ (x x0) − − f(xy) + y!(x− xy) 0! f(x1y)y+1 (yx2 y xh1()y!)y1f((xyfy,()1x+)0)xˆy+!x(˜xy0! (xxy)x0) Differentiable−f N−ondifferentiable f Ou−ter Linearization of f Slope λ − − ¯− − ¯ f(x0) + y! (x f(xx00)) + y˜0! (x x0) Const. f1(x) Slope: yˆ f2,X2 (x) h2,dXo2m( f(f(yx)1hR)2+0( Ryy!1()xRf2 (x1)) f(xˆ ) + y˜ (x xˆ ) 0 − − −f(x1) +−(x1 x2 1)!fy(1xy) +1y!(x 1! xy) 1 − − Differentifab(xley)f+Nyon(xdiffexreyn)tiable f Outfer(xL1fi)(nx+e1a)(rxi+zayti!xo(1nx)!oyf1xff1()Sxlo0)pe+λy0! (x x0) − − − ! f(x0) + y! (x x10) y y1 y2 y h(y) (− y, 1) xˆ x¯ − 0 − − − ¯ ¯ f(x1) + y! (x f(xxˆ11)) + y˜1! (x xˆ1) dom(f) R R R − Const. f1(x) Sl−ope: yˆ f2,X2 (x) h2,X2 ( y) h2( y1) f (x) 0 1 2 Differenti¯able f N¯ondiffe−rentiable f Ouft−(exr1)L+ine(axrizaxt1io)!ny1of −f Slope−λ 2− − f(x1) + (Dx iffexr1e)n!tyiC1abolnestf. Nofn1d(ixff)erSelnotpiea:ble fyˆOf2u,tXe2r(fxL()xin1he)2a+,rXiz2ya(1!t(ixoyn) hox2f1(f) Syl)ofpe(λx) − y y1 y2 y h(y) ( y, 1) xˆ x˜ f(x1) + y1! (x x1) f(xy) + y!(x x2y) f(x ) + y (x x ) − − dom−(f) R0 R1 R2 − − − − 0 0! 0 − dom(fPolyhedral) R0 R1 ApproximationR2 f(xy) + y!(x xy)Differecn"t2iablExtendede f Non Monotropicdifferenti Programmingable f Out−er Linearization of f Slope λ Special Cases Differentiable f Nondifferentiable f Outer Linear¯ization of yf yS1loyp2eyλh−(y) ( y, 1) xˆ x¯ ¯ ¯ f(x ) + y (x f(xx0)) + y! (x x0) y y y y hh(2y() (y)yh,21() xˆy)x¯ yˆk Const. h1(y) Cofn(sxt. )c+ofn(1vx((xX)kxS)lo)pye: yˆ f2,X2 (x) h2,X2 ( y ) h2(! y) f y(x) 0 1 2 −dom(f) R1 0−R1 R2 1 ! 1 −f(x1) + y1! (x x1)− − − 2 − dom(f) R0 R1 R2 − − − f−(x1) + (x x−1)fy(1xy) −+ y2(x xy)− 2 f(xy) + y (x xy) ! ! − c " ¯ ¯ ! − y y1 y2 y h−(¯y) ( y, 1) xˆ x¯ f(x1) + (x xf1()xy11) + y1! (x x1) y y1 y2 y h(y) C( onys,t1.) xˆfx˜1(x) Slope: yˆ f2,X2 (x) h2,X2 ( Dy)iffhe−2r(entyi)abfle(fx)Nonhd2i(fferyce)"n2thi2a(ble−y)f OuyˆtkerCLoinnseta.rizaht1i(oyn)of f Slopceonλv(Xk) ! − − ¯ x−xˆ x˜ − xˆ 2 g g − − − − − 2 − − − Differentiha2b(le yf)fN(hIllustrationxo2n()d+iyff∗)e(dxreokCnmyotˆ(ixknkaf+sbC)t1l.oeyRnfks of+tfRO.1(ufxt(R Simplicialk)ehxr1S(L)klyo+i)np1e(a:xrizaxyˆt1iof)c¯n!oyn1o Decompositionvf((fXx)Sk)lh¯ope λ( y) h ( y) f for(x) minx∈C f (x) − 1 − −¯ 1 ! 10¯ 11− 2 − −2 2,X2 2,X2 f(xy) +2 y!(x 2xy) Const. f1(x) Slope: yˆ f−2,X2 (x)−h2,X2 ( y) h2( −y) f (x) Differe−ntiable −f Nondifferentiable f Outer Linearization of f Slope λ dom(f) R0 R1 R2 2 xy y!(x xy) + f(xy) z − c "2 xy y (x xy) + f(xy) z − Differ−entyiayb1ley2f yNho(nyd)iff(e−ryen, 1ti)abxˆl−ex¯f Ou¯ter xLinxˆe−karx˜izka+t1ioxˆnko+f1fgkSlgokp+e1λ f(x )!+ y (x x ) Differentiable f Nondifferentiable f Outer Line−arCizoantisohtn.2(offyf1)(Sxhl)2o(pSeloy∗λp)ed:omyˆ(kfyˆC)f¯oR2n,X0stR.(x1)Rhh¯212(,Xy)( y) h2(conyv)(fXk()x)y !− y ¯ y y1¯y2 y h(y) ( y, 1) xˆxx¯ xˆ x˜ xˆ g g f(x1) +2 (x x1)2!y1 2 − Const. f1(x) Slope: yˆ f2,X2 (x) h2,X2 ( ySlo) hp2(e: yx˜) f (Slo∗xx)ypkye:(xkx˜+1ixyk)+k+1xf(kxy)k+zf1 (x−−) x − − − − − 2 − dom(f) R R R −dom(kf+)12cR"20 R1! R2 k+1 k k y y1 y2 y h(y) (−yf, 1()x)xˆ x¯ − −¯ 0 1 −2 − − ≤ ∇ − Y f(x1) + (x x1) y1 h2( y) h2( y) yˆk Const. h1y(yy)1 y2 y h(cyo)n(v(Xy,k1)) xˆ x¯ ! f (x) − y−y1 y2−y h(y) ( y,−1) xˆ x¯ − 2 Differentiable f Nondiffecre"n2 tiable f Outer Linearization of f Slo−pe λ Y Const.− f1(x) Slo2pe: Sloyˆ pf¯e: x˜(kx+)1 xh¯Slo∗ xˆpke:(x˜kx˜y+)i1hxˆ2k(k+1xyk)g+kf1g(kx+f)1(xk) xk − h2( y) h¯2( cy") yˆk Con2s,Xt.2 h1(2y,X) 2 conv(Xk2) ¯ Slope: x˜ Slo¯pe:−x˜ i ¯ k x domx−yf(f(yfx)!(xR)(0xx)xRy)1+Rf2(xy) z − ≤ −Differen∇tiable f Nondifferentiable f Outer Linearization of f Slope λ hC2o(nsyt.) h2f(1(xy) Sloyˆpke:Ck+o1nysˆt.f2,Xh−21(x(y))h2,−X2 (k+−y1c)onhv2((XYky)) fk−(x) − 2 − − − − − − ≤− 2− ∇ − − 2 f X f c "2 ¯ ¯ x∗ xˆk x˜k+1 xˆk+21 gk gk+1 y y1 y2 y h(y) ( yC, 1o)nsxˆt.x¯ f1(dxo)mS(lofp)eR: 0 Ryˆ1 fR2,2X2 (x) h2,X2 ( y) h2( y) f (x) X x xˆ xˆ xˆ c "xˆ x x˜ x˜ x˜ x˜ xˆ = x f(x h) 2( fy(xˆ) h¯)2( fy()xˆ ) yˆkf(Cxˆon)st. h1(y) conv(Xk) 2 ¯ 0 1 2 3 4 0 1 2 3 4 4 ∗ 0¯ 1 ¯ 2− 3 − X x0 x−1 x2 x3 x0 x1 x2 f(x0−) f(x1)− f(x2) f(x3) h2( y) h2( y) yˆk Const. h1(y) conCvo(nXskt¯.) f1(x¯) Slope:∇ yˆ f2,∇X−2 (x)Sloh2∇p,−XCe:2X(x˜k−y+)1∇hSlo2( pe:y)fx˜f (−ix(x)y)yk xyk+−1y h2(fy()x(k) yx,k1) xˆ x¯ f f Const. f1(x) Slope: yˆ f2,X2 (x) h2,X2 ( y) h2x(∗ yxˆ)k fx˜k(+x1) xˆk+1 gk gk+1 Y 2 1 2 ∇ ∇ ∇ ∇ X − − − − − 2 x∗ xˆk x˜k+−1 xˆk+1 gk gk+1− 2 − − ≤ ∇ − − − X x− xˆ xˆ −xˆ xˆ fxZ (xx˜)Prximal˜ x˜Prox˜ximalxˆ Iterati=2 x on f(Dualx ) Profx(imalxˆ ) Iterationf(xˆ ) Oufter(xˆAp) prox- Slope: x˜ SloLevpe:el sx˜etsi ofkfx f(x ) x 0 1 2 3 4 0 1 2 3 4 c4 " ∗ Leve0l sets of f1 2 3 k+X1 x0 xˆ1 xˆ2 xˆ3 xˆ4 xk0+1x˜1 x˜2 x˜k3hx˜24(k xˆy4)=h¯2x( y)f(x02yˆ)k Cfoimation(nxˆs1t). offhx(f1∗xˆ(Inn2xyˆ))k ex˜r kfLi+(nearizationx1ˆ3xˆ)kc+o¯1n∇vg(kXofgkCk)+on¯1∇jugate h ∇ C ∇X ≤ ∇ ∗ c "Const. f1(x) Slope: yˆ f2,X2 (x) h2,X2 ( y) h2( y) f (x) x∗ xˆk x˜k+1 xˆk+1 gkhgk(+1y) h¯ ( y) yˆ Const.− h (y−) ∇ − c∇onv(X ∇−) ∇− 2 f X f 2 2 2 2 k C Slo1pe: x˜k+1 Slope: −x˜ ik k xk+1 f−(xk) xk − −conv(Xk) cN" − − − − N Lev−el s2ets of f ¯ Const. f1(x) SlopePr: imalyˆ f¯2,XPro(x)ximalh¯2,X ( Iteratiy) h2( ony) f (Dualx) Proximal Iteration Outer Approx- Slope: x˜k+1 Slope: x˜ i k xk+1 f(xk) xk ≤ c "2 h∇2( xy)xˆ hxˆ2( xˆ yy) y yˆyk yˆCoyˆnst. h1(y) 2 conv(X2k) 2 Level sets of f Primal¯≤ ProximalXc "∇x20Iteratixˆ1 xˆon2Cxˆ3XDualxˆ4 xPr0 x˜ox1imalx˜2 x˜Iteration3 x˜4−x04 =1Oux2∗ter−3 Apf0(−prox1−0)x-2 kf(kx+11) −fmax(x−2) −f(x23p)ixi − − h2( ym) ha2x( y) yˆkpiCxxoi nsxˆtk. x˜kh+1(yxˆ)k+1 gk gkc+o1nv(Xk) ∇ ∇ imationxk ∇xkN+1 xkof+1∇gfk gInnk+1 er Linearization of Conjugate h ¯ N ∗ Slope: x˜k+1 Slope: x˜ i k xk+1 f(xk) xk wixi W h2( y) h2( y) yˆk Cons−t. h1(−y) − conv(XPr−kimal) Proximal− 2Iteration DualNProximal Iteration Outer iAp=1 pro≤x- i=1 Slope: x˜k+1 Slope: x˜ i k xk+1 f(xk) xkx∗imationxˆk x˜k+ofw1ixfxˆi kInn+W1 egrik=Li1gnearizationk+1 of ConFjugate(x) Hh(λ) g ≤ ∇ X x0 xˆ1 xˆ2 xˆ3 xˆ4 x0 x˜1 x˜2 x˜−3 x˜4 xˆ4−= x − f(x0) if=−(1xˆ1) ≤ f(−xˆ2)2 f(xˆ3) k k k+1 xi=0 or 1 " ≤ ∇ ∗ x =0 "N imation of f Inner Linearization of Conjugatexh xˆ x˜ 2xˆ g g ∇ ∇ i or ∇1 ∇ Level sets of f max pixi ∗ k k+c 1" k+!1 k k+1 h2( y) h¯2( yN) yˆ Const. h1(fy)(x0) +cyonv((xX ) x0) X x!0 xˆ1 xˆm2 axˆx3 xˆx4∗xxˆ0kx˜x˜1kpx˜+ix21ix˜xˆ3k+x˜14 gxk4 g=k+x1∗ f(xw0ix)ik Wf(xi1=)1 Subgrf(x2adien) ft(sx3T)emplat0! e k 2 x0 xˆ1 xˆ2 xˆ3 y y y X x0 x1 x2 x3 x0 x˜1x∗x˜2xˆkx˜3x˜kx˜+41xxˆ4k=+N1xg∗k gfkSlo+(x10p)e: x˜fk(+x11)Slopfe:(x−2x˜) i f(kx−∇3x)ik=+11− f≤∇(xk)fx(xk0∇)−+ (x x∇0)−!y0 2 c " 0 1 2 Level sets of f Subgradients Template wixi W x0 x1 x2 g0 g1 g2x =xˆ0 oxˆr 1xˆ xygk"y+1(yxN)yˆgk(yˆx) g(x)¯Const. f(x) c −f(x) Slope: f(xk) xk+1 xk+1 f(xk) xk Slope: x˜k+1 Slope:i=1x˜ i∇ ≤k xki+∇=11 f(x∇k) xk f(∇x≤0) +i0 y10!∇(2 x3 0 x1 02) hk 2(k−+1y) h2( y) yˆk Const. h1(y) conv(Xk) x =0 " − − −∇ ∇ i or≤1 X x0 x∇1 x2 x3 x0 x˜1 x˜2!x˜3 x˜4 x4 =−Slox∗pe:f(x˜xk0+)1−Slof(pxe:1)−x˜ if(x−2k)xk+ff1((xd)3)f(−xk) xk − 2 X x0 xˆ1 xˆ2 xˆ3 xˆ4 x0 x˜1 x˜2 x˜3 x˜4 xˆ4 = x f(x0) f(xˆ1) f(xˆ2Lev) efl(sxˆets3) of f max f(∇xˆ1) +p(xix∇i xˆ1)!y1 ∇ Fk∇(xx ) Hk(λ) gk+1 Leve∗l sets of f N SubgradFike(nx)tsHkT(λe)mgkp+1latxe∗N xˆk x˜k+1 xˆk+1 gk−fg(kx+1 ) + y≤ (x x∇ ) ∇ ∇ ∇ !FSlo∇k(xp)e:Hxk˜(kλ+)1gSlok+1pe: x˜ i k xk+1 f(xk) xwk ixi W 1 1! 1 Slope: x˜ Slope: x˜ i k xfx(d) f(x ) x i=1 ≤ i=1 conv(Xk) Subgrakd+i1ents Template k+1 k k ≤ f(x1∇) + y (x x1)Slopes: endpoints ofx∂!fxˆ(xk) x˜k+1 xˆk+1 gk gk+1 max pixi ≤ ∇ Level sets of f N xi1!=0 or 1 " ∗ − N X x xˆ xˆ xˆNxˆ x x˜ x˜ x˜ x˜ x = x f(x ) f(x ) f(x ) f(x ) Level sets of f wixi W 0 1 2 3 4 0 1 2 3 4 4 ∗ f0(x0) +−(Xx1x xx0x)∗!yxxk2xxk+1x xk+1x3 gxk gxk+1x = x f(x ) f(x ) f(x ) f(x ) iSlo=1 pes:≤ endpi=o1ints of ∂!f(x) ! fx(d) P (0u) =1 c02max3 0, u1 2 3 4 4 ∗ 0 1 2 f(3x0) + (x x0) y X x0 x1 x2 x3 x0 x˜1 x˜2 x˜3 x˜4 x4 = x∗ f(x0) f(x1m) axf(Slox2)pe:∇fx˜(pkxi+3x1)i Slo∇pe: x˜− i∇ k xk+∇{1 }f(xk) xk ∇ ∇ ∇ ∇ ! 0 xi=0 or 1 " max fxp(idx)i SubPg(Nura) d=iecnmaxts T0e, umplate N α − N ∇ ∇ w∇ixi W X∇x0 x1 x2 x3 x0 x˜1≤x˜2 x˜3 x˜4 ∇x4 = x f(x0) f(x1) f(x2) f(x3) N i=X1 x0 x1≤x2 x{3 ix=01}x1 x2f(xˆf(x)0+) (xf(x1xˆ) ) fy(x2) ∗ ! wixi W 1 1 ! 1c Slope: x˜k+1XSlox0 px1e:x2x˜xi3 x0 kx1xxk2+1 f(xf0)(xkf)(x1k) f(x2) i=1 Lev≤ el setsi=1of f Slopes: xendp=0 oorin1ts of"∂!f(xm)∇ax ∇ −Lev∇pel xsets of f ∇ ∇ ∇ ∇ X x0 x1 x2 x3 x0 x˜1 αx˜2 x˜3 x˜4 x4 = xi ∗ f(x0) f(x1) f(x2) f(xi3)i yk xk+1 ≤ ∇ ∇ ∇ f(xˆ∇1) + (x xˆ1) y SubgradXienmxt0asxxˆT1exˆm2 pLevxˆl3atxˆeep4Slol sxets0pes:x˜o1fxe˜fndp2 x˜3oix˜nx4ti=sxˆ0o4of=r∂1!xf(x)"f(x0) f(x1) f(x2)c f(x3) N Slope gˆf(=xy−) + y (x xy) ! 1 i i ∗ Level sets of∇f ∇ w∇ixi fWx(d)∇ k c ! N ∇ ∇ !∇ f∇(xxxyhg()kλ+1)+(hix=1)y(1gλ!k)(xxh)≤2g((λx))xConst.i=yh11)(λ )f(xx) c( fλ(xLev,k)1)Sleop(l see:λts, 1)offf(xk)Nxk+1 xk+1 f(xk) xk − wixi W ! N Level sets o∗f f ∗ ∗ ∗ − i=1 ≤ i=1 Subgradients Template xi=0 or 1 −" − −− − −∇ ∇ Subgradients Template P (u)Nα=−c max 0, u xi=0 or 1 " Level sets of f 1 N X x x x x x x˜ x˜ x˜ x˜ x = x f(x ) f(x ) maxf(x ) fp(ixxi ) N Level sets of ffx(d) x h(λ) h1(λ) h2(Sloλ∗)0pes:1hm1e(andpλ2x∗) 3oxi∗n0ts( 1o!λfp∗∂i2,x!1)fi (3x()4λ, 1)4 { ∗ } c 0 1 N 21 3 α 2 wNixi W 2 ! Subg−rNadients Tem−maxplate − pixi p(v)∇+ f v(x ∇) +#k(i=x=1∇δk ≤x )∇ix=1yk+1 yk max pixi wixif W(x )N + (x x ) y 1 1 ! 1 max pixi N i=1 ≤ 1i=1 wixi W X1c x!0 x11 x22"x3" x0 x˜1 x˜2xi=0x˜3or−x˜142cxk4#"= x∗−N f#(x0) f(x1)P (fu(x)2)= cf(maxx3) 0, u Subgradients Template N fx(d) i=1 ≤ i=1 Slopes: endpoints of ∂ f(x) fwxi(xdiN)W x =0 or 1 " 1 max −pixi ∇wixi W ∇ ∇ ∇ ! ≤ i=1 i c x =0 " i=1 ≤ i=1 { } i=1 Level sets of f2 i or 1 − α N ! x =0 "1 X x0 x1 x2 x3 x0 x1 x2 x3 x4 x4 =wxi∗xi fW(x0) f(x1) f(x2x)i=0 for(x13) " i or 1 max xp(vh)(λ+) phix1v(iλ) h2(λ ) h1(λ ) xSubgr( λadien, 1) i=(t1sλT, 1)emplat≤ ei=1 max Slopes:peindpxi oinNts of!∂!f(x) ! ∗ fx(∗dSl) op∗ e = 0∗ ∇ ∇ ∇ ∇ N wSubgrx Wadients 2Templat" " e − Levinf−eLl cs(etsx, xλiko=)−f0 for 1 " ! c fxSlo(d)pes: endpoints of ∂!f!(xDi)Suffberengraditiabents leTefmi piNond≤late i=i1Diffeffreerenntiabltiabelef fOuNondter Lineariifferentiablzatione fofOufteSlropLinearie λ zation of f Slope λ wixi W i=1 x isX #Nk-subgradienSubgrt adients Template α i=1 ≤ i=1 ∈ Subgradients Template xi=0 or 1 " Level sets of f c ! x =0 " 1 2 fx(d) i or 1 Slopes: endpoints of ∂S!ufb(xg)rpa(mdv)iae+xntsv Tempplaxte N dom(f) R!R R α fx(d) 2" " i i Slopes: endpoints of ∂!f(x) 0 1 2 dom(f) R0 NR1 R2 N x h(λ) h1(λ) h2(λ∗) h1(λ∗) x∗ ( λ∗, 1) ( λ, 1) !Subgradieαnts Template inf Lfcx(x,(d)λk) wixi W v γk fx(d) Subgradients T1emplate x X i=1 ≤ Slopes:i=1endpoints of ∂!f(x) max pixi − − − fx(d) Slopes: endpoints of ∂!f(x) x =0 or 1 "max p x N y y y y h(y) ( y∈, 1) i α i i w x W 0 1 2 y y y y h(y) ( y, 1)N Slf op(des:) endpoinitsi≤of ∂ f(i=x)1 0 Slop1e =2g0 Slopinef =Lcg(1x, Sλlopk) e = gw2ixivkW+1 px(v) i=1 ! c α Slopes: endpoints of ∂!f(x) 1 ! i=1 ≤ i=1 2 1 − x X xi=α0 or 1 " p(v) + v Slopes: endpoints of ∂!f(x) fSxu(bd)gradients Templat∈e − xi=0 or 1 " x fx(d) α 2" " v Slopes: endp1 oints!of ∂!f(x) ! α Subgradients TSemplatfu1(bxg) rafed2(ixyek)nxtks+1Template 1 yk xαk+1 v SlopSleopgˆek = sup− d g Slopes: endpoints of ∂!f(x)Slope gˆk = − gck ∂!f(x) # Slopes: endpoints of ∂!f(x) α ck fx(d) ∈ yk xk+1 Slope = g Slope = g Slop(a)e = (b)g v p(v) 2Slope gˆk = − 0 1 2 k+1 c " α ck Slope = g0 Slope = g1 Slope = g2 vk+1 p(v)fx(d) 1 xf (d) 2 inf Lc(x, λk) α 1 1 x 2 c " x X h2( λ) H2,k(Slopλes:) endpg# ko=inδConst.ts of ∂!xf(x) yh21(λ) #kcon= δk v(Xkxk)+1 yk ∈ α 1 k k minik+1mize k f(x) = x1 + x2 − 2c # − # 1 − − − h2(−Sl2opλckes:)# Hend−2−p,koin#(ts ofλ∂)!f(−x) gk2Const. k h1(λ) conv2 (Xk) f1(x) f2(x) f1(x) f2Slo(xSl)popes:e =endpinfg o∂i!nft(xs1) do#fg ∂!f(x) #k = δk xk+1 yk − subject to −g(x) = x−1 0 ∈ − −−2ck #2 − # 1 Slope = 0 1 Slope = 0 α ≤ α v (a) (b) x X = (x1, x2) x1 > 0 x + αd (a) (b) is #k-subgrax∗dienxtk xk+1 xk+1 gkisg∈#k-s+1ubgradienSlopte |= 0 } α y x k k+1 is #k-subgradient minimSlizope e fgˆk(x=) = x−1 1+ xof2x∂!f(x!x)k xk+1 xkSl+1opgek=gkg+10 Slope = g1 Slope = g2 v p(v) γ cfk = ,∗q = γ k+1 minimize f(x) = x1 + x2 k ∗ ∗ k subject to g(x) = x1 0∞ −∞ 1 ≤ γk 1 Fd(2α) = f(x + αd) subject to g(x) = x1 0 x X = (x#1k,=x2δ)k x1 > 0 xk+1 yk f1(x) f2(x) Slope: x Slop≤ e: xi,x i∈ Sk=x(x2, x)|− x2fc>k(#0}x )−x # x k+1 k+1 | k k x X = (x1, x2) x1 > ≤0 ! ∇ x Slope = supg ∂ f(x) d#g Slope = sup d g x, f(x) # ∈ ! Slope =| 0Slf ∗op= }e:, q∗x=!k+1 Slopg ∂!e:f"(x)x$i, i k xk+1 f(xk) xk ∈ ∞ −∞1 ∈ (a) (b) − is # -subgradient Slope = sup d g ! k #≤g ∂!f(x) #$ ∇ f = , q = x ∈x ∗ ∗ 2 # ∞ −∞ S = (x , x) x > 0 minix mize f(x) = x1 + x2 Slope = infg ∂!f(x) d#g | γk ∈ Slope = infg ∂!f(x) d$g ! " ∈ subject to g(x) = x1 0 X x0 x1 x2 x3 x0 x1 x2 x3 x4 x4 = x∗ 1f(x0) f(x1Sl)ope =fin(fxg 2∂!)f1(x) d#fg (x3) 2 x + αd ∈ ≤ S = (x , x) x > 0 ∇ ∇ x ∇ x + αd ∇ | x X = (x1, x2) x1 > 0 of ∂!f(Xx) x0 x1 x2 x3 x0 x1 x2 x3 x4 x4 = x f(x0) xf+(αxd 1) f(x2) f(x3) of ∂!f(x) ∗ ∈ | } ! Sl"ope = supg ∂!f(x) d$g Level sets of f ∈ ∇ ∇ ∇ ∇ 2 of ∂!f(x) ! q(µ) = min x + µx Fd(α) = f(x + αd) x x 1 f ∗ = , q∗ = ∈$ ∞ −∞ ! Level"sets of fN Fd(α) = f(x + αd) 1/(4µ) if µxSl,>opf(0xe)= #infg ∂!f(x) d$g = − − ∈ if µ 0 2 # −∞ max# ≤ $ pixi x + αd x, f(x) # S = (x , x) x > 0 N N − | wixi W# i=1 ≤ i=1 1 of ∂!f(x) # $ ! " xi=0 or 1 " max pixi # 1 hx(g) N 1wixi W ! i=1 ≤ i=1 Subgradients Template 1 h (g) xi=0 or 1 " x ! 1 Subgradienfx(d) ts Template

Slopes: endpoints of ∂!f(x) fx(d)

Slopes: endα points of ∂!f(x)

yk xk+1 Slope gˆk = − ck α

1 y x k− k+1 2 "k =Slδopk e gˆk = xk+1c yk − 2ck $ k − $

Slope = 0 1 2 "k = δk xk+1 yk is "k-subgradient − 2ck $ − $

Slope = 0 γk is "k-subgradient 1

γk

1 f(x0) + y0! (x x0) Polyhedral Approximation Extended Monotropic Programming− Special Cases f(x0) + y0! (x x0) f(x1) + y (x x1) − 1! − f(x1) + y1! (x x1) x h2(λ) − h1(λ) h2(λ∗) − h1(λ∗) x∗ (−λ∗, 1) (−λ, 1) Slope = λ − Slope = λ∗ f(xy) +y!(x xy) n − Dual Views for minx∈< f(xfy)1+(yx!(x) +xy) f2(x) f(x1) + (x x1)!y1 − f1(x) f2(x) − f(x1) + (xf(x0x)1+)!yy10! (x x0) Differentiableff(xN0)on+dyiff0! e(xrentixa0b)le f Out−er Lineariza−tion of f Slope λ f(x−0) + y˜0! (x x0) (a) (b) dom(f)DRifferRentRiable f Nondifferentia−bfle(xf1)O+utye!r(xLinexa1r)ization of f Slope λ Inner linearize f2 0 f1 (x12) + y! (x x1) 1 1 − − y y1 y2dyomh(yf)) (R0yR, 11) Rxˆf2x¯(xˆ1) + y˜1! (x xˆ1) minimize f(x) = x1 + x2 − − y y1 y2 y h(y) ( y, 1) f(x ) + y˜ (x x ) − f(xy) +0 y!(x 0! xy) 0 subject to g(x) = x1 ≤ 0 Const. f(xfy()x+) yS!l(oxpe: xy)yˆ f¯ (x) h¯ (x) f (x−) 1 − 2,X2 2,X2 −2 − f(x0) −+ y! (x x0) Cofn(sxt.) +fy1((xx) S0lxop)e:f(xf1)(fx+ˆ1(()xx+) fy˜2x((x1x)) yf1 x(ˆx)) x ∈ X = (x1, x2) | x1 > 0} f(x1) + (0x x10! ) y1 0 − 1 1! ! 2 1 −− ! − − ∇ − − ! Difff(exren)ft+i(axyb1l)e(+xf yN1!x(oxn)diffxe1r)entiable f Outer Linearization of f Slope λ Differentiable f Nondifferentiab1le f O1! uter L1in−earization 2of f Slope λ f ∗ = ∞, q∗ = −∞ ¯ − c " dom(f) R Rh2R( y) h2( doym) (f)yˆkR0CoRn1stR. 2 h1(y) conv(2Xk) 0 1 −2 − − − f−(x02) + y0!c("x x0) y y1¯ y2 y h(y) ( y, 1) xˆ − y y1 y2 y h(y) ( y, 1h)2( y) h2( y) yˆk Const. h1(y) conv(Xk) − f(xy−)f+(xyy−!)(x+ −y!x(xy) xy−) − 2 2 − x x x x− −g g f(x1) + y! (x x1) S = (x , x) | x > 0 ∗ k k+1 k+1 k k+1 1 − f(x1)f+(xx(1x∗)C+xoknx(sx1t)k.!+y1x1fx)1k!(+yx11) gSklogpke+:1 yˆ f2(x) f (x) ! " f(Cxo0n)s+t. y! (fx1(x)x0) f1(x) f2(x)−f (x) − 2 −0 − − ∇ − − 2 2 DifferDenifftiearbelnetifSloabNlpeoe:nfdxNiffoenrednSloifftieaprbee:lnetxifabiOleutfkerOxLuitneeraLrifizn(aextairo)inzxaotfiofnSolfopfeSλlope λ q(µ) = min x + µx k+1 k+1f(xy) +k y!(kx xy) x∈# f(x1) + y1! (x x1) ≤ ∇ − ! " dom(fdo)mR(0fR) R1 0RR2 1 −SloR2pe: xk+1 Slope: xi, i k xk+1 f(xk) xk 2 ¯ c "2 ≤ ∇ c " −1/(4µ) if µy >y1 0yy2 y¯y hy(yy) (h(yy), 1() xˆy,h12)(xˆ yx˜) h2( y) yˆkfC(xo1n)st+. (xh1(yx)x1)gyk+11 (x)cogn(vx()XConst.k) f(x) Slope f(xk) xk+1 f(xk) xk = h2( y) h12( 2 y) yˆk Const. h1(y) conv(Xk) ! − − − − −− −− −2 − − − 2 − ∇ ∇ x h2(λ) − h1(λ) h2(λ∗)#−−h∞1(λ∗) xi∗f µ(−≤λ0∗f, 1()x )(−+λy, 1()x Slxop)e = λ X x0 x1 xy2 x3 x!0 x1 xD2yiffxe3rxen4 txia4b=lexf∗ Nofn(dx0iff)erefn(txia1)bleff(xO2u) terf(Lxi3n)earization of f Slope λ ∗ Const. f−1(x) Slope: f1(x∇x) xfk2(x)k∇+f1 (xxk)+1∇gk gk+1∇ Slope = λ X x0 xx1 xx2 kx3xkx+01xx1k+x12 gxk3 gxk4+x14 ∗= x f(2x0) f(x1) f(x2) f(x3) minimize f(x) = −x ∗− dom(f) R−0 R∇1 R2 ∗ fC(xon1)st+. (xf (xx)1S)!lyo1pe: yˆ f¯ (x) h¯ ∇( y) h∇( y) f∇(x) ∇ Level sets of f1− 2,X2 2,X2 2 2 subject to g(xf)(=x0x) +− y10/! (2x≤ 0x0) − y y1 y2 −y h(y) ( y, 1) xˆ x¯− X−x0 x1 x2 x3 x0 x1 x2 x3 x4 x4 = x∗ f(x0) f(x1) f(x2) f(x3) 1 2 − Level sets of f Dual view:Diff Outerfe(rxen)tiafbl(e linearizexf) Nondifferenhtia2ble f Outer LiSloneapre:izaxtki−+o1nSloof pfe:Slxoip, eiλ k xk+1 f(xk) xk ∇ ∇ ∇ ∇ Slope: xk+1 Slope: xi, i k xk+1 Nf(xkc)"2xk x ∈f(Xx1=) +{0y, 1! (}x x1) ≤ ∇ 1 h2( y) h¯2( y) yˆk Con≤st. h1(y∇) conv(Xk) Level sets of f dom(f) R0 R1 R2 − max pixi N (a) (b) − − − N − − 2 wixi W 2 y y1 y2 y h(y) ( y, 1) xˆ x¯ i=1 ≤ maxi=1 pcix" i¯ ¯ q(µ) = min −x + µ(x − 1/2) ¯ Const. ff(N1x(0x))+Slyo!p(ex: xyˆ0)f2,X2 (x) h2,X2 ( y) h2( y) f (x) N x∈{0,1} − h2( y) h2( y) xiyˆ=k0 Coro1nst.w "x hW10(y) conv(Xk) 2 x∗ xk xk+1 x−k+1 gki gik≤+1 i=−−1 − − minimize f(x) =! fx(1x+y)x+2 y!(x "xy)f(x−0)X+ yx0!−(x1 x−2x0x)3 x0 ix=11 x−2 x3 x4 x−4 =2x∗ f(x0) f(x1) f(x2) f(x3) X x0 x1 x2 x3 x0 x1 x2 x3 x4 x4 = x∗ f(xx=00) or f1(x1)"f(x2) f(x3) max pixi − −! i ∇ ∇ ∇ ∇ N = min{−µ/2, µ/2 − 1} Subgradients Template∇ f(x1)∇+ y1! (x ∇ x1) ∇ wixi W 1 ! i=1 ≤ i=1 subject to g(x) =fx(x1≤) +0 (x x1) y1f(x1) + y! (x x xxˆ1) x˜ xˆ g g − Const. f (x) Slope: ! yˆSuf¯bgra(dxLev1i)eh¯netls∗sTets(ekmoyfpk)+flha1te(k+y1) fk (xk+)1 xi=0 or 1 " ∗ Lev1 el sets− of fSlope:2,Xx2k+1 Slo2−,pXe:2 xi, i 2 k xk+1 2 f(xk) xk (0, f ) = (0, 0) (−1/2, 0) (1/2−, −1) − − − f(x0) + y! (x x0) x ∈ X = (x1, x2) | x1 > 0} f≤x(d) ∇ c "2 ! 0 −1/2 1Differentiable f Nondifferentiable f Outer Linearizationhof(f ySf)lo(h¯xpe()λ+y)y (xyˆ xCo)nst.N Subgrh (y) adientscTonemplatv(X ) e − ! 2 N 2y fx(!d) k y 1 k dom(f) R0 R1 R2 Slopes:f(exndpy)Slo+oipnye:t!s(xx˜okf +∂1!xfySlo()x−)pe: x˜ i − k x−km+1−ax f(xk) x−kpixi − 2 f(x1) + y1! (x x1) f ∗ = ∞, q∗ = −∞ max− pixi ≤ N ∇ − N wixi W y y1 y2 y h(y) ( y, 11) xˆ x¯2 1 2 Slopes: endpw xoinWts of ∂f!(fx(1x))+ (xi=1 x1)≤!y1 i=1 fx(d) q(µ) = min x − x +Xµx(0x x+1 xx2 x−31x)0 x1 x2 x3i xi≤4 x4 =i=x12∗ f(x0) f(x1) f(x2) f(x3) 0 −0 f(x1) + i(=x1 x1) y1c " xx −=x0 orx1 x" g g x1≥ , x2≥ ¯ x =0 ! " ∇ ∇∗i k k+∇1 k+1∇k k+1 !h2( y) h2( y) yˆk Co"nst. i h−1o(ry1) cαonv(Xk) − − − Different−iable f N−ond2ifferentia!ble f Outer LineaSlropizaes:tionendofpfoinStslopofe ∂λ!f(xf)(xy) + y!(x xy) Const−. µ f1i(fxµ)2S≥lo1pe: yLevˆ f¯2e,lXsets(x)oh¯f 2f,X!( yS)ufbg(rxa)dients Tempαlate DifferentSiaubblgerfadNieo2ntdsiffTeeremnyp2tkilaabxtlkee+1f 2Outer Linearization of f Slope λ − = S−= (x , x) | x−>X0 x0 xˆ1 xˆd2omxˆ3(fxˆ)4 −Rx00 x˜R11x˜R2 2x˜3 x˜4 xˆ4 = x∗ f(x0) f(x1) f(x2) f(x3) # −∞ if µ < 1 Slope gˆk = c dom(f) R0 R1 xR2 xk xk+1 xk+1k gkykgkx+k1+1 Slope: xk∇+1 Slop∇e: x˜ i ∇k xk+1∇ f(xk) xk f(x1)α+ (x x1)!y1 ! " ∗ y Syl1opye2 gˆyk h=(y) −( y, 1) xˆ x¯N ck fx(d) ≤ ∇ − y0 y1 y2 y h(y) ( y, 1) fx−(d) 2 Level sets of f max 1 pixi y x q(µ) = min x + µx − 2 N 2 Differentiakblekf+1Nondifferentiable f Outer Linearization of f Slope λ q(µ) = min |x1| + x2 + µx1) c " "k =wδxk W xk+1 yk Slope gˆk = − xx∈0# Slopes: ieindp≤ oini=ts1 of1∂!f(x) ck h2( y) h¯2( 2y≥) yˆk CSloonpses:t. endph1(oyin)ts of ∂!fc(xo)nvi=(1Xk) − 2ck $ − $ 2 dom(f) R0 R1 R2 !! Slope:" xk+1"Slope: x i k xxki=+01or"1kf(=xkδ")k xk N xk+1 yk − − − − − 2 ≤ ∇ 2 2¯c ¯ 0−1/(4ifµ|)µ| ≤if 1µ > 0 Const. f1(x) Slope: c "−yˆ f2k,X$2 (x) h−2,X2$( y) hy2(y1 yy)2 fy h(x(y)) ( y, 1) = ¯ X x!0 x1 x2 mxa3xx0 x1 x2 xpi3xix4 xα4 = x∗ f(x0) f2(x1) f(x2) f1(x3) = h2( y)ShlSo2up(be g=yr)a0dieyˆnktsCTone−smt.plathe1α(yN) − conv(Xk) − − #− = δ x y 2 ##−−x∞∞xif x|µ| ≥if 1xµ ≤ 0g g wixi W ∇ ∇ ∇k k ∇ k+1 k ∗ k −k+1 k+−1isS"lko-−pseukb+=g1r0adient − i=1 − ≤2 i=1 − 2ck # − # k x =0yor 1x " y x i k k+1 k iks+"1 -subgSrlaLevodpieeengˆltks=ets o−f f 2 Slope gˆk = − k ! fx(d) ck c " minXiqm(xµi0z)e=x1 ixfn2f(xx)3p=(xu0)−+x1µx%u2 x3 xx∗4cxkxk4 x=k+x1∗ xkf+(1xg0k) gkf+(1x1) f(x2) f(x3)Slope = 0 ¯ u∈#r Subgradients∇Templa∇te 1 ∇ ∇ 2 h2( y) h2( y) yˆk Const. h1(y) conv(Xk) Slope: xk+1 Slope: x i k xk+1 f(xk) xk c " 2 subject to g(x!) = x −Slo1/2"pes:≤ 0endpoints of ∂!¯f(x) 1 isN#k-subgra−dient − − − − ≤ ∇h2( y) h21( y) yˆk Cons1t. h1(y) con2v(Xk) Level sets of f " =−δ − x − y "k2 = δk− x−k+12 yk k k k+1 fx(kd) − m2cakx$ − p$ixi (xλ,∈µ,X1)=Slo{0p,e:1}xk+1 Slope: xi, −i 2ckk $xk+1 −f(x$k) xk N x∗ xk xk+1 xk+1 gk gk+1 wixi W ≤ α ∇ i=1 ≤ i=1 γk N x xk xk+1 xk+1x g=k0 gk+1 " Slope = 0 Slopes: endpSlopoein=ts 0of ∂!f(x∗) i or 1 X x0 x1 x2q(xµ3) x=0 xm1 ixn2 x−31x4+xµ4(=x −x∗1/2f)(x0)yk xfk(+x11) f(x2) f(x3) x∈{0,1} Slope∇gˆk m=ax∇− is ∇"p-isxuibgr∇adient ! is " -subgradNient ck k Slope: xk+1 Slope: xi, i k xk+1 f(xk) xk k w x W Subgradients Template x ! " i i≤ i=1 ≤ ∇ X x0 x1 x2 x3 x0 x1 x2i=x13 x4 x4 = x f(x0) fα(x1) f(x2) f(x3) Level sets o=f fmin{−µ/2, µ/2 − 1} xi=0 or 1 "Slo∗pe: xk+1 Slope: x i k xk+1 f(xk) xk ∇ 1 ∇ ∇ 2≤ ∇1 ∇ " = δ 1 x y Slope = supg ∂!f(x) d$g ! k k k+1 k fx(d) ∈ (0, f ∗) = (0, 0) (−1/2S, 0u)bLevg(1ra/e2dl,is−eetsn1t)sofTfempNlate − 2ck $ 1 − $ X x x x x x x x x x x = x f(x ) f(x ) f(x ) f(x ) −1/2 1 max pixi Slopes: endpoints of ∂ f(x) 0 1 2 3 0 1 2 3 4 x4 ∗ 0 1 2 3 N Slope = 0 ! ∇ ∇ ∇ ∇ wixi W N i=1 ≤ iX=1 x0 x1fxx(2d)x3 x0 x1 x2 x3 x4 x4 = x∗ f(x0) f(x1) f(x2) f(x3) xi=0 or 1is "k"-subgradient Slope = infg ∂!f(x) d$g max pixi ∇ ∇ Lev∇el sets∈o∇f f q(µ) = min x1 − x2 + µ(x1 + x2 −N1) α x1≥0, xSlo2≥0pes:!endpoints of ∂!f(x) wixi W Subgradients Template Levi=e1l sets≤ of fi=1 ! x =0"or 1 " 1 x + αd N −µ if µ ≥ 1 i = ! α 1 max pixi Subgradients Template N of ∂!f(x) N # −∞ if µ < 1 fx(d) wixi W i=1 ≤ i=1 max pixi x =0 or 1 " yk xk+1 N i − wixi W Slopes: endpoiSnltospoefgˆ∂k!f=(x) i=1 ≤ i=1 F (α) = f(x + αd) 1 2ck 1 fx(d) d! q(µ) = min |x | + x + µx ) xi=0 or 1 " Subgradients Template x2≥0 ! Slo! pes: endpαoints o"f ∂!f(x1) 0 if |µ| ≤ 1 Subgradients Tem2 plate = "k = δk xk+1 yk x, f(x)fx(#d) y x − 2ck $ − $ − k−#k+−1∞ if |µ| ≥ 1 α Slope gˆk = c # $ k fx(d) Slopes: endpoints of ∂!f(x)1 yk % xk+1 1 q(µ) = Silnofpre pgˆ(u=) + µ−u u∈# k 1 c Slopes:2 endpoints of ∂!f(x) "k =!δk xk"k+1 yk α − 2ck $ − $ 1 y x (λ, µ, 1) 2 α k− k+1 Slope = 0 "k = δk xk+1 yk Slope gˆk = c − 2ck $ − $ k y x is "k-subgradient k− k+1 1 Slope gˆk = c 1 Slope = 0 k 2 "k = δk xk+1 yk is "k-subgrad1ient − 2ck $ − $ 1 2 "k = δk xk+1 yk − 2ck $ − $Slope = 0 γk is "k-subgradient 1 1 γk

1 Polyhedral Approximation Extended Monotropic Programming Special Cases

Convergence - Polyhedral Case

Assume that

All outer linearized functions fi are finite polyhedral All inner linearized functions fi are co-finite polyhedral The vectors y˜i and x˜i added to the polyhedral approximations are elements of the finite representations of the corresponding fi Finite convergence: The algorithm terminates with an optimal primal-dual pair.

Proof sketch: At each iteration two possibilities: Either (xˆ, yˆ) is an optimal primal-dual pair for the original problem ¯ Or the approximation of one of the fi , i ∈ I ∪ I, will be refined/improved By assumption there can be only a finite number of refinements. Polyhedral Approximation Extended Monotropic Programming Special Cases

Convergence - Pure Cases

Asymptotic convergence results also available in other special cases, where nonpolyhedral functions fi are outer/inner linearized. Examples are the classical cutting plane and simplicial decomposition methods, and related algorithms.

Convergence, pure outer linearization( ¯I: Empty). Assume that the k k sequence {y˜i } is bounded for every i ∈ I. Then every limit point of {xˆ } is primal optimal. Proof sketch: For all k, m ≤ k − 1, and x ∈ S, we have

m X ˆk X ` ˆm ˆk ˆm 0 ˜m´ X ˆk X ˆk X fi (xi )+ fi (xi )+(xi −xi ) yi ≤ fi (xi )+ f , k−1 (xi ) ≤ fi (xi ) i Yi i∈/I i∈I i∈/I i∈I i=1

k Let {xˆ }K → x¯ and take limit as m → ∞, k ∈ K, m ∈ K, m < k.

Exchanging roles of primal and dual, we obtain a convergence result for pure inner linearization case. Convergence, pure inner linearization( I: Empty). Assume that the k ¯ k sequence {x˜i } is bounded for every i ∈ I. Then every limit point of {yˆ } is dual optimal. Polyhedral Approximation Extended Monotropic Programming Special Cases

Concluding Remarks

A unifying framework for polyhedral approximations based on EMP. Dual and symmetric roles for outer and inner approximations. There is option to solve the approximation using a primal method or a dual mathematical equivalent - whichever is more convenient/efficient. Several classical methods and some new methods are special cases. Proximal/bundle-like versions: Convex proximal terms can be easily incorporated for stabilization and for improvement of rate of convergence. Outer/inner approximations can be carried from one proximal iteration to the next. Convergence theory so far inspires confidence in the validity of the method. More work on complexity/rate of convergence theory is needed.