Lecture 9: : Revised Simplex and Interior Point Methods (a nice application of what we have learned so far)

Prof. Krishna R. Pattipati Dept. of Electrical and Computer Engineering University of Connecticut Contact: [email protected] (860) 486-2890

ECE 6435 Fall 2008 Adv Numerical Methods in Sci Comp October 22, 2008

Copyright ©2004 by K. Pattipati Lecture Outline

 What is Linear Programming (LP)?  Why do we need to solve Linear-Programming problems?

• L1 and L∞ curve fitting (i.e., parameter estimation using 1-and ∞-norms) • Sample LP applications • Transportation Problems, Shortest Path Problems, Optimal Control, Diet Problem  Methods for solving LP problems • ….not practical • Karmarkar’s projective scaling (interior point method)  Implementation issues of the Least-Squares subproblem of Karmarkar’s method ….. More in Linear Programming and Network Flows course  Comparison of Simplex and projective methods References 1. Dimitris Bertsimas and John N. Tsisiklis, Introduction to Linear Optimization, Athena Scientific, Belmont, MA, 1997. 2. I. Adler, M. G. C. Resende, G. Vega, and N. Karmarkar, “An Implementation of Karmarkar’s for Linear Programming,” Mathematical Programming, Vol. 44, 1989, pp. 297-335. 3. I. Adler, N. Karmarkar, M. G. C. Resende, and G. Vega, “Data Structures and Programming Techniques for the Implementation of Karmarkar’s Algorithm,” ORSA Journal on Computing, Vol. 1, No. 2, 1989.

2 Copyright ©2004 by K. Pattipati What is Linear Programming?  One of the most celebrated problems since 1951 • Major breakthroughs: • Dantzig: Simplex method (1947-1949) • Khachian: Ellipsoid method (1979) - Polynomial complexity, but not competitive with the Simplex → not practical. • Karmarkar: Projective Interior point algorithm (1984) - Polynomial complexity and competitive (especially for large problems)

 LP Problem Definition

 Given:  an m x n matrix A , m  n or A  Rmn , m  n assume rank ( A)  m  a column vector b with m components: b Rm  a row vector cTn with n components: c R n m x n  Ax b has infinitely many solutions  b aii x i1 T consider xRAr  , Axb r   Axx r  n  b , where xNA n  ( )   xAx n : n  0

3 Copyright ©2004 by K. Pattipati Standard form of LP

 We impose two restrictions on x :

 We want nonnegative solutions of Ax b  xi  0 (or) x  0 x such that Ax b & x 0 are said to be feasible

 Among all those feasible {xx }, we want * such that T c x c1 x 1  c 2 x 2  ... cnn x is a minimum • This leads to the so-called “standard form of LP”

min cxT  convex programming problem. If a  * (SLP): s.t. Ax b  bounded solution exists, then x is  x  0  unique  a single minimum. • Claim: Any LP problem can be converted into standard form.  Inequality constraints:

TTx a) ai x b i  a i 1  b i ; x n11  0, x n  slack variable xn1

Aa  x  In general: AxbAxyb      AI  bx ,  0, y  0 y

Increase number of variables by m & Aa is m by ( n m ) matrix.

4 Copyright ©2004 by K. Pattipati How to Convert Constraints into a SLP? - 1

 Inequality Constraints TT b) ai x b i  a i x  x n1  b i , x n  1  0; x n  1 ~ surplus variable x Ax b   A  I  b , y  0 y

c) di x i  define xˆˆ i  x i  d i & x i  0

d) di x i  define xˆˆ i  d i  x i & x i  0

e) di1 x i  d i 2  0  x i  d i 1  d i 2  d i 1 define xˆˆ x  d and x  y  d  d ; y  0 i i i1 i i i 2 i 1 i slack T f) b12i a i x  b i  use two slacks T ai x y i11 b i  yy , 0 T  ii12 ai x y i22 b i  TTTT g) axbi i   baxb i  i  i  axy i  i1   bax i ; i ybii2

5 Copyright ©2004 by K. Pattipati How to Convert Constraints into a SLP? - 2

 xi is a free variable

 Define xi x i  xˆˆ i with x i  0 & x i  0 a) Maximization: change cTT x to  c x n ˆ b) min xi s.t. Axb  Axyb   ; write xxx i  i  i i1 n  ˆ  min  xxii  i1  The optimal solution of x  this problem solves the   s.t.  A A I xˆ b  original problem.   y   Example of LP Problems

• L1 – curve fitting

 Recall that given a set of scalars b12 , b,..., bm  , the estimate that m minimizes  xb i is the median and that this estimate is insensitive i1

to outliers in the data {.bi }

6 Copyright ©2004 by K. Pattipati Curve Fitting

 In the vector case, we want x such that: m T  minaii x b  min Ax  b xx 1 i1 • L1 – curve fitting → an LP T write x x  xˆ ; ai x  b i  u i  v i ; n T Then the LP problem is: minuii v  min e u  v  x,,,, u v x u v i1 s.t. A x  xˆ  u  v  b xx 0; ˆ 0; uv 0; 0

T • L – curve fitting want x such that min max aii x  b  min Ax  b ∞ xx1im 

• L∞ – curve fitting → an LP

T  Let maxaii x  b  w ; then the problem is equivalent to: 1im T minw , s.t.  w  aii x  b  w for i  1,2,..., m xw, A e   x   b  minw s.t.       A e   w   b 

7 Copyright ©2004 by K. Pattipati Transportation Problem - 1

• Since the number of constraints is large (2m) and the number of variables (n) is small, typically the dual problem with (n+1) constraints and 2m variables is solved instead. maxbT   s.t. AeTT   0,     =1 and   0;   0  Transportation or Hitchcock problem (special LP)  mn sources of a commodity or a product and destinations

 amount of commodity to be shipped from source i  ai , 1  i  m  amount of commodity to be received at destination

(sink, terminal node) i bj , 1  j  n  shipping cost from source ij to destination

per unit commodity  cij dollars/unit − Problem: How much commodity to be shipped from source i to destination j to minimize the total cost of transportation?

8 Copyright ©2004 by K. Pattipati Transportation Problem - 2

c11, x11 i = 1 j = 1 c21, x21 mn c , x m1 m1 min axij ij x  ij ij11 c12, x12 n c , x s.t. x a  i  1,2,..., m 22 22 j = 2  ij i i = 2 j1 m cm2, xm2  xij b j  j  1,2,..., n c , x i1 1n 1n mn c2n, x2n Also: abij i = m j = m ij11 cmn, xmn Conservation Constraint: m sources: n terminal nodes: sinks, mn variables & ( m n ) constraints supply nodes destination nodes

 BIPARTITE GRAPHS: Special LP Problem

abii 1  Assignment problem or weighted bipartite matching problem.

9 Copyright ©2004 by K. Pattipati - 1

 Shortest-path problem

• We formulate it as an LP for conceptual reasons only

u - s, u, v, t are computers, edge lengths 2 5 are costs of sending a message between pairs of nodes denoting computers 1 s t - Q: What is the cheapest way to source send a message from s to t? 4 v 3

 Intuitively, xxsv ut 0, i.e., no messages are sent from s to v & from u to t .

 Shortest path s - u - v - t  xsu  x uv  x vt  1  Shortest path length = 2 1  3  6

10 Copyright ©2004 by K. Pattipati Shortest Path Problem - 2

 LP problem formulation

 Let xij be the fraction of messages sent from i to j

 min 2xsu  4 x sv  x uv  5 x ut  3 x vt

s.t. xsu 0; x sv  0; x uv  0; x ut  0; x vt  0

xsu x uv  x ut  0 (message not lost at u )

xsv x uv  x vt  0

xxut vt 1

• Add all constraints  xsu + xsv = 1, which it must be!! • Only 3 independent constraints (although 4 nodes)  In matrix notation:

xsu 1 0 1 1 0x 0  sv       Ax 0 1 1 0  1 xuv  0  b         0 0 0 1 1 xut  1   xvt  nn nodes   1 independent equations  Similar to Kirchoff's Laws

11 Copyright ©2004 by K. Pattipati Optimal Control Problem - 1

 Shortest path problem as a standard LP NOTE: min exT  - A is called the incidence matrix  s.t. Ax b - b is a special vector  - AAA is a unimodular matrix and so are all invertible submatrices  of x  0   detA  1 or  1  Optimal Control • Consider a linear time-invariant discrete-time system

xk1  Ax k  bu k; u k ~ scalar for simplicity, k  0,1,.... k 1 kk1 xkl A x0  A bu /0 N 1 N N l 1  Define Terminal Error: eN x d  x N  x d  A x0   A bu l /0

 Given x0 , xdk & given the fact that u is constrained by u minuuk max , we can formulate various versions of LP.

12 Copyright ©2004 by K. Pattipati Optimal Control Problem - 2

 Versions of LP n n N 1 a) mine x  AN x  A N l 1 bu 1-norm of error Ni  d0 i   l i1 i  1 l  0 i n T  cii d z i1  d  N vector with components  ANl1 b d i i il T  z  u0 , u 1 ,..., uN  1 

 s.t. uzmin 1 umax 1 T  Convert to standard form via: vi u i  c i  d i z n  Optimal Solutions: vuii   TT i1 *  di z c i if d i z  c i  0  vi   0 otherwise s.t. umin 1 z  u max 1   T   TT * di z  c i if d i z  c i  0 vi u i  c i  d i z 1  i  n u    i   0 otherwise  Can also include constraints on state variables

13 Copyright ©2004 by K. Pattipati Optimal Control Problem - 3

 Versions of LP T b) min maxeNi = min max c i d i z -norm of error 11i  n  i  n T Define v max cii  d z  min v 1in TT s.t. umin 1 zu  max 1, vcdz i  i  0, vcdz  i  i  0 • Proof of equivalence for (a) ***** Suppose vi , u i ,& z are optimal solutions. Claim: v i & u i can not simultaneously be non-zero. ****  If they are and vi u i ,define vˆˆ i = v i u i , u i =0 **** vˆˆi  u i  v i  u i  v i  u i .... a contradiction.  Only one of the two can be non-zero. • Proof of equivalence for (b) Let z*** , v be optimal for revised problem, but z is not optimal for original problem.  Suppose zˆ is optimal solution of original problem. T  Define v max dii zˆ  c  feasible for revised problem  vv *  Contradici to n.

14 Copyright ©2004 by K. Pattipati Diet Problem

 Diet Problem • We want to find the most economical diet that meets minimum daily requirements for calories and such nutrients as proteins, calcium, iron, and vitamins.  We have n different food items:

cjj  cost of food item

xjj  units of food item (in grams) included in our economic diet  There are m minimum nutritional requirements th bi  minimum daily requirement of i nutrient

aij  amount of nutrient i provided by a unit of food item j • The problem is an LP: n  min cx  jj  T j1  min cx n  s.t. Ax b s.t. aij x j b i ; i  1,2,..., m  j1  x  0 x 0; j 1,2,..., n  j  

15 Copyright ©2004 by K. Pattipati Classes of Algorihtms for LP

 Fundamental Property of LP  Optimal solution x* is such that ( n m ) of its components are zero.  If we know the nm components that are zero, we can immediately compute the optimal solution (i.e., remaining m nonzero components) from Ax b • Since we don’t know the zeros a priori, the chief task of every algorithm is to discover where they belong.  Three Classes of for LP • Simplex • Ellipsoid • Projective Transformation (scaling) Algorithm  Key Ideas of  Phase 1: Find a vector x that has ( n m ) zero components, with Ax b and xx 0. This is a feasible , not necessarily optimal.  Phase 2: Allow one of the zero components to become positive and force one of the positive components to become zero.

16 Copyright ©2004 by K. Pattipati Geometry of LP - 1

 Simplex Algorithm • Q: How to pick “entering” and “leaving” components?  A: cost cT x and Ax  b , x  0 must be satisfied. • Another Key Property: Need to look at only extreme (corner) points of the feasible set.  Minimum occurs at one of the corners Example: min xx x2 12 (vertices) of the fesible set: s.t. xx12 2 4 xx12 0,  2  corner point Q 3 xx120, 0 feasible  In n -dimensions, feasible set lies in 2 Q set T n -dimensions and so do the cost planes cx const. 1 P x1  Inequality constraints: 1 2 3 4 x1 2 x 2  4, and x 1  0, x 2  0  xR 0 defines a positive cone in n . x2 TT  aii x0 defines a half space on or below the plane a x 0 feasible T set  Feasible set = positive cone  half spaces defined by aii x b  polyhedron (polygon in 2 dimensions). x1  Feasible set is convex : x1 , x 2 feasible   x 1  (1  ) x 2 is also feasible   [0,1]. Line segment is also in feasible se.t

17 Copyright ©2004 by K. Pattipati Geometry of LP - 2

 An LP may not have a solution

x2 Example:

min xx12

-4 -3 -2 -1 -1 x1 such that xx12 2   4 xx,0 -2 12 feasible set is empty

 An LP may have an unbounded solution

x 2 Example:

min xx12

such that xx12 2 4 4 x1  opt. xx12 ,  (  ,  )  opt. cost value  

• So, an algorithm must decide whether an optimal solution exists and find the corner where the optimum occurs.

18 Copyright ©2004 by K. Pattipati Revised Simplex Algorithm - 1

 Revised Simplex Algorithm  Consider SLP: minz cT x s.t. Ax  b and x  0  Assume rank ( A ) m . Then, we can partition A B | N, where B ~ m linearly independent columns.  Assume first m columns for convenience

xB [B | N ]    b x  Rm ; x  R n m  BN xN  We know n m components of x are zero 1  If xNB 0, x B b is said to be the basic solution and the columns of B form the basis

 If, in addition, xxBB 0, then is called the basic feasible solution .

 In terms of xxBN and , the cost function is TT z cBBNN x c x 11 11  Using xBN B b  B Nx  B b B am11 x m ...  a n x n  TTTT11  zcBb BNBNN  c  cBNx  z0  px

z0  p 1 xm 1  ...  p n m x n

19 Copyright ©2004 by K. Pattipati Revised Simplex Algorithm - 2

 Revised Simplex Algorithm  Compute pT in two steps:

T 1) Solve: Bc  B TTT 2) Compute: p cNN  N or p  c  N ;  is called vector of simplex multipliers  p is called the relative cost vector  forms the basis fo exchanging basis variables. T  If p  0, then the corner is optimal, since p xN  0 and x  0,

it doesn't pay to increase xN .

 If a component pk 0, then the cost can be decreased by increasing the

corresponding component of xNk , that is,  x : m 1  k  n . • Simplex method chooses one entering variable

• One with the most negative pk (or)

• The first negative pk (avoids cycling)

• Simplex allows the component xk to increase from zero.

20 Copyright ©2004 by K. Pattipati Revised Simplex Algorithm - 3

 Revised Simplex Algorithm

 Q: Which component x1 should leave?  A: It will be the first to reach zero.  Ax b is satisfied again at the new point.

 Assume pxk  0, and consider what happens when we increase Nk from zero. old  Let xB initial feasible solution new11 old xB B a k x Nk  B b  x0  x B new 1  xxB Nk yB  b x0 , where By ak th new th  i component of xB will be zero when the i component of yx y x , and B1 b = x are equal. This happens when Nk i Nk i 0 i th1 th  xNk i component of B b / i component of yx 0ii / y

 So, among all yyii s such that  0, the smallest of these ratios determines

how large xNk can become. th  If the l ratio is the smallest, then the leaving variable will be xl .

 At the new corner, xxNk  0 and l  0.

 xBl nonbasic set & column a l joins the nonbasic matrix N .

 xkk basic set & column joins the basic matrix B .

xx00li  Thus,   min : yi  0 1im yyli

21 Copyright ©2004 by K. Pattipati Revised Simplex Algorithm Steps

 One Iteration of Revised Simplex Algorithm

1 • Step 1: Given is the basis B such that xB  B b 0.

T • Step 2: Solve Bc B for the vector of simplex multipliers . • Step 3: T Select a column ak of N such that p k c Nk  a k  0. We may, for example, select

the ak which gives the largest negative values of p k or the first k with negative p k . TT  If p  cN  N  0, stop  current solution is optimal.

• Step 4: Solve for y : By ak

• Step 5: Find  x00l y l  min x i y i where 1  i  m and y i  0. new - Look at xBi x0 i y i x Nk .

- If none of the yi s are positive, then the set of solutions to Ax b , x 0 is unbounded and the cost z can be made an arbitrarily large negative number. - Terminate computation  unbounded solution. • Step 6: Update the basic solution

xi  x  y; i  l ; Set x  corresponding to the new basic variable, k ( l goes out) ili • Step 7: Update the basis and return to Step 1.

22 Copyright ©2004 by K. Pattipati Phase I of LP  How to get initial feasible solution…Phase I of LP • An LP problem for Phase I m ˆ ˆ ˆ  min yim such that Ax  I y  b ; x  0, y  0 i1 yˆ ~ Artificial Variable m ˆ  If we can find an optimal solution such that  yxiB = 0, then we have . i1 m ˆ  If  yi  0 then there is no feasible solution to Ax  b , x  0. i1  Infeasible Problem

 Solve via revised simplex starting with x 0, yˆ  b & B  Im .

• Another approach is to combine both phases I and II by solving:  mineTT x Me y (where M is a large number) xy,   s.t. Ax y  b ; x  0, y  0

− This is called the “big-M” method.

23 Copyright ©2004 by K. Pattipati Basis Updates

 How to Update Basis: • NOTE: We need to solve: T  B  cBk and By  a , where the B 's differ by only one column between any

two subsequent iterations  column aakl replaces • A simple way to solve these equations is to propagate B-1 from one iteration to the next.

T  Recall: Bnew  B old  column a l  column a k  B old  a k  a l e l  rank one update 1TT  1  1 1  1Bold a k a l e l B old B old a k a l e l  1  1 1  So, Bnew  B old T 1 = I  B old . NOTE: B old a k  y and B old aell 1el B old a k a l  y l 11  Bold  EB new  product form of the inverse (PFI)

1 0 ...yy1 l ... 0 yeT 0 1 ...yy ... 0 E is called an l 1 T 2 l where E I   ell e  “Elementary Matrix.” yyll 0 ...... 1yl ... 0  0 ...... yyml ... 1  For large scale problems, store Ep as a vector and update TT and sequentially as follows: TTc E E ... E or y E ...... E E a ...  B p p1 1 p 2 1 k 

• What if yl is small? This creates a problem… • Modern revised simplex methods use LU or QR decompositions.

24 Copyright ©2004 by K. Pattipati Sequential LU and QR

 LU Decomposition • If B is the current basis:

 B  a12 a ... am  L old U old  B old

 Bnew  a1 a 2 ... a l 1 a l 1  a m a k  11  NOTE: H  Lold B new   u1 u 2 ... u l 1 u l 1 ... u m L old a k is an upper Hessenberg matrix. • Use a sequence of elimination steps on H to get: ˆ ˆ ˆ ˆ1 ˆ  UMMMHBLMMUnew  m1 ... l  1 l  new  old l ... m  1 new 11ˆˆ  Store: LMMLnew m1 ... l old  QR Decomposition T  Bnew  a1 a 2 ... a l 1 a l 1  a m a k ; Q old B new  H  Do Givens on H : TT JJHRQQJm11 ... l new and new old l ... J m n  Theoretically, revised simplex is an exponential algorithmO( ) . m  In practice, it takes approximately 2(nm ) iterations.  Each iteration takes approximately O m2  m n m operations.

25 Copyright ©2004 by K. Pattipati Sensitivity Analysis

 Duality and Sensitivity Analysis x Bb1  Recall that the basic feasible solution x B  x 0 N  is the solution of SLP "mincT x s.t. Ax b , x  0" if and only if: TT1   cBB ~ Vector of simplex (Lagrange) multipliers or dual variables TTT 1  p  cNB  c B N  0 ~ Non-negative relative cost vector

• Note that the optimal cost is given by

TTT1 z c x  cB B bb  

 So, z can be gotten by knowing optimal xB or optimal .  Q: Is there another way to get  ?  A: Yes, by solving an equivalent LP, called a dual LP problem .

26 Copyright ©2004 by K. Pattipati Dual LP Problems

 Dual of an SLP Primal Problem Dual Problem

min cxT max T b x   T s.t. Ax b , x 0 s.t. Ac  ASYMMETRIC n variables n constraints DUAL m constraints m variables non-negativity no constraints on

constraints on x the sign of {i }

• Duality of an Inequality constrained LP (InLP) Primal Dual

min cxT min T b x  SYMMETRIC s.t. Ax b s.t. AcT   DUAL x  0   0

27 Copyright ©2004 by K. Pattipati Duality Properties

• Dual of a Dual = Primal - For any feasible x and dual feasible  (SLP): TTTb  Ax c x  weak duality lemma TTT  (InLP): b Ax c x

Dual feasible solution ≤ primal feasible solution

• Very useful concept in deriving efficient algorithms for large problems (e.g., scheduling) with separable structures.  Complementary Slackness Conditions TTT 1) c A x  0  pj  c j  a j  0 or x j  0 TT 2)  Ax b  0  qi  a i x  b i  0 or i  0 T   ajj ~ synthetic cost of variable

• For variables in the optimal basis, relative cost pj = 0  synthetic cost = real cost

• For variables not in optimal basis, relative cost pj ≥ 0  synthetic cost ≤ real cost

28 Copyright ©2004 by K. Pattipati Simplex Multipliers & Sensitivity - 1

 Interpretation of Simplex Multipliers  Suppose b  b  b without changing the optimal basis.  Change in the optimal objective function value TT1 z cB B  b   b

 z th  i   marginal price (value) of the i resource (i.e., right hand side of bi ) bi

 {i } are also called shadow prices, dual variables, Lagrange multipliers, or equilibium prices. • Sensitivity (post-optimality) analysis

− Q: How much can we change {ci} & {bi} without changing the optimal basis?  Consider: minc dT x ; s.t. Ax  b , x  0 x   is the parameter to be varied  Nominal value of  0. th  d  ej  Want to find the range for the j coefficient.

29 Copyright ©2004 by K. Pattipati Simplex Multipliers & Sensitivity - 2

• Fact: Basis B will be optimal as long as nonbasic reduced costs {pk} remain non-negative (recall that the reduced costs for basic variables are zero). TTTTTT  Split c and d as c  cBNBN | c and d   d | d   The required condition is:

TTTT11 cNBNB c B N  d  d B N   0 pTTTT q  0  q   p

 So the range of  min ,  max  , where    p j   min  max max  : qjj  0 and is nonbasic  ,   q j    p j   max  min min  : qjj  0 and is nonbasic  ,   q j

• If α Є (αmin, αmax), the new optimal cost is:

TTT1 z( ) ( cBBBB   d ). B b  z (0)   d x

30 Copyright ©2004 by K. Pattipati Simplex Multipliers & Sensitivity - 3

• Consider parametric changes in b

min cxT x s.t. Ax b  d ;  0 nominal x  0 • If B is the optimal basis, then need

TTT TT x( xBN x )  [ b  d ,0] TT where b B11 b and d B d

• The range of α = (αmin , αmax) is given by:

bbii    minmax max{ :ddii  0},   ; max  min  min{ :  0},   1im 1im ddii   

• If α Є (αmin, αmax), then

TTT1 z( ) cB B ( b   d )   ( b   d )  z (0)   d

31 Copyright ©2004 by K. Pattipati Karmarkar’s Interior Point Method

 Karmarkar’s Interior Point Algorithm • Discuss not the original Karmarkar’s algorithm, but an equivalent (and more general) formulation based on barrier functions n TT minc x min f ( x , ) c x   ln x j ;   0 xx j1 SLP: s.t. Ax b  Barrier s.t. Ax  b x  0 Problem optimal solution x* optimal solution x* ( ) • Key: x*(μ) → x* as the barrier parameter μ → 0 •  many variations of formulations. Wewill discuss them later

32 Copyright ©2004 by K. Pattipati Newton’s Method for NLP

 Consider the general NLP minf ( x ) s.t. Ax b x

 Suppose x is feasible, then x  x d d  search direction  Pick   Ax  b (new point is feasible) and f ( x)()  f x

 What does Newton’s Method do for this problem?

 Feasibility  Ax  Ax  Ad  0  Ad  0  Newton's method fits a quadratic to fx ( ) at the current point and takes   1  fxdfxgd ( )  ( ) TT  1 2 dHd where g   fxH ( );  2 fx ( )  Newton’s method solves a quadratic problem to find d ( a weighted least squares problem)

33 Copyright ©2004 by K. Pattipati Optimality Conditions

• Consider

TT 1 2 1 2 2 1/2 ming d 1 2 d H d  min1 2 || H d  H g ||2 ; H symmetric squareroot dd s.t. Ad 0 s.t. Ad 0 • Define Lagragian function:

T L( d , ) gT d  1 2 dT Hd   d ;  ~ Lagrange multiplier

• Karush-Kuhn-Tucker necessary conditions of optimality:  L /  d  0  g  H d  AT   0 L/   0   Ad  0;  ( AH1 AT )  1 AH  1 g

• Special NLP = barrier formulation of LP: g  f ( x )  c  D1 e and H   2 f ( x )  D 2

T where D Diag ( xj ) , j  1, 2, ..., n and e  (1 1 1 ... 1)

34 Copyright ©2004 by K. Pattipati Optimality Conditions for LP

• Karush-Kuhn-Tucker conditions for special NLP are: D21 d( c   D e  AT  )  0 Ad  0 • So, 1 d D21( c  D e  AT ) (1) 

• Using Ad = 0 in (1), we get =(AD2 AT ) 1 AD 2 (c- D 1 e) (2)

• So, λ is the solution of weighted least square (WLS) problem:

12T min ||D [ c D e A ]||2 

35 Copyright ©2004 by K. Pattipati Barrier Function Algorithm

 Barrier Function Algorithm Choose a strictly feasible solution and constant μ > 0. Let the tolerance parameter be ε and a parameter associated with the update of μ be σ. For k  0,1,2, ... DO

Let D Diag ( x j ) Compute the solution  to (AD2 AT ) AD 2 ( c D 1 e ) ... WLS solution Let p c AT  d  D21 ( p   D e ) / x x d If xT p  , stop: x is near-optimal solution ... complementary slackness condition. else   (1 ) n end if end DO

36 Copyright ©2004 by K. Pattipati  Practicalities & Insights -1

1) Finding a feasible point

 Select any x00 0 and define 02 s  b - Ax with || s ||  1

 02 ||b - Ax0 || and solve x min  s . t . ( A s )  b ; x  0,  0  x , 

 Thesolution :   0 or when  starts becoming negative stop

 Suggest x0 = || b || e

2) Since the method uses Newton’s directions, expect quadratic convergence near minimum 3) Major computational step: Least-squares subproblem AD2 AT  AD 2() c D 1 e Generally A is sparse We will discuss the computational aspects of Least-squares subproblem later

37 Copyright ©2004 by K. Pattipati Practicalities & Insights - 2

4) The algorithm (theoretically) requires O () nL iterations with overall complexity where O() n3 L

mn  Lalog |ij |  1  1 ij01

5) In practice, the method typically takes 20-50 iterations even for very large problems (>20,000 variables). Simplex, on the other hand, takes increasingly large number of iterations with the problem size, n. 6) Initialize   2 OL () and   1/ 6 . In practice, need to experiment with the parameters. 7) T Other potential functions :f ( x , q ) r ln( e x  q )   ln x j j where r n n and q  a lower -bound on the optimal cost

38 Copyright ©2004 by K. Pattipati Practicalities & Insights - 3

 Variants of the algorithm • Problem with barrier function approach: – Update of μ – Selection of initial μ and parameter σ •  two classes of algorithms – – Power series approximation . Views affine scaling directions as a set of differential equations . Not competitive with affine scaling methods • Do not know if the variants have polynomial complexity. But, they work well in practice!!

39 Copyright ©2004 by K. Pattipati Affine Scaling Method - 1

 Affine scaling: • Typically, the affine scaling methods are used on the dual problem Primal Dual Modified dual min cTTT x max  b max b x  s.t. Ax b  s.t. ATT  c  s.t. A  p  c x  0 p  0

• Suppose have a strictly feasible  and the corresponding reduced cost vector (slack vector) p • Define 1 pPpˆ , where PDiagpp (12 ,  , ... , p n ) • So, the dual problem is : max T b s.t. AT  P pˆˆ  c ; p  0

40 Copyright ©2004 by K. Pattipati Affine Scaling Method - 2

• From the equality constraint: pˆˆ P1( c  ATT )  P  1 A  ( P  1 c  p )

• Assuming full column rank of AT or row rank of A  linearly independent constraints in primal AP2 AT   AP  1() P  1 c pˆ    (APA2T )  1 AP  1 ( Pcp  1 ˆˆ )  MPcp (  1  ) • Note that  R()() AP1 R M • Eliminating λ from the dual problem, we have: maxbTT M ( P1 c pˆˆ ) f ( p ) min b M pˆ pˆ s.t. H ( pˆ  P1 c )  0  s.t. H  0 pˆˆ 0 where   p  P1 c and where HIPAM  – 1 T , asymmetric projection matrix  HH2

41 Copyright ©2004 by K. Pattipati Affine Scaling Method - 3

• In addition, we have AP11 H0  columns of H  N ( AP ) • Note that we want NHRPA ( )   (1 T ) • But, R(P-1AT) = R(MT)  The gradient of f ( p ) w.r.t. scaled reduced costs  p is gˆ   MTTT b  R ( M )  R ( P1 A ) p  Result: The gradient w.r.t. scaled reduced costs, p , already lies in the range space of PA1 T , making projection unnecessary. • In terms of original unscaled reduced costs, the project gradient is

g  Pg   ATT() AP21 A b pp

42 Copyright ©2004 by K. Pattipati Affine Scaling Method - 4

• The corresponding feasible direction with respect to λ is: d  MMTT gˆ  () AP21 A b  p  g   AT d p 

• If g  0  dual problem is unbounded  primal is infeasible p (assuming b  0)

• Otherwise, we replace λ by

  d  where

  max ;  0.95   max  min :gpi  0, i  1,2,..., n g pi • Note that primal solution x is: 2  2TT  2  1 x  P gp   P A() AP A b since it satisfies Ax = b.

43 Copyright ©2004 by K. Pattipati Dual Affine Scaling Algorithm  Dual affine scaling algorithm Start with a strictly feasible  , stopping criterion  and  . T zbold   For k  0, 1, 2, ... DO T p  c  A ; P  Diag ( p12 p ... pn )

Compute the solution d  (AP2 ATT ) d b ; g   A d p If g  0 p Stop: unbounded dual solution  primal is infeasible else   pi  min :gpi  0, i  1, 2, ..., n g pi    d (  p  p   g next step); z  T b  p new ||zz If new old   max(1,|zold |) stop: found an optimal solution x P2 g p else

zzold  new end if end if end DO

44 Copyright ©2004 by K. Pattipati Initial Feasible Solution  Finding an initial strictly feasible solution for the dual affine scaling algorithm

||c ||2  0  T b ||Ab ||2 • Want to find a p p   e • Select initial ξ0 as  2minc  AT : i  1,2,..., m 0  i  • Solve an (m+1) variable LP: max T b  s.t. AT    e  c ,  T b  Select   .0 ;  105

0

 The initial (0 ,0 ) are feasible for the problem • Notes: _ If   0 at iteration k found a feasible _ If the algorithm is such that optimal   dual is infeasible  primal is unbounded

45 Copyright ©2004 by K. Pattipati Least Squares Subproblem

 Lease-squares subproblem: implementation issues • Generally A is sparse • Major computational step at each iteration AP-2 AT d  b ... Affine scaling AD2 AT   AD 2 ( c - µD -1 e ) AD ( Dc - µe ) ... barrier function method

• Key: need to solve a symmetric positive definite system  yb

46 Copyright ©2004 by K. Pattipati Solution Approaches - 1  Solution approaches:  Direct methods: a) Cholesky factorization:  SST , S  lower  b) LDLTT factorization:   LDL , S  unit lower  c) QR factorization: of P1 ATT or DA

 Methods to speed up factorization • During each iteration only D or P-1 changes, while A remains unaltered – Nonzero structure of  is static throughout. – So, during the first iteration, keep track of the list of numerical operations performed • Perform factorization only if the diagonal scaling matrix has changed significantly • Consider –   AP2 AT – replace PP by

47 Copyright ©2004 by K. Pattipati Solution Approaches - 2 where old old old new PPPPii if | ii ii | / | ii |  Pii   Pii otherwise   0.1

new old define PPPii  ii  ii then new   old  P . a . a T  ii ii iP:0ii th aii  column of A – So, use rank-one modification methods discussed in Lecture 8 • Perform pivoting to reduce fill-ins  having nonzero elements in factors where there are zero elements in  . – Recall that P PT Py Pb – Unfortunately, finding the optimal permutation matrix to reduce filled-in is NP- complete – However,  heuristics . minimum degree . minimum local fill-in

48 Copyright ©2004 by K. Pattipati Solution Approaches - 3

• Combine with an if have a few dense columns in A that will make impracticably dense  . (Recall the outer product representation)  Hybrid factorization and conjugate called a preconditioned conjugate gradient method. Idea: At iteration k, split columns of A into two partsSS and where columns of A are sparse (i.e., have density < λ(≈ 0.3)) s – 2 T Form APAss – Find incomplete Cholesky factor L such that

2 TT Zs A s P A s LL – Basically the idea is to step through the Cholesky decomposition, but setting l ij

= 0 if the corresponding sij 0

49 Copyright ©2004 by K. Pattipati Incomplete Cholesky Algorithm - 1  Incomplete Cholesky Algorithm

For km  1, ..., DO

lkk skk For i  k 1, ..., m DO

If sik 0

llik sik / kk end if end DO For j  k 1, ..., m DO For i  j , ..., m DO

If sij 0

sij   sij  ll ik jk end if end DO end DO end DO

50 Copyright ©2004 by K. Pattipati Incomplete Cholesky Algorithm - 2  Now consider the original problem y  AT P2 Ay  b L1 ( L  1 )TT . L y L  1 b  Qu f where Q  L1  ( L  1 )TT ; u  L y ; f  L  1 b

 Solve Qu f via conjugate gradient algorithm (see Lecture 5)

51 Copyright ©2004 by K. Pattipati Conjugate Gradient Algorithm  Conjugate gradient algorithm uf ... initial solution cf|| || 2 ... norm of RHS 1 r f  Qu ... initial residual (negative gradient of ( uTT Qu  u f )) 2  ||r || ... square of norm of initial residual dr ... initial direction k  0

do while  /c and k kmax w  Qd   r / dT Qd ... step length u u d ... new solution r r w ... new residual, r f Qu

2  ||r ||2 / ... parameter to update direction d r d ... new direction

2   ||r ||2 kk 1 end DO • Computational load: O(m2+10m) • Need to store only for vector: u, r, d and w

52 Copyright ©2004 by K. Pattipati Simplex vs. Dual Affine Scaling - 1

 Comparison of simplex and dual affine scaling methods • Three types of test problems  NETLIB test problems • 31 test problems • The library and test problem can be accessed via electronic mail netlib@anl-mcs (ARPANET/CSNET) (or) research ! netlib (UNIX network) • # of variables n ranged from 51 to 5533 • # of constraints m ranged from 27 to 1151 • # of non-zero elements in A ranged from 102 to 16276 • Comparisons on IBM 3090

53 Copyright ©2004 by K. Pattipati Simplex vs. Dual Affine Scaling - 2

Simplex Affine scaling Iterations (6, 7157) (19,55) Ratio of time per iteration (0.093, 0.356) 1 Total CPU time range (secs) (0.01, 217.67) (0.05, 31.70) Ratio of CPU times (0.2, 10.7) 1 (simplex/Affine)

 Multi-commodity Network Flow problems • Specialized LP algorithms exist that are better than simplex •  a program to generate random multi-commodity network flow problem called MNETGN • 11 problems were generated • # of variables n in the range (2606, 8800) • # of constraints m in the range (1406, 4135) • Non-zero elements in A ranged from 5212 to 22140

54 Copyright ©2004 by K. Pattipati Simplex vs. Dual Affine Scaling - 3

Simplex Specialized Simplex Affine scaling MINOS 4.0 (MCNF 85)

Total # of iterations (940, 21915) (931, 16624) (28, 35) Ratios of time per (0.010, 0.069) (0.0018, 0.0404) 1 iteration (w.r.t. Affine scaling) Total cpu time (secs) (12.73, 1885.34) (7.42, 260.44) (6.51, 309.50) Ratios of cpu times (1.96, 11.56) (0.59, 4.15) 1 w.r.t. affine scaling

55 Copyright ©2004 by K. Pattipati Simplex vs. Dual Affine Scaling - 4

 Timber Harvest Scheduling problems • 11 timber harvest scheduling problems using a program called FORPLAN • # of variables ranged from 744 to 19991 • # of constraints ranged from 55 to 316 • # of nonzero elements in A ranged from 6021 to 176346 Simplex (MINOS 4.0) Affine scaling (default pricing) Total # of iterations (534, 11364) (38, 71) Ratio of time per iteration (0.0141, 0.2947) 1 Total cpu time (secs) (2.74, 123.62) (0.85, 43.80) Ratios of cpu times (1.52, 5.12) 1

 Promising approach to large real-world LP problems

56 Copyright ©2004 by K. Pattipati Summary

 Methods for solving LP problems • Revised Simplex method • Ellipsoid method….not practical • Karmarkar’s projective scaling (interior point method)

 Implementation issues of the Least-Squares subproblem of Karmarkar’s method ….. More in Linear Programming and Network Flows course

 Comparison of Simplex and projective methods

57 Copyright ©2004 by K. Pattipati