<<

PRACTICAL SET PARTITIONING AND COLUMN GENERATION

Andrew Mason [email protected], www.esc.auckland.ac.nz/Mason Linkoping 1999, Auckland 2000, Auckland 2001

Set Partitioning Problems Given a set of objects (with index set I), find a minimal cost partition of I into mutually disjoint subsets.

Example: Copying 2 CD’s onto C60 tapes.

ENIGMA MCMXC Mins ENIGMA the CROSS of Mins 1 The Voice of Enigma 2.13 changes 2 4.25 1 Second Chapter 2.27 3 Find Love 4.82 2 7.22 4 Sadeness (Reprise) 2.80 3 4.28 5 Callas Went Away 4.48 4 I Love You… I'll Kill You 8.85 6 Mea Culpa 4.87 5 Silent Warrior 6.17 7 The Voice & The Snake 1.75 6 The Dream of the Dolphin 2.78 8 Knocking on Forbidden 4.45 7 5.37 Doors 8 4.88 9 Way to Eternity 2.30 9 2.38 10 Hallelujah 4.25 Total: 44.20 11 3.52 12 Sadeness II 2.72 13 Mea Culpa II 6.07 14 4.83 15 The Rivers of Belief II 7.07 Total: 60.30

Set of objects:

Possible Subsets:

Cost of Subsets (assuming minimisation objective):

This particular problem is known as:

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 1+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Formal Set Partitioning Definition: Given: 1/ I={1…m}

2/ a collection of subsets P={P1, P2, …, Pn}, where each PjÍP

3/ a cost function c(Pj) then JÌ{1, …, n} defines a partition of I if and only if: 1/ (all elements in a subset) 2/

We seek a minimum cost partition: min sum_j in J c(Pj) st J partitioning I

Integer Programming (IP) Formulation of Set Partitioning:

· Rows correspond to elements of I · Columns are elements of P

· aij=1 if element I is in Pj, aij=0 otherwise

· cj=cost of Pj

· xj=1 if Pj is in the partition · all items must be included in soln

Variables: Matrix Coefficients: Right hand side: Constraints: LP Dual Variables: Note: X integer => will be Binary to satisfy constraints

Set Partitioning Example 1: Airline Planning (Pairings, Tours of Duty) Problem

Partition ……………..….………………..… into …….. (These tours will later be allocated to people.)

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 2+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Mon Tue Wed Mon 00 06 00 00 0000 0600 1200 1800 0000 0600 1200 1800 0000 0600 1200 1800 AKL AKL CHC CHC WLG WLG HNL HNL LAX LAX MEL MEL SYD SYD SIN SIN LHR LHR BNE BNE CNS CNS DPS DPS FKF FKF Example and Picture from Air New Zealand

Costs include:

Rules for building columns include:

Integer Programming Formulation: min x st

x

Set Partitioning Example 2: Political Districting

(Images stolen from http://www.elections.org.nz/elections/general/electorates/index.html)

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 3+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Partition ……….…census area units…………………….into ……………electorates……

Costs: Shape of electorate, natural unit (eg not split by rivers), deviation from desired populationsize!

Possible Additional Constraint: Must have 61 electorates

Other Set Partitioning Examples: Vehicle Routing - columns are routes, rows are deliveries to make Bin Packing

Set Packing Problems

Variables: Integer (0/1) Matrix Coefficients: Binary Right hand side: Binary Constraints: LP Dual Variables:

Set Packing Example: Cutting of Boards

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 6 7 8 9 10 6 7 8 9 10 6 7 8 9 10 11 12 13 14 15 11 12 13 14 15 11 12 13 14 15 11 12 13 14 15 max x st 1 2 3 4 5 x 6 7 8 9 10 11 12 13 14 15

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 4+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Set Covering Problems

Variables: Integer (0/1) Matrix Coefficients: Binary Right hand side: Binary Constraints: LP Dual Variables:

Set Covering Example: Mail Deliveries

· Must walk along each street in town to deliver mail for that street. · Each person starts at 5am, must be finished by 7am. · If two people walk a street, only 1 does the deliveries (hence ‘covering’)

Columns: Cost:

Set Covering vs Set Packing If changing any 1 in a column into a 0 gives a valid no more expensive column, then the set covering and set packing solutions are the same.

Set Partitioning/Covering/Packing Generalisations: Different Possibilities:

Right Hand Side: Binary or Integer (or Real) Variables: Binary or Integer A-Matrix coefficients: Binary or Integer (or Real) Constraints: Mix of <, =, >

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 5+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Generalised Set Covering Example: Single-Day Shift Generation

Columns are:

Costs:

1 1 1 1 b1 1 1 1 1 1 1 1 1 b2 1 1 1 1 1 1 1 1 1 1 1 1 b3 1 1 1 … 1 1 1 … 1 1 1 … 1 1 1 … b4 1 1 1 … 1 1 1 … 1 1 1 … 1 1 1 … b5 1 1 1 … 1 1 1 … 1 1 1 … 1 1 1 … b6 1 1 1 … 1 1 1 … 1 1 1 … 1 1 1 … b7 1 1 1 … 1 1 1 … 1 1 1 … 1 1 1 … b8 1 1 1 … 1 1 1 … 1 1 1 … 1 1 1 … b9 1 1 1 … 1 1 1 … 1 1 1 … 1 1 1 … b10 1 1 1 … 1 1 1 … 1 1 1 … 1 1 1 … b11 1 1 1 … 1 1 1 … 1 1 1 … 1 1 1 … b12 1 1 … 1 1 1 … 1 1 1 … 1 1 1 … b13 1 … 1 1 1 … 1 1 1 … 1 1 1 … b14 1 1 1 … 1 1 1 … 1 1 1 … b15 1 1 1 … 1 1 1 … 1 1 1 … 1 1 … 1 1 1 … 1 1 1 … 1 … 1 1 1 … 1 1 1 … 1 1 1 … 1 1 1 … 1 1 1 … 1 1 1 … 1 1 … 1 1 1 … A= 1 … 1 1 1 … 1 1 1 … b= : : : 1 1 1 … 1 1 1 … : 1 1 1 … 1 1 1 … 1 1 1 … 1 1 … 1 …

...... bm-6 . . . . bm-5 bm-4 bm-3 1 : : 1 : : 1 : : 1 : : bm-2 1 1 1 1 1 1 1 1 bm-1 3 hour shifts 1 4 hour shifts 1 5 hour shifts 1 8 hour shifts 1 bm

Variables: Matrix Coefficients: Right hand side: Constraints:

NS+Mon-Min Part time 25 Full time

Smoothed Workload

20 Arrivals

Departures

15

10 Officers on Duty

5

0 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 Generalised Set Covering: Example from NZ Customs

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 6+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Generalised Set Covering Example: Personalised Single-Day Shift Generation

Note: The Gx=e constraints are known as…GUB (generalised upper bnd) OR convexity min x st x

x

Variables: Matrix Coefficients: Right hand side: Constraints:

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 7+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Set Partitioning Example: ToD Allocation to Crew (“Rostering”) The problem here is take the optimal Tours of Duty (ToDs) produced earlier, and allocate them to staff, eg cabin crew. We assume that each ToD requires 4 cabin crew.

ToDs Week 1 Week 2 Week 3 Week 4 ToDs Week 1 Week 2 1 1 W 2 2 k 3 3 W 4 4 k 5 5 6 6 W 7 7 k 8 8 9 9 10 10 W k 11 11 12 12 W T F S S M W T F S S M W T F S S M W T F S S M W T F S S M W T F S S M T T T T min x st x

1 2 3 4 5 x 6 7 8 9 10 11 12

Variables: Matrix Coefficients: Right hand side: Constraints:

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 8+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Generalised Set Covering Example: Group Single-Day Shift Generation

Building shifts with couples who prefer to work together. min x st x

x

Variables: Matrix Coefficients: Right hand side: Constraints:

Fractional matrix coefficients can arise, eg…

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 9+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Elastic Constraints Eg for generalised set partitioning:

Introduce costed slack and surplus variables

Min (c | cslack | csurplus) (x xslack xsurplus ) St. (A | I | -I) (x xslack xsurplus ) = b

min c_slack c_surplus min st 1 -1 st 1 -1 1 -1 1 -1 1 -1

Notes: don’t normally enforce integrality of slack/surplus variables (happens naturally) can put bounds on u and s, and/or piecewise linear costs

This problem is more stable in the sense that the LP Dual Variables are now….. bounded cslack <= Pi <= csurplus

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 10+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Solution Strategies

“Enumeration with Implication” (Constraint Logic Programming) for Set Partitioning

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 min 5 4 3 5 2 2 2 1 9 6 x min 5 4 3 5 2 2 2 1 9 6 x st 1 1 1 1 1 st 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 x = 1 3 1 1 1 1 x = 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1 1 6 1 1

x2 c=4 Possible Successors: 6, 10 Cost= Possible Successors:

x6 x10 min 2 6 x min st 1 1 1 4 st 1 x = 1 5 1 1 1 6 Constraint 5 -> x10 = 1

x2 x10 Feasible, cost = 7

x4 c=5 Possible Successors: 7, 8 Cost= Possible Successors:

x7 x8 min 2 1 x min x st 1 1 2 st 1 1 x = 1 3 x = 1 1 6 1 Constraint 6 => infeasible

x9 c=9 Possible Successors: 1,6,7 Cost= Possible Successors:

x1 x6 x7 min 5 2 2 x min st 1 1 1 2 st 1 1 x = 1 4 x = 1 1 6 Constraint 6 -> x6

x9 x6 c=11 Cost=

x7 min 2 x min st 1 x = 1 2 st x = Require x7

x9 x6 x7 Cost = 13 Cost= Comments: Could we bound? Yes, if all cI>0, eg bound sln x9, x6 by first soln found But, bounds are weak as for only partial solns, no LP to give better Can incorporate implication into IP B&B… see later

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 11+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Can often preprocess A matrix to identify cost/constraint implications

Can use for Covering/Packing, but implications not so strong

CLP can be much better than IP

NB: Cplex preprocessing “probing” is close to CLP

References: eg INFORMS Journal on Computing Volume 10, Number 3, 1998 (pubsonline.informs.org)

Heuristics Many based on Lagrangian relaxation, genetic algorithms, simulated annealing etc. Some are very good.

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 12+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. COLUMN GENERATION AND DECOMPOSITION

Andrew Mason [email protected], www.esc.auckland.ac.nz/Mason Linkoping 1999, Auckland 2000, Auckland 2001

Integer Programming for Set Partitioning Solve the linear programming relaxation, then use “branch and bound” or “branch and cut” to integerise.

When there is a choice of set partitioning or set covering as a formulation, set covering is preferred: (Barnhart et al 1998) · Its linear programming relaxation is numerically far more stable and thus easier to solve; · It is trivial to construct a feasible integer solution from a solution to the linear programming relaxation

Solving the Set Partitioning Linear Programming Relaxation We assume A is an mxn matrix, with n>>m. We solve: Min z=cTx st Ax = b x ³ 0, integer NB: Generally hard to solve as these LP’s are very …………degenerate

Standard LP procedure: Repeat Price all (non-basic) columns to find an entering variable If an entering variable is found Enter variable into basis, remove leaving column, update x, and Pi’s Until no entering column is found

Note:

Let aj denote the j’th column of A, so Aº(a1| a2| a3| ... | an).

For binary A matrices, let I(aj)={i:aij = 1} be the indices of the rows column j contributes to.

Now, the reduced cost for xj is given by T rc(xj) = cj –p aj = ci – sum_ i in I(aj) p_i Does it matter if we price basic columns? No Why? They will have 0 reduced cost But... beware numerical error.. don’t want a basic column to enter!

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 1+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Standard LP with Partial Pricing

Assume the variables are divided (perhaps naturally) into p subsets X1, X2, …, Xp

Eg, For Personalised Shifts Generation problem, Xs=columns for staff member s

min x

st 1 x 1 1

x b

s = 1 Repeat Repeat

Price all (non-basic) columns in subset Xs s = s+1 if (s>p) s=1 Until a ‘sufficiently good’ entering variable is found or all columns priced If an entering variable is found Enter variable into basis, remove leaving column, update x, and Pi’s Until no entering column is found

Absolutely vital for fast solution of large problems. Much more efficient LP Cost memory access

What is sufficiently good?

Time

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 2+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. LP with Sprint Pricing (Multiple Pricing) Maintain an active set. Use active set for fast iterations. Do big pricing occasionally.

x

x

st x

x

Let Aactive denote the active subset of columns from A. s = 1 Repeat Repeat

Price all (non-basic) columns in active subset Aactive If a ‘sufficiently good’ entering variable is found Enter variable into basis, remove leaving column, update x, Pi’s until no ‘sufficiently good’ entering column is found Price A to find a set of good entering columns (-ve r.c.) if good (or any) entering columns are found

Add entering columns to Aactive

Remove non-basic (high reduced cost?) columns from Aactive Until no entering column is found

Note: The “Price A” step does not need to price all columns except in the final iterations.

Advantages: Small active set in memory Can bring in columns off disk if required

We stop our minor iterations and price A when the most negative reduced cost in Aactive is not “sufficiently good”. What’s good enough?

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 3+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. LP Cost

Time

SPRINT successfully used within IBM for a number of big problems. Ideas also appear in Lagrangian-based heuristics.

Efficient Pricing in the LP – Simple (Trivial?) Column Generation Pricing is all about giving the basis a new entering column if one exists.

Example 1: The A-matrix consists of 2n columns being all possibilities or a 1 or a 0 in each position. Cost of each column is 1. min 1 1 1 1 1 1 1 … x

st 0 0 0 0 0 0 0 1 1 p1

0 0 0 0 0 0 0 1 1 p2

0 0 0 0 0 0 0 … 1 1 x = b p3

0 0 0 0 1 1 1 1 1 p4

0 0 1 1 0 1 1 1 1 p5

0 1 0 1 1 0 1 0 1 p6

We could store all our columns in an A-matrix. Then our pricing algorithm could price all the T columns in sequence using rc(xj)= cj –p aj, and then return that column (if any) with themost negative reduced cost.

Or we could be smart:

We do not store any columns (except those in the basis) Whenever we need to price columns, use the following algorithm: Place 1’s in each row with a –ve Pi. Calculate reduced cost Return the constructed column if r.c. < 0

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 4+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Example 2: The A-matrix consists of all columns with 1, 2, 3, 4, or 5 1’s in any rows. Cost of a column is the number of 1’s it contains.

min 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 … x

st 1 1 1 1 1 1 1 1 1 1 1 p1

1 1 1 1 1 1 1 1 p2

1 1 1 … 1 x = b p3

1 1 1 1 p4

1 1 1 1 p5

1 1 1 1 p6

The Smart approach:

Do not store an A-matrix. To find an entering column: For each row, calculate hi=1-Pi Find the rows (up to 5) with most negative hi’s Place 1’s in these rows Return the column constructed if it has negative reduced cost

Gilmore Gomory (Delayed) Column Generation for Stock Cutting

The Better Food Company produces cream-filled sponge rolls with a standard width of 20 cm each. Each 20cm roll costs the company $2.00 to produce. Special customer orders with different widths are produced by cutting (slitting) the standard rolls of sponge into shorter lengths. Typical orders (which may vary from day to day) are summarized in the following table. These orders need to be met at least cost.

Desired Desired Number Order Width (cm) of Rolls A 5 150 B 7 200 C 9 300

An order is filled by setting the cutting knives to the desired widths. Usually, there are a number of ways in which a standard roll can be slit to fill a given order. The figure below shows three possible knife settings for the 20-cm roll. Although there are other feasible settings, we limit the discussion for the moment to considering settings 1, 2. and 3 in the figure. Note that the shaded area in each diagram represents lengths of sponge that are too short to be used in meeting orders, and so these pieces must be thrown away. Such wastage is called trim loss.

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 5+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. 20 20 7 9 4 5 5 7 3

Setting 1 Setting 2 20 5 5 9 1

Setting 3

The effect of all the different ‘sensible’ cutting patterns is summarised in the following table.

Pattern 1 Pattern 2 Pattern 3 Pattern 4 Pattern 5 Pattern 6 5 cm rolls produced 0 2 2 4 1 0 7 cm rolls produced 1 1 0 0 2 0 9 cm rolls produced 1 0 1 0 0 2

We note that each pattern uses no more than 20cm

Mathematical Representation We seek to determine the knife setting combinations (variables) that will fill the required orders (constraints) while using the least number of rolls (objective). To express the model mathematically, we define the variables as xj = number of standard rolls to be slit according to pattern j, j = 1, 2, …, 6

Objective: We wish to minimise the number of the rolls we cut: min x1 + x2 + x3 + x4 + x5 + x6

Constraints We must ensure we cut at least the number of 5, 7 and 9 cm rolls ordered.

5-cm rolls: 2x2 + 2x3 + 4x4 + x5 ³ 150

7-cm rolls: x1 + x2 + 2x5 ³ 200

9-cm rolls: x1 + x3 + 2x6 ³ 300 Logical constraints: x1 , x2 , x3 , x4 , x5 , x6 ³ 0, integer

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 6+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Finding the Entering Column

The LP relaxation to the above IP can be written

x1 x2 x3 x4 x5 x6 min 1 1 1 1 1 1 x orders s.t. 5cm 2 2 4 1 x ³ 150 7cm 1 1 2 ³ 200 9cm 1 1 2 ³ 300

Assume we are solving the LP relaxation, and we want to find the best entering variable (most negative reduced cost). Now, the reduced costs for all columns (including the basic ones) are: T r.c.(xj) = cj - p aj

Now, any column of the A-matrix can be represented as a vector:

xj

cj 1

a1j y1 5cm

a2j y2 7cm

a3j y3 9cm

where y1, y2, & y3 are the integer number of 5cm, 7cm, and 9cm lengths cut from the 20cm.

When we generated the A-matrix, we considered all (sensible) combinations for which

5y1 + 7y2 + 9y3 £ 20

All combinations of y1, y2, & y3 that satisfy this constraint (and are ‘sensible’ in that they could not fit another roll) appear in the A matrix, and so represent possible entering columns.

The reduced cost of this general ‘y’ column is: 1 - y1p1 - y2p2 - y3p3

Therefore, the problem of generating our most negative reduced cost column can be formulated:

min 1 - y1p1 - y2p2 - y3p3 or max y1p1 + y2p2 + y3p3

s.t. 5y1 + 7y2 + 9y3.<= 20 Knapsack Problem

y1, y2, y3=0, integer

Notes: This idea was first used by Gilmore and Gomory in the 1950’s.

How do we implement the column generator?

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 7+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. shortest path, 21 nodes

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 8+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Extreme Columns… Will all the columns above actually be generated by the column generator? Consider a simplified example:

Columns are a mix of 5cm and 7cm pieces cut from 30cm

x1 x2 x3 x4 x5 min 1 1 1 1 1 x orders

s.t. 5cm 6 4 3 1 0 x ³ 9 p1

7cm 0 1 2 3 4 ³ 6 p2

Consider column generator problem of finding most negative reduced cost given p1, p2:

min 1 - y1p1 - y2p2 or max y1p1 + y2p2

s.t. 5y1 + 7y2 .<= 30 Knapsack Problem

y1, y2=0, integer

What do the solutions to this problem look like? Note all duals are non-negative.

4 Case 1: p1> 2/3 p2 y2 T Most negative reduced cost column is: (6,0) 3

2 Case 2: p1< 2/3 p2 Most negative reduced cost column is: (0,4)T 1

What is best optimal LP solution using generated 0 1 2 3 4 5 6 7 y1 columns?: 4 y2 x1 x2 x3 x4 x5 3 1.5 1.5 2 Cost =3 1 What is the best integer solution? x1 x2 x3 x4 x5 0 1 2 3 4 5 6 7

1 1 or 3 1 Cost =3

Note: Column x3 exists in LP solution as a linear combination of other cols

Moral of the story: Column generator gives extreme columns May need to generate new columns during integerisation

Don’t believe AMPL’s/CPlex stock cutting example! (CPlex does not generate in B&B, but only uses columns generated during the LP solve.)

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 9+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Dantzig Wolfe Decomposition and Column Generation Dantzig & Wolfe developed a technique that takes some LP or IP problem, and forms from it a new problem. This new problem can, if we wish, be solved using column generation.

Example: We have 2 staff available to cover today’s 3 shifts, each 4 hours long. We require 1 or more staff on each shift. Each shift costs 1 unit to staff. Person A can work between 4 and 8 hours, person B between 8 and 12 hours.

Let yij = 1 if person i does shift j, and 0 otherwise, yijÎ{0,1}

We can write this problem out as follows to emphasise the independence of the yA’s and yB’s:

min yA1 + yA2 + yA3 + yB1 + yB2 + yB3 st yA1 + yB1 ³ 1 yA2 + yB2 ³ 1 <-Complcated constraints (involve yA’s and yB’s) yA3 + yB3 ³ 1

yA1+ yA2+ yA3 ³ 1 yB1+ yB2+ yB3 ³ 2 yA1+ yA2+ yA3 £ 2 yB1+ yB2+ yB3 £ 3

yA1 yA2 yA3 yÎ {0,1) yB1 yB2 yB3 Î {0,1) Easy constraints

If we replace the ‘easy’ constraints 4 and 5, and 6 and 7, by the set of solutions that they (and the binary restrictions on the y’s) allow, we can write this as:

min yA1 + yA2 + yA3 + yB1 + yB2 + yB3 st yA1 + yB1 ³ 1 yA2 + yB2 ³ 1 yA3 + yB3 ³ 1

(yA1, yA2, yA3) /in SA (yB1, yB2, yB3) /in SA

where SA is the set of all possible legal values for (yA1, yA2, yA3):

ì æ ö æ ö æ ö æ ö æ ö æ ö ü 1 1 1 ï ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ï 1 1 1 ï ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ï 1 1 1 SA= í , , , , , ý ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ï ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ï îï è ø è ø è ø è ø è ø è ø þï and SB is the set of all possible legal values for (yB1, yB2, yB3)

ì æ ö æ ö æ ö æ ö ü 1 1 1 ï ç ÷ ç ÷ ç ÷ ç ÷ ï 1 1 1 ï ç ÷ ç ÷ ç ÷ ç ÷ ï 1 1 1 SB= í , , , ý ç ÷ ç ÷ ç ÷ ç ÷ ï ç ÷ ç ÷ ç ÷ ç ÷ ï îï è ø è ø è ø è ø þï

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 10+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. We can now represent each set as an integer convex combination of its members:

ì æ y A1 ö æ y A1 ö æ ö æ ö æ ö æ ö æ ö æ ö ïç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ïç ÷ ç ÷ SA= í y : y = x a1 , ç A2 ÷ ç A2 ÷ ç ÷+ ç ÷+ ç ÷+ ç ÷+ ç ÷+ ç ÷ ïç ÷ ç ÷ ïè y ø è y ø ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ î A3 A3 è ø è ø è ø è ø è ø è ø 1 1 1 ü 1 1 1 6 ï 1 1 1 ï å xaj =1, xaj binary ý j=1 ï þï ìæ y ö æ y ö æ ö æ ö æ ö æ ö ïç B1 ÷ ç B1 ÷ ç ÷ ç ÷ ç ÷ ç ÷ ïç ÷ ç ÷ ç ÷+ ç ÷+ ç ÷+ ç ÷ SB= í y B2 : y B2 = , ïç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ è ø è ø è ø è ø îïè y B3 ø è y B3 ø 1 1 1 ü 1 1 1 6 ï 1 1 1 ï å xbj = 1, xbj binary ý j=1 ï þï Notice the use of convexity constraints, also termed GUB (generalised upper bound) constraints.

The above give us expressions for yA1 etc in terms of the x’s, so we can now substitute back into the original formulation,

min yA1 + yA2 + yA3+ yB1 + yB2 + yB3 æ y ö æ y ö ç A1 ÷ ç B1 ÷ st yA1 + yB1 ³ 1 ç y ÷ Î S ç y ÷ Î S yA2 + yB2 ³ 1 ç A2 ÷ A, ç B2 ÷ B yA3 + yB3 ³ 1, ç ÷ ç ÷ è y A3 ø è y B3 ø to form a new problem in which the decision variables are the x’s. This new problem has to include the extra convexity constraints and the binary restrictions on the x’s that are used to define SA and SB. This gives us our new formulation:

xA1 xA2 xA3 xA4 xA5 xA6 xB1 xB2 xB3 xB4 xA1 xA2 xA3 xA4 xA5 xA6 xB1 xB2 xB3 xB4 min x min 1 1 1 2 2 2 2 2 2 3 x st 1 1 1 1 1 1 ³ 1 st 1 1 1 1 1 1 ³ 1 1 1 1 1 1 1 x ³ 1 x 1 1 1 1 1 1 = 1 1 1 1 1 = 1 x in 0,1

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 11+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. This new problem is called the IP Master Problem. It’s LP relaxation is called the LP Master Problem.

Note that in the relaxed (LP) master problem, the x’s can be fractional, and so the requirements of yA and yB belonging to SA and SB respectively are relaxed instead to yA and yB being in their convex hulls (polyhedrons), conv(SA) and conv(SB), respectively. The key ideas in Dantzig-Wolfe are (1) the convex hulls of SA and SB can be defined by their extreme points. (2) In general, SA and SB could have millions of members, and indeed millions of extreme points, so we can’t add all of these to the master. (3) Instead, we generate new extreme points (columns) and add these columns to the master whenever these new columns will improve the objective (i.e. have negative reduced cost).

Note: When the master has only a subset of the columns, we say it is restricted.

We have decomposed the original problem into a master and 2 column-generation subproblems, 1 for each person.

The Column Generation SubProblems: In this example, we know that Person A’s columns are defined by æ y ö ç A1 ÷ ç y A2 ÷ ç y ÷ ç A3 ÷ ç 1 ÷ ç ÷ è 0 ø where the points (yA1, yA2, yA3) Î SA, i.e. are the solutions to

1£Ya1+Ya2+Ya3 £ 2, YA1, Ya2, Ya3 binary

Each column defined by (yA1, yA2, yA) has a cost given by

Ya1+Ya2+Ya3

As part of our pricing, we want to find the column (yA1, yA2, yA) in SA that has the most negative reduced cost (i.e. is the best possible ‘Person A’ entering column). Now, given a vector of duals (p1, p2, p3, pA, pB) any column defined by (yA1, yA2, yA) has reduced cost:

rc = yA1+yA2+yA3-yA1 Pi1 - yA2 Pi2 - YA3 Pi3 - PiA

Thus the problem of finding the most negative cost column for person, i.e. the ‘Person A’ column generation problem for the LP Master is

min rc = yA1+yA2+yA3-yA1 Pi1 - yA2 Pi2 - YA3 Pi3 - PiA s.t. 1£Ya1+Ya2+Ya3 £ 2, YA1, Ya2, Ya3 binary

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 12+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Note 1: If the members of SA (or SB) are all (0,1) vectors (i.e. the y’s are binary), then SA is exactly the set of extreme points of the convex hull conv(SA) of SA. However, this is not true if the y’s are general integer. Column generators tend to produce extreme points, and so, in the latter case, some feasible integer columns may never be generated for the LP Restricted Master. Eg… stock cutting problem seen before

Note 2: In this case, SA and SB could be described by linear constraints; the column generators were easy problems. However, this need not be the case. Indeed, the beauty of column generation is that we can embed very complicated rules in the column generators. The column generator handles the complexity, not the IP.

Note 3: Given any (possibly fractional) x’s for the LP, we can calculate the original variables, i.e. the y’s. Eg, for our example:

yA1 st

yA2

yA3

Note 4: The Relaxed LP Master is often stronger than the original formulation as some fractional solutions in the original may not be solutions to the Dantzig-Wolfe reformulation; i.e. the Dantzig-Wolfe reformulation has a worse LP objective than the original formulation, and thus is easier to integerise. However, if the column generation problems are not NP-hard (eg their LP forms have naturally integer solutions), then the Dantzig-Wolfe reformulation and the original problem have… the same objective.

So, for our example above, the sub problem is/is not is naturally integer, and so, the Dantzig- Wolfe reformulation is/is not is not stronger.

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 13+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Branch and Bound with Column Generation If our original problem was an IP, then so will be our new Dantzig-Wolfe master problem. Solving integer programs using column generator almost always requires that we generate during branch and bound. If we generate columns during the branch and bound process, we call it “Branch and Price” (Barnhart et al, 1998), or “IP Column Generation” (Wolsey 1998)

Branching Possibilities If column generating, branches must be respected by the column generator. That is, columns must satisfy the branches imposed. We don’t want this to complicate the generator too much.

Variable Branching: i i i Force a variable xj up or down to éxj ù or ëxj û respectively (xj is value of xj at node i), ie i i adding xj £ ëxj û or xj ³ éxj ù. Why not use variable branching? Problems occur if variables are forced down: · Applying an upper bound on a variable means it cannot be generated again in the column generator. How do we stop it reappearing? Need k’th shortest path- hard!

· With binary variables, a zero branch (xj=0) says little about the solution; many feasible solutions remain. (The 1-branch xj=1 is much more powerful.)

Constraint Branching: Binary Variables, Binary A-matrix, GUB Constraints · Developed by David Ryan and Brian Foster in 1981 · Column Generation Friendly

x xA1 xA2 xA3 xA4 xA5 xA6 xB1 xB2 xB3 xB4 1/2 1/2 1/4 1/4 1/3 2/3

yA1 3/4 st yA1 3/4 st 1 1 1 1 1 1 ³ 1 p1 yB1 1 yA2 1/4 yA2 1/4 1 1 1 1 1 1 ³ 1 p2 yB2 1 yA3 1/2 yA3 1/2 1 1 1 1 1 1 ³ 1 p3 yB3 2/3

1 1 1 1 1 1 = 1 pA

1 1 1 1 = 1 pB

Constraint branch on constraint pair Constraint branch on constraint pair 3,4 (ie yA3) 1-Branch X X X 1-Branch 0-Branch X X X 0-Branch

We branch to force (1-branch) or ban (0-branch) tasks for a specific person. Branch choice is based on ‘original y variables’ in Dantzig-Wolfe view.

Two branches possible above. Pick one of these. Each side of the branch is enforced by banning columns.

These branches are easy to enforce in a column generator. Eg, to force person A to undertake task 3, we tell the Person A column generator that Pi3=inf

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 14+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Column Generator Structures Column generators are typically: · Shortest Path (Dynamic Programming) · Nested shortest path · Shortest Path with Resource Constraints · TSP solutions (vehicle routing) · TSP solutions with resource constraints (eg vehicle routing) · General IP’s · Enumerators · Randomised enumerators

Note: Shortest Path/Dynamic Programming is only useful if there is significant merging of states. Otherwise, it is just inefficient enumeration.

Example: Simplified rostering. Must determine which days are worked, and which days are off for each staff member. Staff like to work 5 days on, then 2 days off (a ‘5/2’), but can also work 3/1, 4/2, 6/2., or 6/3. Each day worked is paid 8 hours. Staff need to work 80 hours over the fortnight roster period. min x st x

x

start Hours Worked 0h 8h 16h 24h 32h 40h 48h 56h 64h 72h 80h Monday Tuesday Wednesday Thursday Friday Saturday Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sunday Days in roster end

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 15+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. L.P. Dantzig-Wolfe Decomposition (Formally)

Consider the L.P. min z = cT x

éSù ébs ù ê úx = ê ú ëT û ëbT û x ³ 0

We assume Sx = bs involves ms constraints, Tx = bT involves mT constraints i.e., S is

ms ´ n , T is mT ´ n

Let RT = {x : Tx = bT , x ³ 0}. Then we can change our LP into:

min z = cT x min z = cT x

éSù ébs ù Sx = bs ê úx = ê ú Þ ëT û ëbT û x Î RT x ³ 0

RT is the set of feasible solutions to a subset of the constraints Tx = bT , x ³ 0 . Now, RT is polyhedral (defined by hyperplanes), and so any point in a bounded polyhedral set can be

written as a convex combination of its extreme points, i.e., if x Î RT then (assuming RT bounded), j T x = å l j x , e l = 1, l ³ 0 j ie, x = Xl , eT l = 1, l ³ 0 , where X=(x1 x2 … xt) is an n x t matrix (t is the number of extreme points), with x j being the th j j extreme point of RT . (Don’t worry yet about how to find x or how many there are.)

We can substitute x = Xl , eT l = 1, l ³ 0 into the original LP i.e. our LP becomes:

T min z = c x min z = cT x min z = cT x = cT Xl min z = éSù éb ù Sx = b Sx = SXl = b x = s Þ s Þ s ê ú ê ú T ëT û ëbT û x Î RT e l =1 x ³ 0 l ³ 0.

min z = cT x = cT Xl = f T l min z = Sx = SXl = Pl = b Þ s eT l = 1 l ³ 0.

j T T T j where P = SX , i.e. p j = Sx , and f = c X , i.e. f j = c x . That is, the new LP has columns P = SX , and costs f T = cT X

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 16+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. NB: 1. This LP is called the Master LP (MLP). It has ms+1 explicit constraints and as

many variables as extreme points of RT - often large.

1 t 2 The new columns are P = [p1 = Sx ... pt = Sx ]

é P ù ébs ù 3. The explicit constraints in the new problem are ê T úl = ê ú ëe û ë 1 û

4. If mt is large Þ

a. ms + mT constraints in original LP – large. But just ms+1 in Master LP. b. n vars in original L.P. but t vars in the new Master L.P (t often very large).

Pricing using a Column Generator j j éSx ù Each extreme point x Î RT leads to a column in the master of the form ê ú with cost ë 1 û cT x j . We want to find an entering variable for the master given some duals p T , ie we want j j some x Î RT with reduced cost rc( x )< 0. vector

T T j If we write p = (p1 ,p 0 ), we see that the reduced cost rc( x ) is given by scalar

j T T j rc( x ) = (c - p 1 S) x -p 0

constant constant; can ignore for choosing xj variable The usual Simplex criterion for entering variable is to minimize reduced cost and we therefore define a linear column generation sub-problem :

T T Given some master duals p = (p1 ,p 0 ) find the extreme point x Î RT , T T j RT = {x : Tx = bT , x ³ 0}.that has the most negative reduced cost (c - p 1 S) x -p 0 , i.e.

T T min (c - p 1 S )x

st Tx = bT x ³ 0

Q: Do we know we will get an extreme point of RT ? Why? Yes; LP only finds extreme points

Q: Do we need to solve this problem to optimality? Why? Often no; can stop if we get a -ve reduced cost col

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 17+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. s The solution x of this LP subproblem defines the new variable ls which enters the master T T s program if the reduced cost is negative, ie. if (c - p1 S)x - p 0 < 0. The column that enters s é ps ù éSx ù T s the Master LP basis is then ê ú = ê ú with objective coefficient f s = c x . ë 1 û ë 1 û

Having found a new variable ls to enter the Master LP basis we can determine a leaving variable using the usual LP criterion (applied to the MLP). The Master LP basis is then updated, thus leading to a new p -vector etc. (Each p vector defines a new subproblem, and

each subproblem generates a column for the master (i.e. an extreme pt of RT ).

Special Case - Multiple Sub-Problems

Consider some L.P

min z = cT x

éSù ébs ù ê úx = ê ú ëT û ëbT û x ³ 0

éT1 0 0 ù ê ú Consider the case where T = ê 0 T2 ú ê ú ë 0 Tp û T T T T Write c = [c1 ,c2 , ..., c p ] and S = [S1 S2 ... S p ] to correspond with the structure of T.

(Same with x and bT ).

The LP is then

T T min z = c1 x1 +...+ c p x p = bs

s / t S1 x1 +....+ S p x p = bs

T1 x1 = b1

Tp x p = bp

x1 , x2 , ..., x p ³ 0

Vectors Master L.P.

The master looks like it did before:

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 18+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. T T j min z = f l f j = c x j Pl = bs Pj = Sx eT l = 1 l ³ 0

Subproblem

T T min (c - p1 S)x - p 0

Tx = bT x ³ 0

Using the above matrix partitions, we find the subproblem becomes.

T T min å (ci - p 1 Si )xi

Ti xi = bi "i

xi ³ 0 "i.

This sub-problem can be treated as p separate problems since the xi ' s don’t affect each other in the constraints. We therefore solve p subproblems

min (cT -p T S )x i 1 i i Optimal solution Ti xi = bi s = xi xi ³ 0.

æ x s ö ç 1 ÷ s ç x2 ÷ ç . ÷ ç ÷ s é ps ù s T s Then x = ç . ÷ , so the new column = ê ú .with cost f s , where ps = Sx f s = c x ç ÷ ë 1 û ç . ÷ ç . ÷ ç s ÷ è x p ø

NB: Each subproblem is usually relatively small.

Alternative Approach for above problem In the above example, the column generator returned one large column that contained

solutions for each sub-problem given by Ti xi = bi , xi ³ 0 . An alternative way of solving this problem is to add a convexity constraint to the master for each constraint sub-problem

Ti xi = bi , xi ³ 0 , i=1,2,…,p. This gives the rostering-type formulations we saw before with p

column generators, one solving each sub-problem defined over Ti xi = bi , xi ³ 0 .

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 19+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Interpretation of Decomposition:

T T th The objective function of each subproblem has the form (ci - p1 Si )xi . The i subsystem proposes a s s solution say xi to the master program. (The master finds that this involves Si x i units of the S shared resources which are therefore not available to other subsystems. The direct cost of xi T th S is given by ci x i but the i subsystem, in proposing x i , must pay for its use of shared resource. (The MLP (with knowledge of the other subsystem’s demands) charges the price

p1 on the shared resources. Note that there is one element of p1 for each shared resource of the MLP. (i.e. constraint of MLP).

T th th Then dz MLP = p1 d b MLP and if the i subproblem uses the k resource then dbk < 0 (i.e less available for other subproblems. I.e. in demand) and if the k th resource is valuable then dz > 0

(if we seek to minimise) therefore p k < 0 .

th s T s The indirect cost incurred by the i subsystem in choosing p1 is represented by - p1 Si x i s th where S x i is the amount (³ 0) of each resource used. Therefore the indirect cost on the i subproblem is >0. (i.e. a charge against the i th subproblem).

The MLP then repeatedly asks the subproblems to propose solutions and each time adjust the prices of shared resources to:

a. discourage the use of shared resources in great demand i.e. sets p1k << 0 .

b. encourage the use of resources underutilyed. I.e. sets p1k = 0 and charges each subproblem nothing to use them.

The solution of the LP is provided by the MLP which computes weights (i.e l ) for each of

the proposals provided by the subproblems. As proposals are considered by the MLP and p1 is adjusted, early proposals will no longer be attractive and will be removed from consideration by setting l = 0 (i.e nonbasic). When no profitable new proposal is made the j solution is given as x = å l j x = X? . j

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 20+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Solving the LP: Different Pricing Calculations for the Entering Variable

Steepest Edge Pricing eg: John J. Forrest and Donald Goldfarb, Steepest-edge simplex algorithms for linear programming, Mathematical Programming 57 (1992), pp. 341-374

Minimisation Example:

7 Total Cost 8.60

C 1 1 0 0 0 0 X2 6 A Matrix b Pi 12 -8 1 31 0 5 11 -12 -1 -48 0 2 10 1 54 0.1 3 1 12 0.27 4 Bas Xn 0 0 3 X 4 4.6 19.8 36.8 0 0 rc 0 0 0 0 -0.1 -0.3 2 1 2 3 4 5 6 Basis Inverse Basis Xb 12 -8 1 0 0 0 -0 0.33 4.0 1 1 11 -12 0 -1 0 0 0.1 -0.07 4.6 2 2 10 0 0 1 0 0.8 -4.53 19.8 3 0 3 0 0 0 0 -1 -1.2 4.47 36.8 4 0 1 2 3 4 5 6 7 1 2 3 4 X1

Which is the most negative reduced cost (termed Dantzig reduced cost) entering variable? x6

If we increase this variable by 1, we move to the following (non-basic) solution.

7 Total Cost 8.33

C 1 1 0 0 0 0 X2 6 A Matrix b Pi 12 -8 1 31 0 5 11 -12 -1 -48 0 2 10 1 54 0.1 3 1 12 0.27 4 Bas Xn 0 1 3 X 3.67 4.67 24.3 32.3 0 1 rc 0 0 0 0 -0.1 -0.3 2 1 2 3 4 5 6 Basis Inverse Basis Xb 12 -8 1 0 0 0 -0 0.33 3.7 1 1 11 -12 0 -1 0 0 0.1 -0.07 4.7 2 2 10 0 0 1 0 0.8 -4.53 24.3 3 0 3 0 0 0 0 -1 -1.2 4.47 32.3 4 0 1 2 3 4 5 6 7 1 2 3 4 X1 Why is this a bad choice of edge to be travelling along?

What happens if we try the other direction?

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 21+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. 7 Total Cost 8.50

C 1 1 0 0 0 0 X2 6 A Matrix b Pi 12 -8 1 31 0 5 11 -12 -1 -48 0 2 10 1 54 0.1 3 1 12 0.27 4 Bas Xn 1 0 3 X 4 4.5 19 38 1 0 rc 0 0 0 0 -0.1 -0.3 2 1 2 3 4 5 6 Basis Inverse Basis Xb 12 -8 1 0 0 0 -0 0.33 4.0 1 1 11 -12 0 -1 0 0 0.1 -0.07 4.5 2 2 10 0 0 1 0 0.8 -4.53 19.0 3 0 3 0 0 0 0 -1 -1.2 4.47 38.0 4 0 1 2 3 4 5 6 7 1 2 3 4 X1

Comment: The step taken when increasing x5 is smaller then x6 This makes the reduced cost smaller

We can scale the problem to make the step sizes similar.

7 Total Cost 8.20

C 1 1 0 0 0 0 X2 6 A Matrix b Pi 12 -8 1 31 0 5 11 -12 -1 -48 0 2 10 4 54 0.1 3 1 12 0.27 4 Bas Xn 1 0 3 X 4 4.2 16.6 41.6 1 0 rc 0 0 0 0 -0.4 -0.3 2 1 2 3 4 5 6 Basis Inverse Basis Xb 12 -8 1 0 0 0 -0 0.33 4.0 1 1 11 -12 0 -1 0 0 0.1 -0.07 4.2 2 2 10 0 0 1 0 0.8 -4.53 16.6 3 0 3 0 0 0 0 -1 -1.2 4.47 41.6 4 0 1 2 3 4 5 6 7 1 2 3 4 X1

Comment: x5 now has the better reduced cost

Steepest Edge Pricing: Calculate rc(xj) / normj

Scale factor ‘normj’ normalises to avoid above effect. Must calculate initial scale factors... eg CPlex’s “Steepest Edge with Slack Initial Norms” Can make a huge improvement (particularly for .....degenerate problems). Scale factors have to be updated at each pivot for all non-basic variables... possibly slow (eg increase time per iteration by 8%). But what about Sprint approach...? Have to get norms for variables added to active set

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 22+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Steepest Edge Pricing - formally

T In the Dantzig rule we compute the variable with the smallest reduced cost c j - p a j .If x j is measured in different units, this has the effect of scaling c j and a j by some value g say. T Then r.c j = g (c j - p a j ). We would prefer that the choice of entering variable be independent of the scaling of the variables. Steepest edge pricing is one way of addressing this issue.

In a single RSM iteration we have

xˆ = x +a ys , æ- B -1a ö ç s ÷ ç 0 ÷ æ x ö ç M ÷ ç B ÷ ç ÷ where x = ç ÷ is a basic feasible solution, and y s = gives the change in all è 0 ø ç 0 ÷ ç ÷ ç 1 ÷ ç ÷ è 0 ø variables (basic and non-basic) when non-basic variable xs , s>m is increasing in value. (Note -1 that as is the column of the A matrix corresponding to xs , and so - B as is the change in the basic variables xB as xs increases.). æ- B -1a ö ç s ÷ ç 0 ÷ ç M ÷ For x to be the entering variable, the direction y = ç ÷ must be downhill with respect s s ç 0 ÷ ç ÷ ç 1 ÷ ç ÷ è 0 ø to c, i.e., we have the common entering variable condition for a negative reduced cost rc(xs):

T T -1 rc(xs ) = c ys = cs - cB B as = c -p T a < 0. s s 7 X2 Steepest edge pricing involves choosing the direction 6 that is most downhill with respect to c, i.e., at the greatest angle to c. 5

4 c q 3 ys

2 ideal direction 1

0 0 1 2 3 4 5 6 7 X1

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 23+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. T T c ys The angle q can be found from c ys = c ys cosq , => cosq = . c ys

For q close to 180° , we must want cosq as small as possible, i.e., choose the entering (hence non-basic) variable index s so that

cT y min cT y min cT y min rc(x ) s = j = j = j s ys j > m c y j j > m y j j > m y j

since c is a constant. (We see that we are simply scaling each reduced cost rc(xj) by its associated step size y j , and then choosing the best of these scaled values.) However, we need to compute æ- B -1a ö ç j ÷ ç 0 ÷ ç M ÷ y = ç ÷ j ç 0 ÷ ç ÷ ç 1 ÷ ç ÷ è 0 ø -1 for each candidate entering variable, and this requires calculating B a j for each possible entering variable x j ; this will be slow. However, recurrences have been developed to keep track of these vectors from iteration to iteration so they do not have to be calculated from scratch.

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 24+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. PRICING STRATEGIES:

Up to now, we have discussed a single method for determining the entering variable, i.e. T entering var = one with minimum reduced cost, i.e. s = arg min {c j - p a j }, j Î N and xs is the entering variable. This is known as the Dantzig rule. But there are many pricing strategies:

(a) Full pricing (see above)

(b) Multiple pricing: find best p r.c.’s. Then price only on subset (p=5 or 10) until r.c.’s not sufficiently negative.

(c) Partial pricing: Like multiple pricing, but one just chooses any p columns (not necessarily ones that have negative r.c.).

(d) Steepest edge (see later)

(e) Lambda pricing: Similar to steepest edge pricing in that it attempts to avoid problems with scaling of variables.

(f) Column generation: (see later).

Lambda Pricing

Try to take into account the objective function cj.

T lambdaj = cj / ( cj- rc(xj) ) = cj / ( p aj)

Example T T cj p aj rc(xj) = cj - p aj lj 1000 1010 -10 0.990 10 12 -2 0.833 1 2 -1 0.5

Of those columns with negative reduced cost, choose that with ...smallest lj

Not used much in practice (eg not in CPlex)

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 25+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Integerisation In general our LP solutions will be fractional, and so we have to integerise. But, for some choices of A, we get naturally integer solutions.

Totally Unimodular Matrices see Hoffman + Kruskal, 1956, and attachment. Naurally give integer x for any rhs b.

Totally unimodular (0,1) matrices arise if there is unique subsequence.... there is an ordering of the rows in which all columns with a one in row i have there next one (if any) in row j.

If the one’s are ordered activities, this means that all columns doing activity i do the same next activity, activity j. Limited sub-sequence can lead to solutions that are close to integer.

Balanced 0/1 Matrices with Unit Right Hand Side Berge (1972), Fulkerson, Hoffman & Opperheim (1974)

For 0/1 right hand side and ³, £ or = constraints (i.e set covering, partitoning, and packing), fractions can only occur if there exist odd-order 2-cycles, i.e. pxp sub-matrices with p odd having row and column sums of 2 (Berge (1972)). Some sample fractional structures demonstrating the odd-order 2-cycles A: 1* 0 1* = 1 1* 1* 0 = 1 “2 from 3” (each of the rows (& columns) has 2 1’s 0 1* 1* = 1 x: ½ ½ ½

A: 1* 0 1* 1 = 1 1* 1* 0 1 = 1 “3 from 4” 1 1 1 0 = 1 0 1* 1* 1 = 1 x: 1/3 1/3 1/3 1/3

1* 0 1* 1* 1* 0 0 1* 1* 0 1 0 1 0 0 0 0 1# 0 0 0 1# “2 from 3” A: 1# 1# 0 0 0 0 1# 1# 0 0 0 0 1# 1# 0 0 0 0 1# 1# x: ½ ½ ½ ½ ½ ½ ½ ½ Odd-order 2-cycles are necessary, but not sufficient, for there to be fractions. (See next.) Constraint branching (see next) removes columns, breaking cycles.

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 26+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Perfect Matrices with Unit Right-hand Side (Padberg 1974)

A (0,1) matrix is perfect if all extreme points of {x: Ax £ e, x³0} are integer. Note: Adding slacks turns this set packing problem into set partitioning (but not set covering).

In perfect matrices, odd-order 2-cycles can occur, but are neutralised (stopped from producing fractions) by other contraints.

Eg A: 1 1 1 £1 <- eg ‘£’ GUB constraint 1* 0 1* £1 1* 1* 0 £1 “2 from 3” (each of the rows (& columns) has 2 1’s 0 1* 1* £1 Soln ½ ½ ½ (which we had before) is now not feasible because of GUB constraint.

A non-singular pxp sub-matrix with row and column sums equal to b may have a fractional solution with each variable being 1/b. If b³2, these variables are fractional. However, if some other ‘integerising’ £1 constraint on these variables includes more than b of these variables (has more than b 1’s), this 1/b solution becomes infeasible because it violates this constraint.

Perfect Matrices guarantee integer solutions because they do not contain any sub-matrices with row and column sums of b unless these sub-matrices have associated integerising constraints.

Note: GUB constraints are “integerising” for all the columns they contain. So, if a solution is fractional, it must fractionate across the GUB constraints.

A: 1 1 1 = 1 GUB 1 1 1 1 = 1 GUB 2 0 1* 0 0 1* 1 = 1 0 1* 1* 0 0 1 = 1 0 1 1 0 1 0 = 1 0 0 1* 0 1* 1 = 1 x: 1/3 1/3 1/3 1/3 1/3 1/3

NB: Adding cuts adds ‘perfect matrix’ structure to matrices.

Ideal Matrices P. Nobili, A. Sassano, (0, ±1) Ideal Matrices, Mathematical Programming 80 (1998) 265-281 · Give integer solutions to set covering (Ax³e), but not well characterised (yet!)

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 27+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Branch and Bound Basically “Enumeration with Upper and Lower Bounds”. Also called “divide and conquer”. · Upper bounds from heuristics and naturally integer solutions to LP relaxations · Lower bounds from LP relaxation

If we generate columns during the branch and bound process, we call it “Branch and Price” (Barnhart et al, 1998), or “IP Column Generation” (Wolsey 1998)

See [LS97] Linderoth, J.T., Savelsburgh, M.W.P., A computational study of search strategies for mixed integer programming, Georgia Inst. of Technology, 1997, for a good analysis of branch and bound strategies; much of this summary comes from here.

Branch and Bound Issues · Which node to explore next? · What branch to make next? · How much processing to do at each step? · Do we want to prove optimality? · Big or small LP – IP duality gap? · Will we be back-tracking? · Is feasiblity hard? is ‘good quality’ hard?

Branching Decisions · “What about our solution are we going to decide next?” · How will our children nodes differ? · Which variable, constraint, SOS, or other branch do we choose? · How balanced is the branch (number of solutions on each side of the branch)? · Unbalanced ok if we don’t intend to explore the other side · Variable branching on binary variables is very unbalanced · Make the important decisions first · If a costly decision has to be made, make that branch earlier, not later · How much do we change the solution? · Branching 0.9 to 1 (‘gentle branch’); small objective increase · Branching 0.5 to 0 or 1; both sides may increase objective · Are we trying to find a good solution, or prove that some solution is optimal? · Finding a good solution - choose the gentle branches · Proving optimality – choose the “0.5” branches · How much do we believe the LP vs our external knowledge?

Predicting LP-Objective after Branching We can often estimate the impact of a branch on the LP objective function.

i i Assume z is the objective at the current node, node i. Assume some xj is fractional at node i’s i i• i i¯ i optimal solution. Let z (xj ) and z (xj ) be the new objective when xj is increased to (at least) i i éxj ù or decreased to (no more than) ëxj û respectively and the problem is resolved.

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 28+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. i i• i i¯ Three methods to estimate z (xj ). (Estimating z (xj ) is analagous.) i i i (1) Find some variable xh that when it enters and increases to value xh new, increases xj up to i i i• i i i éxj ù. Then, z (xj ) » rc(xh )(xh new – xh ). (see Nemhauser and Wolsey 1998, p364) (2) Strong Branching (see below): test the branch by branching the variable, and performing a limited number of iterations on the new problem. (One dual pivot works well.) Provides i i• upper bound on z (xj ) k • k k• k k k (3) PseudoCosts: [LS97] Let p j = [z (xj ) - z ] / [éxj ù-xj ] be the actual rate of change in k k k the objective function that occurred when xj was increased from xj to éxj ù at some node • • k. Let pj be the average over all nodes where xj was branched up; pj is called xj’s ‘(up- )pseudo cost’. We assume the objective will change at the same rate this time: i i• i • i i• z (xj ) » z + pj ( éxj ù- xj ) • If xj has never been branched before, can put pj =cj (not very good), or test the branch using partial solving (above) (recommended). Not so useful with many 0/1 variables as the same variable won’t be branched often in the tree. What about constraint branching?

These methods can be combined; eg weighted combination of pseudo-cost (global) information and (local) strong branching results.

Predicting Integer Solution Objectives These are mainly used in node selection rules (see later).

Techniques exist for obtaining bounds on the best integer solution that can be obtained from a fractional LP solution. These use reduced costs and the requirement that variables be integer; see [LS97]. Other techniques include:

· Best (Integer) Projection i i i i · Integer infeasibility at node i is si=Sj ( min(xj - ëxj û, éxj ù- xj ) i i i · Best integer solution that can be found from fractional node i, z [int] is given by z [int]»z + 0 si (zU-z )/s0, where zU is current upper bound, and node 0 is the root node. · Best (Pseudo-cost+Rounding) Estimate · Modify above approach to use pseudo costs assuming each fractional variable can round in its cheapest direction · Probabilistic Pseudo-cost Estimate · Modify best estimate to assume a variable rounds down or up with various probabilities · Probabilites are based on number of non-zeros in heuristically obtained solutions.

Choice of Branching Variable Range of choices to pick which variable to branch on: · Pick most fractional (good for proving optimality) · Pick least fractional that is not integer (good for finding integer solutions) · Use user-provided priority order on variables · Enhanced Branching: Of a set of most (or sufficiently) fractional variables, pick that variable with greatest cost [Savelburgh’s MINTO default]

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 29+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. · Use one of the above LP objective estimates, eg · Strong Branching (developed by CPlex). Given a set of fractional variables, try each side of each branch by performing a fixed number of simplex iterations. i i• i i¯ · Must use z (xj ) and z (xj ) estimates in some sensible way; examples in the literature include: i i¯ i i• · eg, choose that branch that maximises min[ z (xj ) , z (xj ) ] i i¯ i i• · eg, choose that branch that maximises z (xj ) + z (xj ) i i¯ i i• · eg, choose that branch that maximises z (xj ) - z (xj )

For an example of strong branching (using the last rule above) with constraint branching and column generation, see · Klabjan, D., Johnson, E.L., Nemhauser, G.L., Solving Large Airline Crew Scheduling Problems: Random Pairing Generation and Strong Branching, Georgia Inst. of Technology, 1999 [KJN99]

Node Choice · “Which partial solution are we going to explore next?” · Depth first search = “LP Dive” · Choose one of the just created children · If both children are bounded or infeasible, step back up tree to first unexplored node · focuses on finding a (hopefully good) solution quickly · number of active (unexplored, unbounded and feasible) nodes stays small · finding a solution early allows bounding · may be hard to prove optimality · Best first search · Always explore best active node in tree · choice of ‘best’ can use LP and IP estimates discussed above · good for (eventually) finding a very good solution · large number of active nodes at any time · To choose between children · choose least fractional (good for quick solutions) or use estimates discussed above · can backtrack if first child gives big objective increase · if branch is unclear, eg on 0.5’s, · use estimates discussed above · fully solve both children, and continue from the better · Blended strategies often used in practice · ‘Depth first’ to find a solution · Switch to ‘best first’ to prove optimality · ‘Multi-start’ depth first · if depth first starts back-tracking, switch to a new better (higher) node · eg use depth first to explore both sides of · ‘uncertain branches’, eg 0.5’s · critical branches, eg at top of tree · Avoid ‘playing in the muck’; don’t generate sequences of similar solutions at the bottom of the tree

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 30+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Node Evaluation · What solution do we start from? · Normally use parent LP solution · What algorithm do we use to re-optimise with the new branch? · Dual simplex can’t use column generator · Resolve without column generation? · Use primal simplex · Phase 1 or Big M to drive out banned columns, or push variables to new bounds · How much time do we spend now vs later evaluating a node? · Can partially solve a node (eg using a limited number of iterations) to get bounds on objective function; see estimate methods above · Use heuristics to get node upper bounds · Can solve to ‘objective function optimality’ only · If objective equals parent’s, must be optimal, even if not dual feasible · Can leave node totally un-evaluated · Allows multiple branches to be applied in succession

Partial Node Solution We can partially solve nodes · Stop LP before optimality obtained · Eg limited number of iterations, or stop when reduced costs become near zero · Obtaining optimal solution is not important · LP objective only used for bounding · Can generate lower bounds on optimal LP value from partial solution (see below), allowing node to be bounded in normal way · LP optimal variable values only used for branching decisions · LP decision variables (probably) change only slightly in final iterations · Saves time spent in ‘tail-off’ · Particularly important for column generators, as finding an entering column often gets harder as the master gets closer to optimality · Objective function bounds can help node selection · Save basis of partially solved nodes for re-use if the node is explored later.

Bounds on LP Solutions Consider some minimisation LP solution that is sub-optimal, i.e. there are columns with –ve reduced costs. What can we say about the optimal solution?

Clearly, the current solution is an upper bound. We seek good lower bounds.

Two types of lower bounds: Dual variable based and Lagrangian. Both require a full pricing of the variables, or perhaps an iterative application (modify duals, generate, repeat if not dual feasible) if column generation is used.

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 31+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Farley’s Dual-Based Bounds – Dual Variable Scaling See A.A. Farley, “A note on bounding a class of linear programming problems, including cutting stock problems.” Operations Research 38, 1990, p922

Basic Idea: Modify the duals to become dual feasible; by weak duality, this dual solution gives a lower bound on the primal problem.

Example: Reduce all dual variables to 0. For positive column costs, this solution is dual feasible. The dual objective pTb is 0. This is a (useless but valid) lower bound on cTx.

Consider the primal/dual pair: P min cTx : Ax ³ b, x³0 D max bTp : ATp £ c, p³0

As usual, we will assume c³0.

Assume we have some basic sub-optimal solution to P and associated duals p. We are sub- optimal (i.e. not dual feasible) so for some j, T rc(xj) = cj – (Aj) p < 0

Consider some new set of duals p' = a p, 0£a<1. We know a=0 gives dual feasible (example above), and that a=1 is not dual feasible. What is the largest value of a that is dual feasible?

T Solve: max a : rc'(xj) = cj – (Aj) (ap) ³0 for all columns j in A

T T Gives a = min j {cj / (Aj) p : (Aj) p > 0}

Noting that p ³0 and hence p' = a p ³0, the set of new duals p' = a p satisfy both feasibility requirements for (D) above. From weak duality, any feasible solution to D is a lower bound on a feasible solution to P. Therefore, a lower bound on the optimal objective value cTx* to (P) is given by

T * T T T T T -1 T T c x ³ b p' = b (ap) = a b p = a p b = a cB B b = a c x q

Notes: · For column generation, must be applied iteratively as finding a is hard

Lower Bounds from Dual Feasibility by Additive p Changes A dual feasible solution can often be formed by decreasing one by one dual variables for successive constraints until all reduced costs are non-negative. If none of the dual variables become negative for any ³ constraints, this will allow a lower bound to be calculated. (This technique is used in some Lagrangian heuristics).

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 32+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Lower Bounds from Dual Feasibility with GUB Constraints Consider a problem with equality GUB constraints. We will assume that any columns that are not part of a GUB constraint (eg slacks) have non-negative reduced costs. Let Xi be the set of columns aj associated with GUB constraint i, and let rcmin(Xi)=min( rc(xj) : aj Î Xi ) be the minimum reduced cost in Xi for some current solution with objective value z. A lower bound on the optimal solution, z*=cTx*, is given by

z* ³ z + Si rcmin(Xi)

To see this, we note that at least one column in each Xi is basic and so rcmin(Xi)£0 for all Xi. Assume rcmin(Xi) <0 for some i, and hence the solution is not dual feasible. If the dual variable pi associated with GUB constraint i, is replaced by pi'=pi+rcmin(Xi) then all rc(xj) : aj Î Xi become non-negative. Applying this process to all required Xi gives a dual feasible solution, with a bound (p')Tb. The result follows from unit right hand sides on the GUB constraints, and pTb = cTx at the current solution. Implementation: Partial pricing means we don’t normally price all individuals, and so cannot calculate the bound. However, we can bound the bound as follows! Assume we stop whenever

an individual i has a best reduced cost rcmin(Xi) > -ei, and when the objective z < z* + e, where z* is the unknown optimal solution. We assume e

p+1, then p+2, p+3, ... p+k (assuming p+k wraps around back to 1) until rcmin(Xp+k) > -ep+k, in

which case individual p+k’s column(s) enters, or until Sp

Lagrangian Lower Bound with GUB constraints Standard Lagrangian techniques can be used to get lower bounds using the current set of LP duals. Assuming the GUB constraints are not relaxed, the Lagrangian sub-problem solution for a given set of duals is simply the set of most-negative column from each Xi (ignoring any slacks/surpluses).

Branching Possibilities If column generating, branches must be respected by the column generator. That is, columns must satisfy the branches imposed. We don’t want this to complicate the generator too much.

Variable Branching: i i i Force a variable xj up or down to éxj ù or ëxj û respectively (xj is value of xj at node i), ie i i adding xj £ ëxj û or xj ³ éxj ù. Why not use variable branching? Problems occur if variables are forced down: · Applying an upper bound on a variable means it cannot be generated again in the column generator. How do we stop it reappearing? Need k’th shortest path.

· With binary variables, a zero branch (xj=0) says little about the solution; many feasible solutions remain. (The 1-branch xj=1 is much more powerful.) Fixing variables at 1 that are 1 in the LP can be a useful heuristic for big IP’s.

General form: “The number of times the column ai appears in the solution must be integer.”

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 33+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Constraint Branching: Binary Variables, Binary A-matrix, GUB Constraints

xA1 xA2 xA3 xA4 xA5 xA6 xB1 xB2 xB3 xB4 1/2 1/4 1/4 1/3 2/3

1 1 1 1 1 1 = 1 1 1 1 1 = 1 yA1 1 st 1 1 1 1 1 1 ³ 1 yB1 1 yA2 0 1 1 1 1 1 1 ³ 1 yB2 1 yA3 3/4 1 1 1 1 1 1 ³ 1 yB3 2/3

Constraint branch on constraints 1-Branch 0-Branch

We branch to force (1-branch) or ban (0-branch) tasks for a specific person. Branch choice is based on ‘original y variables’ in Dantzig-Wolfe view.

Two branches possible above. Pick one of these. Each side of the branch is enforced by banning columns.

These branches are easy to enforce in a column generator. Eg, for force person A to undertake task 1, we put PiA1=inf

General form: “The number of times we have person p working shift q in the solution must be integer (0 or 1 in fact).”

Constraint Branching: Binary Variables, Binary A-matrix, no GUB Constraints General case of above.

xA1 xA2 xA3 xA4 xA5 xA6 xA7 xA8 xA9 xA10 3/8 1/8 1/8 1/2 1/2 1 1 1 1 1 1 1 1 = 1 2 1 1 1 1 1 1 = 1 3 1 1 1 1 1 1 = 1 4 1 1 1 1 = 1 5 1 1 1 1 1 1 = 1 Pair Coverage 1 2 -1 -1 0 -1 -1 XXXX 0.5 0 0 -1 0.5 1 3 0 -1 -1 0 XXXX -1 -1 0 -1 0.5 0.5 1 4 -1 XXXX -1 0 XXXX XXXX 0.5 -1 0 0.5 1 1 5 -1 -1 0 -1 -1 XXXX -1 0 0 0.5 0.5 2 3 -1 0.375 -1 -1 -1 -1 -1 0 -1 -1 0.375 2 4 XXXX -1 -1 -1 -1 XXXX 0.5 -1 0 -1 0.5 2 5 XXXX 0.375 0 XXXX 0.125 XXXX -1 0 0 -1 0.5 3 4 -1 -1 XXXX 0 XXXX -1 -1 -1 -1 0.5 0.5 3 5 -1 0.375 -1 -1 -1 -1 XXXX 0 -1 0.5 0.875 4 5 XXXX -1 -1 -1 -1 XXXX -1 -1 0 0.5 0.5 Constraint branch on constraint pair 1-Branch 0-Branch General form: “The number of times shifts p and q occur together in a solution must be integer (0 or 1 in fact).”

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 34+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Follow-on Branching Special case of above where the ‘constraint pair’ is restricted to tasks that occur in immediate succession. Eg if after flight sector 5 you can do one of sectors 6, 7, or 8, then a constraint branch can force (or ban) each of these options. Good for column generation as the sectors become one ‘multi-sector’... you either choose all or none of it. Eg, see [KJN99]

General Constraint Branching: Binary Variables, Binary A-Matrix (both above cases) Developed by Ryan and Foster. Let J(s,t) = { j | asj = 1 and atj = 1}. J(s,t) is the set of columns covering both constraints s and t. Suppose activities (constraints) s and t appear together (i.e. occur in the same column)

at a fractional value in the optimal LP solution (i.e. 0 < å x j < 1). jÎJ(s,t) Then in an integer solution:

either activities s and t must occur together (i.e. å x j =1) jÎJ(s,t)

or activities s and t must not occur together (i.e. å x j = 0 ) jÎJ(s,t) So we find constraints s and t with max é ù ê å x j ú < 1 s, t ë jÎJ(s,t) û 1 and then force s and t to occur together by setting xj = 0 for all jÎ Jban (s,t) where 1 Jban (s,t) = { j | (asj = 1 and atj = 0) or (asj = 0 and atj = 1)}. This is called the 1-branch.

In the 0-branch, we force s and t not to occur together by banning variables where they 0 happen together, i.e. we ban variables in Jban (s,t) 0 Jban (s,t) = { j | (asj = 1 and atj = 1) }

Note: A sequence of constraint branches leads (eventually) to a balanced matrix, and hence an integer solution for all variables in at least 1 constraint with unit right hand side.

Constraint Branching: Binary Variables, Integer A, GUB Constraints

xA1 xA2 xA3 xA4 xA5 xA6 xB1 xB2 xB3 xB4 1/2 1/4 1/4 1/3 2/3

1 1 1 1 1 1 = 1 1 1 1 1 = 1

yA1 3 1/2 st 3 6 6 1 1 7 1 3 6 3 ³ 5 yB1 2 1/3

yA2 2 3/4 4 1 3 7 2 1 4 3 6 6 ³ 8 yB2 5 1/3

yA3 1 1/2 2 2 1 2 1 1 3 2 0 1 ³ 3 yB3 1 2/3

Constraint branch on constraint pair 1-Branch 0-Branch

i i i We impose branches on the y variables of the form ypq£ ë ypq û or ypq ³é ypq ù, where ypq is the value of ypq at node i. These are enforced by banning all columns for person p which have apq> i i ë ypq û or apq <é ypq ù respectively for some task (non-GUB constraint) q, where A=(apq) is the non-GUB matrix in the problem.

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 35+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. i Note that branching may be required even if ypq is integer. This can occur when fractional x i sum to give an integer ypq .

Branches are generally easy to enforce in column generators.

General form: “The number of times person p works shift q must be integer.”

SOS Branching: Binary Variables, Arbitrary A, GUB Constraints

xA1 xA2 xA3 xA4 xA5 xA6 xB1 xB2 xB3 xB4 soln: 1/2 1/4 1/4 1/3 2/3

st 1 1 1 1 1 1 = 1 1 1 1 1 = 1 3 6 6 1 1 7 1 2.5 6 3 ³ 5 4 1 3 7 2 1 4 3 6 6 ³ 8 2 2 1 2 1 1 3 2 0 1 ³ 3

Weight 2 3 4 6 7 9 2 3 5 7 Soln Weight 5 5.333333333

SOS branch on GUB constraint 1-Branch 0-Branch

We forms ‘specially order sets’ (SOS’s) of variables with the property that, at most, only 1 variable from the set can appear in the solution . Specially ordered sets arise from =1 (or £1) GUB constraints. Consider some specially ordered set Xp. Each variable xj in Xp has a i (unique) weight wj. At some node i in the branch and bound tree, the average weight w (Xp) of variables in Xp is given by i i w (Xp)=SjÎXp wjxj i i We impose branches on w (Xp) of the form w (Xi) £ ë w (Xi) û or w (Xi) ³é w (Xi)ù. These i i are enforced by banning all columns in Xp which have wj> ë w (Xp)û or wj <é w (Xp)ù respectively.

Note: If the weights are not unique, SOS branching may be insufficient to form an integer solution for xj in Xp. Can randomly perturb wj before starting if required. Above description is for ‘Type 1’ SOS branching; it can be used for any type of variables (integer, binary, real), but without a GUB constraint, this information must be given externally (it is not in the model). ‘Type 2’ allows for up to 2 variables from Xp to be in the solution as long as they are adjacent in Xp, where Xp is now ordered. Type 3 allows only +1’s and –1’s in the coefficients, but decreases the right hand side for each –1.

SOS branching works with overlapping Xp, i.e. a column can belong to any number of sets. Typically easy to handle in column generators if weight is ‘column generation friendly’, i.e. derived from a column property associated with an arc or a node in shortest path.

Example: wj=‘time spent once sector 5 is completed before starting the next sector.’ (KJN99)

General Form: “The number of times that a column from Xp with a weight above (less than) some specified value must be integer (0 or 1 in fact).”

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 36+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Generalised SOS Branching (‘Attribute Branching’): Integer Variables, Arbitrary A, no GUB Constraints

xA1 xA2 xA3 xA4 xA5 xA6 xA7 xA8 xA9 xA10 soln: 1 3/5 3 1/3 2 1/5 1/4 1 3/4 2/3

st 1 1 1 1 5 1 0 2 2 1 ³ 2 0 2 2 1 0 0 1 0 2 4 ³ 3 3 6 6 1 1 7 1 2.5 6 3 ³ 5 4 5 5 7 2 1 4 3 5 6 ³ 8 2 2 3 2 1 1 3 2 2 1 ³ 3

X= 0-Branch

1-Branch

Consider any set X of integer variables. At some node i in the branch and bound tree, the number of times columns from X are used is given by i i n (X) =SjÎX xj We impose branches on n(X) of the form n(X) £ ë ni(X) û or n(X) ³é ni(X) ù.

Each branch is enforced by adding a constraint (cut) to the problem. These cuts are local; they are not valid inequalities, and have to be removed when stepping up the tree.

Where possible, choose X to be some ‘attribute’ that is important to cost, and is ‘column generator friendly.’ Eg, branch on “number of full weekends off”. For each full weekend off that the column generator includes, the reduced cost changes by the Pi for the added cut.

General Form: “The total number of times that we use columns from X in the solution must be integer.”

Generalised Constraint Branching: Integer Variables, Arbitrary A-matrix, no GUB Constraints

xA1 xA2 xA3 xA4 xA5 xA6 xA7 xA8 xA9 xA10 soln: 1 3/5 3 1/3 2 1/5 1/4 1 3/4 2/3

st 1 1 1 1 5 1 0 2 2 1 ³ 2 0 2 2 1 0 0 1 0 2 4 ³ 3 3 6 6 1 1 7 1 2.5 6 3 ³ 5 4 5 5 7 2 1 4 3 5 6 ³ 8 2 2 3 2 1 1 3 2 2 1 ³ 3

Branch on a= 0-Branch

1-Branch

We impose branches of the generalised SOS form, where

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 37+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. X = X³a = { j : aj ³ a } where ‘a’ is some ‘reference column’ (typically chosen from the A-matrix).

To choose our reference column at non-integer node i (Vanderbeck and Wolsey, 1996): i Find some column aj for which xj is the only fractional value in X³a.

i i Note that aj is any maximal (undominated) column from F ={ j : xj is fractional }.

Must be enforced by adding a constraint.

Vanderbeck and Wolsey show how these constraint branches can be enforced in general IP- based column generators by using 0/1 variables that determine when a column belongs to X³a, and hence when its reduced cost include the dual variable for the cut associated with Xp.

General Form: “The number of times we have columns aj ³ a appearing in the solution must be integer.”

See: F. Vanderbeck, L.A. Wolsey, An exact algorithm for IP column generation, Operations Research Letters 19 (4) (1996) pp. 151-159.

Column Generator Structures Column generators are typically: · Shortest Path (Dynamic Programming) · Nested shortest path · Shortest Path with Resource Constraints · TSP solutions (vehicle routing) · TSP solutions with resource constraints (eg vehicle routing) · General IP’s · Enumerators · Randomised enumerators

Note: Shortest Path/Dynamic Programming is only useful if there is significant merging of states. Otherwise, it is just inefficient enumeration.

Can blend enumeration and shortest path · Enumerate high level structure of column, eg ‘all days on/off that give a 40 hour week’ · Fill in column detail using dual information, eg via shortest path, eg ‘what is done during days on’

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 38+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Column Generation Issues: · Many issues similar to SPRINT pricing. · How many columns do we return in each generate? · Dynamic Programs often end in multiple states, offering many ‘good’ columns · Number to return depends on speed of generator

LP Relax 1200 B&B 1000 800 600

Time (s) 400 200 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Minimum Quality of Returned Columns The effect of multiple column generation (fewer columns per generate on the right)

· When many columns are available, select a range of different columns

· When do we call the generator? 1250 0

1200 -200

1150 -400

1100 -600 Reduced Cost of the Entering Variable

1050 -800 Reduced Cost Objective Value Objective Value 1000 -1000 4000 4400 4800 5200 5600 6000 6400 6800 7200 7600 Generation At Iteration Dark Bars Impact of Calling the Column Generator · Crashing the basis · Best to start from a good basis. · One strategy is to use the ‘remaining b’ (remaining workload to cover) (possibly scaled) as a substitute for p, updating remaining workload as columns are generated and contribute to covering the work.

Plots taken from: Mark Smith, Optimal Nurse Scheduling using Column Generation, Masters thesis, Department of Engineering Science, School of Engineering,University of Auckland, 1995

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 39+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. Integerisation Strategies Branch and Bound

The course will introduce set partitioning, set covering and set packing models and illustrate their use with several case studies.

Column generation will be discussed from both a ‘natural’ and a Dantzig-Wolfe viewpoint.

We will then consider branch and bound and use this to motivate constraint branching and its links to perfect and balanced matrices, and also limited subsequence.

Constraint branching will be discussed, and their use contrasted. This will include recent work on constraint branching via cuts for non-binary problems. General problem-motivated branching will be mentioned.

The choice of alternative column generators (enumerative, (k’th-)shortest path, and ‘delicate’ blends of the two) will be covered.

Other topics covered will include a selection from branch and cut, practical IP implementations, heuristic solution processes, ‘lift and project’ cuts, new work on stabilized column generation, heuristics for set partitioning, and bounded early LP termination. Some of this material will be covered in seminars researched and presented by students enrolled in the course.

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 40+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. yA1 yA2 yA3 yB1 yB2 yB3 yA1 yA2 yA3 yB1 yB2 yB3 min 1 1 1 1 1 1 y min y

st 1 1 ³ 1 p1 st

1 1 ³ 1 p2

1 1 y ³ 1 p3 y

1 1 1 ³ 1 p4

1 1 1 £ 2 p5

1 1 1 ³ 2 p6

1 1 1 £ 3 p7

yA1 yA2 yA3 yB1 yB2 yB3 Î {0,1)

Note 3: The original objective function can be arbitrarily complex in each of the subproblem variables, but for linear programming, must be additive across sub-problems, eg

min z = fA(yA1, yA2, yA3) + fB(yB1, yB2, yB3).

This is possible because ....we evaluate fA(yA1, yA2, yA3) for each column generated.

However, a complex fA(yA1, yA2, yA3) can make for a complex generator.

Note 4: If some column, eg aAp is a convex combination of other columns,

aAp = Sl(k) l(k) aAk : Sl(k) l(k)=1 but

fA(aAp) > Sl(k) l(k) fA(aAp) then column aAp will never appear in the LP solution

Our original formulation:

yA1 yA2 yA3 yB1 yB2 yB3 min 1 1 1 1 1 1 y

st 1 1 ³ 1 p1

1 1 ³ 1 p2

1 1 y ³ 1 p3

1 1 1 ³ 1 p4

1 1 1 £ 2 p5

1 1 1 ³ 2 p6

1 1 1 £ 3 p7

yA1 yA2 yA3 yB1 yB2 yB3 Î {0,1}

Our Column Generation Reformulation:

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 41+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. xA1 xA2 xA3 xA4 xA5 xA6 xB1 xB2 xB3 xB4 min 1 1 1 1 1 1 1 1 1 1 x

st 1 1 1 1 1 1 ³ 1 1 1 1 1 1 1 ³ 1 1 1 1 1 1 1 x ³ 1 1 1 1 1 1 1 = 1 1 1 1 1 = 1

xij Î {0,1}

Example solution feasible for original, but not new formulation: yA1 yA2 yA3 yB1 yB2 yB3

Note 7: Where is the ‘convexity constraint’ in the stock cutting problem?

Hint: The stock cutting problem does not have natural sub-problems, so ...create them! All sub-problems are the same

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 42+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. FIXES TO MAKE

2001: · We started with a simple rostering example., then into col gen forms. · Add something about NP-hard in sub-problem for LP complexity in Dantzig Wolfe. · Draw a price/pivot diagram. · Do Benders by introducing form of master first, then talking about how we find the cuts; merge this into main notes.

2000: DONE! In generating extreme columns, the 2 cases are not Pi(1) < Pi(2) and P1(1)>Pi(2), but instead more complicated. (Depend on slope of frontier; in this case, frontier is a line from (6,0) to (0,4). Ratio is 2/3?

In Dantzig-Wolfe, costs are wrong... we have unit costs for master and original. Also, sub- problem has to have the dual for Pi_A.

Example of solution in original not master is 0.5 for all x’s.

Master problem has a generator sub-problem that is naturally integer, hence the LP is NOT strengthened, contrary to example in notes.

See 2001 Second handout for more material including a rostering col generator.

(c) A. Mason www.esc.auckland.ac.nz/Mason/ 43+

This material is not to be distrubted; contact the author for the latest version and permission to distribute. (c) A. Mason www.esc.auckland.ac.nz/Mason/ 44+

This material is not to be distrubted; contact the author for the latest version and permission to distribute.