Computational Optimisation

Gregory Gutin

April 4, 2008 Contents

1 Introduction to Computational Optimisation 1 1.1 Introduction...... 1 1.2 Algorithm efficiency and problem complexity ...... 3 1.3 Optimality and practicality ...... 4

2 Introduction to 6 2.1 Linearprogramming(LP)model ...... 6 2.2 FormulatingproblemsasLPproblems ...... 6 2.3 GraphicalsolutionofLPproblems ...... 9 2.4 Questions ...... 13

3 Simplex Method 15 3.1 StandardForm ...... 15 3.2 SolutionsofLinearSystems ...... 17 3.3 TheSimplexMethod...... 18 3.4 Questions ...... 22

4 Linear Programming Approaches 25 4.1 Artificial variables and Big-M Method ...... 25 4.2 Two-PhaseMethod...... 26 4.3 Shadowprices...... 28 4.4 Dualproblem...... 29 4.5 DecompositionofLPproblems ...... 32

1 CONTENTS 2

4.6 LPsoftware...... 34 4.7 Questions ...... 34

5 Modeling 36 5.1 Integer Programming vs. Linear Programming ...... 36 5.2 IPproblems...... 37 5.2.1 Travelling salesman problem ...... 37 5.2.2 KnapsackProblem ...... 38 5.2.3 Binpackingproblem...... 38 5.2.4 Set partitioning/covering/packing problems ...... 39 5.2.5 Assignment problem and generalized assignment problem ...... 40 5.3 Questions ...... 41

6 Branch-and-Bound Algorithm 43 6.1 A Simple Example for Integer and Mixed Programming ...... 43 6.2 Knapsackexample ...... 48 6.3 Branchingstrategies ...... 50 6.4 MAX-SATExample ...... 51 6.5 Questions ...... 53

7 Construction Heuristics and Local Search 56 7.1 Combinatorial optimisation problems ...... 56 7.2 Greedy-typealgorithms ...... 57 7.3 SpecialalgorithmsfortheATSP ...... 59 7.4 Improvementlocalsearch ...... 61 7.5 Questions ...... 63

8 Computational Analysis of Heuristics 65 8.1 ExperimentswithATSPheuristics ...... 65 8.2 Testbeds...... 65 8.3 ComparisonofTSPheuristics ...... 68 CONTENTS 3

9 Theoretical Analysis of Heuristics 72 9.1 Propertyof2-Optoptimaltours ...... 72 9.2 ApproximationAnalysis ...... 73 9.2.1 Travelling Salesman Problem ...... 73 9.2.2 KnapsackProblem ...... 75 9.2.3 BinPackingProblem...... 76 9.2.4 OnlineProblemsandAlgorithms ...... 78 9.3 DominationAnalysis ...... 78 9.4 Questions ...... 80

10 Advanced Local Search and Meta-heuristics 82 10.1 Advanced Local Search Techniques ...... 82 10.2 Meta-heuristics ...... 83 10.2.1 SimulatedAnnealing ...... 83 10.2.2 GeneticAlgorithms...... 83 10.2.3 TabuSearch ...... 85 CONTENTS 4

Abstract

This notes accompany the final year course CS3490: Computational Optimisation. We will study basic results, approaches and techniques of such important areas as linear and integer programming, and combinatorial optimisation. Many applications are overviewed.

This document is c Gregory Gutin, 2005.

Permission is given to freely distribute this document electronically and on paper. You may not change this document or incorporate parts of it in other documents: it must be distributed intact. Please send errata to the authors at the address on the title page or electronically to [email protected]. Contents

i Chapter 1

Introduction to Computational Optimisation

1.1 Introduction

Computational Optimisation (CS3490) will have 3 lectures a week. The aim is to intro- duce classical and modern methods and approaches in computational optimisation, and to overview applications and software packages available. The course covers both classical and very recent developments in the area. The main topics of the course are: linear and integer programming, construction heuristics and local search, polynomial time solvable problems, computational and theoretical analysis of heuristics, and meta-heuristics. Most of the theory will be taught through examples with theoretical results formulated, but not proved. Only a few results will be proved. There will be a final exam (100 % mark). Any material taught at the lectures may be in the exam paper. A basic knowledge of graphs and matrices is assumed. These notes contain areas of blank space in various places. Their purpose is to leave room for examples given in the lectures. Unfortunately, it is impossible to recommend only one or two books covering the whole course. Several books and articles will be used in a supporting role to these notes, including the following:

J. Bang-Jensen and G. Gutin, Digraphs: Theory, Algorithms and Applications, • Springer, 2000

J.A. Bondy and U.S.R. Murty, Graph Theory with Applications, North Holland, 1976 • M.W. Carter and C.C. Price, Operations Research: A Practical Introduction, CRC, • 1 1.1. INTRODUCTION 2

2001

F. Glover and M. Laguna, , Kluwer, 1997 • G. Gutin and A. Punnen (eds.), Traveling Salesman Problem and its Variations, • Kluwer, 2002.

G. Gutin and A. Yeo, Anti-matroids. Operations Research Letters 30 (2002) 97–99. • J. Hromkoviˇc, Algorithmics for hard problems, Springer, 2001 • Z. Michalewicz and D.B. Fogel, How to Solve It: Modern Heuristics, Springer, 2000 • W.L. Winston, Operations Research, 3rd edition, Duxbury Press, 1994 •

Inevitably there are misprints in the notes. Please let me know if you have spotted one. 1.2. ALGORITHM EFFICIENCY AND PROBLEM COMPLEXITY 3 1.2 Algorithm efficiency and problem complexity

We start from two particular optimisation problems.

Assignment Problem (AP) We have n persons p1,...,pn and n jobs w1,...,wn, and the cost cij of performing job i by person j. We wish to find an assignment of the persons to the jobs (one person per job) such that the total cost of performing the jobs is minimum. The costs are normally given by matrix C = [cij]. Example (an instance):

1 2 3 2  2 5 0 3  C = .  2 1 6 7     3 4 2 1 

Travelling Salesman Problem (TSP) There are n cities t1,...,tn. Given distances dij from any city ti to any other city tj, we wish to find a shortest total distance tour that starts at city t1, visits all cities (in some order) and returns to t1. Example (an instance):

0 2 3 2  2 0 1 3  C = .  2 1 0 7     3 4 2 0 

The parameter n for both AP and TSP is the size of the problem. Algorithms for most optimisation problem are non-trivial and cannot be performed by hand even for relatively small sizes of a problem. Hence computers have to be used. One possibility to predict that a certain computer code C to solve a certain problem P can handle all instances of P of size, say, n = 100 within a CPU hour is to carry out computational experiments for various instances of P of size 100. However, even if we have carried out computational experiments with 2000 instances of P and C solved each of them with a CPU hour, it does not mean C will spend less than one hour for the 2001st instance. To predict the running time of algorithms and of the corresponding computer codes, researchers and practitioners compute the number of elementary operations required by a given algorithm to solve any instance of a certain problem (depending on the instance size). Elementary operations are arithmetic operations, logic operations, shift, etc. Examples: 1.3. OPTIMALITY AND PRACTICALITY 4

In most cases, we are interested in knowing how the number of performed operations depends on n asymptotically. For example, there is an algorithm AAP for the AP that requires at most O(n3) operations. This means that the number of operations is at most 3 cn , where c is a constant not depending on n. For the TSP there is an algorithm ATSP that requires at most O(2n) operations. To sort n different integers whose values are between 1 and n there is an algorithm ABS (basket sort) that requires at most O(n) operations. We may draw some conclusions on the three algorithms without carrying out any computational experiments. For simplicity, assume that the constant c in each of the algorithms equals 10 and every operation takes 10−6 sec to perform. Then for n = 20, −5 ABS will take at most 2 10 sec, AAP 0.08 sec and ATSP 10 sec. For n = 40, ABS will −5 × take at most 4 10 sec, AAP 0.64 sec and ATSP 127 days. For n = 60, ABS will take −×5 at most 6 10 sec, AAP 2.16 sec and ATSP 366000 years. × Already this example indicates that while ABS and AAP can be used for moderate sizes, ATSP may quickly become unusable. In fact, this example shows the difference between polynomial time and exponential time algorithms. Clearly, polynomial time algorithms are ”good” and exponential are ”bad”. Unfortunately, for many optimisation problems, called NP-hard problems, polynomial time algorithms are unknown. Many 1000s of NP- hard problems are polynomially equivalent in the sense that if one of them admits a polynomial time algorithm then so does every other one. Many researchers have tried to find polynomial algorithms for various NP-hard problems, but failed. Thus, we believe that NP-hard problem cannot have polynomial time algorithms. Since TSP is NP-hard, it is very likely there does not exist a polynomial time algorithm for TSP.

1.3 Optimality and practicality

Many people with a mathematical education are trained to search for exact solutions to problems. If we are solving a quadratic equation, there is a formula for the exact solution. If a list of names needs to be sorted, we use an algorithm which produces a perfectly ordered list. Especially concerning mathematical theorems and their proofs, a respect for truth and perfection has evolved historically, and a nearly correct but incomplete or slightly flawed proof is considered of little or no value at all. Therefore the idea of solving a problem and not giving the ”right” answer looks disappointing and disturbing. Yet there are justifiable reasons for accepting computational results that are imperfect or suboptimal. First of all, models created by analysts are not perfect representations of real systems. So even if we could obtain exact solutions to the models they would not necessarily consti- tute exact solutions or perfect managerial advice to be applied for the real systems. Hence, costly efforts to achieve perfect solutions to mathematical models may not be warranted. 1.3. OPTIMALITY AND PRACTICALITY 5

Every computer has only finite many digits to represent a number. Therefore some numbers should be rounded off. Further calculations with such numbers produce accumu- lated error. Often these errors may lead us far away from an optimal solution even if the algorithm is exact. As we saw above many optimisation problems are NP-hard and do not admit polyno- mial time algorithms. Hence we cannot solve those problems to optimality even when the sizes of their instances are moderate. However many NP-hard problems are practically important. Even polynomial time algorithms may be impractical. Indeed, algorithms of running time O(n3) become impractical for values of n exceeding a few thousands. Therefore we cannot and should not solve all problems to optimality. In practice, more often than not researchers and practitioners settle for suboptimal rather than optimal solutions. Hence, in this course we will consider both exact and approximate methods and ap- proaches in computational optimisation. Question: Assume that every operation takes 10−6 sec to perform. We have two al- gorithms to solve a certain problem, one with running time at most 10n5, the other of 10 2n. Which of the two algorithms is faster for n = 20, n = 40 ? Which conclusion can × one draw? Chapter 2

Introduction to Linear Programming

2.1 Linear programming (LP) model

An optimisation problem is called an LP problem (or an instance of LP) if both objective function and constraints are linear. For example,

maximize 2x1 + 3x2 subject to 3x 5x 7 1 − 2 ≤ 4x x = 3 1 − 2 x 0 2 ≥ is an LP problem.

In general, the objective function is of the form c x +c x + +cnxn, a linear function 1 1 2 2 · · · of the decision variables xi with coefficients ci, which is to be minimized or maximized. All constraints are of the form a x + a x + + anxn = ( , )b. 1 1 2 2 · · · ≤ ≥

2.2 Formulating problems as LP problems

Example 2.2.1. (W.L. Winston) An American company manufactures luxury cars and trucks. The company believes that its most likely customers are high-income women and men. To reach these groups the company has embarked on an ambitious TV advertis- ing campaign and has decided to purchase one minute commercial spots on two types of programmes: comedy shows and football games. Each comedy commercial is seen by 7

6 2.2. FORMULATING PROBLEMS AS LP PROBLEMS 7 million high-income women and 2 million high-income men. Each football commercial is seen by 2 million high-income women and 12 million high-income men. One minute comedy advert costs $ 50000 and 1 minute football advert costs $ 100000. The company would like the commercials to be seen by at least 28 million high-income women and 24 million high-income men. Write down an LP model of this problem.

Solution: The company must decide how many comedy and football adverts should be purchased so the decision variables are: x1 =number of one minute comedy adverts, x2 =number of 1 minute football adverts. The company wants to minimise total advertis- ing cost (in thousands of dollars). Total advertising cost = cost of comedy adverts + cost of football adverts = 50x1 + 100x2. Thus, the company’s objective function is

min z = 50x1 + 100x2. The company faces the following constraints:

Constraint 1 Commercials must reach at least 28 million high-income women. Constraint 2 Commercials must reach at least 24 million high-income men.

Constraint 1 may be expressed as 7x +2x 28 and Constraint 2 may be expressed as 1 2 ≥ 2x + 12x 24. The sign restrictions x 0 and x 0 are necessary, so the company’s 1 2 ≥ 1 ≥ 2 ≥ model is given by:

min z = 50x1 + 100x2 s.t. 7x + 2x 28 1 2 ≥ 2x + 12x 24 1 2 ≥ x ,x 0. 1 2 ≥ Example 2.2.2. (W.L. Winston) An auto company manufactures cars and trucks. Each vehicle must be processed in the paint shop and body assembly shop. If the paint shop were only painting trucks, 40 per day could be painted. If the paint shop were only painting cars, 60 per day could be painted. If the body shop were only producing cars it could process 50 per day. If the body shop were only producing trucks, it could process 50 per day. Each truck contributes $300 to profit and each car contributes $200 to profit. Write down an LP model of this problem.

Solution: The company must decide how many cars and trucks should be produced daily. This leads us to define the following decision variables: x1 = number of trucks produced daily, x2 = number of cars produced daily. The company’s daily profit (in hundreds of dollars) is 3x1 + 2x2, so the company’s objective function may be written as:

max z = 3x1 + 2x2. 2.2. FORMULATING PROBLEMS AS LP PROBLEMS 8

The company’s two constraints are the following:

Constraint 1 The fraction of the day during which the paint shop is busy is less than or equal to 1.

Constraint 2 The fraction of the day during which the body shop is busy is less than or equal to 1.

We have 1 Fraction of day paint shop works on trucks = 40 x1 1 Fraction of day paint shop works on cars = 60 x2 1 Fraction of day body shop works on trucks = 50 x1 1 Fraction of day body shop works on cars = 50 x2 Thus, Constraint 1 may be expressed by 1 1 x + x 1 40 1 60 2 ≤ and Constraint 2 may be expressed by 1 1 x + x 1. 50 1 50 2 ≤ Since x 0 and x 0 must hold, the relevant model is 1 ≥ 2 ≥

max z = 3x1 + 2x2 s.t. 1 x + 1 x 1 40 1 60 2 ≤ 1 x + 1 x 1 50 1 50 2 ≤ x ,x 0. 1 2 ≥ Example 2.2.3. (W.L. Winston) You have decided to enter the candy business. You are considering producing two types of candies: Slugger Candy and Easy Out Candy, both of which consist solely of sugar, nuts, and chocolate. At present, you have in stock 100 oz of sugar, 20 oz of nuts, and 30 oz of chocolate. The mixture used to make Easy Out Candy must contain at least 20 % nuts. The mixture used to make Slugger Candy must contain at least 10 % nuts and 10 % chocolate. Each ounce of Easy Out Candy can be sold for 25p, and each ounce of Slugger Candy for 20p. Formulate an LP model that will enable you to maximize you revenue from candy sales.

Solution: (Fill in the details yourself) 2.3. GRAPHICAL SOLUTION OF LP PROBLEMS 9

The LP model is

max z = 25x1 + 20x2 s.t. x + x 150(= 100 + 20 + 30) 1 2 ≤ 0.2x + 0.1x 20 1 2 ≤ 0.1x 30 2 ≤ x ,x 0. 1 2 ≥ Example 2.2.4. (M.W. Carter and C.C. Price) A dual-processor computing facility is to be dedicated to administrative and academic application jobs for at least 10 hours each day. Administrative jobs require 2 seconds of CPU time on processor 1 and 6 seconds on processor 2, while academic jobs require 5 seconds on processor 1 and 3 seconds on processor 2. A scheduler must choose how many of each type of job (administrative and academic) to execute, in such a way as to minimize the amount of time that the system is occupied with these jobs. The system is considered to be occupied even if one processor is idle. (Assume that the sequencing of the jobs on each processor is not an issue here, just the selection of how many of each type of job.)

Solution: Let x1 and x2 denote respectively the number of administrative and aca- demic jobs selected for execution on the dual-processor system. Because policies require that each processor be available for at least 10 hours, we have the following two con- straints: 2x + 5x 10 3600, 6x + 3x 10 3600 and x 0 and x 0. The 1 2 ≥ × 1 2 ≥ × 1 ≥ 2 ≥ system is considered occupied as long as either processor is busy. Therefore, to minimize the completion time for the set of jobs, we must minimize max 2x + 5x , 6x + 3x . { 1 2 1 2} This nonlinear objective can be made linear if we introduce a new variable x3, where x = max 2x + 5x , 6x + 3x 0. 3 { 1 2 1 2}≥ Now if we require x 2x + 5x and x 6x + 3x and make our objective to 3 ≥ 1 2 3 ≥ 1 2 minimize x , we have the desired linear formulation (x ,x ,x 0). 3 1 2 3 ≥

2.3 Graphical solution of LP problems

Example 2.3.1. Graph the set of points 2x + 3x 6, x ,x 0. 1 2 ≤ 1 2 ≥

Solution: See Figure 2.1. We start from drawing 2x1 + 3x2 = 6. To draw this graph we find two points in the x x plane: x = 0,x = 2, x = 0,x = 3. 1 − 2 1 2 2 1 To graph 2x +3x 6, we check whether the point x =0= x belongs to 2x +3x 1 2 ≤ 1 2 1 2 ≤ 6 or not. We see that 2 0 + 3 0 6, so x =0= x does belong to 2x + 3x 6, and × × ≤ 1 2 1 2 ≤ hence the South-West part of the half-plane is the graph of 2x + 3x 6. 1 2 ≤ 2.3. GRAPHICAL SOLUTION OF LP PROBLEMS 10

x2

3

2

1

1 2 3 x1

Figure 2.1: Area for 2x + 3x 6, x ,x 0. 1 2 ≤ 1 2 ≥

Example 2.3.2. Consider the following instance of LP

min z = 50x1 + 100x2 s.t. 7x + 2x 28 1 2 ≥ 2x + 12x 24 1 2 ≥ x ,x 0. 1 2 ≥ Solve the problem graphically.

Solution: See Figure 2.2. The line 7x1 + 2x2 = 28 contains the points x1 = 0,x2 = 14 and x2 = 0,x1 = 4, and the line 2x1 + 12x2 = 24 the points x1 = 0,x2 = 2 and x2 = 0,x1 = 12. The point of intersection of the two lines can be found by solving the system of the two equations which describe them. From the second equation x = 12 6x ; 1 − 2 substituting into the first equation yields 7(12 6x )+ 2x = 28, thus 56 40x = 0. As a − 2 2 − 2 result we get x = 1.4 and x = 12 6 1.4 = 3.6. The graph of the four constraints, that 2 1 − × is, the set of points which satisfy all the constraints, is the entire (infinite) North-East area of Figure 2.2.

The objective function lines are 50x1 + 100x2 = const. It is easy to see that const decreases as the line moves towards the South-West corner. Thus the minimum is achieved at x = 3.6,x = 1.4. The optimal value is z = 50 3.6 + 100 1.4 = 320. 1 2 × × Example 2.3.3. Add the constraint x + x 2 to Example 2.3.2. Investigate the new 1 2 ≤ LP problem. 2.3. GRAPHICAL SOLUTION OF LP PROBLEMS 11

x2 14

2 1

1 2 4 12 x1

Figure 2.2: Figure for Example 2.3.2

Solution: We see that no point satisfies the 5 constraints (see Figure 2.3). So, the LP problem has no feasible solution.

Example 2.3.4. Consider Example 2.3.2, but with the objective to maximize rather than minimize.

Solution: For any const > 320 there are feasible points satisfying 50x1 + 100x2 = const. Thus, the new LP problem is unbounded.

Example 2.3.5. Consider Example 2.3.2, but with objective function min z = x1 + 6x2.

Solution: We see that one optimal solution is still x1 = 3.6,x2 = 1.4, but there are infinitely many optimal points (an entire straight line segment). See Figure 2.4. Conclusions There are four possibilities for an LP problem:

The LP problem has a unique optimal solution. • The LP problem has infinitely many optimal solutions. • The LP problem has no feasible solutions. • The LP problem is unbounded. • 2.3. GRAPHICAL SOLUTION OF LP PROBLEMS 12

x2 14

2 1

12 4 12 x1

Figure 2.3: Figure for Example 2.3.3

x2 14

2 1

1 2 4 12 x1

Figure 2.4: Figure for Example 2.3.5 2.4. QUESTIONS 13 2.4 Questions

Question 2.4.1. (W.L. Winston) Leary Chemicals manufactures three chemicals: A, B, and C. These chemicals are produced via two production processes: 1 and 2. Running Process 1 for an hour costs $4 and yields 3 units of A, 1 of B, and 1 of C. Running Process 2 for an hour costs $1 and produces 1 unit of A and 1 of B. To meet customer demands, at least 10 units of A, 5 of B, and 3 of C must be produced daily. Write down an LP model to minimize the cost of meeting Leary Chemical’s daily demands.

Question 2.4.2. A small toy company manufactures two types of wooden toys: cars and trains. Each car sells for $25 and uses $20 of raw material and labour. Each train sells for $30 and uses $24 of raw material and labour. The manufacture of both types of toys requires two types of skilled labour: carpentry and finishing. A car requires 2 hours of carpentry labour and 3 hours of finishing labour. A train requires 3 hours of carpentry labour and 4 hours of finishing labour. Each week the company can get all needed material, but only 150 finishing hours and 100 carpentry hours. The company has already orders on 20 trains, but expects to sell all manufactured toys. The company wants to decide what number of cars and trains to manufacture in order to maximize its profit. Write down an LP model to maximize the toy company profit.

Question 2.4.3. A television manufacturing company has to decide on the number of 27- and 20-inch sets to be produced at one of its factories. Market research indicates that at most 40 of the 27-inch sets and 10 of the 20-inch sets can be sold per month. The maximum number of work hours available is 500 per month. A 27-inch set requires 20 work hours, and a 20-inch set requires 10 work hours. Each 27-inch set sold produces a profit of £120, and each 20-inch set produces a profit of £80. A wholesaler has agreed to purchase all the television sets produced if the numbers do not exceed the maxima indicated by the market research. (a) Formulate a linear programming model for this problem. (b) Solve this model graphically.

Question 2.4.4. Goldilocks needs to find at least 16 lb of gold and at least 18 lb of silver to pay the monthly rent. There are two mines in which Goldilocks can find gold and silver. Each day that Goldilocks spends in mine 1, she finds 2 lb of gold and 2 lb of silver. Each day that Goldilocks spends in mine 2, she finds 1 lb of gold and 3 lb of silver. Formulate an LP to help Goldilocks meet her requirements while spending as little time as possible in the mines. Graphically solve the LP.

Question 2.4.5. Find out which of the four possibilities this LP problem belongs to.

max z = x1 + x2 2.4. QUESTIONS 14

s.t. x + x 4 1 2 ≤ x x 5 1 − 2 ≥ x ,x 0. 1 2 ≥ Justify your answer. Question 2.4.6. Find out which of the four possibilities this LP problem belongs to.

max z = 4x1 + x2 s.t. 8x + 2x 16 1 2 ≤ 5x + 2x 12 1 2 ≤ x ,x 0. 1 2 ≥ Justify your answer. Question 2.4.7. Find out which of the four possibilities this LP problem belongs to.

max z = x + 3x − 1 2 s.t. x x 4 1 − 2 ≤ 5x + 2x 4 1 2 ≥ x ,x 0. 1 2 ≥ Justify your answer. Question 2.4.8. A computer manufacturing company has to decide on the number of 64-processor and 32-processor computers to be assembled at one of its factories. Market research indicates that at most 20 of the 64-processor computers and 30 of the 32-processor computers can be sold per month. The maximum number of work hours available is 5100 per month. A 64-processor computer requires 200 work hours, and a 32-processor computer requires 100 work hours. Each 64-processor computer sold produces a profit of £9000, and each 32-processor computer produces a profit of £6000. A wholesaler has agreed to purchase all the computers assembled if the numbers do not exceed the maxima indicated by the market research. Write down a linear programming model to maximize the computer company profit. Why may a linear programming model not be adequate for the above problem? Question 2.4.9. Solve the following LP problem graphically:

max z = 25x1 + 50x2 s.t. 7x + 2x 28 1 2 ≤ 2x + 12x 24 1 2 ≤ x ,x 0. 1 2 ≥ Chapter 3

Simplex Method

3.1 Standard Form

In preparation for using the Simplex Method, it is necessary to express the linear program- ming problem in standard form. For a LP problem with n variables and m constraints the standard form is

max z = c1x1 + c2x2 + . . . + cnxn

s.t. a11x1 + a12x2 + . . . + a1nxn = b1

a21x1 + a22x2 + . . . + a2nxn = b2 ......

am1x1 + am2x2 + . . . + amnxn = bm

x ,x ,...,xn 0, 1 2 ≥

where the constants b1,...,bm are non-negative. Often the standard form is written in vector-matrix form:

max z = cx s.t. Ax = b x 0, ≥ 15 3.1. STANDARD FORM 16 where c = (c1,...,cn),

x1 b1 a11 a12 ... a1n       x2 b2 a21 a22 ... a2n x =  . . .  , b =  . . .  , A =  ......  .        . . .   . . .   ......         xn   bm   am1 am2 ... amn 

Although this standard form is required by the Simplex Method, it is not necessar- ily the form that arises naturally when we first formulate our LP model. Thus, some transformations are required to translate the initial form into the standard form. To convert a minimization problem into a maximization problem, we can simply mul- tiply the objective function by 1 and replace min by max. For example, the problem of − minimizing z = 2x 3x is equivalent to that of maximizing z = 2x + 3x . 1 − 2 − 1 2 Equality constraints require no modification. Less-than-or-equal-to ( ) inequalities ≤ require introduction of slack variables. For example, 2x +5x 7 becomes 2x +5x +s = 1 2 ≤ 1 2 1 7. Greater-than-or-equal-to ( ) inequalities are modified by introducing surplus variables. ≥ For example, 4x 6x 8 becomes 4x 6x s = 8. 1 − 2 ≥ 1 − 2 − 2 Finally, the LP standard form requires that every variable is non-negative. If some variable, say x3, is not required to be non-negative in the initial formulation, then we modify it as follows: we replace x3, in each constraint and the objective function, by x′ x′′ and add x′ ,x′′ 0. 3 − 3 3 3 ≥ Example 3.1.1. Transform the following LP problem into standard form.

min z = 50x + 100x − 1 2 s.t. 7x + 2x 28 1 2 ≥ 2x + 12x 24 1 2 ≤ x ,x 0. 1 2 ≥ Solution: We have

max z′ = 50x 100x 1 − 2 s.t. 7x + 2x s = 28 1 2 − 1 2x1 + 12x2 + s2 = 24 x ,x ,s ,s 0. 1 2 1 2 ≥ Example 3.1.2. Transform the following LP problem into standard form.

max z = 50x 10x 1 − 2 3.2. SOLUTIONS OF LINEAR SYSTEMS 17

s.t. x + 2x 28 1 2 ≤ 2x + 15x 24 1 2 ≥ x 0. 2 ≥

Solution: We have

max z = 50x′ 50x′′ 10x 1 − 1 − 2 s.t. x′ x′′ + 2x + s = 28 1 − 1 2 1 2x′ 2x′′ + 15x s = 24 1 − 1 2 − 2 x′ ,x′′,x ,s ,s 0. 1 1 2 1 2 ≥

3.2 Solutions of Linear Systems

Consider a system of independent linear equations, Ax = b, consisting of m equations and n unknowns xi. The n unknowns include the original decision variables and any other variables that may have been introduced in order to achieve standard form. If a system of equations is independent, then m n. If m = n and detA = 0, then − ≤ 6 there is a unique solution x = A 1b. Optimization is not an issue here. ¿From now on, suppose that m

The Simplex Method performs such a search, although in a very efficient way. We define two extreme points of the feasible region (or two basic feasible solutions) as being adjacent if all but one of their basic variables are the same. Thus, a transition from one basic feasible solution to an adjacent basic feasible solution can be thought of as exchanging the roles of one basic variable and one non-basic variable. The Simplex Method performs a sequence of such transitions and thereby examines a succession of adjacent extreme points. A transition to an adjacent extreme point will be made only if by doing so the objective function is improved (or stays the same). It is a property of linear programming problems that this type of search will lead us to the discovery of an optimal solution (if one exists). The Simplex Method is not only successful in this sense, but it is remarkably efficient because it succeeds after examining only a fraction of the basic feasible solutions. Since the Simplex Method is an algorithm, we must specify how an initial feasible solution is obtained, how a transition is made to a better basic feasible solution, and how to recognize an optimal solution. From any basic feasible solution, we have the assurance that, if a better solution exists at all, then there is an adjacent solution that is better than the current one. This is the principle on which the Simplex Method is based; thus, an optimal solution is accessible from any starting basic feasible solution.

3.3 The Simplex Method

We will use the following simple problem from M.W. Carter and C.C. Price.

max z = 8x1 + 5x2 s.t. x 150 1 ≤ x 250 2 ≤ 2x + x 500 1 2 ≤ x ,x 0. 1 2 ≥

The standard form of this problem is

max z = 8x1 + 5x2 + 0s1 + 0s2 + 0s3

s.t. x1 + s1 = 150

x2 + s2 = 250

2x1 + x2 + s3 = 500 x ,x ,s ,s ,s 0. 1 2 1 2 3 ≥ 3.3. THE SIMPLEX METHOD 19

(Zero coefficients are given to the slack variables in the objective function because slack variables do not contribute to z.) The constraints constitute a system of m = 3 equations in n = 5 unknowns. To obtain an initial basic feasible solution, we need to select n m = 5 3 = 2 variables as non-basic variables. We can readily see in this − − case that by choosing the two variables x1,x2 as the non-basic variables, and setting their values to zero, no significant computation is required in order to solve for the three basic variables: s1 = 150,s2 = 250,s3 = 500. The value of the objective function at this solution is 0. Once we have a solution, a transition to an adjacent solution is made by a pivot operation. A pivot operation is a sequence of elementary row operations applied to the current system of equations, with the effect of creating an equivalent system in which one new (previously non-basic) variable now has a coefficient of one in one equation and zeros in all other equations. During the process of applying pivot operations to an LP problem, it is convenient to use a tabular representation of the system of equations. This representation is referred to as a Simplex tableau. In order to conveniently keep track of the value of the objective function as it is affected by the pivot operations, we treat the objective function as one of the equations in the system of equations, and we include it in the tableau. In our example, the objective function equation is written as

1z 8x 5x 0s 0s 0s = 0. − 1 − 2 − 1 − 2 − 3 The tableau for the initial solution is as follows:

Basis z x1 x2 s1 s2 s3 Solution z 1 -8 -5 0 0 0 0 s1 0 1 0 100 150 s2 0 0 1 010 250 s3 0 2 1 001 500 Observe that the objective function row represents an equation that must be satisfied for any feasible solution. Since we want to maximize z, some other (non-basic) term must decrease in order to offset the increase in z. But all of the basic variables are already at their lowest value, zero. Therefore, we want to increase some non-basic variable that has a negative coefficient. As a simple rule, we will choose the variable with the most negative coefficient. The chosen variable is called the entering variable, i.e., the one that will enter the basis. In our example, x1 is the entering variable. In general, we denote the entering variable by xk.

How much can we increase the value of xk (away from zero)? To answer this question consider a row i with aik > 0. For basic variable xi, after xk is increased, the new value of 3.3. THE SIMPLEX METHOD 20 xi will be xi = bi aikxk. − Since xi 0, we can increase xi only to the point where ≥ xk = bi/aik.

One of the basic variables xi must leave the basis (leaving variable). To find this variable we consider the column k of the entering variable and calculate Θi = bi/aik for every row i for which aik > 0. In our example k = 1, and Θ1 = 150/1 = 150 and Θ3 = 500/2 = 250. This means x1 can be increased to 150 without s1 becoming negative and x1 can be increased to 250 without s3 becoming negative. We do not want either of s1,s3 to become negative, and thus we choose Θ = min Θj = 150. So, s1 is the leaving variable.

Let us consider what happens when none of the aik is positive. In this case, xk can be increased by any amount without any basic variable becoming negative. This means that the corresponding LP problem is unbounded.

Returning to our example, recall that x1 is entering and s1 is leaving. This means that the intersection of the row of x1 and the column of s1 is the pivot element. The pivot element must become 1 and the other entries of the column of x1 must become zero. To achieve this we multiply the 1st row by 8 and add to the row 0 (objective function row), we also multiply the 1st row by 2 and add to the 3rd row. As a result, we get − Basis z x1 x2 s1 s2 s3 Solution z 1 0 -5 8 0 0 1200 x1 0 1 0 100 150 s2 0 0 1 010 250 s3 0 0 1 -2 0 1 200

This shows the new basic feasible solution x1 = 150, s2 = 250, s3 = 200, x2 = s1 = 0, z = 1200. Now x is entering and Θ = min 250/1, 200/1 = 200. Thus, s is leaving. The 2 { } 3 intersection of the row of s3 and the column of x2 is the pivot element and every entry of the column of x2 except for the pivot element must become 0. After making those entries 0, we get

Basis z x1 x2 s1 s2 s3 Solution z 1 0 0 -2 0 5 2200 x1 0 1 0 100 150 s2 0 0 0 2 1 -1 50 x2 0 0 1 -2 0 1 200 Now s is entering and Θ = min 150/1, 50/2 = 25. Thus, s is leaving. The inter- 1 { } 2 section of the row of s2 and the column of s1 is the pivot element and every entry of the 3.3. THE SIMPLEX METHOD 21 column of s1 except for the pivot element must become 0. After making those entries 0, we get

Basis z x1 x2 s1 s2 s3 Solution z 1 000 1 4 2250 x1 0 1 0 0 -1/2 1/2 125 s1 0 0 0 1 1/2 -1/2 25 x2 0 010 1 0 250 Because the objective function row coefficients are all non-negative (the Solution row is not taken into consideration), the current solution is optimal. The optimal values of the ∗ decision variables are x1 = 125 and x2 = 250. The objective optimal value z = 2250. Example 3.3.1. (W.L. Winston) Solve the following LP problem using the Simplex Method:

max z = 60x1 + 30x2 + 20x3 s.t. 8x + 6x + x 48 1 2 3 ≤ 4x + 2x + 1.5x 20 1 2 3 ≤ 2x + 1.5x + 0.5x 8 1 2 3 ≤ x 5 2 ≤ x ,x ,x 0. 1 2 3 ≥ Solution: Computations are done in the following tableau:

Basis z x1 x2 x3 s1 s2 s3 s4 Solution Ratio z 1 -60 -30 -20 0 0 0 0 0 s1 0 8 6 11000 48 48/8 s2 0 4 2 1.5 0 1 0 0 20 20/4 s3 0 2 1.5 0.5 0 0 1 0 8 8/2 s4 0 0 1 00001 5 – z 1 0 15 -5 0 0 30 0 240 s1 0 0 0 -1 1 0 -4 0 16 – s2 0 0 -1 0.5 0 1 -2 0 4 4/0.5 x1 0 1 0.75 0.25 0 0 0.5 0 4 4/0.25 s4 0 0 1 00001 5 – z 1 0 5 0 0 10 10 0 280 s1 0 0 -2 0 1 2 -8 0 24 x3 0 0 -2 1 0 2 -4 0 8 x1 0 1 1.25 0 0 -0.5 1.5 0 2 s4 0 0 1 00001 5 3.4. QUESTIONS 22

Since the objective function row contains only non-negative coefficients in all non- ∗ Solution columns, we have obtained an optimal solution: x1 = 2,x2 = 0,x3 = 8, z = 280.

Example 3.3.2. Solve the following LP problem using the Simplex Method:

min z = 36x 30x + 3x + 4x − 1 − 2 3 4 s.t. x + x x 5 1 2 − 3 ≤ 6x + 5x x 10 1 2 − 4 ≤ x ,x ,x ,x 0. 1 2 3 4 ≥

Solution: Let z′ = z. Then max z′ = 36x + 30x 3x 4x . − 1 2 − 3 − 4 Computations are done in the following tableau: ′ Basis z x1 x2 x3 x4 s1 s2 Solution Ratio z′ 1 -36 -30 3 4 0 0 0 s1 0 1 1-1 0 1 0 5 5/1 s2 0 6 5 0 -1 0 1 10 10/6 z′ 1 0 0 3 -2 0 6 60 s1 0 0 1/6 -1 1/6 1 -1/6 10/3 20 x1 0 1 5/6 0 -1/6 0 1/6 5/3 – z′ 1 0 2 -9 0 12 4 100 x4 0 0 1 -6 1 6 -1 20 – x1 0 1 1-1 0 1 0 5 –

We see that the entering variable x3 can be increased by any number, which means that the problem is unbounded. This also means that the original problem is also unbounded.

3.4 Questions

Question 3.4.1. Transform the following LP problem into standard form and solve it using the Simplex Method:

max z = 2x x + x 1 − 2 3 s.t. 3x + x + x 60 1 2 3 ≤ x x + 2x 10 1 − 2 3 ≤ x + x x 20 1 2 − 3 ≤ x ,x ,x 0. 1 2 3 ≥ 3.4. QUESTIONS 23

Question 3.4.2. (W.L. Winston) Solve the following LP problem using the Simplex Method:

max z = 2x2 s.t. x x 4 1 − 2 ≤ x + x 1 − 1 2 ≤ x ,x 0. 1 2 ≥ Question 3.4.3. Solve the following modification of the previous LP problem using the Simplex Method:

max z = 2x2 s.t. x x 4 1 − 2 ≤ x + 2x 1 1 2 ≤ x ,x 0. 1 2 ≥ Question 3.4.4. Solve the following LP problem using the Simplex Method:

max z = 2x1 + x2 s.t. 3x x 2 1 − 2 ≤ 2x x 3 1 − 2 ≤ x ,x 0. 1 2 ≥ Question 3.4.5. Solve the following LP problem using the Simplex Method:

max z = 3x1 + 5x2 s.t. x 4 1 ≤ 2x 12 2 ≤ 2x + 3x 18 1 2 ≤ x ,x 0. 1 2 ≥ Question 3.4.6. (Hillier and Lieberman, 4.3-7) Consider the following LP problem.

max z = 5x1 + 3x2 + 4x3 s.t. 2x + x + x 20 1 2 3 ≤ 3x + x + 2x 30 1 2 3 ≤ x ,x ,x 0. 1 2 3 ≥ 3.4. QUESTIONS 24

You are given the information that the non-zero variables in the optimal solution are x2 and x3. Describe how you can use this information to adapt the Simplex Method to solving the problem in the minimum possible number of iterations (when you start from the usual initial basic feasible solution). Do not actually perform any iterations.

Question 3.4.7. Consider the following LP problem.

min z = 4x 5x 3x 1 − 2 − 3 s.t. x x + x 2 1 − 2 3 ≥− x + x + 2x 3 1 2 3 ≤ x ,x ,x 0. 1 2 3 ≥

Transform this LP problem into standard form. Construct the simplex table and perform the first iteration of the Simplex Method for the LP problem. Knowing that the second iteration of Simplex Method yields the opti- mal solution, find the optimal solution by performing the second iteration partially (only necessary computations). Chapter 4

Linear Programming Approaches

4.1 Artificial variables and Big-M Method

Consider the following constraint: 3x 7x = 5. Since the right hand side of any 1 − 2 − LP constraint in standard form must be non-negative, to transform this constraint into standard form, we multiply it by 1 : 3x + 7x = 5. − − 1 2 Consider the following constraint: 3x 7x 9. Since the right hand side of any 1 − 2 ≥ − LP constraint in standard form must be non-negative, to transform this constraint into standard form, we multiply it by 1 : 3x + 7x 9. − − 1 2 ≤ If all constraints in an LP problem are of type , then introduction of slack variables ≤ results in getting an initial feasible set of basic variables. Often, some of the constraints are equalities, others are of type (with non-negative right hand sides). ≥ Consider the following LP problem:

max z = 2x 3x + 9x 1 − 2 3 s.t. x + x 4x = 10 − 1 2 − 3 − 7x 3x 5x 2 1 − 2 − 3 ≤− x ,x ,x 0. 1 2 3 ≥

After transformation into standard form, we get:

max z = 2x 3x + 9x 1 − 2 3 s.t. x x + 4x = 10 1 − 2 3 25 4.2. TWO-PHASE METHOD 26

7x + 3x + 5x s = 2 − 1 2 3 − 1 x ,x ,x ,s 0. 1 2 3 1 ≥

It is not easy to find an initial set of basic variables. Thus, we introduce so-called artificial variables a1 and a2:

max z = 2x 3x + 9x Ma Ma 1 − 2 3 − 1 − 2 s.t. x x + 4x + a = 10 1 − 2 3 1 7x + 3x + 5x s + a = 2 − 1 2 3 − 1 2 x ,x ,x ,s , a , a 0, 1 2 3 1 1 2 ≥ where M is a very large positive number. The role of M is to make sure that if the LP problem in hand has a feasible solution, then the optimal solution of this new problem will include a1 = a2 = 0 as positive values of ai will decrease z considerably. Thus, a1 > 0 or a2 > 0 is only possible if the LP problem has no feasible solution at all. If we obtain an optimal solution for the transformed LP problem in which a1 = a2 = 0, then that solution can be considered as an initial feasible solution for the original LP problem. The above described method is called the Big-M Method. There is a reason why the Big-M Method is not always considered practical. The value of M must be significantly larger than that of any other coefficient. However, this may lead to large arithmetic errors during operation of the Simplex Method. So in the end, the solution could be different from the optimal one.

4.2 Two-Phase Method

This method of solving an (initial) LP problem (with artificial variables) consists of two phases:

First phase: Solve a minimization LP problem, whose objective function is a sum of arti- ficial variables, and whose constraints are the constraints of the initial LP problem. If the optimal solution contains a positive artificial variable, then the initial problem is infeasible. Otherwise, we proceed to the next phase.

Second phase: Solve the initial problem using the optimal solution from the first phase as a starting solution.

Consider the following LP problem from M.W. Carter and C.C. Price: 4.2. TWO-PHASE METHOD 27

max z = x1 + 3x2 s.t. 2x x 1 1 − 2 ≤− x1 + x2 = 3 x ,x 0. 1 2 ≥

In the first phase we solve

max za = a a − 1 − 2 s.t. 2x + x s + a = 1 − 1 2 − 1 1 x1 + x2 + a2 = 3 x ,x ,s , a , a 0, 1 2 1 1 2 ≥

The initial tableau for this phase is

Basis za x1 x2 s1 a1 a2 Solution za 1 0 001 1 0 a1 0 -2 1 -1 1 0 1 a2 0 1 100 1 3

We perform row operations to change the coefficients of a1 and a2 in the z row to 0 (a necessary condition to start the Simplex Method with basis a1, a2). To do that, we add the a , a rows each multiplied by 1 to the z row. We get: 1 2 − Basis za x1 x2 s1 a1 a2 Solution za 1 1 -2 1 0 0 -4 a1 0 -2 1 -1 1 0 1 a2 0 1 100 1 3 After two iterations of the Simplex Method we get the following final tableau (perform the iterations yourself):

Basis za x1 x2 s1 a1 a2 Solution za 1 000 1 1 0 x2 0 0 1 -1/3 1/3 2/3 7/3 x1 0 1 0 1/3 -1/3 1/3 2/3

This indicates that the initial LP problem has a feasible solution (the ai are not basic). This allows us to proceed to the second phase in which we replace the tableau with a new one in which the columns of ai are deleted and the objective function row is replaced by that of the initial problem. We get 4.3. SHADOW PRICES 28

Basis z x1 x2 s1 Solution z 1 -1 -3 0 0 x2 0 0 1 -1/3 7/3 x1 0 1 0 1/3 2/3 Now we need to create zeroes in the objective function row in place of 1 and 3 since − − x1 and x2 are basic. To do this, we add the x1 row to the z row and the x2 row multiplied by 3 to the z row. We get

Basis z x1 x2 s1 Solution z 1 0 0 -2/3 23/3 x2 0 0 1 -1/3 7/3 x1 0 1 0 1/3 2/3 Applying one more iteration of the Simplex Method we get

Basis z x1 x2 s1 Solution z 1 200 9 x2 0 110 3 s1 0 301 2 ∗ The optimal solution is x1 = 0,x2 = 3, z = 9. If one of the artificial variables remains positive in the optimal solution of the trans- formed problem (containing ais), then the original problem is infeasible. It may require some amount of computational time to discover this. Still it can be quite useful to make a note of the artificial variables that remain positive. Should infeasibility occur in any real world problem, then it usually indicates an error in the formulation of the particular constraint associated with such an artificial variable. Knowing where the error is likely to be found, it may more easily be corrected. This remark applies equally to the Big-M Method.

4.3 Shadow prices

Consider the following tableau:

Basis z x1 x2 x3 s1 s2 s3 s4 Solution Ratio z 1 0 5 0 30 10 10 0 280 s1 0 0 -2 0 -4 2 -8 0 24 x3 0 0 -2 1 -2 2 -4 0 8 x1 0 1 1.25 0 0.5 -0.5 1.5 0 2 s4 0 0100 0 01 5 The coefficients of the z row are called shadow prices as they allow us to see whether extra resources could give us higher profit (we consider the objective function here as 4.4. DUAL PROBLEM 29 a profit function). For example, the number 30 below s1 shows that if we increase the right hand side of the first constraint by one, then we will be able to increase z by 30. Decision-makers and analysts can use shadow prices to alter economic policy (to increase or decrease certain resources).

4.4 Dual problem

We say that an LP problem is in normal form if it is a maximization problem and all constraints are inequalities, i.e. ≤

max z = c1x1 + c2x2 + . . . + cnxn

s.t. a x + a x + . . . + a nxn b 11 1 12 2 1 ≤ 1 a x + a x + . . . + a nxn b 21 1 22 2 2 ≤ 2 ......

am x + am x + . . . + amnxn bm 1 1 2 2 ≤ x ,x ,...,xn 0, 1 2 ≥

Observe that we do not require that bi 0. ≥ The following LP problem is called dual:

min w = b1y1 + b2y2 + . . . + bmym

s.t. a y + a y + . . . + am ym c 11 1 21 2 1 ≥ 1 a y + a y + . . . + am ym c 12 1 22 2 2 ≥ 2 ......

a ny + a ny + . . . + amnym cn 1 1 2 2 ≥ y ,y ,...,ym 0. 1 2 ≥ Example 4.4.1. Find the dual of the following LP problem.

min z = 50x + 100x x − 1 2 − 3 s.t. 7x + 2x + 2x 28 1 2 3 ≥ 2x + 12x 24 1 2 ≤ x ,x ,x 0. 1 2 3 ≥ Solution: We first transform the original LP problem into normal form: 4.4. DUAL PROBLEM 30

max z′ = 50x 100x + x 1 − 2 3 s.t. 7x 2x 2x 28 − 1 − 2 − 3 ≤− 2x + 12x 24 1 2 ≤ x ,x ,x 0. 1 2 3 ≥ Now we can find the dual:

min w = 28y + 24y − 1 2 s.t. 7y + 2y 50 − 1 2 ≥ 2y + 12y 100 − 1 2 ≥− 2y 1 − 1 ≥ y ,y 0. 1 2 ≥ Example 4.4.2. Find the dual of the following LP problem.

min z = 50x + 100x x − 1 2 − 3 s.t. 7x1 + 2x2 + 2x3 = 28 x ,x ,x 0. 1 2 3 ≥ Solution: We get

max z′ = 50x 100x + x 1 − 2 3 s.t. 7x + 2x + 2x 28 1 2 3 ≤ 7x + 2x + 2x 28 1 2 3 ≥ x ,x ,x 0. 1 2 3 ≥ Normal form:

max z′ = 50x 100x + x 1 − 2 3 s.t. 7x + 2x + 2x 28 1 2 3 ≤ 7x 2x 2x 28 − 1 − 2 − 3 ≤− x ,x ,x 0. 1 2 3 ≥ Dual: 4.4. DUAL PROBLEM 31

min w = 28y 28y 1 − 2 s.t. 7y 7y 50 1 − 2 ≥ 2y 2y 100 1 − 2 ≥− 2y 2y 1 1 − 2 ≥ y ,y 0. 1 2 ≥ There is a very apparent structural similarity between the primal and the dual in a dual pair of problems, but how are their solutions related? In the course of solving a (pri- mal) maximization problem, the Simplex Method generates a series of feasible solutions with successively larger objective function values (cx). Solving the corresponding (dual) minimization problem may be thought of as a process of generating a series of feasible solutions with successively smaller objective function values (yb). Assuming that an op- timal solution does exist, the current objective function value for the primal problem will converge to its maximum value from below. The primal objective function evaluated at x never exceeds the dual objective function evaluated at y; and at optimality, the two problems actually have the same objective function value. This can be summarized in the following duality property: Duality Property: If x and y are feasible solutions to the primal and dual problems, respectively, then cx yb throughout the optimization process; and finally at optimality ∗ ∗ ≤ cx = y b. If follows from this property that, if feasible objective function values are found for a primal and dual pair of problems, and if these values are equal to each other, then both of the solutions are optimal solutions. Not only do primal and dual problems share the same objective function values at their optima. In order to find the complete solutions to both problems, it is actually sufficient to solve just one of them by the Simplex Method. In fact, the shadow prices, which appear in the top row of the optimal tableau of the primal problem, are precisely the optimal values of the dual variables. Similarly, if the dual problem were solved using the Simplex Method, the shadow prices in the optimal tableau would be the optimal values of the primal variables. In the illustrative problem (considered in Section ”Simplex Method”)

max z = 8x1 + 5x2 s.t. x 150 1 ≤ x 250 2 ≤ 2x + x 500 1 2 ≤ 4.5. DECOMPOSITION OF LP PROBLEMS 32

x ,x 0 1 2 ≥ the dual objective of minimizing w = 150y1 +250y2 +500y3 is met when the dual variables (shadow prices) have the values y1 = 0,y2 = 1,y3 = 4. Thus, from the dual point of view,

w∗ = 150 0 + 250 1 + 500 4 = 2250, × × × which is equal to the primal objective value

z∗ = 8x + 5x = 8 125 + 5 250 = 2250 1 2 × × for optimal x values of x1 = 125,x2 = 250. Because the pertinent parameters and goals of any LP problem can be expressed in either a primal or dual form, and because solving either problem yields enough information to easily construct a solution to the other, we might reasonably wonder which problem, primal or dual, should we solve when using the Simplex Method. From the standpoint of computational complexity, we might wish to choose to solve the problem with the fewest constraints. This choice becomes more compelling when the LP problem has thousands of constraints, and of much less importance for more moderate-sized problems of a few hundred or less constraints.

4.5 Decomposition of LP problems

Even though the Simplex Method is relatively fast, it is too slow when the numbers n and m are very large. Practical problems sometimes require solution of LP problems with n and m being several tens of thousands. At the same time practical problems have sparse matrices A in which at least 95% of the entries equal 0. For this reason, several methods have been derived to deal with large sparse LP problems. Here we present one of them. We start with a few notions in graph theory. A graph G = (V, E) is connected if there is a path between every pair of vertices in G. (G is ”in one piece”.) If G is not connected, then it consists of several connectivity components that are largest connected subgraphs of G (i.e., ”pieces” of G). See Figure 4.1 for a graph with 3 connectivity components. Let Ax = b, x 0 be the constraints of an LP problem in standard form. We construct ≥ a graph G corresponding to A as follows. The variables xi of A correspond to vertices vi of G. Two variables xi and xj are linked by an edge in G if and only if they are in the same constraint (row of A). If G is not connected, then an LP problem with the constraints Ax = b, x 0 can be decomposed into several LP problems of smaller sizes. ≥ Let us consider the following simple example. 4.5. DECOMPOSITION OF LP PROBLEMS 33

Figure 4.1: Disconnected graph with three components

v1

v2

v3

Figure 4.2: Two components

An objective function is max z = 2x + 6x + 3x . The constraints are x + 5x 7, 1 2 3 1 3 ≤ x 5, x ,x ,x 0. 2 ≤ 1 2 3 ≥ The graph G has vertices v1, v2, v3 and edge v1v3. So, it has two components with vertex sets v , v and v . See Figure 4.2. This shows that the initial LP problem can { 1 3} { 2} be decomposed into two problems, one with variables x1 and x3, and the other with just one variable x2, as follows.

1. The problem with objective function max z′ = 2x +3x and constraints x +5x 7, 1 3 1 3 ≤ x ,x 0; 1 3 ≥ 2. The problem with objective function max z′′ = 6x and constraints x 5, x 0. 2 2 ≤ 2 ≥

′ Problem 1 has optimal solution x1 = 7,x3 = 0, z = 14. Problem 2 has optimal solution ′′ x2 = 5, z = 30. Thus, the optimal solution of the initial problem is x1 = 7,x2 = 5,x3 = 0, z = 44. 4.6. LP SOFTWARE 34 4.6 LP software

LP problems are normally solved on computers due to a significant amount of calculation needed. One of the methods normally implemented in commercial and free LP software is Revised , a variation of the Simplex Method implemented in matrix form and avoiding unnecessary computations. Another algorithm often implemented in commercial software is the Interior Point Method (IPM) . Unlike the Simplex Method, which is not of polynomial time complexity in the worst case, IPM is of polynomial com- plexity. In practice, IPM is normally slower than Simplex for n + m 2000 and both ≤ methods compete evenly for 2000 n + m 10000. Many software packages allow to ≤ ≤ combine Simplex with IPM when solving very large LP problems. There is a number of commercial LP solvers (package for LP problems). A short list includes CPLEX Linear Optimizer, LINDO, IBM OSL, FortLP and MINOS.

4.7 Questions

Question 4.7.1. Why do we need the Two-Phase Method? Provide an example of an LP problem, where the Two-Phase Method is needed. Question 4.7.2. (W.L. Winston) Using the Two-Phase Method solve the following: max z = 2x 3x − 1 − 2 s.t. 1 x + 1 x 4 2 1 4 2 ≤ x + 3x 20 1 2 ≥ x1 + x2 = 10 x ,x 0. 1 2 ≥

Hint: Use s1 in the first constraint as a basic variable. Question 4.7.3. Find the dual of the following LP problem. min z = 7x 100x x 1 − 2 − 3 s.t. 9x + 12x 2x 18 1 2 − 3 ≤ 2x + 22x x 14 − 1 2 − 3 ≥ x ,x ,x 0. 1 2 3 ≥ Question 4.7.4. Find the dual of the following LP problem. max z = 8x + 100x 5x 1 2 − 3 s.t. 9x 12x 9x = 8 1 − 2 − 3 2x 22x + x 4 1 − 2 3 ≥ x ,x ,x 0. 1 2 3 ≥ 4.7. QUESTIONS 35

Question 4.7.5. Formulate the Duality Property in Linear Programming.

Question 4.7.6. Let x′ and y′ be feasible solutions to the primal and dual LP problems, respectively. Suppose that cx′ = y′b, i.e., the objective functions of both problems coincide for x′ and y′, respectively. Why are both x′ and y′ optimal?

Question 4.7.7. Explain the main ideas of the Decomposition Method in Linear Program- ming (an application of graphs) using your own example.

Question 4.7.8. Transform the LP problem

max z = 4x 5x 3x 1 − 2 − 3 s.t. x x + x 2 1 − 2 3 ≥− x + x + 2x 3 1 2 3 ≤ x ,x ,x 0. 1 2 3 ≥ into normal form and find the dual. The value of the optimal solution of the dual is wopt = 12. Why for x1 = 4,x2 = x3 = 0 do we have z = 16 > wopt? Question 4.7.9. Find the optimal value of the objective function of the following LP problem by directly solving the dual.

max z = 6x 15x 4x 1 − 2 − 3 s.t. 3x + 2x 2x 20 − 1 2 − 3 ≥− x ,x ,x 0. 1 2 3 ≥ Chapter 5

Integer Programming Modeling

5.1 Integer Programming vs. Linear Programming

For many LP problems, we cannot be satisfied by non-integer values of decision variables xi. Indeed, we cannot be satisfied if x3 = 12.3, where x3 is the number of lorries required to transport a certain product from place to place. LP problems with the additional requirement that all decision variables are integers are called Integer Programming (IP) problems. An obvious approach to solving IP problems is to ”forget” the integrality requirement (i.e., that the decision variables are integer) and solve the corresponding LP problem (called the LP relaxation of the IP problem). In general, the relaxation will give us fractional values of decision variables. In certain cases, rounding up or down the decision variables will give if not optimal then near-optimal solutions of the IP problem, but there are many IP problems for which the rounding up or down procedure will often bring ”bad” solutions. One such family of IP problems are so-called 0,1-problems, in which all decision vari- ables are required to be 0 or 1 (they are IP problems as we can require 0 xi 1 and xi ≤ ≤ is integer for every xi). Indeed, our choice, for each xi is 0 or 1 and we often do not have enough information to decide whether to choose 0 or 1. Observe that, unlike LP problems, IP problems do not have continuous feasible region. For example, see a graph of a simple two-dimensional IP problem in Figure 5.1. We will start our Integer Programming part of the course by considering some very important IP problems.

36 5.2. IP PROBLEMS 37

x2

5

4 z∗ = 1390 3

2 150x1 + 10x2 30 1 ≤

1 2 3 4 5 x1

Figure 5.1: Graphical representation of an IP problem

5.2 IP problems

We will provide IP formulations of a few important optimisation problems.

5.2.1 Travelling salesman problem

We have already considered the travelling salesman problem (TSP). In short, the TSP is the problem of visiting a number of cities and come back to the point of origin, all in the cheapest possible way. This is one of the most challenging and most extensively studied problems in the field of combinatorics. The formulation is deceptively simple, and yet it has proven to be notoriously difficult to solve. Define zero-one variables xij = 1 if city i is visited immediately prior to city j. Let cij represent the distance between cities i and j. Suppose that there are n cities that must be visited. Then the TSP can be expressed as:

n n min z = i=1 j=1 cijxij P Pn s.t. i=1 xij = 1, j = 1, 2,...,n Pn xij = 1, i = 1, 2,...,n Pj=1 ∈ ∈ xij S 1 for all S

The first constraint says that you must go into every city j exactly once, and the second constraint says that you must leave every city i exactly once. These constraints ensure that there are two edges adjacent to each city, one in and one out, as we would expect. However, this does not prevent so-called sub-tours. A sub-tour occurs when there is a cycle containing a subset of the cities. Instead of having one tour of all of the cities, 5.2. IP PROBLEMS 38 the solution can be composed of two or more sub-tours. The third constraint eliminates sub-tours; it states that no proper subset of cities, S, can have a total of S edges. | | The TSP has a number of practical industrial applications. Consider the problem of placing components on a circuit board. To minimize the time required to produce a board, one of the primary considerations is often the distance that a placement head has to travel between components. Another example occurs in routing trucks or ships delivering products to customers. (When we allow multiple trucks, this problem becomes the vehicle routing problem.) Another application occurs in a production environment when it is desired to minimize sequence-dependent setup times. When multiple jobs are to be processed on a machine, the total setup time for each job frequently depends on which job preceded it. This situation can be modeled as a TSP, where we sequence jobs rather than sequencing the order in which cities are visited.

5.2.2 Knapsack Problem

Assume that we have a number of items, and we must choose some subset of the items to fill our ”knapsack”, which has limited space b. Each item, i, has a value vi and takes up wi units of space in the knapsack. We wish to choose a collection of the items with total space less than b and with maximum total value. Let the zero-one variables xi = 1 if item i is selected, and let b represent the total space in the knapsack. Then we can formulate the knapsack problem as follows:

n max z = i=1 vixi Pn s.t. wixi b Pi=1 ≤ xi = 0 or 1 for all i.

The zero-one version of the knapsack problem states that every item is unique, and each can either be selected or not. A slight generalization of the knapsack problem states that you can choose more than one copy of each item, so that the variables can take on general integer values (probably with upper bounds on each variable).

5.2.3 Bin packing problem

Bin packing is somewhat similar to the knapsack problem. Suppose that we are given a set of m bins of equal size, b; and a set of n items that must be placed in the bins. Let wi be the size of item i. We define the zero-one variable xij = 1 if item i is placed in bin j. Bin packing is usually expressed as a problem of minimizing the number of bins required to pack all of the items. We can let yj = 1 if we need to use bin j. (Note that 5.2. IP PROBLEMS 39 if yj = 0, then the corresponding bin has no capacity.) The objective function minimizes the number of bins required

m min z = j=1 yj Pn s.t. i=1 wixij byj for all j P m ≤ xij = 1 for all i Pj=1 xij = 0 or 1 for all i, j

yj = 0 or 1 for all j

Bin packing has applications in industry where, for example, there is a limited amount of work that can be assigned to each person working at stations on an assembly line. This model may also be applicable when deciding which products should be produced at each of several possible manufacturing plants, or which customer should be assigned to each delivery truck. Of course, each of these problems involves additional criteria and constraints.

5.2.4 Set partitioning/covering/packing problems

Many problems in combinatorial optimization include (as subproblems) partitioning a group of items into ”optimal” subsets. For example, vehicle routing requires that we allocate customers to vehicles. Airline crew scheduling requires that we allocate flight legs to a crew. Municipal garbage pickup requires that we allocate specific street blocks to trucks. Each of these subproblems can be modeled in the following form as a set partitioning problem:

m min z = j=1 cjyj Pm s.t. aijyj = 1 for all i = 1,...,n Pj=1 yj = 0 or 1 for all j = 1,...,m, where aij = 1 if item i belongs to (potential) subset j and aij = 0, otherwise. Each column of the n m constraint matrix A represents a feasible combination of items. For × example, each column might represent the items that could feasibly be loaded into a truck for delivery to customers; or the items could be road segments that require garbage collection, and a column would represent a feasible route for a truck to pick up garbage. The cost cj represents the cost of delivering (or travelling, or producing) that subset of items. A variable yj = 1 if we decide to include that particular subset in our solution. In the set partitioning problem, all of the items must be included exactly once. In vehicle routing, for example, we might typically require that exactly one truck travel to 5.2. IP PROBLEMS 40 each customer. In a slightly different problem the set covering problem, we require that each item be selected at least once. For example, in the garbage collection problem, and in the crew scheduling problem, every street (every flight leg) must be covered at least once; but it is also feasible to cover the same street (flight leg) twice, if this turns out to be the most efficient solution. (The second truck would not pick up any garbage, and the second flight crew would ride as passengers.) Set covering differs from set partitioning by having inequality constraints instead of equalities. ≥ The set packing problem describes another similar situation. In some production scheduling problems, we are given a list of orders, and we have possible subsets of orders that can be combined on different machines. In some cases, there may not be sufficient resources to satisfy all of the demands. The problem is to select the optimal subset of orders to maximize the combined profit of those orders that are processed. This problem can be formulated as:

n max z = j=1 pjxj Pn s.t. aijxj 1 for all i = 1,...,m Pj=1 ≤ xj = 0 or 1 for all j = 1,...,n,

We select as many items as possible, but we are not allowed to process any items more than once.

5.2.5 Assignment problem and generalized assignment problem

We have already stated the assignment problem in Chapter 1. We have n persons p1,...,pn and n jobs j1,...,jn, and the cost cij of having person i perform job j. We wish to find an assignment of the persons to the jobs (one person per job) such that the total cost of performing the jobs is minimum. The costs are normally given by matrix c = [cij ]. The assignment problem (AP) can be formulated as follows.

n n min z = i=1 j=1 cijxij P Pn s.t. i=1 xij = 1, j = 1, 2,...,n Pn xij = 1, i = 1, 2,...,n Pj=1 xij = 0 or 1, i, j = 1,...,n.

Thus, xij = 1 if employee i has been assigned to job j and xij = 0, otherwise. The first constraint requires every job to be assigned to exactly one employee; and the second constraint states that every employee must do exactly one job. 5.3. QUESTIONS 41

The generalized assignment problem is a fairly simple extension in which every job must be assigned to one employee, but each employee has the capacity to perform more that one job. In particular, suppose that each employee, i, has a limited amount of time, bi hours available, and that job j will occupy employee i for a total of aij hours. Then, the generalized assignment problem can be formulated as:

n n min z = i=1 j=1 cijxij P Pn s.t. j=1 xij = 1, j = 1, 2,...,n Pn aijxij bi, i = 1, 2,...,n Pi=1 ≤ xij = 0 or 1, for all i, j.

5.3 Questions

Question 5.3.1. What is the difference between Linear Programming and Integer Pro- gramming problems? Why can one not in general use the Simplex algorithm to solve Integer Programming problems?

Question 5.3.2. Provide an Integer Programming formulation of the travelling salesman problem. Explain the meaning of all parameters and variables.

Question 5.3.3. Provide an Integer Programming formulation of the knapsack problem. Explain the meaning of all parameters and variables.

Question 5.3.4. Provide an Integer Programming formulation of the bin packing problem. Explain the meaning of all parameters and variables.

Question 5.3.5. Provide an Integer Programming formulation of the generalized assign- ment problem. Explain the meaning of all parameters and variables.

Question 5.3.6. (a) Of which problem is the following an instance:

max z = 2x + 3x + 5x 4x 1 2 3 − 4 s.t. x + x + 2x + 7x 12 1 2 3 4 ≤ xi = 0 or 1 for all i?

(b) Formulate the problem whose instance is given in (a).

Question 5.3.7. (a) Of which problem is the following an instance:

max z = 2x , + x , + 5x , 5x , 1 1 1 2 2 1 − 2 2 s.t. x1,1 + x2,1 = 1 5.3. QUESTIONS 42

x1,2 + x2,2 = 1

x1,1 + x1,2 = 1

x2,1 + x2,2 = 1

xi,j = 0 or 1 for 1 i, j 2? ≤ ≤

(b) Formulate the problem whose instance is given in (a). Chapter 6

Branch-and-Bound Algorithm

6.1 A Simple Example for Integer and Mixed Programming

Branch-and-Bound algorithms are widely considered to be the most effective methods for solving medium-sized general integer programming problems. These algorithms make no assumptions about the structure of a problem except that the objective function and the constraints are linear. Even these restrictions can be relaxed without changing the basic framework of the technique. In its simplest form, Branch-and-Bound is just an organized way of taking a hard problem and splitting it into two or more smaller (and hence easier) subproblems. If these subproblems are still too hard, we ”branch” again and further subdivide the problems. The process is repeated until each of the subproblems can be easily solved. Branching is done in such a way that solving each of the subproblems (and selecting the best answer found) is equivalent to solving the original problem. Consider the following simple example (from M.W. Carter and C.C. Price) in two variables. A manufacturer has 300 person-hours available this week and 1800 units of raw material. These resources can be used to build two products A and B. The requirements and the profit for each item are given as follows: Product Person-hours Raw Material Profit ($) A 150 300 $600 B 10 400 $100

Let x1 and x2 represent the integer number of units of products A and B, respectively. We can formulate this problem as an integer programming (IP) problem:

maximize z = 600x1 + 100x2

43 6.1. A SIMPLE EXAMPLE FOR INTEGER AND MIXED PROGRAMMING 44

x2

5

4 z = 1389.47 3

2 150x1 + 10x2 300 1 ≤

1 2 3 4 5 x1

Figure 6.1: An IP problem.

subject to 150x + 10x 300 1 2 ≤ 300x + 400x 1800 1 2 ≤ x ,x 0 and integer 1 2 ≥ This problem is illustrated in Figure 6.1. The feasible region is given by the discrete set of integer points within the constraint region. The optimal LP solution occurs at x1=1.789 and x2=3.158 with a profit of z=1,389.47. Unfortunately, we cannot sell a fractional number of items. One obvious alternative is to round down both values to x1=1 I and x2=3, for a profit of $900. We will call the feasible integer solution x = (1, 3) the current incumbent solution, and we will update the current incumbent. Before reading any further, try to locate the optimal integer solution to the problem in Figure 6.1, and consider how integer solutions might be found in general. The basic branch-and-bound algorithm stems from the following observations: The feasible integer solution x=(1,3) with z=900 was fairly easy to find. The optimal integer solution cannot have a lower value of z than $900 and we call this a lower bound on the optimal solution. Each time we find a higher valued integer solution, we replace the lower bound zI . This is the ”bound” part of branch-and-bound methods. Over the whole feasible region, the largest possible value of z=1389.47, which is the real valued solution obtained from the LP. We call this an upper bound on the optimal integer function value.

The graphical solution shows that x2 = 3.158. This is infeasible because it is a fractional solution. Since x must be an integer, apparently either x 3 or x 4. This is equivalent 2 2 ≤ 2 ≥ to saying that x2 cannot lie part way between 3 and 4. Consider the following two subproblems: 6.1. A SIMPLE EXAMPLE FOR INTEGER AND MIXED PROGRAMMING 45

x2

5

4

3

2 x2 3 ≤ 150x1 + 10x2 300 1 ≤

1 2 3 4 5 x1

Figure 6.2: Problems (1) and (2).

(1) maximize z = 600x1 + 100x2 subject to 150x + 10x 300 1 2 ≤ 300x + 400x 1800 1 2 ≤ x 0 and integer 1 ≥ x 4 and integer 2 ≥

(2) maximize z = 600x1 + 100x2 subject to 150x + 10x 300 1 2 ≤ 300x + 400x 1800 1 2 ≤ x ,x 0 and integer 1 2 ≥ x 3 2 ≤

Observe that if we find the best integer solutions of these subproblems, then one of them must be the optimal solution to the original problem. These subproblems are represented graphically in Figure 6.2. We say that we have separated on variable x2. Consider problem (1) first. The LP solutions occurs at x = (2/3, 4) with an objective function value of z = 800. Notice that x2 is now integer valued. We will see that each time we separate, the chosen variable will always be integer, although it does not necessarily stay integer on subsequent iterations. 6.1. A SIMPLE EXAMPLE FOR INTEGER AND MIXED PROGRAMMING 46

By definition, the linear programming solution is the largest value possible for the problem. Therefore, the value z=800 is an upper bound on all possible solutions in the feasible region for problem (1). Any integer solution to (1) must be 800. However, ≤ we already have a feasible integer solution with zI = 900. Therefore problem (1) can be ignored. It cannot contain any answer better than 900. In branch-and-bound terminology, we say that problem (1) can be ”fathomed”. In general, a subproblem is called fathomed whenever it is no longer necessary to branch any maximization problem, when the LP solution is infeasible, or when the LP relaxation produces an integer solution. Problem (2) has its optimal LP solution at x = (1.8, 3) with a function value of z = 1380. This value gives us a new upper bound on the optimal integer solution. At each iteration of the branch-and-bound process, the upper and lower bounds can be revised until they eventually converge to the optimal solution. We now know that the optimal value lies between 900 and 1380. Variable x2 is integer valued, but x1 is still fractional. We can now further divide problem (2) into two subproblems based on the fact that x 1 1 ≤ or x 2 as follows: 1 ≥

(3) maximize z = 600x1 + 100x2 subject to 150x + 10x 300 1 2 ≤ 300x + 400x 1800 1 2 ≤ x ,x 0 and integer 1 2 ≥ x 1 1 ≤ x 3 2 ≤

(4) maximize z = 600x1 + 100x2 subject to 150x + 10x 300 1 2 ≤ 300x + 400x 1800 1 2 ≤ x 0 and integer 2 ≥ x 2 and integer 1 ≥ x 3 2 ≤

For problem (3), it is easy to see that the optimal LP solution occurs at point x = (1, 3) with a function value z = 900. Since x is now integer valued, it must be optimal for this subproblem. This subproblem is considered to be fathomed because it gives us an integer solution; there is no need for further branching. It is also considered fathomed because 6.1. A SIMPLE EXAMPLE FOR INTEGER AND MIXED PROGRAMMING 47

0

0

1 2

0

1 2

3 4

Figure 6.3: Branch-and-Bound tree the solution of 900 is no better than the one we already obtained earlier. In either case, problem (3) is finished. Problem (4) consists of the single point x = (2, 0) with a function value of z = 1200. This solution is both integer, and better than the previous lower bound. Since x is integer, subproblem (4) is fathomed and no further branching is required. Our new lower bound increases to zI = 1200 and xI = (2, 0) becomes the new current incumbent. At this point, we observe that all of our subproblems have been fathomed. Therefore, xI = (2, 0) is the optimal integer solution and zI = 1200 is the optimal function value. This example illustrates, in particular, the fact that the rounding up/down approach does not work in general. It is often convenient to display this procedure in the form of a branch-and-bound tree. The tree corresponding to the previous example is illustrated in Figure 6.3. Each subproblem is represented by a node in the tree. Each node must either be fathomed or split into subproblems, which are shown by lower level nodes. Now consider a mixed programming (MP) problem

maximize z = 600x1 + 100x2 subject to 150x + 10x 300 1 2 ≤ 6.2. KNAPSACK EXAMPLE 48

300x + 400x 1800 1 2 ≤ x ,x 0 and x integer, 1 2 ≥ 2 which is a modification of the IP problem above. To solve this problem by the branch- and-bound algorithm, it suffices to consider only Problems (1) and (2). Problem (2) gives ∗ the optimal solution to the MP problem: x1 = 1.8,x2 = 3, z = 1380.

6.2 Knapsack example

Solve the following instance of the knapsack problem:

max z = 10x1 + 8x2 + 4x3 + 7x4 s.t. 2x + 2x + 4x + 5x 8 1 2 3 4 ≤ x1,x2,x3,x4 = 0 or 1.

Recall: When the 0-1 constraints are relaxed to solve the LP, we replace them with the linear constraints:

0 xi 1. ≤ ≤ To solve the LP relaxation of the knapsack problem the ratio choice rule defined below is used. Recall the general knapsack formulation:

n max z = j=1 vjxj Pn s.t. wjxj b Pj=1 ≤ xi = 0 or 1 for all i,

where vi denotes the value of item i and wi denotes the unit of space of item i. The LP relaxation of the knapsack problem is:

n max z = j=1 vjxj Pn s.t. wjxj b Pj=1 ≤ 0 xi 1 for all i. ≤ ≤ 6.2. KNAPSACK EXAMPLE 49

For each i we compute the ratio vj/wj, which indicates the relative value of item i. It is intuitively clear that we should assign xi = 1 first to the item i of highest ratio vi/wi, than xi = 1 to the one with next highest ratio, etc. When we cannot assign xi = 1 to the item i of the highest remaining ratio, we assign the corresponding fraction to xi and 0 to all xi of smaller ratio. (Ties are broken in an arbitrary manner.) We call this rule the ratio choice. The ratio choice gives an optimal solution to the LP relaxation, i.e., we do not need use Simplex algorithm for the LP relaxation of the knapsack problem. Let us return to our example and call (P0) the linear relaxation of

max z = 10x1 + 8x2 + 4x3 + 7x4 s.t. 2x + 2x + 4x + 5x 8 1 2 3 4 ≤ x1,x2,x3,x4 = 0 or 1.

To solve (P0), find the ratios

r1 = 10/2 = 5, r2 = 8/2 = 4, r3 = 4/4 = 1, r4 = 7/5.

Ordering the ratios gives, r1 > r2 > r4 > r3. To obtain a solution for (P0), we will now start to fill up the knapsack in the order the ratios give us. Therefore, we assign x = 1 ((2 1) 8, true). Next assign x = 1 1 × ≤ 2 ((2 1) + (2 1) = 4 8, true). We now have to look at x and we can see that it is only × × ≤ 4 possible to assign a fraction of x4 to the knapsack, i.e. x4 = 4/5. Therefore, the solution of (P0) is x1 = x2 = 1,x4 = 4/5,x3 = 0, which is an infeasible solution of the original IP problem.

Let us now consider the subproblems (P1)=(P0) plus the extra constraint (x4 = 1), and (P2)=(P0) with the extra constraint(x4 = 0).

To work out a solution of (P1), we assign x4 = x1 = 1 ((5 1) + (2 1) 8, true) and 8−7 × × ≤ x2 has to be a fraction again to satisfy the constraint ( 2 = 1/2). The solution of (P1) is x4 = x1 = 1, x2 = 1/2,x3 = 0, z = 21. Finding the solution of (P2) we assign x = 0,x = x = 1, as required ((5 0) + (2 4 1 2 × × 1) + (2 1) = 4 8, true). We can now see that by assigning x = 1 we are still satisfying × ≤ 3 the constraints and the solution is z = 22. The last solution is feasible and is better than that of (P1). Thus, the optimal solution is x4 = 0,x1 = x2 = x3 = 1, z = 22. 6.3. BRANCHING STRATEGIES 50 6.3 Branching strategies

To control the selection of the next node for branching, it is typical to restrict the choice of nodes from the list of currently active nodes in one of the following ways.

The Backtracking or Depth-First-Search Strategy: Always select a node that was most recently added to the tree. Evaluate all nodes in one branch of the tree com- pletely to the bottom, and then work back up to the top following all indicated side branches. A typical order of evaluating nodes is illustrated in Figure 6.4 (the upper tree). The number inside each node represents the time at which it is selected.

The Jumptracking (unrestricted) Strategy: As the name implies, each time the al- gorithm selects a node, it can choose any active node anywhere in the tree. For example, it might always choose the active node corresponding to the highest LP so- lution, z∗. A possible order of solving subproblems under Jumptracking is illustrated in Figure 6.4 (the lower tree).

At first glance, the Backtracking procedure appears to be unnecessarily restrictive. The major advantages are conservation of storage required and a reduction in the amount of computation required to solve the corresponding LP at each node. Observe that the number of active subproblems in the list at any time is equal to the number of levels in the current branch of the tree. Using Jumptracking, the size of the active list can grow exponentially. Each node in the active list corresponds to a linear programming problem with its own set of constraints. Consequently, storage space for subproblems is an important consideration. Computation time is an even more serious issue with Jumptracking. Observe that each time we solve a subproblem, we solve an LP complete with a full Simplex tableau. When we move down the tree, we add one new constraint to the LP. This can be done relatively efficiently if the old tableau is still available. To do this using the Jumptracking strategy, we would have to save the Simplex tableau for each node (or at least enough information to generate the tableau easily). Hence, Backtracking can save a large amount of LP computation time at each node. The efficiency of solving subproblems is crucial to the success of a branch-and-bound method because practical problems will typically generate trees with literally thousands of nodes. The major advantage of Jumptracking is that, by judicious selection of the next active node, we can usually solve the problem by examining far fewer nodes. Observe that when we find the optimal integer solution, many of the nodes can be eliminated by the bounding test. With Jumptracking, the integer solution is represented by a node at the bottom of the branching tree. With Backtracking, each time we choose a branch, one is ”correct” and the other is ”wrong”. If we choose the wrong branch, we must evaluate all nodes in 6.4. MAX-SAT EXAMPLE 51

1

2 9

3 6 10 13

4 5 7 8 11 12 14 15

1

2 4

3 6 5 7

15 14 13 12 8 9 10 11

Figure 6.4: Branching strategies: Backtracking (the upper tree) and Jumptracking (the lower tree). that branch before we can get back on the correct branch. Using Jumptracking, we can return to the correct branch as soon as we realize that we may have made a mistake. When we find the optimal solution, many of the nodes in the ”wrong” branch will be fathomed at a higher level of the tree by the bounding test. In short, there is a trade-off between Backtracking and Jumptracking, and many com- mercial algorithms use a mixed strategy. Backtracking is used until there is a strong indication of being in the wrong branch; then there is a jump to a more promising node in the tree and a resumption of a Backtracking strategy from that point. The amount of Jumptracking is determined by the definition of ”wrong”.

6.4 MAX-SAT Example

Often one is interested in variables and functions that take only two possible values, say TRUE/FALSE or 1/0. These are called Boolean variables/functions, and are used to model situations when it is only desirable to know whether something occurs or not, e.g. 6.4. MAX-SAT EXAMPLE 52 whether a switch is on or off, or whether there is current running through a wire or not (but we do not care how much current). We will asume that the value of a Boolean variable or function is always either 0 or 1. We denote negation by , that is, 0=1 and 1 = 0. ¬ ¬ ¬ If x,y are Boolean variables then the conjunction x y has value 1 precisely when both x ∧ and y have value 1 (logical AND), and the disjunction x y has value 1 precisely when at ∨ least one of x and y has value 1 (logical OR). An example of a Boolean function of three Boolean variables would be F (x ,x ,x ) = ( x x ) x . A clause is a special Boolean 1 2 3 ¬ 1 ∨ 2 ∧¬ 3 function consisting only of disjunctions (and no conjunctions) between single variables or their negations. For example x x x is a clause containing three variables. If ¬ 1 ∨ 2 ∨¬ 3 values are assigned to the variables of a clause in a way which makes its value equal to 1, then we say that this assignment satisfies the clause. An instance of the Maximum Satisfiability Problem (MAX-SAT) is a list of clauses F1, F2,...,Fm containing Boolean variables x1,x2,...,xn. The goal is to find an assign- ment of values to the variables such that the number of satisfied clauses is as large as possible. Consider the following example (from J. Hromkoviˇc) with 10 clauses and 4 variables:

F = x x 1 1 ∨¬ 2 F = x x x 2 1 ∨ 3 ∨¬ 4 F = x x 3 ¬ 1 ∨ 2 F = x x x 4 1 ∨¬ 3 ∨ 4 F = x x x 5 2 ∨ 3 ∨¬ 4 F = x x x 6 1 ∨¬ 3 ∨¬ 4 F7 = x3 F = x x 8 1 ∨ 4 F = x x 9 ¬ 1 ∨¬ 3 F10 = x1.

First we will solve the problem with Backtracking. At each node the following rule is used: Assign a value to the first variable among x1,x2,x3,x4 which currently has no value assigned to it, and let this value be 1 the first time when visiting the node, 0 the second time. The search tree is shown in Figure 6.5. At each interior node of the tree we note the clauses which become violated at the moment the node is reached (they can no longer be satisfied no matter which values are assigned to the variables having no values assigned to them yet). Below any bottom node in the tree we note the value of the objective function which follows from the current assignments. Thus at the bottom node corresponding to the assignments x1 = x2 = x3 = x4 = 1, all clauses except F9 are satisfied, hence the 6.5. QUESTIONS 53

x1 = 1 x1 = 0

x2 = 1 x2 = 0 F10

x3 = 1 x3 = 0 F3

x4 = 1 F9 F7

9

Figure 6.5: Backtracking in MAX-SAT to the left. number of satisfied clauses is 9. With zI now equal to 9, we will never separate on any node for which some clause gets violated, since the optimal solution at such a node cannot be better than the current incumbent. In Figure 6.6 is shown the tree which is searched by Backtracking if, when separating each node, we first assign the value 0 instead of 1 to the next variable. We observe that the number of nodes visited by the algorithm, and hence its efficiency, depends very much on the order in which the nodes are searched.

6.5 Questions

Question 6.5.1. Solve the problem stated in Section 6.1 by first separating on x1 instead of x2. Question 6.5.2. Solve the following problem by the Branch-and-Bound method:

maximize z = x1 + 5x2 subject to x + 10x 20 1 2 ≤ x 2 1 ≤ x ,x 0 and integer 1 2 ≥ Question 6.5.3. Solve the following IP problem by the Branch-and-Bound method:

maximize z = x1 + x2 6.5. QUESTIONS 54

x1 = 1 x1 = 0

x2 = 1 x2 = 0 x2 = 1 x2 = 0 F10

= 1 0 = 1 0 = 1 0 x3 x3 F3 F1 x3

F9 F7 F9 F7 x4 =1 0 1 0

F6 F4,F8 F2,F5 F8 9 8767

Figure 6.6: Backtracking in MAX-SAT to the right.

subject to x 2 1 ≤ x + 2x 5 1 2 ≤ x ,x 0 and integer 1 2 ≥ Question 6.5.4. Solve the following IP problem by the Branch-and-Bound method:

maximize z = x1 + 2x2 subject to x + x 2 1 2 ≤ x 1.5 2 ≤ x ,x 0 and integer 1 2 ≥ Question 6.5.5. Solve the following MP problem by the Branch-and-Bound method:

minimize z = x1 + x2 subject to 2x + 3x 6 1 2 ≥ 3x + x 3 1 2 ≤ x ,x 0 and x integer 1 2 ≥ 1 Question 6.5.6. Describe the Branch-and-Bound algorithm for IP maximization prob- lems. What does it mean that a node (i.e., subproblem) is fathomed.

Question 6.5.7. (a) Describe the Backtracking and Jumptracking strategies for branching. (b) What are he advantages and disadvantages of each of the two strategies? 6.5. QUESTIONS 55

Question 6.5.8. Solve the MAX-SAT example of Section 6.5 by Jumptracking, using the strategy to always separate on an active node with a minimal number of currently violated clauses. Can you say something about the efficiency of Jumptracking compared to Backtracking applied to this problem?

Question 6.5.9. Solve the following instance of the knapsack problem:

max z = 10x1 + 8x2 + 4x3 + 7x4 s.t. 2x + 2x + 4x + 5x 8 1 2 3 4 ≤ x1,x2,x3,x4 = 0 or 1. Chapter 7

Construction Heuristics and Local Search

Unfortunately, the vast majority of optimisation problems are NP-hard, and Branch-and- Bound algorithms cannot solve them to optimality even for moderate instances, since sometimes they need to search every node of the Branch-and-Bound tree. If we have an IP problem with only n = 20 variables, each taking m = 3 possible values, then the Branch-and-Bound tree may have as many as mn = 320 nodes at the lowest level of the tree. However, 320 is already more than 3 billion. Of course, one may stop branching and accept the current best solution of Branch-and-Bound algorithm (partial Branch-and- Bound algorithms). We will consider approaches that produce good solutions (but normally not optimal) faster than partial Branch-and-Bound algorithms.

7.1 Combinatorial optimisation problems

To illustrate possible approaches, we will consider some selected optimisation problems. One of them is the asymmetric travelling salesman problem (ATSP). This problem is one of the most famous and studied optimisation problems. In the ATSP, we are given a complete digraph D with vertices V = 1,...,n and cost cij of every arc (i, j) and we { } wish to find a cheapest tour in D. (A tour starts at a vertex, traverses a sequence of arcs in their forward direction, thereby visiting every other vertex once, and returns to the initial vertex.) See Figure 7.1. The second problem is Max Cut. Here we are given an undirected graph G = (V, E) with vertices V and edges E. A cost c(e) is assigned to every edge e E. A cut (X, V X) ∈ − is the set of edges between X and V X. We are to find a cut of maximum total cost. − 56 7.2. GREEDY-TYPE ALGORITHMS 57

d 5 c 8 5

9 3 4 12 10 7 7 6

a 3 b

Figure 7.1: A complete digraph

Max Cut is of interest in many practical problems when we are to break graphs into pieces with minimum possible number of edges (and thus maximum possible number of edges between the pieces). The two problems are among very many combinatorial optimisation problems. A com- binatorial optimisation problem is given by a set S = s ,...,sn of elements, each of { 1 } some cost c(si), and a collection of subsets of S. We wish to find a set F in such that F F the sum of the costs of elements in F is minimum/maximum among all sets in . F

7.2 Greedy-type algorithms

The for a minimisation combinatorial optimisation problem works as follows. We choose an element si1 of S that is contained in at least one set of and F of minimum cost among such elements. Form X = si1 . At every iteration choose an { } element si S X such that X si is a subset of some set in and si is of minimum k ∈ − ∪{ k } F k cost among all such elements. Add sik to X and proceed to the next iteration. For example, let S = a, b, d, e, f , c(a) = 0, c(b) = 1, c(d) = 5, c(e) = 0.5, c(f) = 2, { } and let = Y, Z , where Y = b, d, e , Z = e, f . At iteration 0 we choose X = e . F { } { } { } { } At iteration 1, we have X = e, b . At iteration 2, we have X = e, b, d = Y . { } { } This example shows the the greedy algorithm not always gives an optimal solution: indeed, c(Y )=1+5+0.5 = 6.5 > c(Z) = 0.5+2=2.5. In ”Theoretical Analysis of Heuristics”, we will see that even for the assignment problem the greedy algorithm may produce the worst solution! In Figure 7.1 for the ATSP, we see that the greedy algorithm may have a problem to 7.2. GREEDY-TYPE ALGORITHMS 58 start: there are two cheapest arcs (b, a) and (b, c). If the greedy algorithm chooses (b, a), then it proceeds to choosing (c, b), (d, c) and (a, d). (It does not choose (a, c) rather than (d, c) because addition of (a, c) would create a cycle shorter than a tour.) The total cost of this tour T = badcb is 3+4+5+10 =22. If the greedy algorithm chooses (b, c), then it proceeds to choosing (a, b). (It cannot choose (b, a), (c, b), (d, c) since if we have added any of them, we’d not be able to complete a tour: according to the description of the greedy algorithm at every iteration we can only add arcs such that a subset of some tour(s) is created.) Then we choose (c, d) and (d, a). The cost of this tour, T ′ = bcdab, is 29. Thus, the issue of ties for the greedy algorithm might be important. A natural problem for the instance of the ATSP in Figure 7.1 is to find a cheapest tour. You can solve this question by examining all six tours of the instance. In general, if the ATSP has n vertices, then there are (n 1)! = (n 1) (n 2) (n 3) 3 2 1 − − × − × − ×···× × × tours there. To see that, fix a vertex, say vertex 1. We can move to one of the remaining vertices 2, 3,...,n (n 1 vertices). After choosing one of them, we can move to one of − n 2 remaining vertices, etc. − Despite the fact the greedy algorithm does not always produce ”good” solutions, it is of use since it is a very simple algorithm and it gives relatively good results for some problems, see ”Computational Analysis of Heuristics”. There are even problems, for which the greedy algorithm gives always optimal solutions. The most famous such problem is the minimum spanning tree problem. Even though the greedy algorithm is easy to describe it is not always easy to implement, since we need to check which elements can be added to X and which cannot. This also leads to the fact that the greedy algorithm is not as fast as we would like an algorithm to be for a given combinatorial optimisation problem. In such difficult cases, specialized greedy-type algorithms can be useful. For the ATSP, such an algorithm is the nearest neighbour algorithm (NN). NN proceeds as follows. We start from some vertex, say 1. We move to the nearest to 1, from there to the nearest to that one, etc. (we never create a cycle shorter than a tour). For example, in Figure 7.1, if we start at a, then we move to c, to b, to d, to a, and obtain the tour T ′ = acbda of cost 30. If we start NN at d, however, we move to c, to b, to a, to d. We have obtained tour T = dcbad of cost 22. This example suggests that it is a good idea to start from each vertex in turn. However, this strategy slows down the algorithm. Indeed, NN is of complexity O(n2) (see below), but if we start from every vertex we get O(n n2)= O(n3). × Observe that O(n3) is too large for the ATSP as we cannot use NN even for n equal a few thousands (we’d need several days to get the result). We consider the behavior of NN and its repetitive modification in ”Computational Analysis of Heuristics” and ”Theoretical Analysis of Heuristics” and see that NN produces results similar to the greedy algorithm even though it is computationally more efficient for the ATSP. In practice, NN is faster than the greedy algorithm. This is easy to predict by calcu- 7.3. SPECIAL ALGORITHMS FOR THE ATSP 59 lating and comparing the time complexities of the two algorithms. The complexity of NN is O(n2). The greedy algorithm starts from sorting the costs of arcs in increasing order. Algorithms to sort N numbers are of complexity O(N log N). There are N = n(n 1)/2 − numbers to sort for the greedy algorithm. Hence, we need O(n2 log n) time to sort the costs. The greedy algorithm can be implemented such that its complexity is O(n2 log n). Clearly, n2 log n>n2 and thus NN is faster than the greedy algorithm.

7.3 Special algorithms for the ATSP

The random insertion heuristic (RI) chooses randomly two initial vertices i1 and i2 and forms the cycle i1i2i1. Then, in every iteration, it chooses randomly a vertex ℓ which is not in the current cycle i1i2 ...isi1 and inserts ℓ in the cycle (i.e., replaces an arc imim+1 of the cycle with the path imℓim+1) such that the cost of the cycle increases as little as possible. The heuristic stops when all vertices have been included in the current cycle, i.e., a tour is formed. We illustrate RI using Figure 7.1. Let’s start from the cycle aba and insert c in the optimal manner. We have to choose between the cycles acba and abca. While the first cycle increases the cost of aba by cost(ac)+cost(cb) cost(ab) = 3, the second one increases the − cost of aba by cost(bc)+ cost(ca) cost(ba) = 7. Hence we choose acba. Now we have to − choose between the cycles adcba, acdba and acbda. They cause an increase to the cost of the current cycle of cost(ad)+ cost(dc) cost(ac) = 10, cost(cd)+ cost(db) cost(cb) = 11 − − and cost(bd)+cost(da) cost(ba) = 18, respectively. Hence, we choose the tour T = adcba − of cost 22. The complexity of RI is O(n2).

Our next heuristic is based on the operation called patching. Let C = i1i2 ...iki1 and Z = j1j2 ...jℓj1 be a pair of disjoint cycles. For any pair s,t of indices the corresponding patching is deletion of the arcs (is, is+1) and (jt, jt+1) and addition of arcs (is, jt+1) and (jt, is+1). As a result, we get one cycle X = isjt+1jt+2 ...j1j2 ...jtis+1is+2 ...i1i2 ...is. 7.3. SPECIAL ALGORITHMS FOR THE ATSP 60

There are kℓ choices of pair s,t and thus kℓ different patchings of the pair of cycles above. The cheapest patching is the one for which the resulting cycle X is cheapest. We can choose the cheapest by selecting X with minimum cost(isjt )+ cost(jtis ) +1 +1 − cost(isis ) cost(jtjt ). So, to find the cheapest patching we need O(kℓ) time. +1 − +1 The patching algorithm can be outlined as follows:

1. Construct a collection F of disjoint cycles covering all vertices of minimum cost.

2. Choose two longest (not cheapest) cycles C and Z in the current F and replace C and Z in F by their cheapest patching.

3. Repeat Step 2 until the current F is reduced to a single cycle, i.e., a tour.

To find a collection F of disjoint cycles covering all vertices of minimum cost, it suffices to solve the Assignment Problem. Indeed, consider a complete digraph D = (V, A) with ′ cost costD(a) on every arc a. Construct a complete bipartite graph B = (V, V ; E) in ′ ′ ′ which V = v : v V , i.e., V is a copy of V , and costB(uv )= costD(uv) if u = v and ′ { ∈ } 6 costB(uv ) = M, where M is a very large constant, otherwise. A minimum cost perfect ′ ′ ′ matching u1v1, u2v2, ..., unvn in B corresponds to a minimum cost collection of cycles in D with arcs u1v1, u2v2, ..., unvn.

Taking into consideration that to find a cheapest patching for a current pair of cycles we need to consider only arcs not considered before, and that each arc is considered only once, we conclude that the search for a patching will take O(n2) time. However, to find a collection of disjoint cycles covering all vertices we need to solve the AP. Algorithms for the AP take O(n3) time (in practice, they are much faster and close to O(n2)). Thus, the patching algorithm’s complexity is O(n3). In practice, however, the complexity is lower and not much larger than O(n2), which makes the patching algorithm fast enough. 7.4. IMPROVEMENT LOCAL SEARCH 61 7.4 Improvement local search

The algorithms presented so far are called construction heuristics. They produce a solution (a tour for the ATSP) and stop. Their advantage is the fact they are fast, but their disadvantage is that their solution could be of poor quality. To improve a solution produced by a construction heuristic, several approaches are used. The simplest one is improvement local search, which we discuss here. However, there are other approaches called meta- heuristics, which we consider later.

The idea of improvement local search is as follows. We have a solution sol1 produced by a construction heuristic. We look at a collection of solutions somewhat close to sol1 and try to find there a better (or best solution) sol2. (We call a collection of solutions somewhat close to sol a neighbourhood of sol .) At iteration i, we have soli found in iteration i 1. 1 1 − We proceed by looking at a neighbourhood of soli and try to find a solution soli+1 better than soli. If soli+1 is not found (none of the solution in the neighbourhood are better than soli) we stop.

There are two types of improvement local search: the one where soli+1 is better than soli and the other one where soli+1 is the best in the neighborhood of soli. In practice mostly the first type is used, in theoretical investigations the second type is mostly used. To specify improvement local search, we have to define a neighbourhood for every solution. So, we proceed by considering neighbourhoods for the ATSP. One of the easiest and most useful are k-Opt neighbourhoods. Local search that uses k-Opt neighbourhoods is called k-Opt. The k-Opt neighbourhood of a tour T is obtained by deleting k arcs from T followed by adding k arcs to form a tour. If we delete three arcs from a tour, there are only two ways to add three arcs (not necessarily different from the deleted ones) in order to reconstruct a tour. To see this, contract each of the three paths obtained after deletion of three arcs to a vertex. As a result, we get a complete digraph with 3 vertices. This digraph has only two tours. 7.4. IMPROVEMENT LOCAL SEARCH 62

We can choose three arcs to delete in n(n 1)(n 2)/6 ways and each such way leads to − − 2 tours. Thus, a 3-Opt neighbourhood has O(n3) tours. Similarly, one can see that a k-Opt neighbourhood has O(nk) tours. In order to see whether deletion of three arcs and addition of three arcs to form a tour leads to improvement it suffices to find the difference in the cost of deleted and added arcs. If the difference is positive, we have found an improvement. It is important that to examine any new tour we need only constant time. This means that O(nk) time is enough to find the best tour in a k-Opt neighbourhood. However, even for k = 3, the time is too large for even moderate instances of the ATSP. Hence we have to try to restrict our choice of candidates for improvement to only some tours in a 3-Opt neighbourhood, and only if this fails, we may look at the entire neighbourhood. Several possibilities to implement this strategy are considered in the literature on the ATSP, but we will not look at them here. Question 7.4.1. Design neighbourhoods for Max Cut and show how we can economically find a better cut in the neighbourhood of a given cut.

Normally, only 3-Opt neighbourhoods are used in practical implementations since ot- herwise restrictions on the fraction of tours to consider must be very strong. The reason is that we spend a constant time on a tour. Can we do better? A positive answer to this question is provided in the rest of this section.

Let C = x x ...xkx be a cycle. The operation of removal of a vertex xi (1 i k) 1 2 1 ≤ ≤ results in the cycle x1x2...xi−1xi+1 ...xkx1 (thus, removal of xi is not deletion of xi from C; deletion of xi gives the path xi+1xi+2 ...xkx1x2 ...xi−1). Let y be a vertex not in C. The operation of insertion of y into an arc (xi,xi+1) results in the cycle x1x2...xiyxi+1 ...xkx1. The cost of the insertion is defined as c(xi,y)+ c(y,xi ) c(xi,xi ). For a set Z = +1 − +1 z ,...,zs (s k) of vertices not in C, an insertion of Z into C results in the tour { 1 } ≤ obtained by inserting the nodes of Z into different arcs of the cycle. In particular, insertion of y into C involves insertion of y into one of the arcs of C.

Let T = x x ...xnx be a tour and let Z = xi1 ,xi2 , ..., xi be a set of non-adjacent 1 2 1 { s } vertices of T , i.e., 2 ik ir n 2 for all 1 k < r s. The assign neighbourhood ≤ | − | ≤ − ≤ ≤ of T with respect to Z, N(T, Z), consists of the tours that can be obtained from T by removal of the vertices in Z one by one followed by an insertion of Z into the cycle derived after the removal. For example, N(x x x x x x , x ,x )= 1 2 3 4 5 1 { 1 3} x xix xjx x ,x xix x xjx ,x x xix xjx : i, j = 1, 3 . { 2 4 5 2 2 4 5 2 2 4 5 2 { } { }} The neighbourhood N(T, Z) has exactly (n s)!/(n 2s)! tours. In particular, if − − s = n/2, then N(T, Z) has (n/2)! tours. Thus, N(T, Z) has exponential number of tours. Interestingly we do not need to spend exponential time to find the best tour in that neighbourhood. Theorem 7.4.2. The best tour in the neighbourhood N(T, Z) can be found in time O(n3). 7.5. QUESTIONS 63

Proof: Let C = y1y2...yn−sy1 be the cycle obtained from T after removal of Z and let Z = z , z , ..., zs . By the definition of insertion, we have n s s. Let φ be an injective { 1 2 } − ≥ mapping from Z to Y = y ,y , ..., yn−s . (The requirement that φ is injective means { 1 2 } that φ(zi) = φ(zj) if i = j.) If we insert some zi into an arc (yj,yj ), then the cost of 6 6 +1 C will be increased by c(yj, zi)+ c(zi,yj ) c(yj,yj ). Therefore, if we insert every zi, +1 − +1 i = 1, 2, ..., s, into yφ(i)yφ(i)+1, the cost of C will be increased by

s g(φ)= c(y , zi)+ c(zi,y ) c(y ,y ). X φ(i) φ(i)+1 − φ(i) φ(i)+1 i=1

Clearly, to find a cheapest tour of N(T, Z), it suffices to minimize g(φ) on the set of all injections φ from Z to Y. This can be done using the following complete bipartite graph G. The partite sets of G are Z and Y . The cost of an edge ziyj is set to be c(yj, zi)+ c(zi,yj ) c(yj,yj ). +1 − +1 By the definition of G, every maximum matching M of G corresponds to an injection φM from Z to Y. Moreover, the costs of M and φM coincide. A cheapest maximum matching in G can be found by solving the assignment problem. Therefore, in O(n3) time, we can find the best tour in N(T, Z). QED

7.5 Questions

Question 7.5.1. Give the definition of a combinatorial optimisation (CO) problem. For- mulate the greedy algorithm for CO. Give examples when the greedy algorithm finds the worst solution.

Question 7.5.2. Describe the greedy, nearest neighbour and random insertion algorithms for the asymmetric travelling salesman problem. Illustrate the algorithms on an instance of the ATSP with 5 vertices. 7.5. QUESTIONS 64

Question 7.5.3. Describe the greedy algorithm for Max Cut.

Question 7.5.4. Describe the ideas of local search algorithms for the Symmetric TSP, in general, and k-Opt, in particular. Chapter 8

Computational Analysis of Heuristics

8.1 Experiments with ATSP heuristics

This section is based on a chapter by D.S. Johnson, G. Gutin, L. McGeoch, A. Yeo, W. Zhang and A. Zverovich in the book ”The Traveling Salesman Problem and its Variations”, G. Gutin and A. Punnen (eds.), Kluwer, 2002. In this section we consider only part of heuristics analyzed in the chapter, namely, Patch, COP, 3opt, Greedy and NN. We described all these heuristics, apart from COP, earlier. COP is an improved version of Patch, which is not described in these notes. In many cases it is impossible to find optimal tours for the instances of the ATSP considered here. To see how far the tour obtained by a heuristic is from optimum, we use lower bounds. One lower bound is the AP lower bound, i.e., the cost of a cheapest collection of disjoint cycles. To see that this is indeed a lower bound, it suffices to notice that a tour is a collection of disjoint cycles (that consists of a unique cycle). Another lower bound is the so-called Held-Karp lower bound (HK), which is normally better than AP. To compute HK one needs to solve several LP problems related to the ATSP (some LP relaxations of the ATSP); we will not provide details on HK.

8.2 Testbeds

There are various families of instances of the ATSP that are of practical and theoretical interest. Thus, it makes sense to study the behavior of ATSP heuristics not on one family of instances, but a set of families. We start by giving short description of the families of instances and analyzing their properties. We first family, rect, has not been used in the

65 8.2. TESTBEDS 66 experiments, but provided a basis for some other families.

Random 2-Dimensional Rectilinear Instances (rect). The cities correspond to random points uniformly distributed in a 106 by 106 square, and the distance between points (x ,y ) and (x ,y ) is x x + y y . 1 1 2 2 | 2 − 1| | 2 − 1| Random Asymmetric Matrices (amat). The random asymmetric distance matrix 6 generator chooses each distance d(ci, cj) as an independent random integer x, 0 x 10 . ≤ ≤ For these instances it is known that both the optimal tour length and the AP bound approach a constant (the same constant) as N . The rate of approach appears to → ∞ be faster if the upper bound U on the distance range is smaller, or if the upper bound is set to the number of vertices n, a common assumption in papers about optimization algorithms for the ATSP.

Shortest-Path Closure of amat (tmat). One of the reasons the previous class is uninteresting is the total lack of correlation between distances. Note that instances of this type are unlikely to obey the triangle inequality, i.e., there can be three cities c1, c2, c3 such d(c1, c3) > d(c1, c2)+ d(c2, c3). A somewhat more reasonable instance class can be obtained by taking Random Asymmetric Matrices and closing them under shortest path computation. That is, if d(ci, cj) > d(ci, ck)+ d(ck, cj) then set d(ci, cj) = d(ci, ck)+ d(ck, cj) and repeat until no more changes can be made. This is also a commonly studied class.

Tilted Drilling Machine Instances, Additive Norm (rtilt). These instances correspond to the following potential application. One wishes to drill a collection of holes on a tilted surface, and the drill is moved using two motors. The first moves the drill to its new x-coordinate, after which the second moves it to its new y-coordinate. Because the surface is tilted, the second motor can move faster when the y-coordinate is decreasing than when it is increasing. The generator starts with an instance of rect and modifies it based on three parameters: ux, the multiplier on ∆x that tells how much time the first + − | | motor takes, and uy and uy , the multipliers on ∆y when the direction is up/down. For + −| | this class, the parameters ux = 1, uy = 2, and uy = 0 were chosen, which yields the same optimal tour lengths as the original symmetric rect instances because in a cycle the sum of the upward movements is precisely balanced by the sum of the downward ones.

Tilted Drilling Machine Instances, Sup Norm (stilt). For many drilling ma- chines, the motors operate in parallel and so the proper metric is the maximum of the times to move in the x and y directions rather than the sum. This generator has the same three parameters as for rtilt, although now the distance is the maximum of ux ∆x and − + | | uy ∆y (downward motion) or uy ∆y (upward motion). For this class, the parameters | | + − | | ux = 2, uy = 4, and uy = 1 were chosen. Random Euclidean Stacker Crane Instances (crane). In the Stacker Crane Problem one is given a collection of source-destination pairs si, di in a metric space where 8.2. TESTBEDS 67 for each pair the crane must pick up an object at location si and deliver it to location di. The goal is to order these tasks so as to minimize the time spent by the crane going between tasks, i.e., moving from the destination of one pair to the source of the next one. This can be viewed as an ATSP in which city ci corresponds to the pair si, di and the distance from ci to cj is the metric distance between di and sj. The generator has a single parameter u 1, and constructs its source-destination pairs as follows. The ≥ sources are picked as in an instance of rect. Then we pick two integers x and y uniformly and independently from the interval [ 106/u, 106/u]. The destination is the vector sum − s + (x,y). In order to preserve a sense of geometric locality, we let u vary as a function of N, choosing values so that the expected number of other sources that are closer to a given source than its destination is roughly a constant, independent of N. These instances do not necessarily obey the triangle inequality since the time for traveling from source to destination is not counted.

Disk Drive Instances (disk). These instances attempt to capture some of the structure of the problem of scheduling the read head on a computer disk. This problem is similar to the stacker crane problem in that the files to be read have a start position and an end position in their tracks. Sources are again generated as in rect instances, but now the destination has the same y-coordinate as the source. To determine the destination’s x-coordinate, we generate a random integer x [0, 106/u] and add it to the x-coordinate ∈ of the source modulo 106, thus capturing the fact that tracks can wrap around the disk. The distance from a destination to the next source is computed based on the assumption that the disk is spinning in the x-direction at a given rate and that the time for moving in the y direction is proportional to the distance traveled at a significantly slower rate. To get to the next source we first move to the required y-coordinate and then wait for the spinning disk to deliver the x-coordinate to us.

Pay Phone Coin Collection Instances (coin). These instances model the problem of collecting money from pay phones in a grid-like city. We assume that the city is a k by k grid of city blocks with 2-way streets running between them and a 1-way street running around the exterior boundary of the city. The pay phones are uniformly distributed over the boundaries of the blocks. We can only collect from a pay phone if it is on the same side of the street as we are currently driving on, and we cannot make “U-turns” either between or at street corners. Finding the shortest route is trivial if there are so many pay phones that most blocks have one on all four of their sides. This class is thus generated by letting k grow with N, in particular as the nearest integer to 10√N.

No-Wait Flowshop Instances (shop). Ina k-processor no-wait flowshop, a jobu ¯ consists of a sequence of tasks (u1, u2,...,uk) that must be performed by a fixed sequence of machines. The processing of ui+1 must start on machine i +1 as soon as processing of ui is complete on machine i. This models the processing of heated materials that must not be allowed to cool down and situations where there is no storage space to hold waiting jobs. These instances have k = 50 processors and task lengths are independently chosen 8.3. COMPARISON OF TSP HEURISTICS 68 random integers between 0 and 1000. The distance from jobv ¯ to jobu ¯ is the minimum possible amount by which the finish time for uk can exceed that for vk ifu ¯ is the next job to be started afterv ¯.

Approx. Shortest Common Superstring Instances (super). This class is in- tended to capture some of the structure in a computational biology problem relevant to genome reconstruction. Given a collection of strings C, we wish to find a short superstring S in which all are (at least approximately) contained. If we did not allow mismatches the distance from string A to string B would be the length of B minus the length of the longest prefix of B that is also a suffix of A. Here we add a penalty equal to twice the number of mismatches, and the distance from string A to string B is the length of B minus max j + 2k: there is a prefix of B of length j that matches a suffix of A in all but k { positions . The generator uses this metric applied to random binary strings of length 20. } In what follows we shall consider which measurable properties of instances correlate with heuristic performance. Likely candidates include (1) the gap between the AP and HK bounds, (2) the extent to which the distance metric departs from symmetry, and (3) the extent to which it violates the triangle inequality. The specific metrics we use for these properties are as follows. For (1) we use the percentage by which the AP bound falls short of the HK bound. For (2) we use the ratio of the average value of d(ci, cj) d(cj , ci) to | − | the average value of d(ci, cj)+ d(cj , ci) , a quantity that is 0 for symmetric matrices and | | has a maximum value of 1. For (3) we first compute, for each pair ci, cj of distinct cities, ′ the minimum of d(ci, cj ) and min d(ci, ck)+ d(ck, cj ) : 1 k N (call it d (ci, cj )). The { ≤ ≤′ } metric is then the average, over all pairs ci, cj, of (d(ci, cj) d (ci, cj))/d(ci, cj ). A value − of 0 implies that the instance obeys the triangle inequality. Table 8.1 reports the values for these metrics on our randomly generated classes. For the random instances, average values are given for n = 100, 316, and 1,000. In Table 8.1 the classes are ordered by increasing value of the HK-AP gap for the 1,000-city entry. For the random instance classes, there seems to be little correlation between the three metrics (1),(2) and (3), although for some there is a dependency on the number n of vertices.

8.3 Comparison of TSP heuristics

Since TSP heuristics are normally used when n 1000, to analyze the relative value of ≥ heuristics, we will mostly consider computational results for n = 3162, 1000 (see Tables 6 and 7), but the rest of results is also of interest to predict the behaviour of the heuristics for n> 3162. We first analyze the relative performance of the heuristics for the family instances separately and then make more general conclusions. For tmat, COP appears to be the best heuristic w.r.t. both time and quality. Patch 8.3. COMPARISON OF TSP HEURISTICS 69

% HK-AP Asymmetry Triangle 100 316 1,000 100 316 1,000 100 316 1,000 tmat .34 .16 .03 .232 .189 .165 – – – amat .64 .29 .05 .333 .332 .333 .633 .752 .837 shop .50 .22 .15 .508 .498 .515 – – – disk 2.28 .71 .34 .044 .045 .046 .250 .313 .354 super 1.04 1.02 1.17 .076 .075 .075 – – – crane 7.19 6.34 5.21 .061 .035 .020 .101 .087 .066 coin 15.04 13.60 13.96 .010 .007 .003 – – – stilt 18.41 14.98 14.65 .329 .333 .336 – – – rtilt 20.42 17.75 17.17 .496 .500 .503 – – –

Table 8.1: For the 100-, 316-, and 1000-city instances of each class, the average percentage shortfall of the AP bound from the HK bound and the average asymmetry and triangle inequality metrics as defined in the text. “–” stands for .000.

appears to be the second best choice. For amat, all heuristic are of similar running time, but the COP and Patch solutions are of higher quality, which COP being the best. For shop, Greedy, NN and 3opt are fast heuristics, while Patch and COP are rather slow. COP is particularly slow, and moreover it is too slow for a heuristic algorithm. But, if the running time is not an issue, then Patch and COP are the heuristics of choice as they produce tours close to optimal. If the time is an issue, then 3opt should be used. Similar comments are for disk. For super, COP is again the heuristic of choice if the time is not very limited. If the time is limited, 3opt provides the best choice. We leave it to the reader to comment on crane. We observe that 3opt is of higher quality than Patch or COP for coin, stilt and rtilt and is the heuristic of choice for the tree families of TSP instances. Overall, we see that COP and 3opt are the best candidates and should be used together with 3opt running first. If the quality of its solution is sufficient, we stop, and otherwise run COP. The problem with COP is that its running time grows very quickly and the heuristic becomes too slow for large values of n. 8.3. COMPARISON OF TSP HEURISTICS 70 Greedy

Percent above HK Time in Seconds Class 100 316 1000 3162 100 316 1000 3162 tmat 31.23 29.04 26.53 26.25 .03 .26 1.7 20 amat 243.09 362.86 418.56 695.29 .04 .27 1.9 21 shop 49.34 56.07 61.55 66.29 .03 .26 2.1 40 disk 188.82 307.14 625.76 1171.62 .03 .28 2.7 23 super 6.03 5.40 5.16 5.79 .03 .22 1.5 18 crane 41.86 44.09 39.70 41.60 .03 .27 1.9 21 coin 48.73 46.76 42.33 35.94 .04 .24 1.7 20 stilt 106.25 143.89 178.34 215.84 .04 .28 1.9 23 rtilt 350.12 705.56 1290.63 2350.38 .03 .28 2.0 23

NN

Percent above HK Time in Seconds Class 100 316 1000 3162 100 316 1000 3162 tmat 38.20 37.10 37.55 36.66 .03 .24 1.7 20 amat 195.23 253.97 318.79 384.90 .03 .26 1.9 21 shop 16.97 14.65 13.29 11.87 .03 .23 2.5 20 disk 96.24 102.54 115.51 161.99 .04 .27 1.9 23 super 8.57 8.98 9.75 10.62 .03 .21 1.5 18 crane 40.72 41.66 43.88 43.18 .03 .26 1.9 21 coin 26.08 26.71 26.80 25.60 .03 .23 1.7 20 stilt 30.31 30.56 27.62 24.79 .03 .30 1.9 22 rtilt 28.47 28.28 27.52 24.60 .04 .26 1.9 22

3opt

Percent above HK Time in Seconds Class 100 316 1000 3162 100 316 1000 3162 tmat 6.44 9.59 12.66 16.20 .19 1.71 5.5 20 amat 39.23 58.57 83.77 112.08 .19 1.75 5.8 21 shop 3.02 7.25 10.22 10.88 .23 1.78 5.6 21 disk 12.11 16.96 20.85 25.64 .19 1.82 6.1 23 super 3.12 4.30 5.90 7.94 .15 1.43 4.8 18 crane 9.48 9.41 10.65 10.64 .19 1.76 7.3 22 coin 8.06 9.39 9.86 9.92 .18 1.62 5.3 20 stilt 11.39 12.65 12.62 12.27 .19 1.80 8.2 22 rtilt 10.04 13.09 18.00 19.83 .19 2.05 6.6 23

Table 8.2: Tour quality and running times for Greedy, NN, and 3-Opt. 8.3. COMPARISON OF TSP HEURISTICS 71

Patch

Percent above HK Time in Seconds Class 100 316 1000 3162 100 316 1000 3162 tmat .84 .64 .17 .00 .03 .22 1.8 29 amat 10.95 6.50 2.66 1.88 .03 .22 1.9 18 shop 1.15 .59 .39 .24 .04 .48 8.4 260 disk 9.40 2.35 .88 .30 .03 .26 2.9 75 super 1.86 2.84 3.99 6.22 .02 .19 1.7 29 crane 9.40 10.18 9.45 8.24 .03 .21 1.5 23 coin 16.48 16.97 17.45 18.20 .02 .18 1.4 17 stilt 23.33 22.79 23.18 24.41 .03 .24 2.2 29 rtilt 17.03 18.91 18.38 19.39 .03 .28 2.9 54

COP

Percent above HK Time in Seconds Class 100 316 1000 3162 100 316 1000 3162 tmat .57 .36 .16 .00 .01 .12 .7 15 amat 9.31 3.15 2.66 1.01 .01 .15 .6 26 shop .68 .36 .19 .10 .08 1.41 29.1 1152 disk 6.00 1.13 .51 .15 .03 .31 8.7 297 super 1.01 1.20 1.22 2.06 .03 .24 4.6 243 crane 10.32 9.08 7.28 6.21 .04 .44 3.5 53 coin 16.44 17.68 16.23 16.06 .02 .10 1.2 22 stilt 22.48 23.31 22.80 22.90 .07 .94 8.1 105 rtilt 19.62 22.86 20.95 20.37 .05 .33 5.6 117

Table 8.3: Tour quality and running times for the patching algorithm and COP. Chapter 9

Theoretical Analysis of Heuristics

Previously we have considered experimental performance of TSP heuristics. While exper- imental analysis is of a certain importance, it cannot cover all possible families of TSP instances and, in particular, it normally does not cover the hardest ones. Experimental analysis provides little theoretical explanation why certain heuristics are successful while some others are not. This limits our ability to improve on the quality and efficiency of existing algorithms. It also limits our ability to extend approaches successful for the TSP to other combinatorial optimization (CO) problems. This part of the course is devoted to theoretical approaches that allow one to analyze properties of optimal solutions of heuristics.

9.1 Property of 2-Opt optimal tours

An instance of the Euclidean TSP is given by a collection of points in the plane (vertices); the distance between any pair of points u = (xu,yu), v = (xv,yv) is the Euclidean distance 2 2 between the points, i.e. dist(u, v)= (xu xv) + (yu yv) . Clearly, the Euclidean TSP p − − is symmetric, i.e. dist(u, v) = dist(v, u). Recall that the 2-Opt neighbourhood of a tour T consists of all tours that can be obtained from T by deleting two edges of T and than adding two edges.

Theorem 9.1.1. For the Euclidean TSP, a tour T which is the best in its 2-Opt neigh- bourhood, does not have self-intersections.

Proof: Let T be a tour in an instance of the Euclidean TSP that has self-intersection; let y be such an intersection, which is of course not a vertex (no vertex can be visited by travelling salesman twice!). See Figure 9.1. To prove that the intersection in the figure is impossible, it suffices to see that the tour T ′ that is obtained from T by deleting edges

72 9.2. APPROXIMATION ANALYSIS 73

u v y

w x

Figure 9.1: A tour with self-intersection xv and uw and adding edges uv and xw is shorter than T, i.e. dist(x, v) + dist(u, w) > dist(u, v) + dist(x, w). (Indeed, T ′ is in the 2-Opt neighbourhood of T and thus has to be longer than T by the condition of the theorem.) By the triangle inequality, we have:

dist(u, y) + dist(y, v) > dist(u, v) and dist(x,y) + dist(y, w) > dist(x, w).

Add these two inequalities:

(dist(u, y) + dist(y, w)) + (dist(x,y) + dist(y, v)) > dist(u, v) + dist(x, w), or dist(x, v) + dist(u, w) > dist(u, v) + dist(x, w). QED

9.2 Approximation Analysis

In Approximation Analysis, we study the approximation ratios of heuristics. The approx- imation ratio of a heuristic H for the (TSP) is an upper bound on the ratio c/c∗, where c is the cost of a tour found by H and c∗ is the optimal cost of a tour.

9.2.1 Travelling Salesman Problem

In what follows, we assume that all costs are non-negative. As an example, we consider the double tree heuristic (DTH) for the Symmetric TSP with triangle inequality, i.e. with the inequality cost(x,y) + cost(y, z) cost(x, z) for any ≥ triple x,y,z of vertices. To introduce DTH we recall some notions of graph theory. In an undirected multigraph G a trail is a sequence of edges such that every two consecutive edges have a common 9.2. APPROXIMATION ANALYSIS 74 b a

c

d f

e g h

Figure 9.2: An example for DTH vertex. If every edge belongs to a trail, without repetition of any edge, then the trail is called Euler. Not every multigraph has an Euler trail. Multigraphs that have Euler trails are called Euler.

Theorem 9.2.1 (Euler Theorem). A multigraph G has an Euler trail if and only if G is connected and every vertex has even degree.

In DTH, we 1. Find a minimum cost spanning tree S∗ in the complete graph of the STSP

2. Double its edges obtaining an Euler multigraph GE (see Euler Theorem)

3. Find an Euler trail F in GE 4. Going along F delete any repetition of vertices in F , obtaining a tour T DTH is illustrated in Figure 9.2. Suppose that a minimum cost spanning tree is given in the left hand side of the figure. After doubling its edges, we get the graph in the right hand side of the figure. We find an Euler trail (as a sequence of vertices for simplicity):

acbcdfdedghgdca.

After deleting vertex repetitions, we get a tour T = acbdfegha.

Theorem 9.2.2. For the Symmetric TSP with triangle inequality, a tour found by DTH is at most twice as long as an optimal tour.

Proof: Consider an instance of the Symmetric TSP with triangle inequality. Let T ∗ be an optimal tour of the instance. Observe that after deleting an edge from T ∗ 9.2. APPROXIMATION ANALYSIS 75 we get a spanning tree S of the complete graph. Clearly, cost(S) cost(T ∗). By the ∗ ∗ ≤ definition of S , cost(S ) cost(S). Observe that by the definitions of GE and F , we ≤ ∗ have cost(F ) = cost(GE) = 2cost(S ). By the triangle inequality, any short cuts (i.e. deleting repetitive vertices in F ) cannot increase the cost of the derived tour T . Hence, cost(T ) cost(F ). Thus, ≤ cost(T ) cost(F ) = 2cost(S∗) 2cost(S) 2cost(T ∗). ≤ ≤ ≤ QED

9.2.2 Knapsack Problem

Recall the knapsack problem. Assume that we have a number of items, and we must choose some subset of the items to fill our ”knapsack”, which has limited space b. Each item, i, has a value vi and takes up bi units of space in the knapsack. We wish to choose a collection of the items with total space less than b and with maximum total value.

In what follows we assume that bi b since any item j with bj > b cannot be put ≤ in the knapsack. Also assume that b + b + + bn > b since otherwise the problem is 1 2 · · · trivial.

For the branch-and-bound algorithm above, we used the following simple heuristic H1: order items in the non-increasing value of the ratio ri = vi/bi and place in the knapsack as many items as possible putting them in the obtained order. The heuristic H1 seems very good, but there are examples that show that this is not true.

Consider the following example, let b = 400 and there are four items with v1 = v2 = v3 = 1, b1 = b2 = b3 = 1, v4 = 399, b4 = 400. The heuristic will compute the ratios 399 r1 = r2 = r3 = 1 and r4 = 400 and put the first three items in the knapsack. The value obtained is 3. However, if we put only the forth item in the knapsack, we get value equal 399. This is the optimal solution. Thus, the solution obtained by the heuristic is 133 times smaller than that of the optimal solution. This example show that the heuristic is not that good after all. Perhaps, the reason for that is that we do not consider the largest value item for inclusion in our solution. Consider another heuristic H2 that puts the most valuable item in the knapsack first and then applies H1 to the remaining items. This heuristic would find the optimal solution in the example above, but it’ll fail badly on other examples (for example, take v4 = 2).

Let us combine H1 and H2, i.e., run both of them on input and choose the best among the obtained solutions. We denote this heuristic by H. We show that there is a guarantee of the quality of solutions obtained by H.

Theorem 9.2.3. The value of the solution obtained by H is always at least half of the value of the optimal solution. 9.2. APPROXIMATION ANALYSIS 76

Proof: Consider an arbitrary instance of the knapsack problem given by n items with values v1, v2,...,vn and sizes b1, b2,...,bn. Let ri = vi/bi as before. We may assume that r r rn. Let vopt be the value of the optimal solution and vH the value of 1 ≥ 2 ≥ ··· ≥ the solution obtained by H. Clearly, vH = max vH1 , vH2 , where vH is the value of the { } k solution obtained by Hk. For some j, the heuristic H will put, in the knapsack, items 1, 2,...,j 1 one by 1 − one. Suppose that item j will not fit into knapsack with the fist j 1 items already there. − Thus,

vˆj := v + v + + vj− = vH1 vH 1 2 · · · 1 ≤ and ˆbj = b + b + + bj− b. 1 2 · · · 1 ≤ Since r r rn, if we were allowed to place, in the knapsack, part of item j, we 1 ≥ 2 ≥···≥ would get the (LP) optimal solutionv ˆj + (b ˆbj)rj. This is not worse than vopt. Thus, −

vopt vˆj + (b ˆbj)vj/bj < vˆj + vj. ≤ −

The last inequality follows from the fact that (b ˆbj)/bj < 1. − To complete the proof we consider two possible cases. If vj vˆj then ≤

vopt < vˆj + vj 2ˆvj 2vH1 2vH . ≤ ≤ ≤

If vj > vˆj then vmax > vˆj, where vmax is the maximum value of an item. Thus,

vopt < vˆj + vj vˆj + vmax < 2vmax 2vH . ≤ ≤ In both cases, the theorem follows. QED

Section 6.3 has an instance of the knapsack problem for which H gives the optimal solution, while the use of H1 requires lengthy computations by branch-and-bound to get the optimal solution.

9.2.3 Bin Packing Problem

Recall that in the Bin Packing Problem (BPP), we are given a positive integer number b and a sequence of N items of sizes L = (s ,s ,...,sN ) such that 0 < si b. Our aim is 1 2 ≤ to pack the items into minimum number of bins, each of capacity b. For example, if b = 1 1 1 1 1 1 and L = ( 2 , 2 , 3 , 3 , 3 ), then the minimum number of bins is 2 as we can pack the first two items in Bin 1 and the remaining items in Bin 2. One of the simplest heuristics for BPP is the Next Fit heuristic (NF). In NF we start from Bin 1. We pack items into Bin 1 one by one as long as its capacity b is not exceeded. 9.2. APPROXIMATION ANALYSIS 77

Once the capacity is exceeded, we put the current item (not fitting into Bin 1) into Bin 2 and pack items into Bin 2 as long as its capacity b is not exceeded, etc. Example. Let b = 1 and 1 1 1 1 1 1 1 1 1 1 1 1 L = ( , , , , , , , , , , , ). 2 6 2 6 2 6 2 6 2 6 2 6 Since a pair 1/2, 1/2 completely fill in a bin and six 1/6 also fill in a bin, the optimal number of bins is 4 (no wasted space at all). Check that NF fills in 6 bins! In what follows, assume b = 1.

Theorem 9.2.4. Let p be the minimum number of bins required in BPP; then NF always packs items in at most 2p bins.

′ Proof: Let p be the number of bins used by NF and let b1, b2,...,bp′ be the packed ′ part of Bins 1,2,...,p , respectively. By the definition of NF, bi + bi+1 > 1 for each i = 1, 2,...,p′ 1 (otherwise Bin i can be used instead of Bins i and i +1). Summing up − ′ and adding b and bp′ , we get 2(b + b + + bp′ ) >p 1. However, p b + b + + bp′ 1 1 2 · · · − ≥ 1 2 · · · and, thus, ′ 2p 2(b + b + + bp′ ) >p 1. ≥ 1 2 · · · − Therefore, 2p>p′ 1 and, since 2p is an integer, 2p p′. QED. − ≥ There are examples that generalize the example above that show that the theorem is asymptotically sharp (i.e., cannot be improved). Consider a little bit more sophisticated heuristic First Fit (FF). In FF we start from Bin 1. We pack items into Bin 1 one by one as long as its capacity b is not exceeded. Once the capacity is exceeded, we put the current item (not fitting into Bin 1) into Bin 2. The next item is placed in Bin 1 if it can be put there. If not, it is placed in Bin 2 if it can be put there. In the case neither of Bin 1 and Bin 2 can accommodate the item, the item is put in Bin 3, etc. The following theorem (without proof) holds.

Theorem 9.2.5. Let p be the minimum number of bins required in BPP; then FF always packs items in at most 1.7p + 2 bins.

As we can see, FF appears to be better than NF in the worst case. The First Fit Decreasing heuristic (FFD) puts the items in non-increasing order of their sizes.

Theorem 9.2.6. Let p be the minimum number of bins required in BPP; then FF always packs items in at most 1.5p bins. 9.3. DOMINATION ANALYSIS 78

Proof: (scheme) Let s s . . . sN . Partition the items into four sets 1 ≤ 2 ≤ ≤ A = i : si > 2/3 , B = i : 1/2

9.2.4 Online Problems and Algorithms

Often in practice, items are become available one by one and once they are available they must be packed irreversibly. In such cases we have the online BPP. Heuristics NF and FF can be used for the online BPP; FFD cannot. In fact, it is proved that no algorithm for the online BPP can satisfy Theorem 9.2.6. There are many other online problems, for example the online AP and TSP. Algorithms for online problems are called online algorithms.

9.3 Domination Analysis

Domination analysis provides an alternative to approximation analysis. In domination analysis, we are interested in the number of solutions that are worse or equal in quality to the heuristic one, which is called the domination number of the heuristic solution. In many cases, domination analysis is very useful. In particular, some heuristics have domination number 1 for the TSP. In other words, those heuristics, in the worst case, produce the unique worst possible solution. At the same time, the approximation ratio is not bounded by any constant. In this case, the domination number provides a far better insight into the performance of the heuristics. The domination number domn(H,n) of a heuristic H for the TSP is the maximal number of tours that are more expensive or equal in cost to the tour produced by H, for every instance of the TSP on n vertices. For example, domn(H,n) (n 2)! means that ≥ − H always produces a tour that is better or of the same cost as at least (n 2)! tours for − every instance on n vertices. Theorem 9.3.1. For the STSP with triangle inequality, DTH has domination number 1.

Proof: Consider an instance of the DTH with vertices v1, v2, ..., vn−1, vn. Let the cost of the edges v3v4, ..., vn−1vn, vnv2 be 2 and the cost of the rest of the edges be 1. Notice 9.3. DOMINATION ANALYSIS 79 that the tour v1v3v4...vn−1vnv2v1 is the unique most expensive tour as it is the only tour that includes all edges of cost 2.

The tree S with edges v1vi, i = 2, 3, ..., n is a minimum cost spanning tree. Suppose that DTH chooses S (DTH may choose any minimum cost spanning tree). After doubling edges of S, we get an Euler graph G;

F = v2v1v3v1v4v1v5v1...v1vnv1v2 is an Euler trail of G and suppose that DTH constructs F . After deleting repeated vertices, we get the tour v1v3v4...vn−1vnv2v1, which is unique most expensive. QED

Recall that in the Assignment Problem, we are given a complete bipartite graph B with n vertices in each partite set and a non-negative cost assigned to each edge of B. We are required to find a perfect matching (i.e. matching with n edges) in B of minimum cost.

Theorem 9.3.2. For the Assignment Problem, the greedy algorithm has domination num- ber 1.

Proof: Let B be a complete bipartite graph with n vertices in each partite set and let u1, u2, ..., un and v1, v2, ..., vn be the two partite sets of B. Let M be any number greater than n. We assign cost Mi to the edge uivi for i = 1, 2, ..., n and cost M min i, j + 1 × { } to every edges uivj, i = j; see Figure 9.3 for an illustration in the case n = 3. 6 We classify edges of B as follows: uivj is horizontal (forward, backward) if i = j (i < j, i > j). See Figure 9.3.

The greedy algorithm will choose edges u1v1, u2v2, ..., unvn (and in that order). We call this perfect matching P and we will prove that P is the unique most expensive perfect matching in B. The cost of P is cost(P )= M + 2M + ... + nM. Choose an arbitrary perfect matching P ′ in B distinct from P. Assume that P ′ has edges u1vp1 , u2vp2 , ..., unvpn . By the definition of the costs in B, cost(uivpi ) Mi + 1. ′ ≤ Since P is distinct from P , it must have edges that are not horizontal. This means it has 9.4. QUESTIONS 80 M

M+1 backward edge

2M 2M+1 horizontal edge

forward edge

3M

Figure 9.3: Assignment of costs for n = 3; classification of edges

backward edges. If ukvp is a backward edge, i.e. pk < k, then cost(ukvp ) M(k 1)+1 = k k ≤ − (Mk + 1) M. Hence, − cost(P ′) (M + 2M + ... + nM)+ n M = cost(P )+ n M. ≤ − − Thus, cost(P ′) < cost(P ). QED

Notice that there are algorithms for the Assignment Problem with much larger domi- nation number. The Assignment Problem can be solved to optimality by the O(n3)-time Hungarian algorithm. The Hungarian algorithm is of domination number equal to the number of all perfect matchings in B, hence equal to n! It is possible to prove that the greedy algorithm and NN for the TSP are of domination number 1. However, the vertex insertion heuristic is proved to be of domination number at least (n 2)! for the Asymmetric TSP. Clearly, the vertex insertion heuristic appears − to be more robust than the greedy algorithm and NN.

9.4 Questions

Question 9.4.1. Find an optimal tour for the STSP with cost matrix 02 4 3 0 2 6 3  2 0 2.5 5   2 0 1 4  (a) C = (b) C =  4 2.5 0 3.5   6 1 0 7       3 5 3.5 0   3 4 7 0  9.4. QUESTIONS 81 3n 2n 4n

n 5n n3

Figure 9.4: NN is of domination number 1 for the ATSP

Find a tour by DTH. Compare the tours. Can we apply Theorem 9.2.2 to the instances above? Why?

Question 9.4.2. Prove that for the Euclidean TSP, if a tour T is the best in its 2-Opt neighbourhood, then T does not have self-intersections.

Question 9.4.3. Describe the Double Tree Heuristic (DTH) for the Symmetric TSP. Illustrate DTH using an instance of the STSP with 5 vertices (given by the cost matrix).

Question 9.4.4. Prove that, for the STSP with triangle inequality, the Double Tree Heuristic always produces a tour of cost at most twice the cost of the optimal tour.

Question 9.4.5. Define the domination number of a heuristic. What does it mean that a heuristic has domination number 1 for the problem in hand ?

Question 9.4.6. Prove that for the Assignment Problem, the greedy algorithm has dom- ination number 1.

Question 9.4.7. (For interested students) Prove that NN is of domination number 1 for the Asymmetric TSP. Hint: Consider an instance of the ATSP with vertices 1,2,...,n and arc costs given as follows: cost(i, i +1) = in for 1 i n 1, cost(n, 1) = n3 and ≤ ≤ − cost(i, j) = n min i, j + 1 for any arc (i, j) whose cost is not defined earlier. Prove × { } that NN will choose the tour (1, 2, 3, ..., n, 1) and this tour is unique most expensive. See Figure 9.4.

Question 9.4.8. Let b = 1 and 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 L = ( , , , , , , , , , , , , , , , ). 2 8 2 8 2 8 2 8 2 8 2 8 2 8 2 8 What is the minimum number of bins required and what is the number of items found by Next Fit? Justify your answers. Chapter 10

Advanced Local Search and Meta-heuristics

10.1 Advanced Local Search Techniques

Most of improvement local search algorithms have the advantage of being generally ap- plicable and flexible, i.e., they require only a specification of solutions, a cost function, a neighbourhood function, and an efficient method of exploring a neighbourhood, all of which can be readily obtained for most optimisation problems. Nevertheless, poor local optima are found in several cases. To remedy this drawback - while maintaining the paradigm of neighbourhood search - many researchers have been investigating the possibilities of broadening the scope of local search to obtain neighbourhood search algorithms that can find high-quality solutions in possibly low running times. A straightforward extension of local search would be to run a simple local search algorithm a number of times using different start solutions and to keep the best solution found as the final solution. Several researchers have investigated this multistart approach, but no major successes have been reported. Instead of restarting from an arbitrary solution, one could consider an approach that applies multiple runs of a local search algorithm by combining several neighbourhoods, i.e., by restarting the search in one neighbourhood from a solution found in another one. Such approaches are called multilevel. An example is iterated local search, in which the start solutions of subsequent local searches are obtained by modifying the local optima of a previous run. Examples for the Symmetric TSP are iterated Lin-Kernighan algorithms investigated by some researchers, which are currently the best known heuristics for the Symmetric TSP. In addition there are several approaches that take advantage of search strategies in which cost-deteriorating neighbours are accepted. We briefly describe these strategies in

82 10.2. META-HEURISTICS 83 the following section.

10.2 Meta-heuristics

All of the approaches below are together called meta-heuristics as they are very general and can be applied to a vast number of different optimisation problems. The approaches model certain kinds of optimisation in Nature.

10.2.1

Simulated annealing (SA) is based on analogy with the physical process of annealing, in which a pure lattice structure of a solid is made by heating up the solid in a heat bath until it melts, then cooling it down slowly until it solidifies into a low-energy state. From the point of view of combinatorial optimisation, simulated annealing is a randomized neighbourhood search algorithm. In addition to better cost neighbours, which are always accepted if they are selected, worse-case neighbours are also accepted, although with a probability that is gradually decreased in the course of the algorithm’s execution. Lowering the acceptance probability is controlled by a set of parameters whose values are determined by a cooling schedule. SA has been widely used with considerable success. The randomized nature enables asymptotic convergence to optimal solutions under certain mild conditions. Unfortunately, the convergence typically requires exponential time, making simulated annealing imprac- tical as a means of obtaining optimal solutions. Instead, like most local search algorithms, it is normally used as an , for which much faster convergence rates are acceptable.

10.2.2 Genetic Algorithms

Genetic algorithms use concepts from population genetics and evolution theory to con- struct algorithms that try to optimise the fitness of a population of elements through recombination and mutation of their genes. There are many variations known in the lit- erature of algorithms that follow these concepts. As an example, we discuss genetic local search, also called the Memetic Algorithm. The general idea of genetic local search is explained below.

Step 1, initialize. Construct an initial population of n solutions. • Step 2, improve. Use local search to replace the n solutions in the population by n • local optima. 10.2. META-HEURISTICS 84

String Fitness Selection probability i xi fi = f(xi) fi/ fi P 1 11000 48 0.381 2 00101 10 0.079 3 10110 44 0.349 4 01100 24 0.191 Sum 126 1.000 Average 31.5 0.250 Max 48 0.381

Table 10.1: Four strings and their fitness values.

Step 3, recombine. Select pairs of individuals from the population to participate in • recombination. Augment the population by adding the m offspring solutions; the population size now equals n + m.

Step 4, improve. Use local search to replace the m offspring solutions by m local • optima.

Step 5, select. Reduce the population to its original size by selecting n solutions • from the current population.

Step 6, evolute. Repeat Steps 3-5 until a stop criterion is satisfied. •

Eventually, recombination is an important step, since here one must try to take advan- tage of the fact that more than one local optimum is available, i.e., one must exploit the structure present in the available local optima. The above scheme has been applied to several problems, and good results have been reported. The general class of genetic algorithms contains many other approaches that are significantly different from the scheme above. To illustrate the main ideas, we will consider an example (due to Stephen D. Scott) of a genetic local search applied to a simple optimization problem. To make the example even simpler, we will skip Steps 2 and 4 of the general description, which involve the local search part of the heuristics. Consider strings with five bits 0 or 1, x = x5x4x3x2x1, where xj = 0 or 1, for j = 1, 2, 3, 4, 5, and an objective function f(x) = 2x1 + 4x2 + 8x3 + 16x4 + 32x5 (the j’s stand for superscripts, not powers). For example x = 10101 is such a string having objective function value f(x) = 2 + 8 + 32 = 42. The goal is to maximize the objective function f(x) over all such strings. Now imagine a population of the four strings in Table 10.1, generated at random. The fitness values come from the objective function f(x). 10.2. META-HEURISTICS 85

After Parent Crossover After Fitness reprod. Parents strings point crossover fi = f(xi) x x 11 000 2 11110 60 5 1 | x x 10 110 2 10000 32 6 3 | x x 1 1000 1 11100 56 7 1 | x x 0 1100 1 01000 16 8 4 | Sum 164 Average 41 Max 60

Table 10.2: The next generation after selection and crossover.

The values in the “fi/ fi” column provide the probability of each string’s selection. So initially 11000 has a 38P.1% chance of selection, 00101 has an 7.9% chance, and so on. Based on these probabilities, we randomly select two pairs of strings to be recombined; suppose that the random selection produces the pairs (x1,x3) and (x1,x4). After selecting the pairs, the genetic algorithm looks at each pair individually. For each pair (e.g. x1 = A = 11000 and x3 = B = 10110), the algorithm decides whether or not to perform crossover recombination. If it does not, then both strings in the pair are placed into the population with possible mutations (described below). If it does, then a random crossover point is selected and crossover proceeds as follows: A = 11 000 B = 10 110 are ′ ′ | | crossed and become A = 11110 B = 10000. Then the children A′ and B′ are placed in the population with possible mutations. The genetic algorithm invokes the mutation operator on the new strings very rarely (usually on the order of < 0.01 probability), generating a random number for each bit and flipping that particular bit only if the random number is less than or equal to the mutation probability. After the current generation’s selections, crossovers, and mutations are complete, the new strings representing the next generation is shown in Table 10.2. In this example, average fitness increased from one generation to the next by approximately 30% and maximum fitness increased by 25%.

Reducing the population to its original size 4, we select the best solutions x1,x3,x5,x7 from the current population, see Table 10.3. This finishes one iteration of the algorithm. The simple process would continue for several generations until a stopping criterion is met.

Question 10.2.1. Give a brief description of genetic local search.

10.2.3 Tabu Search

Tabu search ”models” intelligent human argumentation while searching for a good solution. 10.2. META-HEURISTICS 86

String Fitness Selection probability i xi fi = f(xi) fi/ fi P 1 11000 48 0.231 3 10110 44 0.212 5 11110 60 0.288 7 11100 56 0.269 Sum 208 1.000 Average 52.0 0.250 Max 60 0.288

Table 10.3: The population after one iteration.

Tabu search combines the deterministic iterative improvement algorithm with a possi- bility to accept cost-increasing solutions. In this way the search is directed away from local minima1, such that other parts of the search space can be explored. The next solution visited is always chosen to be a legal neighbour of the current solution with the best cost, even if that cost is worse than that of the current solution. The set of legal neighbours is restricted by a tabu list designed to prevent us from going back to recently visited solution. The tabu list is dynamically updated during the execution of the algorithm. The tabu search list defines solutions that are not acceptable in the next few iterations. However, a solution on the tabu list may be accepted if its quality is in some sense good enough, in which case it is said to attain a certain aspiration level. Tabu search has also been applied to large variety of problems with considerable suc- cesses. Tabu search is a general search scheme that must be tailored to the details of the problem at hand. Unfortunately, there is little theoretical knowledge that guides this tailoring process, and users have to resort to the available practical experience. We will now consider a basic algorithm of tabu search, where only so-called short term memory is used and no aspiration criteria are of use. We will describe the basic algorithm applied to the so-called k-tree problem. In this problem, we are given a positive integer k and a graph G with costs on it edges. We are required to compute a minimum cost subgraph of G, which is a tree with exactly k edges. Consider the graph G depicted in Figure 10.1. We will try to solve the 3-tree problem for G. We start from an initial feasible solution T consisting of the edges 1, 2 , 1, 5 and 0 { } { } 1, 6 . The cost of T is 22, see Figure 10.2 (a). No edge of T is tabu. { } 0 0 Now we will move to another solution by deleting a non-tabu edge in T0 and adding to the remaining graph another non-tabu edge such that the result is a tree. (This way, we

1We are speaking of minimisation problems; the similar approach applies to maximisation problems. 10.2. META-HEURISTICS 87 11 9 10 1 2 3 4

10 5 8 1 3 3 2

5 6 7 8 14 4 5

Figure 10.1: Graph G. have a so-called swap neighbourhood.) We will make such a move that brings us the cheap- est possible solution. It is easy to see that we will obtain T with edges 1, 5 , 1, 6 , 2, 6 1 { } { } { } and of cost 14. See Figure 10.2 (b). We have deleted 1, 2 and make it tabu for 2 itera- { } tions. We cannot add 1, 2 back during the next two iterations. In other words, the tabu { } time for deleted edges is fixed to 2. We have added 2, 6 and make it tabu for 1 iteration. { } We cannot delete 1, 2 back during the next iteration. In other words, the tabu time for { } added edges is fixed to 1.

Observe that no swap of an edge in T1 can produce a solution cheaper than T1. In other words, the improvement local search is the swap neighbourhood of T1 will not be able to find a better solution, i.e., T1 is local minimum and the improvement local search would be stuck in it. Since edge 1, 2 is tabu, our next move gives us T consisting of edges 1, 6 , 2, 6 , 6, 7 { } 2 { } { } { } of cost 17. (The increase of cost is not surprising in the light of the previous paragraph.) In the next iteration, we can delete 1, 6 and add 7, 8 . We get a solution of cost { } { } 12, see Figure 10.2 (d). The results of the last two iterations are depicted in Figure 10.2 (e) and (f). Interestingly, the last solution is optimal. One can modify the basic tabu search algorithm by adding aspiration criteria. One of the most used aspiration criteria is the record-improving one. Here we may add or delete tabu edges as long as we get a solution of cost lower than the best known one. Question 10.2.2. Consider the example for the 3-tree problem above. Replace the basic tabu search by that with the record-improving aspiration criterium. Will any solution be changed? Question 10.2.3. Consider the graph G in Figure 10.1. (a) Apply the nearest neighbour algorithm to compute an initial solution for the 2-tree problem on G. (b) Do five iterations of the basic tabu search algorithm to improve the initial solution. 10.2. META-HEURISTICS 88

111 2 1 2

10 1 110 3

5 6 5 6 (a) (b)

1 2 2

10 3 3

5 64 7 64 7 5 8

(c) (d)

2 3 3 4

3 3 2

6 47 5 8 75 8

(e) (f)

Figure 10.2: Iterations of the basic tabu search algorithm. 10.2. META-HEURISTICS 89 8 12 1 2 3

13 9 1 11 2

10 3 4 5 6

Figure 10.3: Graph H.

The tabu times for deleted and added edges are 2 and 1, resp. (c) Replace the basic tabu search by that with the record-improving aspiration criterium. Will any solution be changed?

Question 10.2.4. Consider the graph H in Figure 10.3. (a) Apply the nearest neighbour algorithm to compute an initial solution for the 2-tree problem on H. (b) Do five iterations of the basic tabu search algorithm to improve the initial solution. The tabu times for deleted and added edges are 2 and 1, resp. (c) Replace the basic tabu search by that with the record-improving aspiration criterium. Will any solution be changed?