Norwegian University of Science and Technology

TPK4161 - Supply chain analytics

Author: Jørn Vatn

September 10, 2020 Contents

1 Introduction 2 1.1 Course content ...... 2 1.2 Learning outcome ...... 2

2 theory 3 2.1 Basic probability notation ...... 3 2.1.1 Event ...... 3 2.1.2 Probability ...... 3 2.1.3 Probability and Kolmogorov’s axioms ...... 4 2.1.4 The law of total probability ...... 5 2.1.5 Bayes theorem ...... 6 2.1.6 Stochastic variables ...... 7 2.2 Common probability distributions ...... 10 2.2.1 The ...... 11 2.2.2 The ...... 12 2.2.3 The ...... 12 2.2.4 The ...... 13 2.2.5 The inverted gamma distribution ...... 13 2.2.6 The lognormal distribution ...... 14 2.2.7 The ...... 14 2.2.8 The ...... 14 2.2.9 The inverse-Gauss distribution ...... 15 2.2.10 The ...... 16 2.2.11 The PERT distribution ...... 16 2.3 Assessment of parameters in parametric distributions ...... 17 2.4 Distribution of sums, products and maximum values ...... 18 2.4.1 Distribution of sums ...... 18 2.4.2 Distribution of a product ...... 19 2.4.3 Distribution of maximum values ...... 19

3 Introduction to modelling 22 3.1 Deterministic and probabilistic models ...... 22 3.2 Problem formulation - Modelling ...... 22

4 Discrete event simulation 25 4.1 Introduction ...... 25 4.2 Components of a Discrete-Event Simulation ...... 26

1 4.3 Simulation Engine Logic ...... 27 4.4 Implementing the pending event set (PES) ...... 28 4.5 Library of functions for generating pseudorandom numbers ...... 31 4.6 A simple failure and repair model ...... 31

5 Linear, dynamic, non-linear and stochastic programming 34 5.1 Introduction to programming problems ...... 34 5.2 Linear programming ...... 34 5.2.1 Motivating example ...... 35 5.2.2 Linear programming problem on standard form ...... 35 5.2.3 Solving the linear programming problem by the SIMPLEX method ...... 36 5.2.4 Demonstration of the SIMPLEX method ...... 37 5.2.5 Summing up the SIMPLEX method ...... 39 5.2.6 Unique optimal, multiple optimal and unbounded solutions ...... 40 5.2.7 Shadow prices ...... 40 5.2.8 Solving the LP problem by a computer ...... 41 5.2.9 Mixed integer programming ...... 45 5.3 Dynamic programming ...... 49 5.3.1 Worked example ...... 51 5.4 Nonlinear programming ...... 56 5.5 Stochastic programming ...... 58 5.5.1 Introduction ...... 58 5.5.2 Discretization ...... 59 5.5.3 The Value of the Stochastic Solution ...... 61 5.5.4 of perfect information ...... 62 5.5.5 Scenario building ...... 63 5.5.6 How to perform discretization? ...... 64

6 Flow and network modelling 66 6.1 Transportation problems ...... 66 6.1.1 Worked example in MS Excel ...... 67 6.2 Job assignment problems ...... 68 6.2.1 Maximal-flow problems ...... 69 6.2.2 A worked maximal-flow problem in MS Excel ...... 70 6.3 Project management ...... 71 6.4 Critical Path Method (CPM) ...... 73 6.5 Linear programming (LP) ...... 74 6.5.1 Slack ...... 75 6.6 Program Evaluation and Review Technique (PERT) ...... 76 6.7 Successive schedule planning (SSP) ...... 77 6.8 Monte Carlo simulation (MCS) ...... 80 6.9 Penalty for default ...... 82

7 Markov processes and queueing theory 85 7.1 Markov processes ...... 85 7.1.1 Markov state equations ...... 87 7.1.2 Time dependent solution for the Markov process ...... 87 7.1.3 Steady state solution for the Markov process ...... 88

2 7.1.4 Mean time to first passage to a given state ...... 89 7.2 Birth-death processes ...... 90 7.3 Queue theory models ...... 92 7.3.1 The M/M/1 queue ...... 93 7.3.2 The M/M/1/N queue ...... 97 7.3.3 The M/M/C/N queue ...... 98 7.3.4 The M/Ek/1/N queue ...... 99 7.3.5 Final remarks ...... 101

8 Inventory models 102 8.1 Introduction ...... 102 8.2 The classical economic order quantity ...... 102 8.3 Probabilistic models ...... 103 8.3.1 The newsboy problem ...... 103 8.3.2 A lot size, reorder point policy; (r, Q) ...... 105

9 Reliability and maintenance 110 9.1 Definitions ...... 110 9.2 Reliability terminology ...... 110 9.2.1 System structure analysis ...... 113 9.2.2 Systems of independent components ...... 115 9.3 Maintenance model - Single component considerations ...... 117 9.3.1 Preventive maintenance ...... 117 9.3.2 Single activity - Calendar based maintenance ...... 118 9.3.3 Synchronization of maintenance and production ...... 119 9.3.4 Predictive maintenance ...... 120 9.3.5 Predictive maintenance and cyber physical systems ...... 122

10 Decision under uncertainties 127 10.1 Introduction ...... 127 10.1.1 Overview of the method ...... 127 10.2 Basic concepts ...... 128 10.2.1 Discrete end consequences vs attribute vector ...... 129 10.2.2 Maximising expected utility ...... 133 10.2.3 Examples with one decision node ...... 135 10.2.4 Decision trees ...... 136

11 Parameter estimation 142 11.1 Introduction ...... 142 11.2 The MLE principle ...... 142 11.3 Method of moments – PERT distribution ...... 143 11.4 The LS principle ...... 144 11.5 Bayesian meothds ...... 147

12 Forecasting 149 12.1 Introdcution ...... 149 12.2 Naïve approach ...... 149 12.3 Average approach ...... 150

3 12.4 Moving average method ...... 150 12.5 Simple exponential smoothing method ...... 150 12.6 Holt’s method ...... 151 12.7 Holt-Winters additive method ...... 152 12.8 Holt-Winters multiplicative method ...... 153

Bibliography 153

Index 158

4 Chapter 1

Introduction

This course compendium is covering the main aspects of the topics in the course TPK4161 - Supply Chain Analytics. TPK4161 was first lectured autumn 2017. Excel files used will be uploaded on Blackboard.

1.1 Course content

Introduction to mathematical modelling as a tool to address challenges in production logistics and supply chains. Problem formulation and choice of modelling. Linear, dynamic and non-linear pro- gramming. Flow and network modelling. Queueing models and Markov chains. Some analytical results and use of discrete event simulation. Monte Carlo simulation Systems dynamics modelling. Forecasting. Statistical process control. Reliability and maintenance of the production line. Syn- chronization of maintenance and production activities. Multivariate regression analysis for analysis of performance data. Statistical techniques for estimation of model parameters. Machine learning and Big Data. Quantitative methods for artificial intelligence. Methods for economical analysis and especially activity based calculation. Models and visualization of cyber physical systems in real time. Decision trees. Expected Utility theory.

1.2 Learning outcome

Knowledge: Basic insight in mathematical formulation of operations and supply chain manage- ment problems. Ability to analyse and understand real problems in order to develop realistic models where text book examples are not sufficient. Understand the strengths and weaknesses of various modelling approaches. Skills: Be able to perform practical quantitative analysis related to operations and supply chain management. The students will be trained in using easy available tools like MS Excel with visual basic programming and dedicated tools for systems dynamic and discrete event simulation. General competence: Understand the relevance of quantitative analysis related to operations and supply chain management and the implication related to competence requirements in the companies.

5 Chapter 2

Probability theory

2.1 Basic probability notation

In this chapter basic elements of probability theory are reviewed. Readers familiar with probability theory can skip this chapter. Readers which are very unfamiliar with this topic are advised to read an introductory textbook in probability theory.

2.1.1 Event In order to define probability, we need to work with events. Let as an example A be the event that there is an operator error in a control room. This is written:

A = {operator error}

An event may occur, or not. We do not know the outcome in advance prior to the experiment or a situation in the “real life”. We also use the word event to denote a set of distinct events. For example the event that we get an even number when throwing a die.

2.1.2 Probability When events are defined, the probability that the event occurs is of interest. Probability isdenoted by Pr(·), i.e.

Pr(A) = Probability that A occur

The numeric value of Pr(A) may be found by:

• Studying the sample space.

• Analysing collected data.

• Look up the value in data hand books.

• “Expert judgement” [11].

The sample space defines all possible events. As an example let A = {It is Sunday}, B = {It is Monday}, .. , G = {It is Saturday}. The sample space is then given by S = {A, B, C, D, E, F, G}. So-called Venn diagrams are useful when we want to analyse a subset of the sample space S.A rectangle represents the entire sample space, and closed curves such as a circle are used to represent

6 A

S

Figure 2.1: Venn diagram subsets of the sample space as illustrated in Figure 2.1. In the following we will illustrate frequently used combinations of events:

Union. We write A ∪ B to denote the union of A and B, i.e., the occurrence of A or B or (A and B). Let A be the event that tossing a die results in a “six”, and B be the event that we get an odd number of eyes. We then have A ∪ B = {1, 3, 5, 6}. S AB

Intersection. We write A ∩ B to denote the intersection of A and B, i.e. the occurrence of both A and B. As an example, let A be the event that a project is not completed in due time, and let B be the event that the budget limits are exceeded. A ∩ B then represent the situation that the project is not completed in due time and the budget limits are exceeded. S AB

Disjoint events. A and B are said to be disjoint if they can not occur simultaneously, i.e. A ∩ B = Ø = the empty set. Let A be the event that tossing a die results in a “six”, and B be the event that we get an odd number of eyes. A and B are disjoint since they cannot occur simultaneously, and we have A ∩ B = Ø. S A B

Complementary events. The complement of an event A is all events in the sample space S except for A. The complement of an event is denoted by AC . Let A be the event that tossing a die A C results in an odd number of eyes. AC is then the event that we get an even number of eyes. S A

2.1.3 Probability and Kolmogorov’s axioms

Probability is a set function Pr() which maps events A1, A2,... in the sample space S to real numbers. The function Pr(·) can only take values in the interval from 0 to 1, i.e. are greater or equal than 0, and less or equal than 1. Kolmogorov established the following axioms

A B

S

0 Pr(A)Pr(B) 1

Figure 2.2: Mapping of events on the interval [0,1] which all probability rules could be derived from: 1. 0 ≤ Pr(A)

7 2. Pr(S) = 1

3. If A1,A2,A3,... is a sequence of disjoint events we shall then have: Pr(A1 ∪ A2 ∪ ...) = Pr(A1) + Pr(A2) + ... The axioms are the basis for establishing calculation rules when dealing with probabilities, but they do not help us in establishing numerical values for the basic probabilities Pr(A1), Pr(A2), etc. Historically two lines of thoughts have been established, the classical (frequentiest) and the Bayesian approach. In the classical thinking we introduce the concept of a random experiment, where Pr(Ai) is the relative frequency with which the event Ai occurs. The probability could then be interpreted as a property of the experiment, or a property of the world. By letting nature reveal itself by doing experiments, we could in principle establish all probabilities that are of interest. Within the Bayesian framework probabilities are interpreted as subjective believe about whether Ai will occur or not. Probabilities is then not a property of the world, but rather a measure of the knowledge and understanding we have about a phenomenon. Before we set up the basic rules for probability theory that we will need, we introduce the concepts of conditional probability and independent events.

Conditional probability. Pr(A|B) denotes the conditional probability that A will occur given that B has occurred.

Independent events. A and B are said to be independent if information about whether B has occurred does not influence the probability that A will occur, i.e., Pr(A|B) = Pr(A).

Basic rules for probability. The following calculation rules for probability apply: Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B) (2.1) Pr(A ∩ B) = Pr(A) · Pr(B) if A and B are independent (2.2) Pr(AC ) = Pr(A does not occur) = 1 − Pr(A) (2.3) Pr(A ∩ B) Pr(A|B) = (2.4) Pr(B) Example 2.1 Let the two events A and B be defined by A = {It is Sunday} and B = {It is between 6 and 8 pm). First we note that A and B are independent but not disjoint. We will find Pr(A∩B), Pr(A∪B) and Pr(A|B) 1 2 1 Pr(A ∩ B) = Pr(A) · Pr(B) = · = 7 24 84 1 2 1 9 Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B) = + − = 7 24 84 42 Pr(A ∩ B) 1/84 1 Pr(A|B) = = = Pr(B) 2/24 7 ♢

2.1.4 The law of total probability In many situations it is easier to assess the probability of an event B conditionally on some other events, say A1,A2, . . ., Ar, than unconditionally. The law of total probability could then be used

8 to assess the unconditional probability. Now, we say that A1,A2, . . ., Ar is a division of the sample space if the union of all Ai’s covers the entire sample space, i.e. A1∪ A2∪ … ∪ Ar = S and the Ai’s are pair wise disjoint, i.e. Ai∩ Aj = Ø for i ≠ j. An example is shown in Figure 2.3.

A2 A4

A1 A3 S

Figure 2.3: Division of the sample space

Let A1,A2, . . ., Ar represent a division of the sample space S, and let B be an arbitrary event in S. The law of total probability now states:

∑r Pr(B) = Pr(Ai) · Pr(B|Ai) (2.5) i=1

Example 2.2 A special component type is ordered from two suppliers A1 and A2. Experience has shown that components from supplier A1 has a defect probability of 1%, whereas components from supplier A2 has a defect probability of 2%. In average 70% of the components are provided by supplier A1. Assume that all components are put on a common stock, and we are not able to trace the supplier for a component in the stock. A component is now fetched from the stock, and we will calculate the defect probability, Pr(B):

∑r Pr(B) = Pr(Ai) · Pr(B|Ai) = Pr(A1) · Pr(B|A1) + Pr(A2) · Pr(B|A2) = i=1 0.7 · 0.01 + 0.3 · 0.02 = 1.3%

2.1.5 Bayes theorem Now consider the example above, and assume that we have got a defect component from the stock (event B). We will derive the probability that the component originates from supplier A1. We then use Bayes formula that states if A1,A2, . . ., Ar represent a division of the sample space, and B is an arbitrary event then: | · | Pr(B Aj) Pr(Aj) Pr(Aj B) = ∑r (2.6) Pr(Ai) · Pr(B|Ai) i=1

9 Example 2.3 We have Pr(B|A ) · Pr(A ) 0.01 · 0.7 Pr(A |B) = 1 1 = = 0.54 1 ∑r 0.013 Pr(Ai) · Pr(B|Ai) i=1

Thus, the probability of A1 is reduced from 0.7 to 0.54 when we know that the component is defect. The reason for this is that components from supplier A1 are the best ones, and hence when we know that the component was defect, it is less likely that it was from supplier A1. ♢

2.1.6 Stochastic variables Stochastic variables are used to describe quantities which can not be predicted exactly. Note that the term ‘random quantity’ is often used to denote a stochastic variable.

X is stochastic ⇔ Cannot say precisely the value X has or will take

To be more precise, a stochastic variable X is a real valued function that assigns a quantitative measure to each event ei in the sample space S. Often the underlying events, ei are of little interest. We are only interested in the stochastic variable X measured by some means. Examples of stochastic variables are given below:

• X = Life time of a component (continuous)

• R = Repair time after a failure (continuous)

• T = Duration of a construction project (continuous)

• C = Total cost of a renewal project (continuous)

• M = Number of delayed trains next month (discrete)

• N = Number of customers arriving today (discrete)

• S = Service time for the first customer arriving today (continuous)

• W = Maintenance and operational cost next year (continuous)

Remark: We distinguish between continuous and discrete stochastic variables. Continuous stochastic variables can take any value among the real numbers, whereas discrete variables can take only a finite (or countable finite) number of values. ♢

Cumulative distribution function (CDF). A stochastic variable X is characterized by it’s cumulative distribution function:

FX (x) = Pr(X ≤ x) (2.7)

We use subscript X to emphasise the relation to the cumulative distribution function of the quantity X. The argument (lowercase x) states which values the stochastic variable X could take, or is of our interest. From the expression we observe that FX (x) states the probability that the random quantity X is less or equal than (the numeric value of) x. A typical distribution function is shown

10 in Figure 2.4. Note that the distribution function is strictly increasing, and 0 ≤ FX (x) ≤ 1. From FX (x) we can obtain the probability that X will be within a specified interval,a [ ,b):

Pr(a < X ≤ b) = FX (b) − FX (a) (2.8)

FX (x) 1

x

Figure 2.4: Cumulative distribution function, FX (x)

Note that the index X representing the stochastic variable X often is dropped if it is obvious which stochastic variable we are working with. Note also the distinction between lowercase and uppercase letters. The uppercase X is used to denote a stochastic variable, for example number of customers arriving next day. The lowercase x is just a representation of possible values X can take. For example X = 3.

Example 2.4 −(0.01x)2 Assume that the function of X is given by FX (x) = 1 − e , and we will find the probability that X is in the interval (100,200]. From Equation (2.8) we have:

≤ − [ Pr(100] <[ X 200) = FX] (200) FX (100) = 2 2 1 − e−(0.01·200) − 1 − e−(0.01·100) = e−1 − e−4 ≈ 0.35

Probability density function (PDF). For a continuous stochastic variable, the probability den- sity function is given by d f (x) = F (x) (2.9) X dx X The probability density function expresses how likely the various x-values are. Note that for

fX (x)

x

Figure 2.5: Probability density function, fX (x)

11 continuous random variables the probability that X will take a specific value vanishes. However, the probability that X will fall into a small interval around a specific value is positive. For each x-value given in Figure 2.5 fX (x) could be interpreted as the probability that X will fall within a small interval around x divided by the length of this interval. Especially we have:

∫x

FX (x) = fX (u)du (2.10) −∞ and

∫b

Pr(a < X ≤ b) = fX (x)dx (2.11) a The last expression is illustrated in Figure 2.6.

fX (x)

x a b Figure 2.6: The shadded area equals Pr(a < X ≤ b)

Random quantities that take discrete values are said to be discretely distributed. For such quantities we introduce the point probability for X in the point xj:

p(xj) = Pr(X = xj) (2.12) where x1, x2,... are possible values X could take.

Expectation. The expectation (mean) of X is given by

 ∞  ∫  x · fX (x) dx if X is continuous −∞∑ E(X) =  (2.13)  xj · p(xj) if X is discrete j The expectation can be interpreted as the long time run average of X, if an infinite amount of observations are available.

Median. The of a distribution is the value m0 of the stochastic variable X such that Pr(X ≤ m0) ≥ 1/2 and Pr(X ≥ m0) ≥ 1/2. In other words, the probability at or below m0 is at least 1/2, and the probability at or above m0 is at least 1/2.

Mode. The of a distribution is the value M of the stochastic variable X such that the probability density function, or point probability at M is higher or equal than for any other value of the stochastic variable. We sometimes used the term ‘most likely value’ rather than mode.

12 . The variance of a random quantity expresses the variation in the value X will take in the long run. We denote the variance of X by:  ∫∞  2  [x − E(X)] · fX (x) dx if X is continuous Var(X) = −∞∑ (2.14)  2  [(xj − E(X)] · p(xj) if X is discrete j

Standard deviation. The standard deviation of X is given by √ SD(X) = + Var(X) (2.15)

The standard deviation defines an interval which observations are likely to fall into, i.e., if100 observations are available, we expect that approximate1 67 of these observations fall in the interval [E(X) − SD(X), E(X) + SD(X)].

Precision. The precision, P , is the reciprocate of the variance, i.e. P = 1 . Var(X)

α-percentiles. The upper α-percentile, xα, in a distribution FX (x) is the value satisfying α = Pr(X > xα) = 1 − FX (xα). We end this section by giving some results regarding expectation and . These results apply when it is easier to express the expectation and variance of one variable if we condition on the value of another variable.

Result 2.1 Double expectation Let X and Y be stochastic variables. We then have:

E(X) = E(E(XY )) (2.16)

Var(X) = E(Var(XY )) + Var(E(XY )) (2.17)

It follows easily that

E(X) = E(XB) Pr(B) + E(XBC ) Pr(BC ) (2.18)

Var(X) = Var(X|B) Pr(B) + Var(X|BC ) Pr(BC ) [ ]2 + [E(X|B) − E(X)]2 Pr(B) + E(X|BC ) − E(X) Pr(BC ) (2.19)

2.2 Common probability distributions

In this section we will present some common probability distributions. We write X ∼ () to express that X belongs to , and with parameters . Sometimes we also use an abbreviation for the distribution, for example we write X ∼ N(3, 4) to express that X is normally distributed with expectation 3 and variance 4.

1 This result is valid for the normal distribution. For other distributions there may be deviation from this result.

13 2.2.1 The normal distribution X is said to be normally distributed if the probability density function of X is given by:

2 1 1 − (x−µ) fX (x) = √ e 2σ2 (2.20) 2π σ where µ and σ are parameters that characterise the distribution. The mean and variance are given by:

E(X) = µ Var(X) = σ2 (2.21)

The distribution function for X could not be written on closed from. Numerical methods are required to find FX (x). It is convenient to introduce a standardised normal distribution for this purpose. We say that U is standard normally distributed if it’s probability density function is given by:

1 − u2 fU (u) = ϕ(u) = √ e 2 (2.22) 2π We then have ∫u ∫u 1 − t2 FU (u) = Φ(u) = ϕ(t)dt = √ e 2 dt (2.23) 2π −∞ −∞ and we observe that the distribution function of U does not contain any parameters. We therefore only need one look-up table or function representing Φ(u). A look-up table is given in Table 2.1. To calculate probabilities in the non-standardised normal distribution we use the following result:

Result 2.2 If X is normally distributed with parameters µ and σ, then X − µ U = (2.24) σ is standard normally distributed. ♢ ∫ a In many situations we are interested in calculating the “truncated expectation” −∞ xf(x)dx. For the normal distribution the following result may be used:

Result 2.3 Let X be normally distributed with parameters µ and σ. We then have: ∫ ( ) ( ) a a − µ a − µ xf(x)dx = µΦ − σϕ (2.25) −∞ σ σ where Φ() and ϕ() are the CDF and PDF for the standard normal distribution respectively. ♢ ∫ ∫ a (a−µ)/σ To prove Equation (2.25) first introduce u = (x − µ)/σ yielding −∞ xf(x)dx = −∞ (σu − µ)ϕ(u)du. The µϕ(u) part of the integral is directly√ ∫ found by the Φ() function whereas for the 2 (a−µ)2/2σ2 −z σuϕ(u) part introduce z = −u /2 yielding −σ/ 2π −∞ e dz. The result then follows.

14 Example 2.5 Calculation in the normal distribution Let X be normally distributed with parameters µ = 5 and σ = 3. We will find Pr(3 < X ≤ 6). We have: 3 − µ X − µ 6 − µ 3 − 5 6 − 5 Pr(3 < X ≤ 6) = Pr( < ≤ ) = Pr( < U ≤ ) ( ) ( ) σ σ σ 3 3 1 −2 = Φ − Φ = Φ(0.33) − (1 − Φ(0.67)) = 0.629 − 1 + 0.749 = 0.378 3 3 ♢

Problem 2.1 Consider the example in Example 2.5, and carry out the calculation by means of the pRisk.xlsm program. ♢

Problem 2.2 Let X be the height of men in a population, and assume X is normally distributed with parameters µ = 181 and σ = 4. How large percentage of the population is more than 190 cm? ♢

2.2.2 The exponential distribution X is said to be exponentially distributed if the probability density function of X is given by:

−λx fX (x) = λe (2.26)

The cumulative distribution function is given by:

−λx FX (x) = 1 − e (2.27) and the mean and variance are given by:

E(X) = 1/λ Var(X) = 1/λ2 (2.28)

Note that for the exponential distribution, X will always be greater than 0. The parameter λ is often denoted the intensity in the distribution

Example 2.6 We will obtain the probability that X is greater than it’s expected value. We then have:

−λE(X) −1 Pr(X > E(X)) = 1 − Pr(X ≤ E(X)) = 1 − FX (E(X)) = e = e ≈ 0.37 ♢

2.2.3 The Weibull distribution X is said to be Weibull distributed if the probability density function of X is given by:

α−1 −(λx)α fX (x) = αλ(λx) e (2.29)

The cumulative distribution function is given by:

−(λx)α FX (x) = 1 − e (2.30)

15 and the mean and variance are given by: ( ) 1 1 E(X) = Γ + 1 [ ( ) λ ( α )] 1 2 1 Var(X) = Γ + 1 − Γ2 + 1 (2.31) λ2 α α where Γ(·) is the gamma function. Note that in the Weibull distribution X will also always be positive.

2.2.4 The gamma distribution X is said to be gamma distributed if the probability density function of X is given by: λα f (x) = (x)α−1e−λx (2.32) X Γ(α) α is denoted the shape parameter whereas λ is denoted the scale parameter. For integer values of α the gamma distribution is often denoted the . The cumulative distribution function could then be found on closed form: − α∑1 (λx)α F (x) = 1 − e−(λx) (2.33) X n! n=0 For non-integer values of α numerical methods are required to obtain the cumulative distribution function. The mean and variance are given by: α E(X) = λ α Var(X) = (2.34) λ2 If we know the expectation E and the variance V in the gamma distribution, we may obtain the parameters α and λ by: λ = E/V , and α = λ · E. The gamma distribution is often used as a prior distribution in a Bayesian approach. For integer values of α the gamma distribution and in particular the Erlang distribution may be seen as a distribution for a sum of exponentially distributed stochastic variables:

Result 2.4 Let∑ Z1,Z2,...Zk be independent and exponentially distributed with parameter λ. The k variable X = i=1 Zi is then gamma distributed with shape parameter k and scale parameter λ.

2.2.5 The inverted gamma distribution X is said to be inverted gamma distributed if the probability density function of X is given by: ( ) λα 1 α+1 f (x) = e−λ/x (2.35) X Γ(α) x The mean and variance are given by: E(X) = λ/(α − 1) Var(X) = λ2(α − 1)−2(α − 2)−1 (2.36) Note that if X is gamma distributed with parameters α and λ, then Y = X−1 has an inverted gamma distribution with parameters α and 1/λ. If we know the expectation, E and the variance, V , of an inverted gamma distribution we could obtain α and λ by α = E2/V +2, and λ = E ·(α−1).

16 2.2.6 The lognormal distribution X is said to be lognormally distributed if the probability density function of X is given by:

1 1 1 − 1 (log x−ν )2 fX (x) = √ e 2 τ2 (2.37) 2π τ x We write X ∼ LN(v,τ). The mean and variance of X is given by

ν+ 1 τ 2 E(X) = e 2 2 2 Var(X) = e2ν(e2 τ − eτ ) (2.38)

The following result could be utilised:

Result 2.5 If X is lognormally distributed with parameters ν and τ, then Y = ln X is normally distributed2 with expected value ν and variance τ 2. ♢

2.2.7 The binomial distribution Before the binomial distribution is defined, binomial trials are defined. Let A be an event, and assume that the following holds:

i) n trials are performed, and in each trial we record whether A occurrs or not.

ii) The trials are stochastic independent of each other.

iii) For each trial Pr(A) = p

When i)-iii) is satisfied, we say that we have binomial trials. Nowlet X be the number of times event A occurs in such a binomial trial. X is then a stochastic variable with a binomial distribution. This is written X ∼ Bin(n, p). The probability function is given by ( ) n − Pr(X = x) = px(1 − p )n x for x = 0, 1, 2, .., n (2.39) x

The cumulative distribution function Pr(X ≤ x) is given in statistical tables. For the binomial distribution, expectation and variance are given by:

E(X) = np Var(X) = np(1 − p) (2.40)

2.2.8 The Poisson distribution The Poisson distribution is often appropriate in the situation where the stochastic variable may take the values 0,1,2,..., and where the expected number of occurrences is proportional to an exposure measure such as time or space. For the Poisson distribution we have the following point distribution: λx p(x) = Pr(X = x) = e−λ (2.41) x! 2 ln(·) is the natural logarithm function

17 For the poison distribution, expectation and variance are given by:

E(X) = λ Var(X) = λ (2.42)

It can be proved that the Poisson distribution is appropriate if the following situation applies: Consider the occurrence of a certain event (e.g. a component failure) in an interval (a, b), and assume the following:

1. A could occur anywhere in (a,b), and the probability that A occurs in (t, t + ∆t) is approxi- mately equal to λ∆t, and is independent of t (∆t should be small).

2. The probability that A occurs several times in (t, t + ∆t) is approximately 0 for small values of ∆t.

3. Let I1og I2 be disjoint intervals in (a, b). The event A occurs within I1 is independent of if the event A occurs in I2. When the criteria above are fulfilled we say we have a Poisson point process with intensity λ. The number of occurrences (X) of A in (a, b) is then Poisson distributed with parameter λ(b − a):

[λ(b − a)]x p(x) = Pr(X = x) = e−λ(b−a) (2.43) x!

Result 2.6 In a Poisson point process with parameter λ the times between the occurrence of the event A are exponentially distributed with parameter λ.

2.2.9 The inverse-Gauss distribution The inverse-Gauss distribution is often used when we have an “under laying” deterioration process. If this deterioration process follows a Wiener process with drift η and diffusion constant δ2, the time3 T , until the first time the process reaches the value ω will be Inverse-Gauss distributed with parameters µ = ω/η, and λ = ω2/δ2. If the failure progression Ω(t) follows a Wiener process it could be proven that Ω(t) - Ω(s) is normally distributed with expected value η(t − s) and variance δ2(t - s). That is η is the average growth rate in the process, whereas δ2 is an expression for the variation of the growth around the average value. For the inverse-Gauss distribution we have: (√ ) ( √ ) √ √ √ √ λ 1 λ 1 2λ/µ FT (t) = Φ t − λ√ + Φ − t − λ√ e (2.44) µ t µ t and

E(T ) = µ (2.45) Var(T ) = µ3/λ (2.46) 3We use the symbol T rather than the more general symbol X here since this modell is so explicitly linked to the time.

18 2.2.10 The triangular distribution The triangular distribution has a probability density function that comprises a triangle. The lover left corner points out the lowest value (L), the upper right corner points out the highest value (H). Finally, the x-value of the third corner points out the most probable value, or mode (M).The probability density function for the triangular distribution is given by: { − 2(x L) if L ≤ x ≤ M f (x) = (M−L)(H−L) (2.47) X 2(H−x) ≤ ≤ (H−M)(H−L) if M x H

The cumulative distribution function is given by: { − 2 (x L) if L ≤ x ≤ M F (x) = (M−L)(H−L) (2.48) X − (H−x)2 ≤ ≤ 1 (H−M)(H−L) if M x H and the mean and variance are given by: L + M + H E(X) = 3 L2 + M 2 + H2 − LM − LH − MH Var(X) = (2.49) 18 Problem 2.3 Assume that the completion of a project is triangular distributed with parameters L = 200, M = 240 and H = 350. In our contract we have committed our selves to finish the project within 220 days. After 220 days we have to pay a penalty of 100 Euro per day in penalty for default. Find the total expected penalty for default in this project. ♢

Problem 2.4 Consider Problem 2.3 and assume that a special building method could reduce H from 350 to 300, leaving L and M unchanged. This will cost 2,000 Euro extra. Do a cost benefit analysis of this option. ♢

2.2.11 The PERT distribution The PERT distribution has as the triangular distribution three parameters, L (lowest value), M (most likely value), and H (highest value). To give the probability density function for the triangular distribution we first define: 4M + H − 5L α = 1 H − L 5H − 4M − L α = 2 H − L x − L z = (2.50) H − L The probability density function is now given by:

− − (x − L)α1 1(H − x)α2 1 fX (x) = α +α −1 (2.51) B(α1, α2)(H − L) 1 2 where B(·, ·) is the . The cumulative distribution function is given by:

19 Bz(α1, α2) FX (x) = (2.52) B(α1, α2) where Bz(·, ·) is the incomplete beta function. The mean and variance are given by: L + 4M + H E(X) = 6 [E(X) − L][H − E(X)] Var(X) = (2.53) 7

Problem 2.5 Use the pRisk.xlsm program to find Pr(X ≤ 7) if X ∼ PERT(L = 3,M = 6,H = 10). ♢

Problem 2.6 Consider a situation where the unconditional distribution of the duration of a project groundwork activity is PERT distributed with parameters L = 0.5, M = 1.5 and H = 3.5 days. By a detailed analysis into the uncertainty of the situation we recognize that frozen soil is a major factor to the long duration. Let B represent the event that it is frozen soil. We now make the following assessment: Given frozen soil, the duration of the activity, T B, is PERT distributed with parameters L = 2, M = 2.5 and H = 3.5, and if the soil is not frozen the duration of the activity, T BC , is PERT distributed with parameters L = 0.5, M = 1 and H = 2.5. Find p = Pr(B) such that the expectation in the conditional situation is the same as in the unconditional situation. Hint: You may use that E(T ) = E(T B) Pr(B) + E(T BC ) Pr(BC ), see Equation ( 2.18). ♢

Problem 2.7 Make a sketch of the unconditional probability distribution function in the situation in Problem 2.6 when the consideration of frozen soil is taken into account. ♢

Problem 2.8 Find the unconditional variance of the duration in Problem 2.6. Hint: You may use equation (2.19). ♢

Problem 2.9 Consider again the situation in Problem 2.6, i.e. we let in the first place T ∼ PERT(L = 0.5,M = 1.5,H = 3). Also Let B represent frozen soil and Pr(B) = 0.2. We now introduce three factors, fB, fBC and fV that relate the conditional situation to the original situation. { } { } The parameters relevant in the conditional situation are LB,MB,HB and LBC ,MBC ,HBC in the situation where B occurs, and B does not occur respectively. We now let MB = fB · M, − · − · − · − · − LB = MB fV (M L), HB = MB + fV (H M), MBC = fBC M, LBC = MBC fV (M L), · − and HBC = MBC + fV (H M). Let fB = 1.5 and fV = 0.5. Find by an iterative procedure the value of fBC such that the expectation of T is equal to the original expectation. Next find fV by a similar iterative procedure such that the variance of T is equal to the original variance. ♢

2.3 Assessment of parameters in parametric distributions

We have in the previous section discussed parametric probability distributions. Common for all these distributions is that they involve parameters. When using a parametric distribution, we also need to assess the parameters. In this presentation we will not discuss in detail how this could be done. If we have access to experience data, we could estimate these parameters by e.g. the maximum likelihood principle. In other situations where we have no, or very little data we

20 would use expert judgement to assess the parameters, see e.g., [11] for further discussion on expert judgement. In this presentation we will very often assume that the uncertainty in a quantity, e.g. the duration of an activity could be described by a so-called triple estimate {L, M, H}. We will then as a general rule assume that the corresponding parametric distribution is the PERT distribution. We will further assume that the L value is the absolute minimum, and that the H value is the absolute maximum the quantity could take. It is, however, important to realise that in other presentation the L and H values are treated as lower and upper quantiles in the distribution, and often a 90% interval is assumed. This is even the situation for the PERT distribution which is defined for a finite domain. So if we for a given triple estimate should establish the expected value, and the standard deviation we should be careful regarding the interpretation of the triple estimate.

2.4 Distribution of sums, products and maximum values

2.4.1 Distribution of sums

If X1, X2,…,Xn are random variables we might obtain the expected value, the variance and the standard deviation of the sum of the x-es: ( ) ∑n ∑n E(X1 + X2 + ... + Xn) = E Xi = E(Xi) (2.54) i=1 i=1 ( ) ∑n ∑n Var(X1 + X2 + ... + Xn) = Var Xi = Var(Xi) (2.55) i=1 i=1 √ (∑ ) ∑ n n 2 SD Xi = [SD(Xi)] (2.56) i=1 i=1

Note that Equations (2.55) and (2.56) are only valid if the x-es are stochastically independent. If there is dependency between the x-es we need to include a covariance term, e.g., if we only have two variables X1 and X2 we have:

Var(X1 + X2) = Var(X1) + Var(X2) + 2Cov(X1,X2) (2.57) where Cov(X1,X2) is the covariance between X1 and X2. The results above help us in determine the expectation and variance of a sum of stochastic variables, but the results could not be used to establish the probability distribution of the sum. In the following we refer some results we could utilise in many situations.

Result 2.7 Sum of normally distributed stochastic variables ∑Let X1, X2,…,Xn be independent normally distributed.∑ Let Y be the sum of the x∑-es, i.e. Y = n n n i=1 Xi. Y is then normally distributed with E(Y ) = i=1 E(Xi) and Var(Y ) = i=1 Var(Xi). ♢

Result 2.8 Sum of exponentially distributed stochastic variables Let X1, X2,…,Xn ∑independent exponentially distributed with parameter λ. Let Y be the sum of n ♢ the x-es, i.e. Y = i=1 Xi. Y is then gamma distributed with parameters n and λ.

Result 2.9 Sum of gamma distributed stochastic variables Let X1, X2,…,Xn independent∑ gamma distributed with parameters α and λ. Let Y be the sum of n ♢ the x-es, i.e. Y = i=1 Xi. Y is then gamma distributed with parameters nα and λ.

21 Result 2.10 Central limit theorem Let X1, X2,…,Xn be a sequence of identical independent distributed stochastic variables with ex- pected value µ and standard deviation σ. As n approaches infinity, the average value of the x√-es will asymptotically have a normal distribution with expected value µ and standard deviation σ/ n. Similarly, the sum of the x-es√ will asymptotically have a normal distribution with expected value nµ and standard deviation σ n. ♢ Several generalizations for finite variance exist which do not require identical distribution but incorporate some conditions which guarantee that none of the variables exert a much larger influence than the others. Two such conditions are the Lindeberg condition and the Lyapunov condition. Now, as n approaches∑ infinity, the sum of the ∑x-es will asymptotically have a normal distribution n n with expected value i=1 E(Xi) and variance i=1 Var(Xi).

Problem 2.10 Consider a project consisting of n activities that follow each other in time. Let each activity have a PERT distribution with parameters L = 3, M = 5 and H = 10. Use the Monte Carlo simulation procedure in the pRisk.xlsm program to find the cumulative distribution function for the total duration of the project. Compare the result with using the Central Limit Theorem for various values of n. How large should n be in order to give a reasonable approximation by using the normal distribution? ♢

2.4.2 Distribution of a product

If X1, X2,…,Xn are independent stochastic variables we might obtain the expected value, the vari- ance and the standard deviation of the product of the x-es: ( ) ∏n ∏n E(X1 · X2 · ... · Xn) = E Xi = E(Xi) (2.58) i=1 i=1 The results for the variance and standard deviation is more complicated, and we only present the results for n=2.

2 2 Var(X1X2) = Var(X1)Var(X2) + Var(X1) [E(X2)] + Var(X2) [E(X1)] (2.59) √ 2 2 SD(X1X2) = Var(X1)Var(X2) + Var(X1)[E(X2)] + Var(X2)[E(X1)] (2.60)

Problem 2.11 Show that Equation (2.59) is correct by using the fact that Var(X) = E(X2) − [E(X)]2. ♢

Problem 2.12 Use the program pRisk.xlsm to simulate the mean and standard deviation of the product X1X2 if both X1 and X2 are independent and normally distributed with expected value 10 and standard deviation 2. Compare the result with the exact result. ♢

2.4.3 Distribution of maximum values

Let X1and X2 be independent stochastic variables, and let Y = max(X1,X2). The cumulative distribution function of Y is given by:

FY (x) = Pr(Y ≤ x) = Pr(X1 ≤ x ∩ X2 ≤ x) ≤ ≤ = Pr(X1 x) Pr(X2 x) = FX1 (x)FX2 (x) (2.61)

22 In this situation we could easily obtain the distribution of the maximum of two stochastic variables, but it is not so easy to obtain the expectation and variance. However, since the probability density function, fY (x) is the derivative of FY (x) we find: ∫∞ ∫∞ · · E(Y ) = x fY (x) dx = x [fX1 (x)FX2 (x) + fX2 (x)FX1 (x)] dx (2.62) −∞ −∞

∫∞ − 2 · Var(Y ) = [x E(Y )] [fX1 (x)FX2 (x) + fX2 (x)FX1 (x)] dx (2.63) −∞

Problem 2.13 Find the expectation and standard deviation of Y = max(X1,X2) if X1 and X2 are independent and normally distributed with µ1 = E(X1) = 10, µ2 = E(X2) = 7, σ1 = SD(X1) = 2, and σ2 = SD(X2) = 3. Hint: You might use the routine for numerical integration implemented in the pRisk.xlsm program. ♢

Problem 2.14 Consider the problem above, but now find the result by using the Monte Carlo simulation procedure in the pRisk.xlsm program. ♢

Problem 2.15 Consider the problem above, but now find the result by using the EMax and VarMax functions in the pRisk.xlsm program. ♢

23 Table 2.1: The Cumulative Standard Normal Distribution

∫z 1 − u2 Φ(z) = Pr(Z ≤ z) = √ e 2 du 2π −∞

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 0.0 .500 .504 .508 .512 .516 .520 .524 .528 .532 .536 0.1 .540 .544 .548 .552 .556 .560 .564 .567 .571 .575 0.2 .579 .583 .587 .591 .595 .599 .603 .606 .610 .614 0.3 .618 .622 .626 .629 .633 .637 .641 .644 .648 .652 0.4 .655 .659 .663 .666 .670 .674 .677 .681 .684 .688 0.5 .691 .695 .698 .702 .705 .709 .712 .716 .719 .722 0.6 .726 .729 .732 .732 .739 .742 .745 .749 .752 .755 0.7 .758 .761 .764 .767 .770 .773 .776 .779 .782 .785 0.8 .788 .791 .794 .797 .800 .802 .805 .808 .811 .813 0.9 .816 .819 .821 .824 .826 .829 .831 .834 .836 .839 1.0 .841 .844 .846 .849 .851 .853 .855 .858 .860 .862 1.1 .864 .867 .869 .871 .873 .875 .877 .879 .881 .883 1.2 .885 .887 .889 .891 .893 .894 .896 .898 .900 .901 1.3 .903 .905 .907 .908 .910 .911 .913 .915 .916 .918 1.4 .919 .921 .922 .924 .925 .926 .928 .929 .931 .932 1.5 .933 .934 .936 .937 .938 .939 .941 .942 .943 .944 1.6 .945 .946 .947 .948 .949 .951 .952 .953 .954 .954 1.7 .955 .956 .957 .958 .959 .960 .961 .962 .962 .963 1.8 .964 .965 .966 .966 .967 .968 .969 .969 .970 .971 1.9 .971 .972 .973 .973 .974 .974 .975 .976 .976 .977 2.0 .977 .978 .978 .979 .979 .980 .980 .981 .981 .982 2.1 .982 .983 .983 .983 .984 .984 .985 .985 .985 .986 2.2 .986 .986 .987 .987 .987 .988 .988 .988 .989 .989 2.3 .989 .990 .990 .990 .990 .991 .991 .991 .991 .992 2.4 .992 .992 .992 .992 .993 .993 .993 .993 .993 .994 2.5 .994 .994 .994 .994 .994 .995 .995 .995 .995 .995 2.6 .995 .995 .996 .996 .996 .996 .996 .996 .996 .996 2.7 .997 .997 .997 .997 .997 .997 .997 .997 .997 .997 2.8 .997 .998 .998 .998 .998 .998 .998 .998 .998 .998 2.9 .998 .998 .998 .998 .998 .998 .999 .999 .999 .999 3.0 .999 .999 .999 .999 .999 .999 .999 .999 .999 .999

Φ(-z) = 1 - Φ(z)

24 Chapter 3

Introduction to modelling

3.1 Deterministic and probabilistic models

Mathematical models are essential in TPK4161. A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modelling. Mathematical models are useful to explain a system, to study the effects of different decisions, and to make predictions about future system behaviour. A distinction is made between deterministic and probabilistic models. A deterministic model is primarily used to describe relations between physical quantities and other real world observables. Equation (3.1) is a deterministic model for profit where it is assumed a fixed production costof 100 Euro per unit produced, and the sale price is 300 Euro per unit but drops exponentially with an increasing sales volume. x is the production volume.

p(x) = 300xe−x/10 − 100x (3.1)

A probabilistic model is a model that enables the analyst to apply the law of total probability in an efficient way when expressing uncertainty, i.e., performing probability calculus. It is important to emphasize that a probabilistic model is not a model of the world, but it is a model used to express uncertainty regarding observables in the real world. Examples of modelling tools are Markov models and queue theory models. Equation (3.2) is a probabilistic model expressing that exactly n customers arrive a given day when the customer arrival rate per day is µ. µn Pr(N = n) = e−µ (3.2) n! 3.2 Problem formulation - Modelling

1. Structuring. Structuring is an important step required before the modeling may start. Struc- turing means to present tacit knowledge, system understanding etc in such a way that the analyst is able to start modelling. 2. Identification of variables. Variables are those quantities in the model that vary. Decision variables are those variables the decision maker can control by implementing measures, de- ciding upon number of items to order and so on. Uncontrollable variables are quantities that the decision maker can not influence. Uncontrollable variables are often stochastic variables. 3. Determination of cause and effects. Usually variables are dependent in one vay or another. These dependencies must be specified.

25 4. Identification of model parameters. Model parameters are fixed quantities in the model that describe variables or other parts of the model. For example stochastic variables are often described by their mean values and standard deviations.

5. Specify the objective function. The objective function is used in optimization problems where the objective function represents the function to minimize or maximize. The objective function is a function of the decision variables.

6. Specify constraints. In most optimization problems there are constraints that apply. For example there might be limited resources, a machine can only produce a certain amount of units per hour and so on.

7. Modelling. Modelling means to put everything together in a consistent mathematical model. Both deterministic and probabilistic models are usually required.

8. Identification of the need of data. All models will require input data such as demand rates, production rates, cost figures. Depending on the format and level of detail in the modelling, the main objective of this step is to be specific on the need for data.

9. Data collection and assessment of model parameters. When the need for data is specified the next step is to collect data and estimate/assign model parameter based on raw data and use of expert judgements.

10. Run the model. The objective of the modelling is to get insight into a complex problem area. By playing around with the model this will hopefully get such insight. In particular the model should be run with different values of decision variables to search for good decisions.

11. Optimization. In cases where an objective function is specified it is required to optimize the objective function. Various approaches exist.

Example 3.1 The various elements of the problem formulation are elaborated by an example. We are the project manager of an Engineer to order (ETO) project where we are approaching the final delivery of an offshore supply vessel (OSV) to an important customer. The due date of deliveryis in DD = 30 days from now on (DD = Due Date). If we are not able to finalize the OSV before the due date we have to pay a penalty of default equal to 10 000 Euro per day. Data has been collected for other similar ETO projects, and for this finalization phase the following data set for the number of days until finalization have been collected: 25, 35, 28, 41, 22, 31, 28 and25. From the historical data it is quite obvious that there is a large risk that we will not reach the due date. Overtime is considered as a means to reduce risk. If overtime is used, we assume that the extra cost is 1 000 Euro per % overtime used. Further we assume that the production rate for this last period increases linearly with the amount of overtime used. We are not allowed to use more than 10% overtime. The following elements are now part of the problem structuring:

• Stochastic variable: Let X be the number of days required to finalize the OSV. We assume that X is normally distributed with mean µ and standard deviation σ.

2 • Let fX (x|µ(p), σ ) be the probability density function of X, where µ(p) depends on the amount of overtime used.

• Decision variable: Let p be the percentage of overtime used, i.e., p is the decision variable.

26 • Model parameter: The cost of overtime is CO(p) = cOp = 1000p, where cO = 1000 is a model parameter.

• Model parameter: The cost cPD = 10 000 per day we are delayed is a model parameter.

• Cause and effect: Let µ0 be the expected duration of the last phase if overtime is not used, then we assume that the expected duration as function of p is given by µ(p) = µ0/(1+p/100). • Model parameter: We assume that the standard deviation in the duration, σ, is not influenced by the amount of overtime used. ∫ ∞ − | 2 • The expected penalty of default is CPD(p) = x=DD cPD(x DD)fX (x µ(p), σ )dx.

• The objective function to minimize is C(p) = CO(p) + CPD(p). • Constraints: p < 10.

• Data analysis: To estimate µ0 and σ the =AVERAGE() and =STDEV() functions in MS Excel are applied to the historical data points.

Problem 3.1 Consider Example 3.1. Use MS Excel to estimate µ0 and σ. Use the pRisk.xlsm pro- gram for numerical integration, and minimize the objective function by a graphical solution. ♢

Problem 3.2 Consider Example 3.1. Improve the solution by applying Equation (2.25) rather than using numerical integration. ♢

Problem 3.3 Assume that X is N ∼ (µ, σ2). Use Equation (2.25) to derive an expression for the integral ∫ ∞ 2 (x − x0)fX (x|µ, σ )dx x=x0 and implement the solution as a VBA function B(x0,mu,sigma) in MS Excel. Test the function and compare with the numInt() function in pRisk.xlsm. ♢

Problem 3.4 At a service desk customers arrive with a daily arrival rate of λ customers per day. Assume that we have observed the numbers of customers the last week to be 8, 5, 9, 15, 6. We have only capacity to handle 10 customers per day. For each customer we cannot serve, there is a penalty cost of 1 000 Euro. To increase the daily capacity to 13 we can pay an extra 500 Euro. Follow the model structuring process described above, and determine if it pays off. Test the solution both if the number of customers arriving are normally distributed, and Poisson distributed. ♢

27 Chapter 4

Discrete event simulation

4.1 Introduction

In discrete-event simulation, the operation of a system is represented as a chronological sequence of events. Each event occurs at an instant in time and marks a change of state in the system (Robin- son, 2004). For example, if the up- and down times of a system of components are simulated, an event could be “component 1” fails, and another event could be “component 2” is repaired. A third event could be that a customer arrives to a service desk. Rather than explicitly assessing the performance of a system by the laws of probability, we just “simulate” the system, and cal- culate the relevant to get the performance. This presentation gives the core elements of discrete-event simulation and some indications regarding implementing simple situations. An Excel file,DiscreteEventSimulation.xlsm is available. There are basically three ways to perform discrete-event simulation, i.e., by application of:

1. A tailored tool to perform simulation for special situations. Miriam Regina is such a tool to perform simulation of an offshore oil- and gas production system.

2. A dedicated discrete-event simulation program with predefined functions and procedures to handle events, housekeeping of resources etc. SimEvent is such a tool which builds on Matlab. Other tool are ExtendSim and Arena. Experience shows that many such tools comes with user manuals which emphasize one application area, and that it is not straight forward to apply the tool for a specific application without considerable effort in order to understand the logic of the tool.

3. Ordinary programming languages like FORTRAN, C/C++ and Visual Basic for Application (VBA). With the ordinary programming languages the programmer has full control over everything, but has to develop all the code needed.

In this presentation we will start with the basic, hence the starting point is 3 from the above list. Note that code written in VBA is generally slow to execute compared to e.g., FORTRAN or C++ code. The advantage of VBA code is that it is very easy to integrate the code with model specification either from an MS Excel work sheet, or an MS Access application. In the codelisting that follows we only provide pseudo code. Essentially VBA syntax is used. To simplify the code variable declaration is generally omitted. It is, however, good practice to declare all variables used. This will help verifying the code, and also ensure more optimal code. VBA allows use of user defined types by the TYPE statement. To refer to type elements the standard reference by a period (.) is used. Also note that code is optimized by using the following construct

28 With Code, where an element is referred only by a period (.) and the element name, e.g., .Time = .Time + random variable End With

Lists etc. that are used are sometimes linked. Generally the syntax (^Element) is used as a pointer statement. VBA does not support pointers, so here the pointer is just an integer pointing to an element in a table. Linked lists are implemented as arrays of user defined types with one element named Next. To traverse a linked list the following structure is then used:

Next = pointer to first element in the list Do While Next <> NIL ... Code ... Next = List(Next).Next Loop

Where we exit the loop if some criterium is met.

4.2 Components of a Discrete-Event Simulation

In addition to the representation of system state variables and the logic of what happens when system events occur, discrete event simulations include the following concepts:

• Clock. The simulation must keep track of the current simulation time, in whatever mea- surement units are suitable for the system being modelled. In discrete-event simulations, time advances in discrete jumps, that is, the clock skips to the next event start time as the simulation proceeds.

• Event. A change in state of a system.

• Events List (PES). The simulation maintains at least one list of simulation events. This is sometimes called the pending event set because it lists events that are pending as a result of previously simulated events but have yet to be simulated themselves. In other presentations this list is denoted the future event set. The event list, or pending event set, will in this presentation be denoted PES. The pending event set is typically organized as a priority queue, sorted by event time. That is, regardless of the order in which events are added to the event set, they are removed in strictly chronological order.

• Event notice. An element in the PES describing when an event is to be executed. An event is described by the time at which it occurs and a type, indicating the code that will be used to simulate that event. It is common for the event code to be parametrised, in which case, the event description also contains parameters to the event code. In a programming context the EventNotice() is a function that places an event in the PES. Logically an event notice is an event in the PES waiting to be executed at a given point of time. When implementing the EventNotice() function we refer to an event notice as the action to insert the event in the PES.

• Activity. A pair of events, one initiating and the other completing an operation that transform the state of the entity. Time elapses in an activity. Repairing a component is treated as an

29 activity. At the start of the activity the component is in a fault state, and at the end of the activity the state of the component is a functioning state. Another activity could be the service of a customer. We usually assume that no continuous changes are taking place during the activity. If this is the case, it is much more demanding to implement activities in the computer code.

• Random-Number Generators. The simulation needs to generate random variables of various kinds, depending on the system model. This is accomplished by one or more pseudorandom number generators. To generate pseudorandom numbers a two-step procedure may be used, (i) generate a pseudorandom number from a uniform distribution on the interval [0,1], and then (ii) use this number (or more such numbers) as a basis for generating a pseudorandom number from the required distribution for example by a call to the inverse cumulative distri- bution function (CDF). Modern programming languages provide at least a simple subroutine to generate [0,1] variables, but in most cases the user need to provide subroutines to transform this number to the required distribution. Various subroutine libraries exist for more efficient random-number generation.

• Statistics. The simulation typically keeps track of the system’s statistics, which quantify the aspects of interest. In the example above, it is of interest to track the relative portion of time the system is in a fault, or a degenerated state.

• Ending Condition. A discrete-event simulation could run forever. So the simulation designer must decide when the simulation will end. Typical choices are “at time t” or ‘after processing n number of events”.

4.3 Simulation Engine Logic

The main loop of a discrete-event simulation is basically:

• Start of simulation.

• Initialize Ending Condition to FALSE.

• Initialize system state variables.

• Initialize Clock (usually starts at simulation time zero).

• Schedule one or more events in the PES.

• “Do loop” or “While loop”, i.e., While (Ending Condition is FALSE) then do the following:

– Get the NextEvent from the PES. – Set clock to NextEvent time. – Execute the NextEvent code and remove NextEvent from the PES. – Update statistics.

• Generate statistical report.

• End of simulation.

30 Although the main program is very simple it is usually much harder to write the code for processing the events. It is informative to use the notation On as function name of the functions to be called when the NextEvent is retrieved and to be executed by running the event function. This emphasise that the Event is what is happening. For example if a ”failure” is put in the PES, and when the time comes to ”execute” this failure, the OnFailure event is called. The OnFailure code then has to programmatically change state variables and “post” new events, for example when the repair is completed. An other On function could be OnArrival which is a function to call when a new customer arrives into a waiting room.

4.4 Implementing the pending event set (PES)

Logically the PES may be seen as a queue of events waiting for execution from left to right. As simulation proceeds we may think of the queue as a number of Post-it notes placed on the blackboard sorted in chronological order with respect to time of execution. The main simulation loop will then fetch the leftmost Post-it note and execute the code described. The code to be executed will typically add one or more new Post-it notes on the blackboard. When a new Post-It note is added it is placed at it’s appropriate time. There are several ways to manage the PES in a computer program. Examples are binary search trees, splay trees, skip lists and calendar queues. The following properties need to be balanced:

• It should be fast to insert a new event in the PES

• It should be fast to delete an event from the PES

• It should be fast to get the first event in the PES

• The PES should not occupy too much memory

• The code to implement the PES should not be too complex

A very simple implementation of a PES will be to use a double-linked list. For double linked list it is easy both to insert and delete records. If the list is rather short searching for events is also very efficient. However, if the number of elements in the PES becomes large, searching timeis proportional to the number of elements. Since we for each execution of an event typically add a new event, this means that searching the appropriate place to insert the new event will require much computer time. In the following we will provide a rather simple approach for implementation of the PES. The implementation comprises the following elements:

• A linked list to represent the PES.

• An indexed list (Indx) to enable fast access to the PES

Figure 4.1 shows the structure. The PES contains the pending events. In addition it has a header and a tail with corresponding times equal -1 and infinity respectively. Since the header may be considered as a dummy event, we will always have access to the header, and this element will never be removed from the list. This makes it easy to maintain the list as simulation proceeds. The indexed list is a list of representative times sorted in chronological order. Since this list is sorted, it is very fast to access a given point of time. This indexed list has pointers to the PES. There are fewer elements in the indexed list than in the PES. This means that when we search for an event in the PES we will only get a rough position where to start searching. For example if we will like to insert a new element in the PES at time t = 9.1 we first search the indexed list and finds that9.1

31 Figure 4.1: Implementation of the PES with a supporting indexed list is between entry 2 and 3 in the indexed list. Entry 2 points to the entry corresponding to t = 7.5 in the PES. Hence, we start searching in the PES from that point until we find the place where to insert the new element with t = 9.1. Searching time now consist of (i) the time to (binary) search the indexed list which is proportional to log2 NIL, where NIL is the number of events in the indexed list, and (ii) the time to search the PES from the starting point until we find an event, or the place where to insert a new event. For example, if the size of the PES, say NPES, is 50NIL we need in average 25 comparisons in the PES. As elements are inserted and deleted from the PES, the indexed list becomes out of date. We therefore now and then update the indexed list. This will require approximately NPES operations. Obvious there will be a need to optimize the parameter NIL and the frequency of re-indexing of the indexed list. This is not discussed further here. The following functions are now required to operate on the PES and the indexed list

Function FindLow(t) Search Indx to find i where Indx(i).t ￿￿ t < Indx(i+1).t Return a pointer to PES, i.e. FindLow = Indx(i).^PES End Function

Function EventNotice(time, functionPointer, parameters) EventNotice = InsertElementInList(time, Data=[functionPointer, parameters]) End Function

Function InsertElementInList(t, Data) Start = FindLow(t) Traverse PES from PES(Start) until PES(i).t ￿￿ t < PES(i+1).t Insert a new element in PES at position i+1 PES(i+1).Data = Data If NeedToUpdate > IndexFrequency NeedToUpdate = 0 IndexPES() Else NeedToUpdate = NeedToUpdate + 1 End If InsertElementInList = i + 1 End Function

Function ReleaseEventNotice(EventNotice) i = FindLow(t)

32 Traverse PES from PES(i) until i = EventNotice If PES(EventNotice)^Indx <> NIL Then SetFlag Disconnect and Free EventNotice If SetFlag Then IndexPES() End Function

Function IndexPES() Make Indx be an empty list Run through PES For every k’th element, insert new entry in Indx, and make links End Function

Function GetNxtEvent() PrevClock = Clock Clock = PES(PES(1).Next).t GetNxtEvent = PES(PES(1).Next).Data ReleaseEventNotice PES(1).Next End Function

Function InitPES() Empty tables Create dummy events for header (t = -1), and tail (t = infinity) End Function

Function GetClock() GetClock = Clock End Function

Function TimeElapsed() TimeElapsed = Clock - PrevClock End Function The above listed functions have been implemented in the DiscreteEventSimulationLib VBA module of the DiscreteEventSimulation.xlsm Excel file. The user only needs to care about the following functions: • InitPES()

• EventNotice(time, ”subroutine”, [P1], [P2], [P3], [P4])

• GetNxtEvent()

• ReleaseEventNotice(eventNotice)

• GetClock()

• TimeElapsed()

Also an ”Exectue” function is required to execute the On got by the GetNxtEvent function. The On function needs to work with parameters, for example which component failed since the same functional call is used for all component failures. Up to 4 parameters of integer type can be

33 passed to the EventNotice function. The GetNxtEvent returns a Variant type data, which could be passed to the Execute() function. The Execute() function will execute the named subroutine, and where the parameters P1 to P4 are passed to the subroutine.

4.5 Library of functions for generating pseudorandom numbers

In order to run a probabilistic discrete-event simulation we need a library of pseudorandom number generators. In the rndLib VBA module of the DiscreteEventSimulation.xlsm Excel file some standard functions are provided. Among these are:

Function rndExponential(MU) Returns a random number exponentially distributed with mean MU End Function

Function rndWeibull(A, B) Returns a random number, Weibull distributed with shape parameter A and location parameter B, i.e. parametrization is f(x) = ( A/(B^A) ) * x^ (A-1) * EXP( -(x/B) ^A ) End Function

Function rndErlang(K, B) Returns a random number, Erlang-k distributed with shape parameter K and scale parameter B End Function

Function rndNormal(MU, SIGMA) Returns a random number, Normally distributed with mean MU and standard deviation SIGMA. End Function

Note that many of the standard life time distributions exist with different parametrization. It is therefore required to check the parametrization used in the library of random number generators.

4.6 A simple failure and repair model

We are now able to write our first discrete-event simulation program. We consider a very simple model where there is one component that either is in a fault state, or it is functioning. A global variable DownTime is used to store accumulated down time. To run the program, we need two subroutines, OnFailure() is used to handle the situation when component fails, and OnRepair() is used to handle the situation when the component is repaired:

Sub OnFailure() EventNotice GetClock() + rndExponential(10), ''OnRepair'' compStatus = Down End Sub

Sub OnRepair() EventNotice GetClock() + rndExponential(1000), ''OnFailure''

34 compStatus = Up End Sub In each of these subroutines the EventNotice is called with a point of time generated by adding an exponentially distributed number and a function pointer to a subroutine. The main program now looks like: Sub MainProgram downTime = 0 prevClock = 0 compStatus = Up InitPES EventNotice rndExponential(1000), ''OnFailure'' Do While getClock() < 100000 nxtEvent = getNxtEvent() If CompStatus = Down Then downTime = downTime + (getClock() - prevClock) prevClock = getClock() Execute nxtEvent Loop Debug.Print "U = " & DownTime / GetClock() End Sub Note that it is not straight forward to implement the steps required to execute a function. In the example above we have passed a so-called function pointer to the EventNotice() function. How this is actually implemented will vary from one implementation language to another. In for example VBA (Visual basic for Application used by MS Office) we can write an Execute() function to execute the subroutine pointed to by the data structure returned by the getNxtEvent() function. In Windows Excel it is rather easy to implement an Execute() that uses the address of the function to be called when an event is fetched from the pending event set. In Mac Excel this is not so easy. Therefore to have VBA code that runs under Windows as well as on a Mac a slower implementation may be used where we take advantages of the Application.Run command. This command can run a subroutine specified by it’s name. Using such an approach we then rather pass the name of the function to the EventNotice routine. In the example we have considered exponentially distributed failure and repair times. In a more general setting we may use any distribution for both time to failure and time to repair as long as we are able to generate the corresponding pseudorandom numbers.

Problem 4.1 Write a VBA code that calls the EventNotice with point of times 5,4,6,1,3. Use the getNxtEvent to retrieve the events and verify that they are picked in the correct order. Then repeat the same procedure, but use the ReleaseEventNotice to remove point of time 4 before retrieving the events.

Problem 4.2 Implement the model described in Section 4.6 in a VBA program and find the unavailability.

Problem 4.3 In a workshop there are two production lines in parallel. Each production line has a critical machine with constant failure rate λ = 0.01 failures per hour. There is one (common) spare machine that can replace a failed machine. We assume that switching time can be ignored. The repair rate of the machines is assumed constant and equal to µ = 0.2 per hour. If a production line is down the loss is assumed to be cU = 10 000 NOKs per hour. Only one repair man is available.

35 • Write a VBA code that simulates the situation.

• Find the expected loss due to downtime.

• If production is not 24/7 but runs from 07:00 to 15:00 it is reasonable to assume that we do not lose production during night, but that repair may continue. Implement this in the VBA code.

• Repeat the analysis, but assume that two repair men are available.

• How much should one be willing to pay per hour for having this extra backup on repair resources? ♢

36 Chapter 5

Linear, dynamic, non-linear and stochastic programming

5.1 Introduction to programming problems

Programming problems deals with the use or allocation of resources in the best possible manner in terms of minimizing cost or maximizing profit. Resources are materials, labour, machines, transportation capacity, energy, capital and so forth. In general resources are limited. In this chapter four types of programming are introduced:

• Linear programming

• Dynamic programming

• Non-linear programming

• Stochastic programming

The learning objective of this chapter is to understand these type of programming and obtain skills to apply the programming techniques. This means that after the end of this chapter the student shall be able to formulate problems and solve problems by appropriate tools such as MS Excel. The deeper understanding of the mathematical ideas behind these methods are left for courses in operations research.

5.2 Linear programming

Linear programming deals with problems defined by the following conditions:

1. The decision variables are non-negative

2. The objective function is given as a linear function of the decision variables. The linear assumption implies that only the first powers of the decision variables are included, andno cross terms are allowed.

3. Constraints in terms of limitation of resources can be expressed as a set of linear equations or linear inequalities.

37 5.2.1 Motivating example As an introduction example consider the shipment of goods from Trondheim to Oslo. A truck has a capacity to take 100 units. There are four products types that could be transported. The profit per unit depends on the the product type and are given by c1 = 10, c2 = 5, c3 = 8 and c4 = 4 for product type 1, 2,..., 4. Let xi denote the number of units to take of product type i. The objective function is then given by:

Z(x1, x2, x3, x4) = 10x1 + 5x2 + 8x3 + 4x4 = c1x1 + c2x2 + c3x3 + c4x4 (5.1) There are some constraints in addition to the truck capacity. The customer in Oslo has storage restrictions implying that 2x1 + 3x2 cannot exceed 50. Further there is a maximum of 50 units available in Trondheim of product type 3. The constrains can thus be formulated by:

x1 + x2 + x3 + x4 ≤ 100 = b1

2x1 + 3x2 ≤ 50 = b2 (5.2)

x3 ≤ 50 = b3

In this problem it is quite obvious that the number of units cannot be negative, hence x1 ≥ 0, x2 ≥ 0, . . . , x4 ≥ 0. The LinearProgrammingIntroExample.xlsx file provides a solution to this transportation problem. The optimal solution is given by x1 = 25, x2 = 0, x3 = 50 and x4 = 25 giving a final profit equal to Z = 750.

5.2.2 Linear programming problem on standard form A linear programming problem generally comprises an objective function to maximize or minimize, and a set of constraints. The objective function, Z() is always a function of the decision variables xj, j = 1, . . . , n but usually we do not explicitly state this relation on the left hand side of the equation. Further we assume m restrictions regarding linear combinations of the decision variables. Finally all decision variables and some parameters have to be non-negative. This is written:

Maximize: Z = c1x1 + c2x2 + ··· + cnxn (5.3)

Subject to: a11x1 + a12x2 + ··· + a1nxn = b1

a21x1 + a22x2 + ··· + a2nxn = b2 . . (5.4) am1x1 + am2x2 + ··· + amnxn = bm

x1 ≥ 0, x2 ≥ 0, . . . , xn ≥ 0

b1 ≥ 0, b2 ≥ 0, . . . , bm ≥ 0 Note that the LP problem formulation given by Equation (5.3) and the following constraints could be written in matrix form: Maximize: Z = cx

Subject to: Ax = b x ≥ 0 b ≥ 0

38 A linear programming problem can be solved by the SIMPLEX method which we will come back to. The motivating example was not exactly on standard form because there were some inequalities in the constraints, i.e., · · · ≤ bi rather than ··· = bi. In other cases there are other types of restrictions that do not completely match the standard format of the problem. The following steps may be used to transform a linear programming problem into standard form:

• If the problem is to minimize Z convert the problem to a problem that maximizes −Z.

• If there is a less or equal inequality in one or more rows of the constraints, i.e., ai1x1 +ai2x2 + . . . ainxn ≤ bi, convert it into an equality constraint by adding a nonnegative slack variable si yielding the resulting constraint: ai1x1 + ai2x2 + . . . ainxn + si = bi, where si ≥ 0.

• If there is a greater or equal inequality in one or more rows of the constraints, i.e., ai1x1 + ai2x2 + . . . ainxn ≥ bi, convert it into an equality constraint by subtracting a nonnegative surplus variable si yielding the resulting constraint: ai1x1 + ai2x2 + . . . ainxn − si = bi, where si ≥ 0.

• If some of the bi’s are negative, multiply that constraint equation by -1. a − b • If one of the decision variables, say xj is unrestricted in sign, replace it by xj xj everywhere a ≥ b ≥ in the problem formulation and add to the constraints: xj 0 and xj 0.

a b Note that si, xj and xj will be renamed to maintain a set of n decision variables, xj, j = 1 . . . n.

5.2.3 Solving the linear programming problem by the SIMPLEX method To maximize the objective function in Equation (5.3) and the related constraints some definitions are required. The term system is used to denote an objective function with related constraints. Two systems are said to be equivalent if they have the same solution set. In the SIMPLEX method the original system is in a stepwise manner transferred to equivalent systems until we easily can read the solution from such an equivalent system. Such a procedure is similar to the classical Gauss-Jordan elimination procedure for solving a system of linear equations. To obtain equivalent systems the following matrix operations are allowed:

1. Multiply any equation in the system by a constant.

2. Add to any equation a constant multiple of any other equation in the system.

The required definitions we need are:

• Basic variable. A variable xj is a basic variable if and only if aij = 1 for exactly one equation, say i, and akj = 0 for all other equations, i.e., k ≠ i. • A pivot operation for a specific variable is a set of matrix operations where one systemis transformed to an equivalent system where that variable becomes a basic variable.

• A canonical system is a system where there is at least one basic variable in each equation. This means that the number of basic variables equals the number of equations.

• A basic solution is obtained from a canonical system by setting the non-basic variables to zero and solving for the basic variables.

• A basic feasible solution is a basic solution where the basic variables are non-negative.

39 • An adjacent basic feasible solution differs from the present basic feasible solution in exactly one basic variable.

The SIMPLEX method can now be explained. The first step is to express the problem in standard form as discussed above. The idea is to identify a set of basic variables by modifying the constraints equations so that we get a canonical system. The reason we need a canonical system is that we then easily can find a basic feasible solution. A basic feasible solution is not necessarily anoptimal solution, but it is a starting point. The SIMPLEX method then seeks to improve the solution by finding a better adjacent basic feasible solution. This is performed by testing all non-basic variables with respect to if they would improve the solution if entered into the system, and the one to be added is the one that gives the best relative improvement of the solution. If a non-basic variable shall be entered into the system, one of the existing basic variables has to be taken out. The one to take out is found by first calculating an upper limit the new basic variable can take ineach constraint equation. Since all constraint equations have to be fulfilled, the equation where the least increase could be achieved will limit the upper limit for the new variable, and for that constraint equation the existing basic variable will be reduced to zero, and hence is not a basic variable any more. That is, a better adjacent basic feasible solution has been found. This process continues until no better adjacent basic feasible solution could be found.

5.2.4 Demonstration of the SIMPLEX method The introduction example will be used to demonstrate the SIMPLEX method. Some extra cal- culation formulas will be introduced during the presentation. The original objective function and constraints are given in Equations (5.1) and (5.2) respectively. Since the constraints contain in- equalities slack variables x5, x6 and x7 are added as shown in Equations (5.5) and (5.6) respectively:

Maximize: Z = 10x1 + 5x2 + 8x3 + 4x4 + 0x5 + 0x6 + 0x7 (5.5)

Subject to: x1 + x2 + x3 + x4 + x5 = 100

2x1 + 3x2 + x6 = 50 (5.6)

x3 + x7 = 50

The problem has now been formulated on standard form and the manipulation of equations may start. To organize the equations and intermediate results a so-called tableau is used. Figure 5.1 shows the initial tableau where row 2 contains the objective function coefficients, and rows 7-9 contain the constraint coefficients. To keep track of the basic variables column B for rows 7-9are used for this purpose. The choice of the initial basic variables is a challenge. When slack variables

Figure 5.1: Initial tableau are introduced in the model, as is the case in this example, the slack variables will automatically act

40 as basic variables. In other situations it is not always easy to spot the basic variables. The so-called big M method is often used if basic variables could not be found by inspection. The big M method adds new artificial variables to the system, and these variables are then taken as basic variables by setting aij equal to 1 for artificial variable j in equation i and equal to 0 for the other rows. In the objective function the corresponding cost coefficient, cj is set to a large negative number for a maximization problem, and a large positive number for a minimization problem. Note that we totally need one basic variable for each constraint equation in the system. The SIMPLEX method now proceeds with checking if the system performance improves if one non-basic variable replaces one basic variable. For all non-basic variables which in the existing basic solutions per definition take zero values, we now take one by one and increase it’s valuebyone unit. For non-basic variable xj this means that the objective function will increase by an amount cj. However, if variable xj is increased by one unit, the constraints in the basic solution is no longer fulfilled. To maintain the constraint in equation i we have to reduce the value of the existing basic variable. Let xki be the basic variable in constraint equation i. Since per definition ai,ki equals one we need to reduce the value of xki by ai,j units to maintain constraint equation i. Such a reduction will reduce the objective function by cki ai,j. The relative profit by increasing xj by one unit is then: ∑m − RPj = cj cki ai,j (5.7) i=1 where m is the number of constraints and ki is the index of basic variable in the i’th constraint equation. Figure 5.2 shows the updated tableau where the relative profit for each nonbasic variable is calculated. Note that since cki = 0 for the initial basic variables, the relative profit equals the

Figure 5.2: Tableau with relative profits calculated corresponding coefficients in the objective function. Since RP1 has the highest value, x1 is the non-basic variable to add to the system. The question is then which basic variable to remove. To find out which basic variable to remove we calculate an upper limit for the increaseof xj (j = 1 in the example) for each constraint equation such that the basic variable is still non-negative. The upper limit is given by:

ULi = bi/ai,j (5.8)

Figure 5.3 shows the tableau where the upper limit for increase in x1 is calculated. We see that it is constraint equation number 2 which limits the increase and hence the improvement in the objective function. Therefore basic variable x6 in the second row of constraint equations will now be replaced by x1 which will be the new basic variable. The second row is denoted the pivot row in the system. A pivot operation is applied to make xj, i.e., x1 the new basic variable in the pivot row, i.e., i = 2 in the example, the first step is to divide all the numbers in the pivot rowby ai,j = a2,1 = 2.

41 Figure 5.3: Tableau with upper limit for increase in the candidate variable calculated

This will ensure that the coefficient of the new basic variable in the pivot row equals 1.Then for each of the other equations, say row k, the coefficient of the new basic variable should be0. This is obtained by subtracting a constant multiple of the pivot row equation to each row. The multiplication constant is ak,j for equation k.

Figure 5.4: Tableau with variable x1 inserted as the new basic variable

Figure 5.4 shows the tableau with variable x1 inserted as the new basic variable. When the new values of the basic variables are calculated we see that the objective function has increased from 0 to 250. Now, the SIMPLEX method continues to search for the next non-basic variable to test for inclusion in the model. Figure 5.5 shows the tableau with calculation of relative profit and corre- sponding upper limit for the best non-basic variable to add in the second iteration. The calculations shows that variable x3 should replace variable x7 in row 3. In the third iteration variable x4 will

Figure 5.5: Tableau with calculation of relative profit and corresponding upper limit for the best nonbasic variable replace variable x5 in row 1, and further calculation of relative profit shows that non of the nonbasic variables will improve the solution as shown in Figure 5.5. The final solution is given by x1 = 25, x2 = 0, x3 = 50 and x4 = 25 giving a final objective function equal to Z = 750.

5.2.5 Summing up the SIMPLEX method 1. Make sure that the system is on standard form. Add extra variables, multiply with -1 and so on.

42 Figure 5.6: Final tableau

2. Choose a set of basic variables to ensure we have a system on canonical form. If not found by inspection, use the big M method.

3. Set up the tableau, and ensure that the initial basic variables has corresponding ai,j = 0 for the equations where it is not included, and ai,j = 1 in the equation where it is included, i.e., that they are basic variables ∑ − m 4. Check for all non-basic variables if the relative profit RPj = cj i=1 cki ai,j is positive, and then choose the non-basic variable with highest relative profit for inclusion in the set of basic variables. If none of the non-basic variables have a positive relative profit we are done. The solution could be read from the tableau.

5. If a positive profit was found let xj be the variable found. For each constraint equation calculate the upper limit ULi = bi/ai,j allowed for xj to increase without violating the corre- sponding constraints.

6. The basic variable in the constraint equation with the lowest upper limit is replaced by the new basic variable xj, i.e., the minimum ratio rule is applied

7. Use pivot operations to ensure we have a new canonical system, i.e., first get aij = 1 for the constraint equation we are inserting xj into, and then get aij = 0 for the other constraint equations

8. GoTo Step 4.

5.2.6 Unique optimal, multiple optimal and unbounded solutions In the example presented the solution found is the only feasible solution that gives the highest value of the objective function. Such a solution is said to be a unique optimal solution. In other LP problems there may exist more than one feasible solution that all give the highest value of the objective function. Such an LP problem is said to have multiple optimal solutions. Finally there might be situations where an LP problem does not have an optimal solution because it is always possible to find a better solution. Such an LP problem is said to havean unbounded solution. Typically an LP problem may be specified without sufficient constraints to prevent a solution that gives an infinite objective function. If this is the case, one should carefully investigate whetherthe constraints are sufficiently described.

5.2.7 Shadow prices In constrained optimization like an LP problem, the shadow price is the change, per infinitesimal unit of the constraint, in the optimal value of the objective function obtained by relaxing the

43 constraint. This typically means that the shadow price associated with a resource tells how much more profit one would get by increasing the amount of that resource by one unit, or put inanother way, “How much one would be willing to pay for an additional resource”.

5.2.8 Solving the LP problem by a computer For practical purposes we need a computer to solve a LP problem. The method presented here is the basis for most computer codes, but refinement of the algorithm has made it much more efficient. We will not discuss the content of such refinements but refer to standard text books in operations research. For very many users MS Excel will be the computer tool to solve a large range of LP prob- lems. In order to utilize MS Excel the ”Solver” need to be installed. The ”Solver” if found under ->File->Options->Add-ins. When the ”Solver” is installed it is found under ->Data. The solver needs 5 types of input:

1. The cell to optimize. This cell should contain a cell that contains the objective function. Although we have used the name Z for the objective function any cell may be used. (Set objective:)

2. The type of optimization. A choice could be set to either minimization or maximization, or even target the objective function to a given value. In MS Excel we therefore do not need to convert a minimization problem to a maximization problem by multiplying the objective function by -1 on both sides as we do in the manual procedure. (To:)

3. Define the decision variables, i.e., those variables we could change in order to findthebest solution. (By Changing Variable Cells:)

4. Define the constraints, i.e., specify the equations that define the constraints. NotethatMS Excel will accept inequalities in the constraints, hence we do not need to add slack variables. (Subject to the Constraints:)

5. Define the solving method. For linear models we always choose the Simplex LP method. For nonlinear models the Simplex LP method does not work. (Select a solving method:)

Figure 5.7 shows the specification screen for the MS Excel Solver. Note the following:

• The cell containing the objective function has been named Z. Further, to calculate Z we used the =SUMPRODUCT() function. This function accepts two arguments, the first argument is the x-values, and the second argument is the objective function coefficients.

• The cells containing the x-variables has been given the name xValues.

• To simplify the specification of the constraints it is convenient to move∑ the constant term to the left-hand side of the inequality sign, i.e., we can create cells containing ai,jxj − bi. These cells are then used in the specification of constraints. Also here the =SUMPRODUCT() function is useful for calculating the sum of products. The three cells containing the constraint equations have been given the name Constraints. When the ≤ comparison is specified it applies to all the constraint equations.

Example 5.1 Production scheduling This example is from [33] where we are scheduling the next 4 weeks of production. The weekly

44 Figure 5.7: Final tableau demands are 300, 700, 900 and 800 items for the next 4 weeks. Production cost is $5 for week 1 and 2, and $10 for week 3 and 4. Normal production cannot exceed 700 units per week. It is possible to use overtime to produce up to 200 extra units per week in week 2 and 3 but then at an additional item cost of $5. Excess production can be stored at a cost of $3 per item (from one month to the other). Table 5.1 shows the decision variables to use and other important quantities for the problem.

Table 5.1: Variables and quantities in Example 5.1 Variable/Quantity Explanation

xi, i = 1 : 4 Normal production week i di, i = 1 : 4 Demand week i, = {300,700,900,800} x5, x6 Overtime in week 2 and 3 respectively xi, i = 7 : 10 Stock level at end of week i − 6

Minimize: Z = 5x1 + 5x2 + 10x3 + 10x4 + 10x5 + 15x6 + 3x7 + 3x8 + 3x9 + 3x10

45 Subject to: x1 ≤ 700

x2 ≤ 700

x3 ≤ 700

x4 ≤ 700

x5 ≤ 200

x6 ≤ 200

x1 − x7 = 300

x2 + x5 + x7 − x8 = 700

x3 + x6 + x8 − x9 = 900

x4 + x9 − x10 = 800 where it is implicitly assumed that all decision variables are non-negative. ∑ An MS Excel solution is shown in Figure 5.8. The constraints equations given in the aij − bi column is named constraintsLE for the ≤ constraints and constraintsEQ for the = constraints respectively for easy specification in MS Excel.

Figure 5.8: MS Excel solution of Example 5.1

Example 5.2 Product mix This example is also from [33] where a company manufactures three products, A, B and C. Table 5.2 shows the resources required for each product.

Table 5.2: Required resources for the various products Company Engineering hours Direct labour hours Materials (kg) A 1 10 3 B 2 4 2 C 1 5 1

The company offers discounts for bulk purchases giving a sales dependent profit perunitas shown in Figure 5.3. Resource constraints for the scheduling is 100 hours of engineering, 700 hours

46 Table 5.3: Unit prices depending on product type and volume A A B B C C Sales Unit Profit Sales Unit Profit Sales Unit Profit (Units) ($) (Units) ($) (Units) ($) 0-40 10 0-50 6 0-100 5 40-100 9 50-100 4 Over 100 4 100-150 8 Over 100 3 Over 150 7 of labour and 400 kg of materials available. For product A we label the first 39 units A1, then units from 40 up to 100 with label A2, units from 100 up to 150 with label A3 and units from 150 and above with label A4. Similarly for product B, the first units up to 50 are labelled B1 and so on. The decision variables are xi, i = 1 : 4 number of units with label Ai, xi, i = 5 : 7 number of units with label Bi−4 and xi, i = 8, 9 number of units with label Ci−7. The objective function to maximize and the constraints are:

Maximize: Z = 10x1 + 9x2 + 8x3 + 7x4

+ 6x5 + 4x6 + 3x7 + 5x8 + 4x9

Subject to: x1 ≤ 39

x2 ≤ 60

x3 ≤ 50

x5 ≤ 49

x6 ≤ 50

x8 ≤ 99

x1 + x2 + x3 + x4 + 2x5 + 2x6 + 2x7 + x8 + x9 ≤ 100

10x1 + 10x2 + 10x3 + 10x4 + 4x5 + 4x6 + 4x7 + 5x8 + 5x9 ≤ 700

3x1 + 3x2 + 3x3 + 3x4 + 2x5 + 2x6 + 2x7 + x8 + x9 ≤ 400 where it is implicitly assumed that all decision variables are nonnegative. The MS Excel specification is shown in Figure 5.9. It is easy to use the model to investigate if it pays off to allocate more resources. For example adding 10 extra engineering hours increases profit by $15 which then has to pay for the extra over time cost. This could have been attractive in 1976 but most likely not today. An interesting result is that access to more engineering resources reduces the amount of product A and increases the amount of product B. A further increase of engineering resources up to approximately 150 hours will result in starting to produce product C. Note that when increasing the engineering hours available from 100 to 101 the marginal increase in profit is $1.5, whereas when increasing from 150 to 151the marginal increase in profit is reduced to $1. Similarly sensitivity analysis could be performed with respect to direct labour hours. Since available material is not spent, it makes no sense to perform sensitivity analysis on the materials available unless both engineering and direct labour hours are increased.

47 Figure 5.9: MS Excel solution of Example 5.2

5.2.9 Mixed integer programming Many LP problem formulations put additional logical constraints on the decision variables. Typi- cally some of the decision variables are restricted to integer values or even binary variables. If this is the case the problem is said to be a mixed integer programming problem. A naive approach to mixed integer programming is to ignore the integer constraint and solve the problem as if it was an ordinary LP problem and then take the nearest integer from the continuous solution for those variables. This solution is however not particular good. To really find an optimal solution there is no general approach that guarantees that we can succeed because all possible combinations of the decision variables have to be investigated, a problem that generally can not be solved. The branch and bound (B&B) algorithm is maybe the most widely method used for solving mixed integer problems in commercial computer codes. The B&B algorithm is just an efficient enumeration procedure for examining all possible integer feasible solutions. The idea of the branch and bound algorithm is presented below but technical details regarding the implementation is left out and the reader is referred to a operations research text book for further reading. A mixed integer programming problem (MIP) is generally given on the following form:

Maximize: Z = cx (5.9)

Subject to: Ax = b x ≥ 0 b ≥ 0

xj is an integer forj ∈ I where I is the set of all integer variables. The B&B algorithm starts with maximizing the objective function in Equation (5.10) ignoring the integer restrictions. This problem is denoted LP-1 and will give a maximum objective function, say Z1. Figure 5.10 is used as an illustrative example where we assume x1 and x2 are restricted to integer values. Since the integer restrictions were ignored in LP-1 it is likely that the solution for some integer variables contain fractional values. The idea now is to split the LP-1 problem into two new problems, say LP-2 and LP-3. These problems are identical to the LP-1 problem

48 LP-1 Z1 = 9

x2 ≤ 1 x2 ≥ 2

LP-2 Z2 = 8 LP-3 Z3 = 8.5

x1 ≤ 4 x1 ≥ 5

LP-4 Z4 = 7 LP-5 Infeasible Figure 5.10: B&B example but a new restriction is introduced for each of LP-2 and LP-3. The restrictions relate to one of the integer variables that contains a fractional value. Various principle for selecting this branching variable exist in the literature but will not be discussed here. Assume that variable xj is taken as the branching variable and that the LP-1 solution gave an optimal value of xj = βj. Since xj = βj is not a feasible solution due to the integer restriction we add xj ≤ int(βj) as a new restriction to the LP-2 problem, and similarly xj ≥ int(βj) + 1 as a new restriction to the LP-3 problem. For example if LP-1 has an optimal solution with x2 = 1.5 Figure 5.10 shows as an example that we have split the problem into LP-2 with an additional restriction x2 ≤ 1 and into LP-3 with an additional restriction x2 ≥ 2. The B&B algorithm now proceeds by solving the LP-2 and LP-3 problems, again as if they were standard LP problems, i.e., ignoring the integer restrictions. Figure 5.10 shows as an example that LP-2 has an optimal solution of Z2 = 8 and LP-3 has an optimal solution of Z2 = 8.5. Assume that LP-3 has a feasible solution, i.e., that integer restrictions are fulfilled. We proceed with LP-3 and split into two new problems, say LP-4 and LP-5 and so on. Here the integer variable x1 is used to put new restrictions on the LP-problems. Occasionally the new restrictions put on one the integer variables will give integer value for all the integer variables in the optimal value in the solution, and we are closer to the solution. If this is the case we have found a solution for that particular LP-problem. It is no guarantee that it is optimal, but it is at least a lower bound for the optimal solution. This also means all LP-problems with a Z-value lower than the best feasible solution up to now can be ignored for further investigation. In the example in Figure 5.10 LP-4 has a solution less favourable compared to the best feasible solution in LP-2, and since LP-5 gave no feasible solution the best solution found is the solution found in LP-2. There is no guarantee that an optimal solution is found, but we are able to squeeze the solution into limits which might be good enough. When one LP-problem is branched into two new problems it is not evident which one to proceed with. Some principles are discussed in the literature, and again we refer to an operation research textbook for further reading. In MS Excel we may specify that some decision variables are integer or binary variables. This is done when specifying the constraint comparison symbol where we can use ”int” or ”bin”. For example in Example 5.2 we may add integer restriction on all variables. This is done by adding another constraint equation: xValues = integer. The solution is slightly changed, and the new maximum value is reduced from 714 to 711.

Example 5.3 Production scheduling with fixed costs We will now revisit Example 5.1 but assume that there is a fixed cost in addition to the variable cost if there is a production in period i. Numbers are slightly changed from Example 5.1 and shown in Table 5.4. The introduction of the binary variable is a tricks to take fixed costs into account.

49 Table 5.4: Variables and quantities in Example 5.3 Quantity Explanation

xj Normal production in period j, i.e., number of items produced CN,j Normal production capacity in period j, = 700 ∀j cI,j Cost per item produced without overtime in period j, = {5,5,10,10} cF,j Fixed cost in period j if xi > 0, = 3000 ∀j yj indicator variable for production in period j, yj = 1 if xj > 0 dj Demand in period j, = {500,300,900,800} vj Items produced by means of overtime in period j CO,j Overtime production capacity in period j, = 200 ∀j cO,j Cost per item using overtime in period j, = {10,10,15,15} cH,j Holding cost per item at end of period j, = 3∀i uj Stock level at end of period j

The LP problem is now given by: ∑ ∑ ∑ ∑ Minimize: Z = cI,jxj + cF,jyj + cO,jvj + cH,juj j j j j

Subject to: xj ≤ CN,jyj

vj ≤ CO,jyj

xj + vj + uj−1 − uj = dj(u0 = 0)

yj is binary

Note that the constraints xj ≤ CN,jyj ensure that there is no production in period j if yj = 0. Note further that the notation deviates from the standard notation but it is straight forward to modify the notation to get the problem on standard form. Figure 5.11 shows the MS Excel specification and the solution. Note that the large fixed cost gives a solution where there is no production in period 3. The fixed cost savings can defend rather large holding costs in the two first periods.

Problem 5.1 Introduce slack variables to solve the following LP problem, and verify the solution by the Solver in MS Excel:

Maximize: Z = 8x1 + 10x2 + 6x3 + 3x4

Subject to: x1 + x2 + x3 + x4 ≤ 50

x2 + x3 ≤ 20

2x3 − x4 ≤ 40

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0

Problem 5.2 Transform the following linear program to the standard form:

Maximize: Z = −5x1 + 4x2 − 3x3 + 5x4

50 Figure 5.11: MS Excel solution of Example 5.3

Subject to: 4x1 − x2 + 3x3 − x4 = −3

x1 + x2 + 4x3 − 2x4 ≤ 15

−2x1 + 3x2 − 2x3 + 2x4 ≥ 3

x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 unrestricted in sign

Problem 5.3 Solve the LP problem in 5.2 by using the SIMPLEX method. Verify the solution with MS Excel. Note: You will run into problems, also reported by MS Excel.

Problem 5.4 This problem is adopted from [33]. A shoe store sells rubber shoes for the winter season. The sales division has forecasted the demand for the next season given in Table 5.5. We assume a perfect forecast, i.e., a deterministic model will be applied.

Table 5.5: Montly demand Month Demand October 40 November 20 December 30 January 40 February 30 March 20

The store purchases pairs of shoes from a manufacturer at a purchasing cost of 100 NOKs per pair. Due to seasonal effects there is an extra cost of 10 NOKs per pair in December and anextra cost of 20 NOKs per pair in January. The maximum order size is 50 pairs. For each order there is a fixed cost of 200 NOKs to account for transportation, insurance, packaging, and so on. The shop has limited capacity and can not carry an inventory level higher than 30 at the end of each month. There is a holding cost of 5 NOKs per pair based on the end- of-month inventory level. The season starts with an empty inventory, and should also end with an empty inventory.

51 1. Discuss the fixed cost, and why this information cannot be included in the LP-model, unless you have ideas how to do it

2. Formulate the problem as an LP problem, not necessarily on standard form

3. Solve the problem, for example by using MS Excel

4. What would be reasonable price to pay to increase inventory level by end of the month to 40?

Problem 5.5 Introduce binary variables in Problem 5.4 to model fixed cost and find a mixed integer solution to the problem by using MS Excel.

Problem 5.6 To be solved without a computer Consider a fish processing facility with a holding of high and low quality fish (raw material) shown in Table 5.6. Based on the holding the objective is to plan today’s production. Relevant parameters for the optimization are shown in Table 5.7. Production rates are shown in Table 5.8, where the rates are processing rates of raw materials and not the final products. Today, there are 5 workers on job today. Production of fish fillet requires high quality raw material, whereas fish au gratin can utilize both high and low quality raw material.

• Formulate the problem as a linear programming problem on standard form

• Use the SIMPLEX method to obtain the optimal production profile.

5.3 Dynamic programming

Dynamic programming (DP) is a method for finding an optimal strategy to apply to a dynamical system that evolves over time. Let N be the number of stages a system runs through. Further the following quantities are defined for stage number n:

• Sn = input state, i.e., the system state when the system enters state n

• dn = decision at stage n, i.e., what we can do to influence the system performance

• S˜n = output state, i.e., the system state when the system leaves state n

• rn(Sn, dn) = return function, typically a cost or profit function that depends on the system state and the decision made

Figure 5.12 shows the main elements of an N-stage multistage system. The stage numbering is typically in an opposite direction compared to the flow of information. Usually we solve the problem computationally by starting at stage 1 and proceed from right to left to stage N. Such an approach is denoted backward analysis or backward recursion. In a

Table 5.6: Current level and expected supply under two scenarios for Problem 5.6 Parameters Values Holding high quality 2 000 Holding low quality 3 000

52 Table 5.7: Parameters for Problem 5.6 Parameters Values Profit per kg frozen fish fillet 20 Profit per kg fresh fish fillet 40 Profit per kg fish au gratin 10 Utilization frozen fish fillet 70% Utilization fresh fish fillet 60% Utilization fish au gratin 80%

Table 5.8: Production rates for Problem 5.6 Production rates Values Frozen fish fillet per worker per day 500 Fresh fish fillet per worker per day 750 Fish au gratin per worker per day 1 000

backward analysis we always have that Sn−1 = S˜n. The idea is that for the rightmost stage it is rather easy to find the optimal solution for a given input, i.e., for agiven S1 we easily find the best decision d1. When proceeding left to stage 2 we then know all the stage 1 optimal decisions and corresponding return functions r1 for each of the output values S˜2. Then adding r1 and r2 for possible d2 decisions makes it possible to find optimal decisions at stage 2 for a given input S2. We then proceed till stage N is reached. The change in system state in a given stage is governed by some transition function, i.e.,:

S˜n = gn(Sn, dn) (5.10)

The return function rn(Sn, dn) specifies what is the cost or profit related to what happens instage n. The accumulated total return calculated over n stages, i.e., from right to left or numerically ∗ from 1 to n is denoted fn(Sn, dn). Further fn(Sn) is the accumulated optimal returns up to stage n. ∗ Mathematically it is a challenge to be explicit on the formulation of fn(Sn, dn) and fn(Sn). Typically these functions are formulated recursively.

Elements of an optimal decision policy The idea of dynamic programming is to start at the rightmost stage, i.e., at stage 1. If the input to stage 1 is known, i.e., we know the value of S1, then it is rather easy to obtain the optimal decision, ∗ say d1 at stage 1, since what has happened previously should not affect the best thing to do at this ∗ ∗ final stage. Note that d1 = d1(S1) and obviously depends on the return function, r1(S1, d1). rN rN−1 rn r1 ˜ ˜ SN SN SN−1 SN−1 Sn S˜n S1 S˜1 N N-1 n 1

dN dN−1 dn d1 Figure 5.12: An N-stage multistage system

53 Then at stage 2 we may in a similar way ignore what has happened up to stage 2 as long as ∗ we know S2. To find an optimal decision, say d2 at stage 2 we need to consider both the return function, r2(S2, d2) but also all possible scenarios for stage 1. Since S1 = S˜2 this means that we ∗ ∗ ∗ ˜ may use d1 = d1(S1) = d1(S2) to find the optimal return function contribution from stage 1when optimizing at stage 2. More generally, at stage n we ignore what has happened previously, i.e., for stages left to n (higher values than n), when finding the optimal decision at stage n. We have to consider accumulated contribution to the return function to the right of stage n. That is, in the optimization ∗ ˜ at stage n we utilize fn−1(Sn) which in principle is known for each output state of stage n. Finally, at stageN the final optimization is performed, i.e., the calculation steps performed at stage n = N yields the optimal policy. To solve such a problem by a computer code we need some functions. Let g(n,S,d) be a function that implements the transition that takes place in stage n given that decision d is implemented, see Equation (5.10). Further let r(n,S,d) be a function that implements the return in stage n given that decision d is implemented. In most cases the accumulated returns are found by adding the return function contributions. In such situations we could then write a recursive function, f(), ∗ implementing fn(Sn) on the form: f(n, S) = opt{r(n, S, d) + f(n-1, g(n,S,d))} where opt is the optimization with respect to decision dn at stage n. Now, assume that S_N is the initial state of the system, the optimal strategy is found by calling f(N, S_N). Note that f() needs to be implemented in a programming language that allows recursive calls because this function calls itself. Further note that the all the functions need to be carefully designed to take care off all boundary conditions, for example to stop when state 0 is invoked! Further, since the function calls are recursive, the number of calls may explode in magnitude.

5.3.1 Worked example This example is adopted from [33]. A shoe store sells rubber shoes for the winter season. The sales division has forecasted the demand for the next season given in Table 5.9. We assume a perfect forecast, i.e., a deterministic model will be applied.

Table 5.9: Montly demand Month Demand October 40 November 20 December 30 January 40 February 30 March 20

The store purchases pairs of shoes from a manufacturer at a purchasing cost of 4 $ per pair (1976 prices). The supplier only sells in lots of 10, 20, 30, 40 or 50 pair. A discount is achieved for large orders as given in Table 5.10. For each order there is a fixed cost of 10 $ to account for transportation, insurance, packaging, and so on. The shop has limited capacity and can not carry an inventory level higher than 40 at the end of each month. There is a holding cost of 0.3 $ based on the end-of-month inventory level.

54 Table 5.10: Discount depending on order quantity Lot Size Discount 0 0% 10 5% 20 5% 30 10% 40 20% 50 25%

The season starts with an empty inventory, and should also end with an empty inventory. Table 5.11 gives summarized information.

Table 5.11: Additional parameters Parameter Value Description

cP 4 Purchasing cost per unit cF 10 Fixed cost per order cH 0.2 Holding cost per unit at end of month SL 40 Storage limitation at end of month S6 0 Inventory level at the beginning of the season S0 = S˜1 0 Inventory level at the end of the season

The cost structure favours large orders, but the downside is a holding cost at the end of each month. Also large order could save the fixed cost if we could escape from ordering one month. It is assumed that the pair of shoes arrives the first day of the month. The backward analysis means that the system runs through 6 stages, where October corresponds to n = 6, November to n = 5 and so on to March where n = 1. The decision variables, dn is the number of pairs to order for each month. The state variable Sn is the number of pairs in the beginning of the month when the order from the manufacturer has arrived. From Table 5.9 we can read the demand, Dn each month, again remembering that October corresponds to n = 6 and so on. The transition function must relate the state variable at stage n to that of stage n − 1:

Sn−1 = Sn + dn − Dn (5.11)

The return function comprises the cost of an order and holding cost at the end of the month:

rn(Sn, dn) = ϕ(dn) + cH(Sn + dn − Dn) (5.12) where ϕ(dn) is the cost of an order comprising the fixed cost and the discounted variable cost per pair of shoes. It is assumed that ϕ(0) = 0. Now, assume that the transition function in Equation (5.11) and the return function in Equation (5.12) are implemented by the functions g(n,S,d) and r(n,S,d) respectively. Then we need to write a function for the accumulated return. Below is given the main structure of the computer code we need. The programming language is VBA, i.e., the built in Visual Basic for Microsoft applications.

Function f(n As Integer, S As Integer)

55 If n = 0 Then f = 0 Exit Function End If fOptimum = infty For Each d In lotSizes isFeasible = True If (g(n, S, d) > 40 Or g(n, S, d) < 0) Then isFeasible = False If (n = 1 And g(n, S, d) <> 0) Then isFeasible = False If isFeasible Then fTest = r(n, S, d) + f(n - 1, g(n, S, d)) If fTest < fOptimum Then fOptimum = fTest End If End If Next f = fOptimum End Function

The recursive formulation causes the f(n,S) function to call it self recursively starting from n = N = 6. At the lowest level, i.e., n = 1 there are only 3 feasible values of S1 to consider, i.e., 0, 10 and 20. It is easy to verify that the optimal decisions are given in Table 5.12

Table 5.12: Intermediate results for n = 1 ∗ ∗ S1 d1 f1 0 20 86 10 10 48 20 0 0

These results for n = 1 are needed several times when the function is called from stage n = 2. Therefore it would be a good idea to save the optimal values. This applies also for all other sates. In fact, for this example we need to calculate fn() more than 1 100 times, which reduces to around 30 calculations by clever saving of intermediate results. In computing, memoization is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again. Memoization here means to save calculated values for f ∗ for (S, n) combinations. The intermediate results for the other (Sn, n) combinations, i.e., n > 1 is not given in this presentation. Note that since the function f(n,S) only returns the accumulated cost, but we have not kept track of the optimal decision at the various levels. A technical solutions to this challenge is left for the reader as an exercise. Table 5.13 summarizes the final results for the DP analysis.

Problem 5.7 A directed acyclic graph (DAG) is a finite directed graph with no directed cycles. That is, it consists of finitely many nodes (or more precisely vertices) and arrows (or more precisely edges) with each arrow directed from one node to another, such that there is no way to start at any node v and follow a sequence of arrows that eventually loops back to v again. In this problem

56 Table 5.13: Final results ∗ ˜ ∗ Month fn Sn Sn Dd dn October 606 0 0 40 40 November 468 0 30 20 50 December 302 30 0 30 0 January 302 0 0 40 40 February 164 0 20 30 50 March 0 20 0 20 0

6 1 3 1 2 s 4 1 5

2 3 1 2 4 Figure 5.13: DAG with travelling times we investigate the shortest path from the a source s to a destination node n. Generally we assume n + 1 nodes. Figure 5.13 shows the DAG to be analysed where the distances are shown on the arrows. We let di,j denote the distance form node i to node j. In such a notation the source node is given the number 0. Write a dynamic programming function in VBA that finds the shortest path from the source s to the destination node n. Hint: Let Pj be the set of predecessors of node j, i.e., the set of nodes where there is an arrow from each node in Pj to node j. Further let D(j) denote the shortest path from node s to node j. Verify the following:

D(j) = min (di,j + D(i)) i∈Pj where D(0) = 0. Use this result to write the required recursive function and find the length of the minimal path from node s to each of the other nodes in the Figure 5.13.

Problem 5.8 The dynamic lot-size model in inventory theory takes into account that demand for the product varies over time and was introduced by Harvey M. Wagner and Thomson M. Whitin in 1958 [35]. The model is identical in the case where we are purchasing the items or we are producing them ourselves. In the following we assume that we are producing the items our selves. Let xt denote the number of items to produce in period t, where the number of periods to consider is N. The production cost per item in period t is cI,t and the fixed cost in period t is cF,t if there is some production. The demand in period t is dt. Finally there is a holding cost of cH,t per item at end of period t. The problem is how many units xt in each period to produce to minimize the sum of setup cost and inventory cost. Let It denote the inventory level at the end of period t: ∑t ∑t It = xj − dj ≥ 0 j=1 j=1

57 Wagner and Whitin [35] proved the following four theorems that are useful for finding an optimal solution:

1. There exists an optimal program such that It−1xt = 0, ∀t. This means that either the inventory level is empty at the beginning of period t, and/or there is no production in period t for an optimal program. ∑ ∀ k 2. There exists an optimal program such that t either xt = 0 or xt = j=t dj for some k, t ≤ k ≤ N. This means that either we do not produce in period t, or we produce exactly the demand for k ≥ 1 consecutive periods.

3. There exists an optimal program such that if dt is satisfied by some xr, r < t, then ds, s = r + 1, . . . , t − 1, is also satisfied by xr. This means that there is no additional production in between r and t.

4. Given that It = 0, it is optimal to consider periods 1 through t by themselves. That is, we do not need to take into account what is happening after period t.

These theorem state that if there is production in period∑ t the inventory is always empty at the k ≤ ≤ start of that period. The amount to produce is xt = j=t dj for some k, t k N. Let fk be the minimum costs for periods 1 through k when Ik = 0. Further let mjk be the∑ cost incurred by k producing in period j for all periods j through k, i.e., the cost of producing i=j di in period j. mjk is given by: ∑k−1 ∑k mjk = cF,j + cI,jxj + cH,t dr t=j r=t+1 The recursive formulation of the optimization problem for periods 1 through k is given by:

fk = min [fj−1 + mjk] for k = 1,...,N 1≤j≤k where f0 = 0. − ∗ Note that if we for k 1 found an optimal j, i.e., jk−1 such that the inventory is empty at the ∗ ∗ start of period jk−1, we don’t need to look for j values below jk−1 when searching for the minimum. Write the recursive optimization algorithm in VBA and test it with the data in Table 5.14.

Table 5.14: Testdata for the Wagner and Whitin algorithm Cost t = 1 t = 2 t = 3 t = 4

dt 60 100 140 200 cF,t 150 140 160 160 cI,t 7 7 8 7 cH,t 1 1 2 2

Problem 5.9 To be solved without a computer Assume you have an initial capital of cI = 10 units (for example million NOKs) you are going to consume over N = 5 years. Let ct be the consumption in year t. The utility of consuming a capital of c in one year is u(c) = ln c. Further consumption late is consider less valuable, hence the utility is discounted by a factor b = 0.8 each year. Explain how to use dynamic programming to find the optimal consumption per year over the period. Argue why the following formula is relevant to solve the problem:

58 f(n, S) = max{Log(c) + b * f(n - 1, S - c)}

∗ Table 5.15 shows the optimal value of f4 (S4). Use this to find the optimal consumption the first year. Assume you can only spend integer values each year.

Table 5.15: Intermediate results for Problem 5.9, n = 4 ∗ Sn fn(Sn) 5 0.69 6 1.25 7 1.69 8 2.1 9 2.45

5.4 Nonlinear programming

If the problem at hand can be formulated as a standard LP problem as given by Equation (5.3) and the corresponding constraint equations it is computationally straight forward to find an optimal solution. Although integer variable restrictions may cause some trouble, we are usually able to solve the problem by branch and bound algorithms. Many real life problems could, however, not be formulated as an LP problem either because of a nonlinear objective function, or because the constraints could not be formulated as linear functions of the decision variables. An intuitive solution to nonlinear optimization is to take the derivatives of the objective function and search for roots in the set of equations obtain. Newton’s method (also known as the Newton–Raphson method) is a method for finding successively better approximations to the roots of a real-valued functions. However, this approach has two major disadvantages. First of all we require to calculate the derivatives, and further the method may converge very slowly or even fail to converge. Therefore other approaches are usually required. Gradient based methods calculates numerically or analytically the gradient of the objective function and could be applied to improve a current solution. MS Excel has implemented the so- called Generalized Reduced Gradient method in the Solver. This approach is recommended to use for problems that are smooth nonlinear. Genetic algorithms represent a class of approaches for optimization inspired by natural selection. The idea is to construct and breed a population of candidate solutions points. Fitness measures are calculated for each candidate solution. The idea then is to match randomly candidates to see if better solutions could be obtained. Such approaches are also denoted evolutionary methods and recommended by MS Excel for problems that are non-smooth. Nonlinear programming is one of the core objectives of courses in numerical methods and the reader is advised to consult a textbook in numerical methods for further reading.

Example 5.4 Maintenance optimization Preventive maintenance is a means to reduce the effective failure rate of a component. Let τ denote the length of intervals between a preventive maintenance action. The effective failure rate is often approximated by the following approximation: ( ) [ ] α 2 − Γ(1 + 1/α) α−1 0.1ατ (0.09α 0.2)τ λE(τ) ≈ τ 1 − + MTTF MTTF2 MTTF

59 Where MTTF = Mean Time To Failure = The expected time from a new component is put in operation until it fails when no preventive maintenance is performed. α is a parameter describing the strength of the degradation leading to failures. In MS Excel the gamma function may be found by GAMMA(x). Assume that there is a cost cPM each time a preventive maintenance action is carried out. Further assume that there is an unavailability loss per hour of cU when the component is down for a corrective repair action. In addition to the unavailability there is corrective repair cost of cCM. The MDT = Mean Down Time depends on the whether a spare part is available in the stock. Let s denote the stock size after replenishment of spares. Assume there is a demand from the spare part stock of ρ spares in the period between replenishment. Table 5.16 shows numerical values for the relevant parameters in the optimization.

Table 5.16: Data for nonlinear programming Parameter Value Comment MTTF 8 760 MTTF witout preventive maintenance α 3 Medium ageing cPM 10 000 Cost per preventive maintenance action cCM 20 000 Cost per corrective maintenance action cU 50 000 Unavialability cost per hour downtime MDTW 4 Mean downtime with a spare part available MDTWO 72 Mean downtime without a spare part available ρ 5 Rate of withdraws from the stock cS 1 Cost to keep one spare in stock per hour

Let p denote the probability that there is no spare part available upon a failure. Let N(t) be the number of spares withdrawn from the stock at time t in a replenishment period. The probability that the stock is empty is then given by p(t) = Pr(N(t) ≥ s). It is reasonable to assume that N(t) is Poisson distributed with parameter ρt/t0 where t0 is the length of a replenishment period. Let p denote the average value of p(t) in one period. The unconditional mean downtime is then given by:

MDT = MDT(s) = pMDTWO + (1 − p)MDTW where p is a function of s implicitly given by the Poisson argument. The objective function to minimize is:

C(τ, s) = cPM/τ + λE(τ)[cCM + cUMDT(s)] + cSs

This is a mixed integer programming problem. The optimal solution is found for s∗ = 6 and τ ∗ = 1 968 hours. For a fixed value of s the MS Excel solver finds a solution using the GRG Nonlinear engine, but fails to find a solution for both s and τ simultaneously. The Evolutionary engine finds a solution after some seconds.

Problem 5.10 Write a simple code in VBA to minimize the objective function in Example 5.4 where you test all integer values of s up to 10 in combination with all values for τ = 1000, 1100,..., 2500. ♢

60 5.5 Stochastic programming

5.5.1 Introduction In the standard linear programming problem discussed in section 5.2 we assumed all quantities to be know in advance. That is, the LP problem formulation was given by Equation (5.3), and to repeat in matrix form we have:

Maximize: Z = cx (5.13)

Subject to: Ax = b x ≥ 0 (5.14) b ≥ 0

The notation used is that x is a decision vector which is controlled by the decision maker, c is the profit/cost vector, A is the coefficient matrix and b is the requirement vector. Up to now we have assumed that c, A and b are all known at the time of the decision to be made, that is we have a deterministic optimization problem at hand. In real life, we often have to make decisions when some of these quantities are not known. Typically the requirement vector could be uncertain, and we need to treat this vector as a stochastic variable. The requirement vector often represents resource constraints which is only or partly known at a later stage in the decision process. Stochastic programming is a technique to solve such challenges. In this presentation we limit the presentation of stochastic programming only to deal with linear programming situations. To motivate for the formalism to be introduced, consider a fish processing facility where we have to plan tomorrow’s production before we actually know the catch, i.e., the amount and quality of fish available for production. Fish of high quality can be used for production of fresh fillet. Frozen fish can be produced from both high and low quality fish. It is assumed that the selling priceof fresh fillet is higher than of frozen fish, but production of fresh fillet requires more workload.Today we have to decided upon how many workers, x to allocate for the production. x is often referred to as the here an now decision. The production profile, y can be determined at a later stage, i.e., when we know the catch. The objective function is now split into two objectives function, Z1 and Z2. The maximization (minimization) of Z1 = cx subject to some Ax = b is denoted the first stage optimization. In the example this corresponds to finding the number of workers to allocate for tomorrows production. This is obvious not straight forward since we need to take into account the catch during the night. Now, assume that we can describe tomorrows situation by some stochastic vector U. Given that we know the value of this vector, say U = u there is a second stage optimization problem defined by:

Maximize: Z2 = q(u)y (5.15)

Subject to: B(u)x + C(u)y = d(u) y ≥ 0 (5.16) d(u) ≥ 0

Since the decision to make tomorrow is influenced by the decision we make today through the constraints B(u)x we are facing the classical two stage linear stochastic programming problem were the decision the two days are interconnected. In the first stage we need to take into account what will be the result of the second stage decision but this again depends on the unknown vector

61 U. The idea now is to add the expected value of the best solution tomorrow to the today’s objective function:

Maximize: Z1,2 = cx + EU [Q(x, U)] (5.17)

Subject to: Ax = b x ≥ 0 (5.18) b ≥ 0 where the expectation is taken with respect to the stochastic vector U and Q(x, u) is the solution to:

Maximize: Z2 = q(u)y (5.19)

Subject to: B(u)x + C(u)y = d(u) y ≥ 0 (5.20) d(u) ≥ 0

In the second stage problem the profit/cost vector q(u) and the requirement vector d(u) both depend on the value, u, the stochastic vector U will take. Further the coefficient matrix in B(u)x + C(u)y is built up by two terms: B(u) relates to the first stage decision vector x and C(u) relates to the second stage decision vector y. Both matrices depend on the value the stochastic vector U will take. In some presentations the second stage problem is considered as a recourse action where the term C(u) compensates for a possible inconsistency of the system B(u)x = d(u) and q(u)y is the cost of this recourse action. In the example −cx represents the cost of allocating x workers for tomorrows production. q(u)y is the profit of the tomorrows production. B(u)x + C(u)y = d(u) gives the constraints taking the number of workers and catch into account.

5.5.2 Discretization To solve the two-stage stochastic problem numerically it is common to perform a discretization where the stochastic vector U can only take a finite number of possible realizations. These re- alizations are often denoted scenarios, say u1, u2 ..., uk, with corresponding probabilities pi = Pr(U = ui). The expectation in the second-stage problem is then given by:

∑k EU [Q(x, U)] = piQ(x, uk) (5.21) i=1 It is now possible to write the stochastic problem on a so-called extensive form, that is a determin- istic equivalent problem:

∑k Maximize: Z1,2 = cx + piq(ui)yi (5.22) i=1

62 Subject to: Ax = b

B(ui)x + C(ui)yi = d(ui), i = 1, 2, . . . , k x ≥ 0 (5.23) b ≥ 0

yi ≥ 0 ≥ d(ui) 0, i = 1, 2, . . . , k

For the scenarios i = 1, 2, . . . , k we have to specify B(ui), C(ui) and d(ui) one by one based on the understanding of the problem at hand.

Example 5.5 Fish processing facility We consider a fish processing facility where tomorrow’s supply is uncertain. A discretization into 3 different supply scenarios is performed as shown in Table 5.17. Table 5.18 shows additional parameters required for the optimization.

Table 5.17: Supply scenarios

Scenario, i pi High quality Low quality 1 0.15 16 000 12 000 2 0.5 8 000 6 000 3 0.35 6 000 3 000

Table 5.18: Parameters Parameter Value Description

ρFi 0.6 Utilization ratio fresh fillet ρFz 0.7 Utilization ratio frozen fillet pFi 80 Selling price, fresh fillet (NOKs/kg) pFz 30 Selling price, frozen fillet (NOKs/kg) µFi 70 Processing capacity, kg fresh fillet /man hour µFz 300 Processing capacity, kg frozen fillet /man hour cMHR 500 Man hour cost (NOKs)

For fish of high quality the most beneficial product is fresh fillet despite the higher workload required. This means that if we knew the supply we will always allocate work-force to process all high quality fish to fresh fillet, and the low quality to frozen fillet. However, at thedecision point for the work-force we do not know the supply. If we allocate sufficient work-force for the best scenario, i.e., 250 man-hours (some 30 persons), and actually get less raw material, we have to pay for man-hours we are not utilizing. The following decision variables are introduced:

• x1 = Number of man-hours allocated

• y1,i = Number of kg processed for fresh fish fillet, scenario i

• y2,i = Number of kg processed for frozen fish fillet, scenario i

63 In the objective function the profit coefficient corresponding to x1 is −cMHR, the profit coefficients corresponding to y1,i are piρFipFi and the profit coefficients corresponding to y2,i are piρFzpFz. In the constraint matrix for the deterministic equivalent problem we have for each scenario to make sure that y1,i not exceeds supply of high quality fish, and that y1,i + y2,i not exceeds total supply of fish. Further for each scenario we have to make sure that y1,i/µFi + y2,i/µFz ≤ x1. Solving the deterministic equivalent problem gives x1 ≈ 134. If scenario 1 occurs, 3 739 kg of high quality fish is used for fresh fillet, and 24 260 kg (= remaining high quality fish +lowquality fish ) is used to produce frozen fish fillet. The Excel model used is shown inFigure 5.14.

Figure 5.14: Specification of the stochastic programming example in Excel

5.5.3 The Value of the Stochastic Solution The Value of the Stochastic Solution (VSS) is introduced to measure the benefit of solving the stochastic problem rather than treating the problem as a deterministic problem. To be more precise on what is meant by treating the problem as a deterministic problem we mean that the stochastic variables are replaced by their expected values where uncertainties affecting the second stage decision are ignored. This is referred to as an expected value (EV) approach. This means that to solve the two stage problem the decision maker anticipate the value of U by u = E(U), and optimize:

Maximize: ZEV = cx + q(u)y (5.24)

Subject to: Ax = b B(u)x + C(u)y = d(u) x ≥ 0 (5.25) b ≥ 0 y ≥ 0 d(u) ≥ 0

The first stage solution is denoted xEV. The first stage solution is optimal if the stochastic variable U takes the value u. However, since the problem is stochastic in nature, other values could also be the case. Therefore, the decision maker needs to modify the second stage decision when the value of the stochastic variable U is revealed. The expected value of the expected value solution is now

64 the value of the following optimization problem:

∑k Maximize: ZEEV = cxEV + piq(ui)yi (5.26) i=1

Subject to: B(ui)xEV + C(ui)yi = d(ui), i = 1, 2, . . . , k yi ≥ 0 (5.27) ≥ d(ui) 0, i = 1, 2, . . . , k Here the solution for each scenario i could be found individually. The difference compared to the optimization problem in Equation (5.22) is that xEV is fixed. This also mean that the constraint B(ui)xEV + C(ui)yi = d(ui) could be very costly for some scenarios since these constraints were only “tested” for u. ∗ Let the value of the optimization problem in Equation (5.26) be denoted ZEEV and the value ∗ of the optimization problem in Equation (5.22) be denoted Z1,2. The difference between these two is denoted the value of the stochastic solution, i.e.,

∗ − ∗ VSS = Z1,2 ZEEV (5.28)

5.5.4 Expected value of perfect information In decision theory, the expected value of perfect information (EVPI) is the price that one would be willing to pay in order to gain access to perfect information. In our context this means the difference between the expected value of the stochastic solution and expected value of thesolution if we knew u at the time of the first stage decision. For the latter case we often say that this correspond to a wait and see situation were we imagine that we can wait and observe u before we need to decide upon x. The optimization problem for the wait and see situation given that we observe u = ui is given by:

Maximize: Zi = cxi + q(ui)yi (5.29)

Subject to: Axi = b

B(ui)xi + C(ui)yi = d(ui) x ≥ 0 (5.30) b ≥ 0

yi ≥ 0 ≥ d(ui) 0 ∗ Let Zi be the result of the optimization problem∑ in Equation (5.29). The expected profit when ∗ averaging over all scenarios are given by i piZi . The expected value of perfect information is thus:

∑k ∗ − ∗ EVPI = piZi Z1,2 (5.31) i=1

65 There are several reasons why we would like to calculate the EVPI. For example if we know EVPI we can decide upon how much effort we should pay to reduce uncertainty. For example in Example 5.5 we could imagine that we are able to establish a better prediction model for the catch. Such a model could take weather forecast, type of catch tools and so on into account. A better prediction model will not reduce variability in the catch, but will reduce the epistemic uncertainty, and hence we are able to take better decisions.

5.5.5 Scenario building In our two-stage model the uncertainty is defined through the stochastic vector U. In Example 5.5 we assumed that we only could have three different values. In general U could have many elements, where each element could take many different values, or even be continuous variables. This will lead to a very large number of scenarios which is a challenge both from a modelling and from a computational point of view. There are many approaches to overcome the challenges. The sample average approximation (SAA) is one approach that is often used. The idea is to use Monte Carlo techniques to generate a limited sample of the stochastic vector U. That is, we realize N vectors, say ui, with the same distribution as U. The optimization problem to solve is then:

1 ∑N Maximize: Z = cx + q(u )y (5.32) 1,2 N i i i=1 subject to the same constraints as in Section 5.5.2. Another approach is to investigate element by element in the stochastic vector U. That is, for each element Ui we first calculate the VSS where we use 3 values, say low, medium and highfor the discretization. If SSV is relatively small, we may be tempted to ignore uncertainty of element Ui. However, if SSV is large the uncertainty of element Ui should be taken into account. The next question is then if a discretization using only 3 values is sufficient. If Ui is a continuous variable we would expect to get a better result if we approximate Ui with a 5-point discrete variable compared to a 3-point discrete variable. Again, we calculate the expected value of the stochastic solution (SSV). The situation is now more challenging. We have to compare the 5-point solution with the 3- point solution. If we assume that the 5-point discretization of Ui is “perfect”, we can then calculate the expected cost of assuming a 3-point discretization for the first stage decision, and then face the 5-point discretization for the second stage decision. The procedure for calculating the expected profit for the 3-point discretization is then:

1. Formulate the problem as a two stage optimization problem on extended form where a 3-point discretization is used

2. Find the solution for both x and yi, i = 1, 2, 3. Denote the solution for x by x3p 3. Switch to the 5-point discretization

4. For each of the 5 scenarios, solve the second stage problem assuming that first stage solution is bounded to x3p 5. For each of the 5 scenarios, calculate the expected profit

6. Calculate an average weighted profit by using corresponding pi’s for each scenario

66 The calculated weighted average is then subtracted from the expected value where a 5-point dis- cretization is used in the ordinary manner. If the difference is significant, we could compare the 5-point discretization with the situation where a 7-point discretization is assumed to be “perfect”. We proceed in this manner until the value of a higher point discretization is relatively low. Here we need to be pragmatic, so that the total number of combinations taking all elements of U into account is manageable.

5.5.6 How to perform discretization? [36] propose a standard approach for approximating a continuous distribution by a discrete distribu- tion where (i) the outcome region is divided into intervals, then (ii) for each interval a representing point is chosen, and (iii) a probability, pi is assigned to each point. Usually the intervals are found by dividing the outcome region into k equally probable intervals, where the representative point is the mean of the corresponding interval, and the assigned probability is 1/k. When generating a limited number of discrete outcomes, some statistical properties should be specified. It is common to include the first four central moments as properties, i.e., the moments for the discrete stochastic variable should be as close as possible to the moments of the variable we are approximating. [37] have used such an approach if the stochastic variable is normally distributed with mean value µ and standard deviation σ. Odd numbers of points are used, and Table 5.19 presents standardized distances from the mid point. The midpoint is always given by µ. If a k point approximation is required, the two nearest points to the mid point are given by the (left and right) distance dk,1σ from the midpoint at µ, the second nearest points to the mid point are given by the distance dk,2σ from the midpoint at µ and so forth.

Table 5.19: Standardized distances from mid point

k = # of scenarios dk,1 dk,2 dk,3 dk,4 dk,5 3 1.22474 5 0.87889 1.31436 7 0.16787 0.49042 1.79758 9 0.21902 0.60872 0.67431 1.90442 11 0.26459 0.43883 0.70498 0.89051 1.98681

Problem 5.11 The farmers problem [38] Farmer Tom can grow wheat, corn, and sugar beets on his 500 acres. Tom requires 200 tons of wheat and 240 tons of corn to feed his cattle. These can be grown on his land or bought from a wholesaler. Any production in excess of these amounts can be sold for $170/ton (wheat) and $150/ton (corn). Any shortfall must be bought from the wholesaler at a cost of $238/ton (wheat) and $210/ton (corn). Tom can also grow sugar beets on his 500 acres. Sugar beets can be sold for $36/ton for the first 6 000 tons, but due to economic quotas on sugar beets, sugar beets inexcess of 6 000 tons can only be sold at $10/ton. The planting costs are: • Wheat: $150/acre

• Corn: $230/acre

• Sugar beets: $260/acre Based on experience Tom estimate the mean yield to be roughly:

67 • Wheat: 2.5 tons/acre

• Corn: 3 tons/acre

• Sugar beets: 20 tons/acre

The mean yield is achieved if the weather is normal. In case of bad weather the yield is only 80% of the mean yield, but if the weather is good the yield is 120% of the mean yield. We assign equal probabilities to the three different weather scenarios.

(a) Find the optimal use of land if it is known that it will be normal weather.

(b) Repeat if it is known that it will be bad and good weather respectively.

(c) Find the profit if Tom use the land as if it was known that it would be normal weather, butit turned out to be bad and good weather respectively.

(d) Formulate the problem as a stochastic programming problem and find the stochastic program- ming solution

(e) Calculate the expected value of the stochastic solution (VSS)

(f) Calculate the expected value of perfect information (EVPI)

(g) Assume that the 80% and 120% numbers were obtained by a discretization using 3 points when the underlying weather condition is normally distributed with µ = 1 and some σ. Use Table 5.19 to find σ. Find the expected value of using a 5 point discrete distribution rather than a 3 point distribution.

68 Chapter 6

Flow and network modelling

This chapter reviews some of the classical models used in flow and network analysis. Most situations could be formulated as LP problems, although there exist other problem formulations and solutions that are more efficient. Whenever we are able to specify an LP formulation of a problem wedo that, and leave for the reader to consult the operations research literature for the more efficient solutions. Further, in situations where we are able to come up with an LP formulation the main objective is to give hints on efficient solving by use of MS Excel rather than tackling thespecific challenges of finding an initial basic feasible solution.

6.1 Transportation problems

In supply chain management one often deals with the problem of distributing products from m > 1 sources to n > 1 locations. In the retail industry the sources could be warehouses and the locations could be markets or specific stores. In the problem formulation we will use the term warehouse, Wi, i = 1, . . . , m for the sources and marked, Mj, j = 1, . . . , n for the locations receiving the products. In the model we assume only one product type and deterministic supply and demand. The supply provided by warehouse Wi is ai and the demand by marked Mj is bj. Further there is a unit cost cij of shipping products from warehouse Wi to marked Mj. The total shipment from one warehouse can not exceed the supply provided by that warehouse, further the demand requirement at a given marked should be met although we could bring more products to a market than the actual demand. The formulation of the problem is thus: ∑m ∑n Minimize: Z = cijxij (6.1) i=1 j=1

∑n Subject to: xij ≤ ai, i = 1, . . . , m j=1 ∑m (6.2) xij ≥ bj, j = 1, . . . , n i=1

xij ≥ 0 where xij is the shipment from∑ warehouse∑i to marked j, m n ∑ The system is balanced if i=1 ai = j=1 bj. For a balanced∑ system the supply inequality n ≤ m ≥ j=1 xij ai translates to an equality and the demand inequality i=1 xij bj also translates to

69 an equality. If this is the case the problem is said to be a standard transportation problem. For an unbalanced problem where the total supply exceeds the total demand a dummy marked may be created to absorb the excess supply. The shipping cost from any warehouse to the dummy marked is set to zero since no physical transport of goods will take place. Similarly, if the total demand exceeds the total supply a dummy warehouse is introduced to supply the shortage. Transportation cost from that warehouse to any marked is also set to zero. The balancing is only required if the LP problem is to be solved from scratch. If we stick to MS Excel, as we will do, the only issue is to specify the data in an appropriate manner.

6.1.1 Worked example in MS Excel Consider a situation with m = 3 warehouses and n = 4 markets with shipping costs given in Table 6.1. Further assume supply parameters given by a1 = 3, a2 = 8 and a3 = 6 and demand parameters are given by b1 = 4, b2 = 3, b3 = 2 and b4 = 5.

Table 6.1: Shipping costs from warehouse Wi to market Mj

M1 M2 M3 M4

W1 3 3 3 2 W2 12 8 4 3 W3 9 7 7 10

Figure 6.1 shows how the parameters have been entered into MS Excel. The cells B3:E5 contain

Figure 6.1: MS Excel specification of the transportation problem the cost parameters c_ij. These cells are also given the name c_ij for easy reference. The cells F9:F12 contain the supply parameters and the cells B12:E12 contain the demand parameters. The decision variables are given in cells B9:E11. For easy reference these cells are named x_ij. Cell B15 contains the objective function specified by =SUMPRODUCT(c_ij,x_ij) and the cell is given the

70 name Z. To simplify specification of the constraints they are reformulated as follows: ∑n xij − ai ≤ 0, i = 1, . . . , m (6.3) j=1 ∑m xij − bj ≥ 0, j = 1, . . . , n (6.4) i=1 For example in cell G9 Equation (6.3) reads =SUM(B9:E9)-F9 and in cell B13 Equation (6.4) reads =SUM(B9:B11)-B12. The supply constraints in cells G9:G11 and demand constraints in B13:E13 are given names SupplyConstraints and DemandConstraints respectively. Figure 6.2 shows the

Figure 6.2: MS Excel Solver specification of the transportation problem specification entered into the MS Excel solver. Note that it is not required to specify eachindi- vidual constraint, since constraints of the same form, for example <=0 applies for all cells in the SupplyConstraints range, and hence only one specification line is required for these constraints. Table 6.2 shows the optimal solution.

Table 6.2: Optimal shipping from warehouse Wi to market Mj

M1 M2 M3 M4

W1 3 0 0 0 W2 0 0 1 7 W3 2 3 1 0

6.2 Job assignment problems

In this section a standard assignment problem is introduced. In a manufacturing shop machining is required for several jobs. The jobs are denoted Jj, j = 1, . . . , n and there are Mi, i = 1, . . . , n machines available for the machining. We assume that each job only needs one machine, the machines can work in parallel, but one machine can only perform one job. Let cij denote the cost of doing job Jj on machine Mi. In this problem we introduce binary decision variables, i.e., xij equals one if job Jj is done on machine Mi, zero otherwise.

71 ∑n ∑n Minimize: Z = cijxij (6.5) i=1 j=1

∑n Subject to: xij = 1, i = 1, . . . , n j=1 ∑n (6.6) xij = 1, j = 1, . . . , n i=1

xijare all binary

The job assignment problem is almost identical to the standard transportation problem. The constraints are simpler in terms of ai = bj = 1 for all i and j. The complication is binary constraints on the decision variables. Figure 6.3 shows an example where n = 4 machines and n = 4 jobs. The cells B3:E5 contain the cost parameters c_ij.

Figure 6.3: MS Excel job assignment specification

An almost identical setup as for the transportation example is used, however, the constraints are:

JobConstraints=0 MachineConstraints=0 x_ij=binary

The MS Excel solution shows that job J1 should be done on machine M2, job J2 should be done on machine M4, job J3 should be done on machine M1, and job J4 should be done on machine M3.

6.2.1 Maximal-flow problems A flow network is a directed graph with arcs and nodes. An arc defines the maximum flowthatcan be transferred from one node to another node. The capacities on flow between node i and node j is

72 2 10 7

s 1 1 n

8 10 1 Figure 6.4: Flows for example system

denoted kij. A flow could be number of items transported per day, number of tasks performed per hour and so on. The source node s is the location from where the flow starts, and the sink node n is the location receiving the flow. The objective of the analysis is to determine the maximum flow f that can be sent from the source node to the sink node. The conservation rule has to be fulfilled at all nodes, this means that total flow into anode must equal total flow out of that node. The conservation rule needs to be implemented aspartof the constraint specification. We will solve the maximal-flow problem by use of an LP formulation, although such problems could be solved much more efficiently by other techniques. The LP problem is now formulated as follows:

∑n Maximize: Z = x0,j (6.7) j=1

∑n n∑−1 Subject to: xlj = xil, l = 1, . . . , n − 1 j=1 i=0 ∑n n∑−1 (6.8) x0,j = xin j=1 i=0

kij ≥ xij, i = 0, . . . , n − 1, j = 1, . . . , n where index i = 0 is used for the source node, and kii = 0 for all nodes per definition.

6.2.2 A worked maximal-flow problem in MS Excel To fabricate a certain product a factory has two parallel production lines, line 1 and line 2. The fabrication is a two stage operation where an A-type machine is used for the first stage and a B-type machine is used for the second stage. For production line 1 the A-machine has a capacity ks,1 = 8 and the B-machine has a capacity of k1,n = 10. For production line 2 the A-machine has a capacity ks,2 = 10 and the B-machine has a capacity of k2,n = 7. It is possible to take products machined by the A machine of line two and complete the machining on the B machine one line 1, and visa versa. The capacities to transfer products between the two production lines are k1,2 = k2,1 = 1. Figure 6.4 shows the corresponding flow network diagram. Figure 6.5 shows the MS Excel setup for this problem. Cells B3:D5 contain the capacities, kij, cells B9:D11 contain the decision variables, and cells B16:D18 contain the slack, i.e., kij − xij. Cells E9:E11 sum up the flow out of each node, calculated node by node. Similarly cells B12:D12 sum up the flow into each node, calculated node by node. This is utilized incells G11:G13 where

73 Figure 6.5: MS Excel flow and capacity specification the conservation constraints are calculated, i.e., the sum of flow out of the source node minus the sum of flow into the sink node should equal zero. Further the sum out of node 1 minus thesum into node 1 should equal zero which should also hold for node 2. The final step in the MS Excel Solver constraint window is to specify that all slack cellsshould be nonnegative. Note that the easiest way to specify the model to the MS Excel Solver is to let the solver run through all decision variables, i.e., the range B9:D11, and similarly include all slack specifications in range B16:D18. This is, however, waste of computer time, since we know that slack constraints and decision variables corresponding to zero capacities will not be used. Therefore Figure 6.5 shows cells required to be entered into the model with yellow background. From the solution shown in Figure 6.5 we observe that machine A in production line 2 is not fully utilized, i.e., there is a slack of two production units per time unit for this machine. Further machine B in production line 1 has a slack of one. If the transfer capacity between the two production line could be increased by one, the maximum-flow could also be increased by one.

Problem 6.1 Figure 6.6 shows transportation capacities for transportation of goods from the source s to the sink n. Find the maximum flow that can be achieved. In 10% of the time the capacity from node 3 to node 4 is reduced from 4 to 1 and similarly the capacity from node 2 to node n is reduced from 4 to 1 due to queueing problems. The logistic manager proposes to open the road from node 2 to 4 for traffic in both directions. The cost per unit time for such ameasure is 15% of the value of one unit of capacity. Does it pay off?

6.3 Project management

Flow network models are also used in project management. Challenges relate to planning, scheduling and control of project activities where activities have to be performed in a specified technological sequence. The methods presented in this section are all used in the analysis of projects, but could also be used to analyse a specific production process in manufacturing. In order to analyse the duration of a project, or a project activity we use flow network models. In the literature there are two different ways to represent a project in a network model. Activity on

74 4 3 3 4 2

s 1 1 2 n

4 1 2 4 2 Figure 6.6: Flows for Problem 6.1 node (AON) networks uses the nodes to represent the project activities whereas the arcs are used to represent the sequence and dependencies between activities. Activity on arc (AOA) networks uses the arcs to represent the project activities whereas the nodes are used to represent events in the project, i.e., starting and finalizing activities. In this presentation we will use theAON representation of a project. Visually, a flow network model to be developed is similar to a Gantt diagram. However, we usually indicate dependencies between activities with arrows, and the y and x axes are usually not labelled. The symbols used in a flow network used in this presentation are shown in Figure 6.7. An example flow network diagram is shown in Figure 6.8.

Activity

Uncertain activity

Milestone

S Start point

F End point, finish

Coupling between activities or nodes Figure 6.7: Symbols used in flow network diagrams (AON representation

There exist several methods for analysing flow networks. All these models requires that the flow network is described completely in terms of dependencies between the activities. Further the duration of the activities should be described by probability distribution functions with numeric values for the parameters. When analysing such flow network we differentiate between:

• Analytical methods.

• Monte Carlo simulation methods.

Generally we let T denote the duration of the project we are analysing, or a part of the project, e.g. a work package. If the project comprises n activities, we often denote these activities Ai, and the duration of activity Ai is denoted Ti. Sometimes in this presentation we also use the more simplified notation where each activity is described by a letter, e.g. A, B etc. The main purpose of the schedule analysis is to establish the cumulative distribution function for the entire project

75 D

S A B E F H F

C G Figure 6.8: Example flow network from [3] duration. We might also want to establish the cumulative distribution function for parts of the project, milestones etc. Another important measure of interest is the probability that an activity will delay the project, i.e. the criticality index. The methods we will investigate are: • Critical Path Method (CPM) • Linear programming (LP) • Program Evaluation and Review Technique (PERT) • Successive schedule planning (SSP) • Monte Carlo simulation (MCS) The example diagram shown in Figure 6.8 will be used to demonstrate the various methods. This example is adapted from [3]. The parameters to describe the duration of each activity are given in Table 6.3.

Table 6.3: Data for the schedule demonstration examples Activity L (Lowest) M (Most likely) H (Highest) A 2 5 9 B 4 6 9 C 7 12 21 D 5 7 10 E 4 7 11 F 2 3 6 G 3 5 9 H 5 7 10

Fundamental for all methods is to understand the term ’path’. A path in a flow network is a set of activities from the the starting point to the end point in the network, where each activity in the set follows another activity in the set except the first activity that follows the starting point. This means that all activities in a path have to be executed in order to complete the project. Usually there are several paths in a flow network. Formally, we also include uncertain activities in apath, even if they might not be necessary to execute.

6.4 Critical Path Method (CPM)

The idea of the CPM method is to find all paths in the flow network. Next, we assume thatthe duration of all activities are deterministic, and typically equal to the most likely duration (M).

76 The duration of each path is given as the sum of duration of all activities in the path. The path with the longest duration is denoted a critical path, and the duration of the project is found by the duration of the critical path (or all critical paths in case of several critical paths). In Figure 6.8 we have the following paths: P1 = {A,B,D,F,H}, P2 = {A,B,E,F,H} and P3 = {A,C,G,H}. Inserting the duration of each activity, we get the following durations TP1 = 5 + 6 + 7 + 3 + 7 = 28, TP2

= 5 + 6 + 7 + 3 + 7 = 28 and TP3 = 5 + 12 + 5 + 7 = 29 for P1, P2 and P3 respectively. Since P3 has the longest duration, P3 is a critical path, and the project duration is found to be 29. A disadvantage of the CPM method is that it cannot handle the uncertainty in the duration of each activity, i.e. it is a deterministic approach. Float or slack is the amount of time that an activity in a project network can be delayed without causing a delay to:

• subsequent tasks (“free float”)

• project completion date (“total float”).

By investigating the paths it is rather easy to obtain the two float measures.

6.5 Linear programming (LP)

The CPM method is an intuitive and simple approach for analysing a flow network for project duration. For larger projects a computerized tool will be required for identification of the paths. As an alternative the minimum project duration and related concepts may be found by linear programming. The following steps are required to specify the problem on standard LP format:

1. For each activity, identify the preceding activity or activities.

2. For each activity ensure that the starting point of the activity is greater or equal to the end of the preceding activity. I.e., the difference between the starting point of the activity and the ending point of the preceding activity should be nonnegative. A separate row in the constraint table is required for each preceding activity.

3. Let all the starting points be the decision variables

4. The objective function is the starting point of the final node in the network

The project in Figure 6.8 is used to demonstrate the solution in MS Excel. The data for each activity is organised in various columns. To specify the constraints systematically it is convenient to give the cell names. In the following cell names of the form ES_ are used for the earliest start of activity , for example ES_A is the name of the cell containing the earliest start of activity A. Similarly, EF_ is used as cell name for the earliest finish of activity . Constraints for the example project in Figure 6.8 are given as:

A =ES_A-EF_SN B =ES_B-EF_A C =ES_C-EF_A D =ES_D-EF_B E =ES_E-EF_B F =ES_F-EF_D F =ES_F-EF_E G =ES_G-EF_C

77 H =ES_H-EF_F H =ES_H-EF_G FinalNode =ES_FN-EF_H where the equal sign indicates the formulas entered in the relevant MS Excel cells. A, B, C and so on are the corresponding nodes. For node F and H there are two lines since these nodes have two predecessors. The cells containing the starting point of all activities are used as the decision variables, and the objective function should have a link to the cell ES_FN.

6.5.1 Slack In the set-up we assume that all activities start as early as they can. For the project manager it might be arguments supporting that the starting point of some activities should be postponed, for example due to resource optimization. It is therefore of interest to investigate the slack in the project. We introduce the following notation: • ES = Earliest Start

• EF = Earliest Finish

• LS = Latest Start

• LF = Latest Finish

• TF = Total Float = Maximum delay in an activity without causing a delay to the project completion date

• FF = Free Float = Maximum delay in an activity without causing a delay to subsequent tasks It is relatively easy to find the free float of all activities if we have found the earliest starting points. It is more troublesome to find the total float. A method for this is described in the following:

• Let the minimum project duration be denoted T and the activity durations be denoted Di

• Assume that T = LP(D) could be found by an LP module in some software, and let T0 be the minimum project duration for the initial Di values

• If activity i is on a critical path, we know that increasing Di by ∆Di will give a new minimum project duration equal to T0 + ∆Di

• For all project activities we now increase Di by ∆Di, activity by activity to investigate the result

• Let ∆Ti be the increase in the calculated new minimum project duration when Di is increased by ∆Di. If ∆Ti = ∆Di we know that activity i is on a critical path, which means that the total flow for this activity is zero.

• If ∆Ti < ∆Di this means that the duration of activity i has been increased but the total project impact is less than the increase in activity duration and activity i is therefore not critical. Let the total flow, TFi, be how much we may increase Di without delaying the project. It is now straight forward to verify that TFi = ∆Di − ∆Ti since we know that T (Di) = LP(D) is not changing for ∆Di ≤ TFi and then start increasing with a slope of 1 (45 degrees) as illustrated in Figure 6.9.

78 ∆Ti

45° ∆Di ∆Di − ∆Ti Figure 6.9: Total flow calculation

Note that if ∆Ti = 0 in the above procedure, we have increased the duration of activity i, but the activity is still not critical. In such a case, we need to test with a higher value of ∆Di to force activity i to become critical. Table 6.4 shows the results for the example project.

Table 6.4: Example project results - D = Activity Duration, TF = Total Float, ES = Earliest Start, EF = Earliest Finish, LS = Latest Start, and LF = Latest Finish Activity/Node D TF ES LS EF LF Is critical? S=Start node 0 0 0 0 0 0 True A 5 0 0 0 5 5 True B 6 1 5 6 11 12 False C 12 0 5 5 17 17 True D 7 1 11 12 18 19 False E 7 1 11 12 18 19 False F 3 1 18 19 21 22 False G 5 0 17 17 22 22 True H 7 0 22 22 29 29 True F=Final node 29

A final note relates to the fact that the LP model as formulated above will not necessarily start all activities at their earliest time if they are not critical. Rather than minimizing the starting point of the final node, we could minimize the sum of all starting points, this will force all activities to start as early as possible.

6.6 Program Evaluation and Review Technique (PERT)

The PERT method is similar to the CPM method when finding the project duration. We also here find the critical path, i.e. P3. But rather than using the deterministic value for the duration, we now treat uncertainties in the activities in the critical path. The expectation and variance of a sum are given by Equations (2.54) and (2.55). If the expectations and variances are given for the various activities, we can proseed directly. However, very often the activities are described by a low (L), most likely (M), and high value (H) as in Table 6.3. If we now assume that these parameters are describing the PERT distribution, we recall that the expected value is given by µ = (L+4M +H)/6 and the variance is given by σ2 = (µ − L)(H − µ)/7. Note that the PERT method only includes one critical path. In case of more than one critical path, it will be appropriate to use the path with the highest variance. A weakness in the PERT

79 method is that the project duration will be underestimated in case of many paths with expected duration in the same range as the critical path.

6.7 Successive schedule planning (SSP)

The idea behind the SSP-method is that we need to consider more than the critical path. To motivate for the algorithm we are going to present, consider a project with two activities A and B executed in parallel. In this situation it is reasonable to assume that the project duration will be the maximum of the duration of the two activities, and we might apply Equations (2.62) and (2.63). In this example A and B are both paths in the network, and the two paths have no common activities. In a general situation, the various paths might have common activities which complicates the calculation. Now, consider a situation with two activities B and C in parallel, where the startup of these two activities will be immediate after the finalization of activity A as shown in Figure 6.10. We might now establish the two paths {A, B} and {A, C}, but we realise that the two paths have a common activity, A. For each path we could find the expectation and variance similar to what we did in the PERT method, but since the two paths share a common activity, they will not be independent, and the result for the maximum of the two paths will not be correct. In order to overcome this problem, we could in this situation first find the expectation and variance of activity A, and next add the expectation and variance for the maximum of activity B and C. In the following we assume that we are able to find the expected value and the variance of a parallel structure by the eMax() and varMax() functions respectively. This means that we

B S A F C Figure 6.10: Flow diagram B and C executed in parallel after execution of A essentially have E(T ) = E(A) + eMax(B,C) and Var(T ) = Var(A) + varMax(B,C) where T is the duration of the project, and the notation for the activities is only indicative.

Problem 6.2 Consider the situation above with activities B and C in parallel following activity A. Further let the expectation and variance of the activity durations be given by: µA = 10, µB = 7, 2 2 2 2 2 2 µC = 8, σA = 2 , σB = 3 and σC = 2 . Find the expectation and duration of the project by first treating the two paths {A, B} and {A, C} as independent. Next, carry out a correct calculation and compare the result with the first result. ♢ To motive for the approach described in the following summarize the “building blocks” required:

• For a series structure of activities with no parallel branches the expected value and variance for the series structure equals the sum of expectations and variances of the activities respectively.

• For a parallel structure with independent activities and no common activities the eMax() and varMax() functions may be used

• For a structure like the one in Figure 6.10 and the notation used in Problem 6.2 we may use

80 2 2 – E(T ) = µA + eMax(µB, σB, µC , σC ) 2 2 2 – Var(T ) = σA + varMax(µB, σB, µC , σC ) • Using the result from 6.3 we may also use 2 − 2 2 − 2 – E(T ) = eMax(µAB, σAB σA, µAC , σAC σA) 2 − 2 2 − 2 2 – Var(T ) = varMax((µAB, σAB σA, µAC , σAC σA) + σA where the indexes AB and AC are used for the sum of activity A and B and A and C 2 − 2 2 − 2 respectively. Further σAB σA and σAC σA is the accumulated variance over each of the branches in the parallel structure. • The latter approach is convenient when specifying the schedule model into an MS Excel spreadsheet.

To structure the analysis we need some definitions. We define a meeting point where two or more arrows join before or into an activity or the endpoint. For example in Figure 6.8 the activities D and E join into a meeting point just before activity F .A branching point is a point where one activity is followed by two or more activities in parallel, i.e., one branch splits into two or more branches. For example in Figure 6.8 the activities B and C follow in parallel after activity A, and the branching point is just right to activity A. We also need some numerical routines for solving the integrals in Equations (2.62) and (2.63). Assume that we have access to the following 2 2 2 2 routines eMax=eMax(µ1, σ1, µ2, σ2) and varMax=VarMax(µ1, σ1, µ2, σ2) for solving Equations (2.62) 2 2 and (2.63) respectively. Here µ1, σ1, µ2 and σ2 are expectations and variances for the two variables we are taking the maximum of. We will only consider the situation where eMax and varMax are implemented under the assumption of independent and normally distributed variables. See pRisk.xlsm for such an implementation.

Problem 6.3 Show that 2 2 − 2 − 2 • eMax(µ1, σ1, µ2, σ2) = ∆µ+ eMax(µ1 ∆µ, σ1, µ2 ∆µ, σ2) 2 2 − 2 − 2 • varMax(µ1, σ1, µ2, σ2) = varMax(µ1 ∆µ, σ1, µ2 ∆µ, σ2) ♢

The procedure for successive schedule planning with respect to describing the project duration is as follows1: 2 1. For each activity i, establish the expectation, µi and variance σi for the duration of activity i. 2. Identify all meeting points, i.e. where one or more branches join into one arrow. 3. Repeat and follow all activities from left to right in the flow network. This process is iterative since each activity to the left of the current activity has to be processed before we can proceed.

S F 4. For each activity i establish the expected start (Ei ), and expected finalisation (Ei ). The expected start is equal to the expected finalisation of the preceding activity (or meeting point in case of branches are joining just before activity i). The expected finalisation is given by F S the expected start plus the expected duration of activity i, i.e. Ei = Ei + µi. Note that this step cannot be executed if one or more of the activities to the left have not been processed.

1The presentation is slightly different from the original presentation by Lichtenberg (1990).

81 F F 5. For each activity i establish the accumulative variance, Vi . Here Vi is the accumulative 2 F F 2 variance of the activity (or meeting point) preceding activity i plus σi , i.e. Vi = Vk + σi where k is the activity preceding activity i.

6. If there is a meeting point in the network just before the entry into an activity, we have to process this meeting point. Note that this means that two or more branches join together and the succeeding activity cannot start before all the branches, or paths up to this point, have been finalised (completed). Technically, we now introduce a virtual node at the meeting point, representing the finalisation of the two (or more) branches going into the meeting point. The virtual nodes are enumerated V1,V2,.... If three or more branches join into one meeting point, we first process two branches into one virtual node, then this virtual node represent one branch which is then processed together with the third branch into another virtual node etc.

7. Let Vk be the virtual node we are processing, and assume that it is activities i and j that are joining into Vk. If one of the activities (or virtual nodes) immediate to the left of Vk has not been processed, we have to go to the left in the network until we meet processed activities or F nodes. The expectation and variance for the finalisation of activity i are now given by Ei F F F and Vi respectively. Similar we have Ej and Vj for termination of activity j. If the two paths up to activity i and j were disjoint, we could easily find the expectation and variance of the finalisation of the virtual node Vk by Equations (2.62) and (2.63), or numerically by EMax and VarMax. Typically, the two branches that join together after activity i and j did split up into two branches from one single branch prior to a branching point. Let l be the activity at which the branches did split up before joining again at the virtual node k. When finding the expectation and variance up to the virtual node k we then first find the expectation and variance up to the branching point l, and then add the expectation and variance of the maximum of the two branches from the branching point l to the virtual node k. The accumulated variance along the path from branching point l to the end of activity i is found ∆F F − F by Vi = Vi Vl . We get similar results for the other branch, i.e. the one with activity j preceding the virtual node k. The expectation and variance for the finalisation of the virtual F F F − F F F − F F F node k is now given by Ei = EMax( Ei ,Vi Vl ,Ej ,Vj Vl ) and Vi = Vl + VarMax( F F − F F F − F Ei ,Vi Vl ,Ej ,Vj Vl ). 8. If there are more branches not processed into the meeting point, repeat until all branches are processed by creating new virtual nodes.

9. When we reach the end node, we are done.

Example 6.1 We want to demonstrate the calculation process by the flow network shown in Figure 6.8. We further assume that we have a spread sheet program available. The result of the calculations are shown in Table 6.3. In addition to the activity row, and the three rows for the low, most likely 2 F F and high row, we add four rows for µi , σi , Ei and Vi respectively. For each row corresponding 2 − − to normal activities we calculate µi = (L + 4m + H)/6 and σi = (µi L)(H µi)/7. Then we F calculate the expected finalisation of each activity, Ei as the expected finalisation of the previous activity (or virtual node) plus the expected duration of activity i, µi. Similarly the variance of F the finalisation of activity i, Vi is the variance of the finilisation of the previous activity (or 2 virtual node) plus the variance of activity i, σi . For activity A the expected finalisation and variance of the finalisation is equal to the expectation and variance of the duration ofactivity

82 D

S A B E V1 F V2 H F

C G Figure 6.11: Example flow network with virtual nodes, adapted from[3]

A since it is the first activity. We note that after activity A we have a branching point that F F meets again after activities F and G. For Activity B we see that EB = EA + µB = 5.17 + F F 2 6.17 = 11.3, and VB = VA + σB = 1.73 + 0.88 = 2.61. We note that after activity A we have a branching point that meets again after activities F and G.We proceed similarly with activities C, D and E. We now proceed to the virtual node V1. It is convenient to insert a new row in the spread sheet program just before activity F . In order to find the expectation and variance for this node we take advantage of the functions EMax and VarMax. The arguments to these functions are the expectation of the finalisation of each of the preceding activities, and the accumulated variance through the branches from the branching point which in this case is after activity B. EF V1 = EMax(18.5,3.49-2.61,18.5,4.35-2.61)=19.14. In order to obtain the variance we use the VarMax function but we have to remember to add the accumulate variance up to finialisation of activity B, V F = VarMax(18.5,3.49-2.61,18.5,4.35-2.61)+2.61=3.5. We complete the sheet V1 for the remaining activities, including the virtual node V2. The expectation and variance of the F F ♢ duration of the entire project is now given by EH and VH respectively. Note that we in the SSP-method have used the PERT distribution as a basis. The method could be used for any distribution for the activities, the essential point is to assess the expectation and variance of each activity duration.

Problem 6.4 Consider Example 6.1 and carry out the calculations by your self in a spread sheet program. ♢

Problem 6.5 Find the cumulative distribution function for the entire project duration (T ) based on the calculation in Problem 6.4, and especially find Pr(T > 35). ♢

6.8 Monte Carlo simulation (MCS)

The analytical models for project duration evaluation is often not flexible enough to capture relevant aspects of a project. The use of Monte Carlo simulation techniques is a supplement to analytical methods when we the situation is to complex to be analysed by analytical models. The idea of Monte Carlo simulation is that we establish a set of stochastic variables and events. Then we establish deterministic relations between these variables and the events, e.g. the order of which activities are executed, which activities that could be executed in parallel etc. It is important to realise that the model that describes these relations is a deterministic model. Such a model could be implemented in e.g. an MS Excel spreadsheet. The next idea in the Monte Carlo simulation is to generate the stochastic variables and the events (indicator variables). Most computer codes or program systems have a function that generate uniform distributed stochastic variables on the interval from 0 to 1. Given a such function it is also in principle straight forward to generate the

83 stochastic variables we need. By inserting these stochastic variables into the deterministic model (e.g. an MS Excel model) we now get one realisation of the system, or more specific the project duration. Let t1 be the numeric value when this process is done the first time. Now, we might repeat the process by generating another set of random quantities and insert these into the deterministic model to yield another value, say t2. By repeating this process we could think of the generated values t1, t2 ... as realisations of the project, and use the values to obtain statistical properties such as the mean, the standard deviation, the cumulative distribution function etc. We will now illustrate how this process could be carried out with the pRisk.xlsm program.

Example 6.2 It will be convenient to establish one row in MS Excel for each activity. The first columnA ( ) could contain the activity number, the second, third and forth (B, C and D) could then contain the parameters in the PERT distribution, similar to Table 6.3. Now we introduce three new columns (E, F and G) to contain the duration, start and finalisation of each activity respectively. We start to enter the duration of each activity. Assume that activity A is described in row 2 in the Excel sheet. In cell E2 we now enter the following expression for the duration: =RndPert(Rand(),B2,C2,D2) Here the RndPert() function is a pRisk.xlsm specific function, whereas the Rand() function is a standard Excel function. The procedure is repeated for all activities, and we simply copy the formula in cell E2 into the cells E3, E4 etc. We will now proceed to the start and finalisation of each activity. It will be convenient to give the cells containing the start and finalisation names in Excel. For activity A we give the following names D_A, S_A and F_A for the duration, start and finalisation respectively. Similarly we give the names S_B, D_B and F_B for the start, duration and finalisation of activity B respectively, and so on for the remaining activities. By giving name to the activities, it is easy to access them in formulas in other cells. We now use the convention cell name = expression where the cell name is the name of the cell we want to assign an expression. By inspecting the network in Figure 6.8 we easily verify the following statements for the start of the various activities: S_A = 0 S_B = F_A S_C = F_A S_D = F_B

Table 6.5: Data for the successive schedule planning demonstration in Example 2 F F Act. L M H µi σi Ei Vi Comment A 2 5 9 5.17 1.73 5.17 1.73 Branching point for V2 B 4 6 9 6.17 0.88 11.33 2.61 Branching point for V1 C 7 12 21 12.67 7.75 17.83 8.48 D 5 7 10 7.17 0.88 18.50 3.49 E 4 7 11 7.17 1.73 18.50 4.35 6.1 V1 19.14 3.50 E & E joins, branches after B F 2 3 6 3.33 0.51 22.48 4.01 G 3 5 9 5.33 1.22 23.17 9.70 V2 24.13 5.75 G & F joins, branches after A H 5 7 10 7.17 0.88 31.30 6.63

84 S_E = F_B S_F = Max(F_D, F_E) S_G = F_C S_H = Max(F_F, F_G) The finalisation of the activities is given as the start point plus the duration, e.g., for activity A we enter: F_A = S_A + D_A and similar for the other activities. We have now specified the model and are prepared to simulate several runs. First we notethat each time we press the F9 key Excel updates the model by generating new random numbers since we used the RAND() function in the cells containing the duration of each activity. Next we switch to the RunSimul sheet and press the Run button. ♢

Problem 6.6 Consider the example in Figure 6.8. We will now consider an alternative execution method for the last part of the project. Rather than executing activity H as one activity, it is possible to split this activity into two parallel activities H and I. Each of these activities could be described by the PERT distribution with parameters L = 3, M = 5 and H = 8. Set up the flow network for this situation, and use the pRisk.xlsm program to find the expectation and standard deviation of the project duration by Monte Carlo Simulation. ♢

6.9 Penalty for default

The contracting party might issue penalties for default to ensure that the contractor put necessary resources and effort into project execution. Penalties for default could be linked to milestones and finalisation of the entire project. In the following discussion we only consider the situation when there is defined penalty for default if the project as such is delayed. Let T be the duration of the project measured from a defined startup date. Let D be the number of days (from the startup date) before penalty for default is initiated. Finally, let PD be the size of the penalty per day. The total penalty for default is then max(0, (T − D)PD). The expected total penalty for default in a project is thus: ∫ ∞ PDTot = (t − D)PDfT (t) dt (6.9) D where fT (t) is the probability density function for the project duration. In principal we have to perform the integration in Equation 6.9 to find the expected total penalty for default in a project. In most cases we also need to carry out numerical integration. However, if we have a Monte Carlo simulation model for the project, we might utilise that for a given project duration T the total penalty for default is max(0, (T − D)PD), and in e.g. pRisk.xlsm we could specify in the Cell to analyse': =max(0,PD*( T_End- D_Start ) where D_Start is the name of the cell where we have specified when penalty for default is ini- tiated, and T_End is the name of the cell where the total project duration could be found.

Problem 6.7 Consider the example in Figure 6.8. Assume that D = 34, and PD = 1, 000 Euro. Find the total expected penalty for default in this project. ♢

85 ∫ a Problem( ) 6.8 (For the) normal distribution we have from Equation (2.25) that −∞ xf(x)dx = a−µ − a−µ µΦ σ σϕ σ . Use this to verify that if the duration, T of a project is normally distributed with mean µ and standard deviation σ then we have: ∫ ∞ PDTot = (t − D)PDfT (t) dt (6.10) [ ( )]D ( ) D − µ D − µ = PD(µ − D) 1 − Φ + PDσϕ (6.11) σ σ where Φ() and ϕ() are the cumulative distribution function and probability density function for the standard normal distribution respectively.

Problem 6.9 The flow diagram in Figure 6.12 describes the production of a system.

• Find the critical path by manual inspection of the diagram.

• Use LP to verify the results. That is, set up a LP model for the system, find the minimum duration, and find the total flow for all activities. The most likely durations (M) for each activity in hours is given in parentheses.

• Assume that for all activities Li = 0.75Mi and Hi = 1.5Mi. Use the PERT-, SSP- and MCS- methods to find E(T ), SD(T ), Pr(T > 9.5), and E(TO) where T is the number of hours to complete the production, and TO is the number of overtime hours required, where overtime runs from 9.5 hours.

• The production starts at 07:00 in the morning. When you show the production manager the latest starts (LS) for all activities, he observes that the latest start for activity C is after 6 hours, i.e., at 13:00. He then decides that the team working with activities C and D should stay home and start working first at 13:00. What will be the increase in E(TO) by this decision?

• Cost of overtime (for the entire production team) is 200 Euro per hour. On the other hand, the specialized team working on activity C and D is a critical resource for since they are very busy with other tasks. So for each hour prior to 13:00 this team has to stay at work there is an additional cost of 60 Euro per hour, i.e., 1 Euro per minute. Find the optimal time when this specialized team should come to work.

C(2) D(1) B(3) S A(1) E(2) F(2) F

G(3) H(5) Figure 6.12: Flow diagram for Problem 6.9, M-values given in parentheses

Problem 6.10 To be solved without a computer In this problem you are going to evaluate a major maintenance work on an offshore safety valve.

86 The work is to be considered as a small project. There are basically two activities to be conducted, B: maintenance of the safety valve, and C: maintenance of supporting components (i.e., detectors and the control logic (computer)). The B-activities are conducted in series, but as a branch in parallel with the C-activities, which are also conducted in serial in that branch. The activities, together with relevant duration parameters (in hours) are shown in Table 6.6. The A-activities are preparation activities and are executed in serial before B- and C-activities can start. The D-activities are finalization activities executed in serial after completion of the B- and C-activities.

Table 6.6: Data to use in Problem 6.10 Activity µ σ2 A1 - Preparation 1 0.036 A2 - Isolate valve to be maintained 1.4 0.033 A3 - Verify that pipeline is depressurized 0.6 0.016 B1 - Perform functional test of the valve 1.5 0.036 B2 - Perform other safety verification activities 1.5 0.036 C1 - Replace detectors according to a prev. maint’mce prog. 4.2 0.306 C2 - Perform functional test of detectors and logic (CPU) 2 0.143 D1 - System verification (safety loop = Detec., logic & valve) 0.8 0.017 D2 - De-isolate valve 1.4 0.033

Find the expected duration, and variance of the duration with the PERT method. Find the probability that the work is not finalized after 13 hours (i.e., the due time). The production lossis cU = 100 000 NOKs per hour if not completed in due time. Find the expected production loss. ♢

87 Chapter 7

Markov processes and queueing theory

7.1 Markov processes

A Markov process is a special type of stochastic processes where the process posses the so-called Markov property. A stochastic process {X(t), t ∈ Θ} is a collection of random variables. The set Θ is called the index set of the process. For each index t in Θ, X(t) is called the state of the process at time t. In the general presentation we always assume that X(t) can only take the values 1, 2, ..., r. In practical examples it is often convenient to allow for a zero value for the state variable. We now define the Markov property: A process is said to have the Markov property if:

Pr(X(t + s) = j|X(s) = i ∩ some history up to time s) = Pr(X(t + s) = j|X(s) = i)

This means that given the process is in state i at some time s, the probability of being in another state, say j, t time units later is independent of the history up to time s, i.e., we may ignore all information regarding the process in the past when looking into the future. The only thing that counts is the current state. This general presentation also only treats Markov processes with stationary transition proba- bilities. This means that:

Pr(X(t + s) = j|X(s) = i) = Pr(X(t) = j|X(0) = i) for all s, t ≥ 0 that is, the probability of going from state i to j during a time period of t is independent of the starting point of such a “journey”. The following notation is introduced:

Pij(t) = Pr(X(t) = j|X(0) = i)

The so-called sojourn time, T˜i, is the time the process spends in state i from it arrives into state i before it jumps out of state i. Further let Tij denote the time time the process spends in state i before it eventually jumps to state j. The transition rate from state i to state j is denoted aij and is the limiting conditional probability of jumping to state j given that the process is in state i (divided by the length of the interval considered). It may be argued that the Markov property and

88 the stationary transition probabilities yields that all transition times are exponentially distributed. The total rate of transition out of state i is denoted αi, where ∑ αi = aij j≠ i

From the fact that the sojourn time and all other transition times are exponentially distributed it follows that:

Pii(∆t) = Pr(T˜i > ∆t) ≈ 1 − αi∆t

Pij(∆t) = Pr(Tij ≤ ∆t) ≈ aij∆t

Rearranging and letting ∆t approach 0, we get:

1 − Pii(∆t) lim = αi (7.1) ∆t→0 ∆t Pij(∆t) lim = aij (7.2) ∆t→0 ∆t These two equations are later to be used to obtain the Kolmogorv differential equations. From the Markov property and the law of total probability we have

∑r Pij(t + s) = Pik(t)Pkj(s) k=1 This equation is denoted the Chapman-Kolmogorov equations. We utilize this equation to find:

∑r Pij(t + ∆t) = Pij(∆t + t) = Pik(∆t)Pkj(t) k=1 Rearranging (having in mind we are seeking the derivative) we get:

∑r Pij(t + ∆t) − Pij(t) = Pik(∆t)Pkj(t) − [1 − Pii(∆t)] Pij(t) k=1 k≠ i

Now dividing by ∆t, inserting equations 7.1 and 7.2, letting ∆t → 0, and defining aii = −αi, we get after some rearrangements:

∑r P˙ij(t) = aikPkj(t) (7.3) k=1 These differential equations are denoted the Kolmogorov backward equations. Similarly, we may obtain the Kolmogorov forward equations:

∑r P˙ij(t) = akjPik(t) (7.4) k=1

89 7.1.1 Markov state equations We now assume that we know the initial state, and assume that the process started in state i. We then simplify notation by omitting the index for the initial state, hence we write Pj(t) instead of Pij(t). It is convenient to introduce matrix and vector notation. First we define the transition rate matrix, A:   a11 a12 ··· a1r    a21 a22 ··· a2r  A =  . . .   . . ··· .  ar1 ar2 ··· arr where ∑r aii = −αi = − aij j=1 j≠ i which means that the diagonal elements are defined such that the sum of each row equals zero. Further we define the row vectors: P(t) = [P1(t),P2(t),...,Pr(t)] and P˙ (t) = [P˙1(t), P˙2(t),..., P˙r(t)]. We may then write the Kolmogorov forward equations on matrix format:   a11 a12 ··· a1r  a a ··· a   21 22 2r  ˙ ˙ ˙ [P1(t),P2(t),...,Pr(t)] ·  . . .  = [P1(t), P2(t),..., Pr(t)]  . . ··· .  ar1 ar2 ··· arr that is:

P(t) · A = P˙ (t) (7.5)

7.1.2 Time dependent solution for the Markov process To solve Equation (7.5) as a function of time we may use an analogy to ordinary differential equations in one dimension and we get:

P(t) = P(0)etA

Although this is a very elegant solution, it is not very attractive since taking the exponential of a matrix is not that easy. Computer codes such as Matlab is required. We may, however, rewrite Equation (7.5) as:

P(t + ∆t) − P(t) P˙ (t) = lim = P(t) · A ∆t→0 ∆t yielding

P(t + ∆t) ≈ P(t)[A∆t + I] (7.6) where I is the identity matrix. This equation may now be used iteratively with a sufficient small time interval ∆t and starting point P(0) to find the time dependent solution. Only simple matrix

90 multiplication is required. Implementing a solution in for example VBA some considerations are required regarding the step length ∆t. Choosing a too low value gives numerical problems and will also require longer computational time. Choosing a too high step length will cause the approxima- tion in Equation (7.6) to be inaccurate. A rule of thumb will be to use a value of one tenth of the inverse value of the highest transition rate.

7.1.3 Steady state solution for the Markov process In the long run we will have that P˙ (t) → 0 when t → ∞, hence P(t) · A = 0. We define the steady state probabilities by the vector P = [P1,P2,...,Pr], where we have omitted the time dependency (t) to reflect that in the long run the state probabilities are not changing anymore. To solve the steady state equations we realize that the matrix A has not full rank due to the way have have established the diagonal elements. To overcome this problem we remove one (arbitrary) of the equations in the following set of equations:

  a11 a12 ··· a1r    a21 a22 ··· a2r  [P1,P2,...,Pr] ·  . . .  = [0, 0,..., 0]  . . ··· .  ar1 ar2 ··· arr and replace it by the following equation:

∑r Pj = 1 j=1

For example replacing the first equation gives:   1 a12 ··· a1r    1 a22 ··· a2r  [P1,P2,...,Pr] ·  . . .  = [1, 0,..., 0]  . . ··· .  1 ar2 ··· arr

In matrix form we get:

P · A1 = b (7.7) where b is a row vector of zeros except for the first element which equals one. Note that Equation (7.7) is not on standard form A · x = b. Transposing each side on the equal symbol in Equation T · T T (7.7) gives A1 P = b which could be solved by standard Gauss-Jordan elimination. Ideally we could obtain an analytical solution for the steady state equations, but for r > 3 we usually stick to numerical solutions.

Visit frequency The visiting frequency, ν, is one of several system performance that we define for the steady-state situation. The visiting frequency for state j, νj, is the unconditional transition rate into state j.

91 arr dep We could make different arguments for the arrival rate, say νj , and the departure rate, say νj . Considering departures we may argue directly that:

dep νj = αj Pj (7.8)

Similarly for arrival we have from the law of total probability: ∑ arr νj = Pkakj (7.9) k≠ j

Since in the long run we should fulfil the balance equations stating that the total rate into a state equals the total rate out of that state we get: ∑ νj = αj Pj = Pkakj k≠ j

7.1.4 Mean time to first passage to a given state

The visiting frequency νj is the unconditional transition rate into state j, whereas 1/nuj is the unconditional mean time between state j is visited. In some situation we would rather find the mean time to the first time the system enters state j. To solve this problem we can make state j an absorbing state. An absorbing state means that we can not leave that state. To make a state absorbing we just remove all arcs out of that state. Since we are considering state j as an absorbing state, we obtain the transition rate matrix identical with the original transition state matrix, except that the j’th row (corresponding to a departure) comprises only zeros. From before we know that the transition matrix has not full rank, and we may therefore remove one of the equations. This corresponds to removing the j’th column of the matrix. Further, since the row j only contains zeros, the equation for Pj(t) will disappear. We may therefore also remove the j’th row in the transition rate matrix. This means that we have a set of r − 1 differential equations with r unknowns, P1(t),...,Pj−1(t),Pj+1(t),...,Pr(t). Note that when establishing the reduced system by removing the j’th row and the j’th column, we need to treat the modified system with j as an absorbing state. To solve the set of differential equations we introduce ∫the Laplace transform. The Laplace ∗ L ∞ −st transform of a function, say f(t) is given by f (s) = f(t) = 0 e f(t)dt. The following rule applies for the Laplace transform:

L[f ′(t)] = sL[f(t)] − f(0) = sf ∗(s) − f(0)

In addition we have that the Laplace transform of a sum of functions equals the sum of the Laplace transforms of those functions. Now taking the Laplace transform on both sides of the set of differential equations, we observe that the right hand side is the derivative of thestate ∗ − probabilities, hence the Laplace of the right hand side will be sPi (s) Pi(0), where Pi(0) = 1 only for the initial state, and 0 else. − − ∗ ∗ ∗ ∗ The result is a set of r 1 equations with r 1 unknowns, P1 (s),...,Pj−1(s),Pj+1(s),...,Pr (s). In principle we may solve these equations by elimination. ∑ ∗ r ∗ The Laplace transform of the survivor function is R (s) = i=1,i≠ j Pi (s). If we are able to take the inverse Laplace transform, we may also find the survivor function R(t) of the system. A trick to do this would be to arrange the denominator on the form (s−k1)(s−k2) and then factorize, and hope that we get something we recognize from the table of Laplace transforms of known functions.

92 3λ 2λ λ 3 2 1 0 µ µ µ Figure 7.1: Markov transition diagram

Our objective, is however,∫ to find the mean time to the first∫ time the system enters state j. ∞ ∗ L ∞ −st We have that E(T ) = 0 R(t)dt∫. Further we also have R (s) = R(t) = 0 e R(t)dt. Thus, by ∞ 0 ∗ inserting s = 0 we have∑ E(T ) = 0 R(t)e dt = R (0). ∗ r ∗ Since R (0) = ∑i=1 Pi (0) we therefore have that we obtain the mean time to first system r ∗ failure by: MTTF = i=1 Pi (0). Note that we by this procedure may establish the mean time to the first visit to sate j without actually calculating the Laplace transforms. What we actually do is to solve a set of linear equations, ∗ where the unknown variables are the Pi (0)’s from the reduced systems by removing the first row and first column. Further note that the right hand side equals 0 for all equations except the equation ∗ representing the initial state, where the right hand side equals -1, since sPi (s) = 0 for s = 0.

7.2 Birth-death processes

A birth-death process is a special type of Markov process where the transitions are to the next state immediately above or immediately below the current state. The states have some natural ordering, for example the number of customers being served by one or more servers. For that reason we also usually start the numbering from zero rather than one. The transition matrix is then tridiagonal as shown in Equation (7.10):   a00 a01 0 ......    a10 a11 a12 0 ......  A =   (7.10)  0 a21 a22 a23 0 ...  . . 0 a32 a33 a34 0

The above-diagonal elements, aij, j − i = 1 are denoted births and causes the system state to increase by one, whereas the below-diagonal elements, aij, i − j = 1 are denoted deaths, and causes the system state to decrease by one. In birth-death processes it is common to use λ as a transition symbol for births and µ as transition symbol for deaths. A birth-death process may have a finite or an infinite number of states.

Example 7.1 Consider a workshop with three critical machines. Each machine has a constant failure rate equal to λ and there is one repair man that can repair failed machines. The rate of repair is µ meaning that the mean repair time is 1/µ. The state variable represent the number of failed machines. The transition matrix is given by:   ? µ 0 0  λ ? µ 0  A =    0 2λ ? µ  0 0 3λ ?

Figure 7.1 shows the Markov transition diagram corresponding to the transition matrix. Note when the system is in state 3 and all machines are functioning, there are 3 machines that potentially may fail, hence the transition rate from state 3 to state 2 equals 3λ. In state 2 there

93 is only two machines that may fail, hence the transition rate from state 2 to state 1 is 2λ. Since there is only one repair man, all the above-diagonal elements equal the repair rate µ. The question marks in the transition matrix represent the diagonal elements. They are com- pleted at the end when all the “real” transitions are specified by applying the rule that all rows should sum to one, i.e., we get:   −µ µ 0 0  λ −λ − µ µ 0  A =    0 2λ −2λ − µ µ  0 0 3λ −λ

Figure 7.2 shows the specification of this model in MS-Excel. It is convenient to give namesto the cell containing λ and µ. The numerical values used are: λ = 0.001 and µ = 0.1.

Figure 7.2: MS Excel specification of the transition matrix

Table 7.1 shows the calculated steady state probabilities. Full production is achieved in 97% of the operating hours. For some 3% one machine is down for corrective maintenance, whereas the probability of two or more failed machines is very low.

Table 7.1: Steady state probabilities

State Pi 3 0.9703 2 2.91E-02 1 5.82E-04 0 5.82E-06

Problem 7.1 In a workshop there are two production lines in parallel. Each production line has a critical machine with constant failure rate λ = 0.01 failures per hour. There is one (common) spare machine that can replace a failed machine. We assume that switching time can be ignored. The repair rate of the machines is assumed constant and equal to µ = 0.2 per hour. If a production line is down the loss is assumed to be cU = 10 000 NOKs per hour. Only one repair man is available. • Construct the Markov diagram and find the steady state solution.

• Calculate the expected loss due to downtime.

• If production is not 24/7 but runs from 07:00 to 15:00 it is reasonable to assume that each morning we start with 3 functioning machines. Find the time dependent solution and find the expected loss due to downtime.

• Repeat the analysis, but assume that two repair men are available.

94 • How much should one be willing to pay per hour for having this extra backup on repair resources? ♢

7.3 Queue theory models

Queue theory deals with modelling the situation where one or more “servers” serve “customers” arriving for service. The customers may be real customers, or it could be items asking for repair, or products to be processed and so on. The servers may be people, machines and so on. In Example 7.1 the customers are machines entering a workshop for corrective maintenance, whereas the server is the repair man. The probabilistic nature of a queue model is that arrivals and service times are random. Inter- arrival times, or times between arrival will here be denoted Xi, whereas the service times will be denoted Si. the interarrival and service times are governed by so-called arrival processes and service processes respectively. A letter is used to specify the distributions, and the following convention is usually applied: • M for exponentially distributed variables, i.e., the Markov property holds

• Ek for Erlang-k distributed variables • D for deterministic or fixed variables

• GI for general distributed interarrival times

• G for general distributed service times A special notation is used for specification of queue models. A set of parameters are separated by the the slash symbol (“/”) where the order of variables are:

1. The arrival process, M, Ek, D or GI.

2. The service process, M, Ek, D or G. 3. The number of servers, could be infinite .

4. Limit on number in system, could be infinite. If not specified it is assumed to be infinite. The limitation could for example be related to the number of places available in the waiting room.

5. Number in the source, could be infinite. If not specified it is assumed to be infinite. In Example 7.1 the source was limited to 3 since there were only 3 machines.

6. Queue discipline. If not specified it is assumed to be a first come, first served (FCFS), but other principles exist, for example last come, first served (LCFS). In Example 7.1 the situation could be specified by M/M/1/∞/3/FCFS. When analysing queue models the most interesting output variables to consider are: • L = Expected number in the system under steady-state conditions.

• Lq = Expected number in the queue under steady-state conditions, i.e., customers being served are excluded.

95 λ λ λ λ 0 1 2 3 ··· µ µ µ µ Figure 7.3: Markov transition diagram for the M/M/1 queue

• W expected time spent in the system by a customer under steady-state conditions.

• Wq expected time spent in the queue by a customer under steady-state conditions. There is a unique relation between these variables, and if one of them is known the others could easily be derived. The formulas will be presented later on. In the following some standard queue models are discussed.

7.3.1 The M/M/1 queue For the M/M/1 queue we assume that customers arrives according to a Markov process, and the rate of arrivals is constant equal to λ independent of the number of customers being served. There is only one server which has a completion rate of µ. The transition matrix vectA is thus given on the following form where indexing of vectA starts at 0:

  −λ λ 0 ......    µ −(λ + µ) λ 0 ......    A =  0 µ −(λ + µ) λ 0 ...  . .

Figure 7.3 shows the Markov transition diagram. To obtain the steady state solution we have

−λP0 + µP1 = 0 for the first equation, and

λP0 − (λ + µ)P1 + µP2 = 0 for the second equation, and in the general case we have:

λPj−1 − (λ + µ)Pj + µPj+1 = 0, j > 0

We now obtain the steady state probabilities by: λ P = P 1 µ 0 and by substituting P1 into the second equation: ( ) λ 2 P = P 2 µ 0 and in general ( ) λ j P = P j µ 0

96 Since all probabilities sum to unity, we have

[ ( ) ( ) ] λ λ 2 1 = P + P + ··· = P 1 + + + ... 0 1 0 µ µ ∑ ∞ k − The infinite sum of a geometric series is given by k=0 r = 1/(1 r) which is inserted into the − λ brackets to give 1/(1 µ ) provided that λ/µ < 1. Hence, λ 1 = P /(1 − ) 0 µ λ P = 1 − = 1 − ρ 0 µ

λ where ρ = µ is denoted the traffic intensity. Substituting for the other Pj’s we get:

( ) λ j λ P = (1 − ) = ρj(1 − ρ) (7.11) j µ µ

The mean number of customers in the system is given by:

∞ ∞ ∞ ∑ ∑ ∑ ρ L = jP = jρj(1 − ρ) = ρ(1 − ρ) jρj−1 = (7.12) j 1 − ρ j=0 j=1 j=1 ∑ ∞ k−1 − 2 where we have used that k=1 kr = 1/(1 r) for r < 1. To find the expected number of customers in the queue, Lq, we distinguish between those customers that are in the queue, and those being served. Since there is only one server we have the following: If there is zero or one customers, there is no customers in the queue, whereas if it is j > 1 customers there will be j − 1 customers in the queue. Hence:

Lq = 0(P0 + P1) + 1P2 + 2P3 + 3P4 + ... = 1ρ2(1 − ρ) + 2ρ3(1 − ρ) + 3ρ4(1 − ρ) + ... (7.13) [ ] = ρ2(1 − ρ) 1ρ0 + 2ρ1 + 3ρ2 + ... = ρ2/(1 − ρ) ∑ ∞ k−1 − 2 where we again have used that k=1 kr = 1/(1 r) for r < 1. To obtain mean waiting time we use Little’s formula which reads:

L = λW (7.14) or, equivalently:

W = L/λ (7.15)

A similar result also applies for the mean time spent in the queue, i.e.,

Wq = Lq/λ (7.16)

97 For the M/M/1 queue we thus have the following main results:

j Pj = ρ (1 − ρ) ρ L = 1 − ρ ρ2 L = q 1 − ρ (7.17) ρ W = λ(1 − ρ) ρ2 W = q λ(1 − ρ)

Example 7.2 Consider an M/M/1 queue where λ = 2 and µ =3. This means that a customer arrives every half hour in average, and it takes in average 20 minutes to serve the customer. ρ = λ/µ = 2/3 is the average percent of time the server is occupied with serving a customer, and he or she will be idle for one third of the time. Relevant numerical values are:

P0 = (1 − ρ) ≈ 33.33%

P1 = ρP0 ≈ 22.22%

P2 = ρP1 ≈ 14.81%

P3 = ρP2 ≈ 9.88%

P4 = ρP3 ≈ 6.58% L = ρ/1 − ρ = 2 2 Lq = ρ /(1 − ρ) = 4/3 W = ρ/λ(1 − ρ) = 1 = one hour 2 Wq = ρ /λ(1 − ρ) = 2/3 = 40 minutes

Observe that the probability that a customer will experience waiting time is as high as ρ ≈ 67%. In fact it will in average be L = 2 customers in the system when a new customer arrives. The Markovian assumptions imply that we may ignore how long time these customers have been in the system, hence the new customer has to wait 20 minutes per customer already in the system, hence the waiting time for the newly arrived customer is 20 minutes times L = 2, that is, 40 minutes. The M/M/1 queue is so simple that we are able to find closed formula representations of the interesting quantities, and simulation seems to be overkill. However, the simplicity of the M/M/1 queue is attractive for demonstrating discrete event simulation principles. The VBA code below assumes that we have access to a library implemented the manipulation of the pending event set (PES) and a library for random number generation.

Function MainProg_MM1() MaxTime = 100000 InitPES initSystem Do While MaxTime > GetClock() Execute GetNxtEvent() Loop Debug.Print "Mean number of customers in system:" & getL() End Function

98 Function initSystem() L = 0# lambda = 2# mu = 3# systemState = 0 EventNotice rndExponential(1 / lambda), AddressOf OnArrival End Function

Sub OnArrival() L = L + systemState * (GetClock()-PrevTime) If systemState = 0 Then EventNotice rndExponential(1 / mu) + GetClock(), AddressOf OnServiceCompleted End If systemState = systemState + 1 EventNotice rndExponential(1 / lambda) + GetClock(), AddressOf OnArrival PrevTime = GetClock() End Sub

Sub OnServiceCompleted() L = L + systemState * (GetClock()-PrevTime) systemState = systemState - 1 If systemState > 0 Then ' Proceed with the next customer in the queue EventNotice rndExponential(1 / mu) + GetClock(), AddressOf OnServiceCompleted Else 'Do nothing, server goes idle End If PrevTime = GetClock() End Sub

Function getL() getL = L / GetClock() End Function

The pseudo code only focuses on the mean number of customers in the system (L). At each time an event occurs, i.e., a new customer arrives, or the server completes the treatment of a customer, we add the time elapsed since last event times the number of customer (systemState) just prior to the event to the accumulated customer hours in the system. Dividing by the total time of simulation at the end gives the desired statistics, i.e., L. In the system initiation initSystem() the arrival and service rates are specified, the initial system state is set to zero customers and the time for the first customer to arrive is put into the pending event set, i.e., we initiate the arrival process. We do not need to initiate the service process since service takes place first when a customer arrives. The OnArrival() function performs some housekeeping work in addition to initiate the service process if the queue is empty by adding an OnServiceCompleted event to the PES. In addition the arrival time for the next customer is specified by adding an onArrival event to the PES. The OnServiceCompleted() function performs housekeeping in addition to ”call it self” when- ever there are more customers to treat. This means that the ”service” process runs independent

99 of the arrival process as long as there are customers in the system. The only interaction between these processes is that the arrival process triggers the service process when the queue is empty. With the numbers specified the resulting statistics L is around 2 as it should be. The variation is around 1%. This variation is due to the probabilistic nature of the simulation. Increasing the simulation period reduces the variation, but it is required to run the system for very long time to really get precise results. This is in particular due to the Makrovian assumptions. Recall that there are high probabilities for the system to be in a relative high state, hence we need very many simulations to explore those states that contributes to the relevant statistics. The VBA code above did not include statistics for the steady state probabilities nor waiting times. To treat the steady state probabilities we would have needed to add a table, say Dim P(0 To 100)1. Then we need to update this table each time an event is triggered by some statement: P(systemState) = P(systemState) + TimeElapsed() and at the end the elements in the probability table is normalized by dividing by the simulation time. Note that it is not straight forward to obtain statistics for W because the individual customers are not explicitly modelled in the pseudo code given. To obtain statistics for W as well as Wq it is possible to embed a recording process following selected customers from they enter the system after an onArriaval() event until they leave the system. In the onArriaval() a counter could be set for the number of customers in the system and the current time is kept in another variable. The counter is reduced by one each time the OnServiceCompleted() is called. When the counter equals zero this means that the service of that customer is completed, and the total time spent in the system could be calculated and aggregated into a statistics variable. Note that only one customer is followed at a time, meaning that many customer are not followed. It is also a challenge to select the customers to follow. It is required to establish an independent monitoring process such that a new monitoring starts independent of the arrival / service process since the average waiting time should reflect a customer reaching an average number of customers in the system.

7.3.2 The M/M/1/N queue The M/M/1/N queue differs from the M/M/1 queue in that there is a limited capacity to hold new customers. The transition matrix is now finite, and given by:   −λ λ 0 ...... 0    µ −(λ + µ) λ 0 ...... 0    A =  0 µ −(λ + µ) λ 0 ... 0   .   .  0 ...... µ −µ To obtain steady state probabilities, we proceed as for the M/M/1 queue. Now there will be a limited number of equations, and the last equation where λPN−1 − µPN = 0 deviates slightly from the equations in between the first one and the last one. However, since we can replace one equation by the fact that all probabilities should sum to one. The form of this normalization equation now replaces the infinite sum with a finite sum. Applying results for geometric series it canbeshown that the steady state probabilities are given by: 1 − ρ P = ρj , ρ ≠ 1 (7.18) j 1 − ρN+1

1This table should in principle be infinite, but this is not possible in a computer code

100 and 1 P = , ρ = 1 (7.19) j N + 1 where we note that for the M/M/1/N queue it still make sense if the service rate is less than the arrival rate, this only means that some customers leave the system when there is no space in the waiting room. The probability that the system is in a state where the waiting room is full is given by PN = N 1−ρ ̸ ρ 1−ρN+1 assuming ρ = 1. If this is the case, customers are lost and thus the rate of lost customers equals 1 − ρ η(N) = λρN ρ ≠ 1 (7.20) 1 − ρN+1 Note that N is a decision variable, and cost optimization could be used to optimize the system size. The mean number in the system can be found by applying results for finite geometric series, and it may be shown that: ( ) ρ 1 − ρN NρN+1 L = − , ρ ≠ 1 (7.21) 1 − ρ 1 − ρN+1 1 − ρN+1 When ρ = 1 all states are equal probable, hence the mean number in the system equals L = N/2. To obtain mean waiting time Little’s formula may be applied, i.e., L = λW .

7.3.3 The M/M/C/N queue The M/M/C/N queue opens for more than one server which affects the below-diagonal elements:   −λ λ 0 ......    µ −(λ + µ) λ 0 ......     0 2µ −(λ + 2µ) λ 0 ......  A =    0 0 3µ −(λ + 3µ) λ 0 ...  . . where the service rate is proportional to the number of customers in the system, that is up to j = C which limits the service level to Cµ. It is beyond the scope of this text to derive the relevant formulas for the M/M/C/N queue, but it might be shown that for C = N we have: ρj/j! Pj = ∑ (7.22) C i i=0 ρ /i! where in particular the so-called Erlang’s lost call formula is of interest:

ρN /N! PN = ∑ (7.23) C i i=0 ρ /i! since the rate of lost customers is given by η(N) = λPN . The mean number of occupied servers equals the mean number of customers in the system and is given by:

L = ρ (1 − PN ) (7.24)

101 For general C and N closed formulas could not be obtained. However, a numerical solution is straight forward to obtain. Assume that Equation (7.7) is implemented in a routine denoted MatrkovAsymptotic returning the steady state probabilities in a vector [1:N+1]. The only challenge now is to place the appropriate transition rates into the transition matrix. The following VBA code will be sufficient:

Function P_MMCN(lambda , mu , C , N ) For i = 1 To N A(i, i + 1) = lambda If i > C Then A(i + 1, i) = C * mu Else A(i + 1, i) = i * mu End If Next P_MMCN = MarkovAsymptotic(A) End Function where we assume that MatrkovAsymptotic fixes the diagonal elements so that we do not needto consider those.

7.3.4 The M/Ek/1/N queue So far we have assumed that both the arrival and service processes are Markovian. Especially the assumption of exponentially distributed service times is usually unrealistic. A more realistic approach is to assume that service times are Erlang-k distributed. In the first place this is not very attractive since then we have go give up the Markov assumptions. However, the following trick is applied to overcome this challenge. It is well known in probability theory that a sum of k independent and identically distributed exponential random variables is Erlang-k distributed. This means that we may think of the service process as a k-stage process where the server completes the service task in k subtasks where each subtask has an exponentially distributed completion time. In the modelling we now introduce k − 1 artificial states to each state “physical ” state j. These artificial states are used for bookkeeping of the service process. Assume we startwith j = 0. Then a customer arrives which brings the system to state j = 1. Now the server starts to complete the k steps to handle the customer, i.e., bringing the system back to state j = 0. Let the artificial states linked to state j = 1 be denoted 12, 13,..., 1k. We can then think of the system going from state j = 1 through the sates 12, 13,..., 1k before it finally returns back to state j = 0. For example for k = 3 the return trajectory will be: 1 → 12 → 13 → 0. Obviously, the system might be disturbed by a new arrival. Assume that the server is processing the second subtask, i.e., the system is state 12. The newly arrived customer will not affect the ongoing service, but just shift the “‘physical” system state one to the right, i.e., to state 22 where indexes are introduced to state j = 2 in the same manner as was done for state j = 1. Generally we use the notation jl to denote that the server is handling subtask l when there are j customers in the system. Specially note that j1 corresponds to state j. Figure 7.4 shows the Markov transition diagram for N = 3 and k = 3 We now have the following transitions in the system: ≤ ≤ ≤ ajl,(j+1)l = λ, 0 j < N, 1 l k ≤ ≤ ≤ ajl,jl+1 = µ, 1 j N, 1 l < k (7.25) ≤ ≤ ajk,(j−1)1 = µ, 1 j N

102 λ λ 13 23 33 µ µ µ λ λ 12 22 32 µ µ µ µ µ µ λ λ λ 0 1 2 3

Figure 7.4: Markov transition diagram for the M/E3/1/3 queue where elements not specified equal zero and where the sum of each row shall equal 1. Equation 7.25 specifies the so-called embedded Markov chain method. In this presentation we proceed with numerical methods. For the numerical approach we need to find a systematic approach to convert the jl indexing notation to a normal index regime. To hold all transition rates we need an (N +1)k times (N +1)k matrix. We may then use the following index conversion formula for an index jl:

INDEX(j, l) = kj + l (7.26)

Now, assume that the transition matrix has been specified by Equation 7.25 by using the indexing formula 7.26. Let further P˜ be the resulting vector of steady state probabilities from the (N + 1)k times (N +1)k system with artificial nodes. There are k elements in P˜ that represent the “physical” state j and we need to find these:

k(∑j+1) Pj = P˜(l) (7.27) l=kj+1

The expected number of customers in the system is given by:   ∑N k(∑j+1) L = j  P˜(l) (7.28) j=1 l=kj+1

It is relatively easy to apply the formula in Equation (7.28) to find a numerical solution to the ex- pected number of customers. The result may then be compared to the famous Pollaczek-Khinchine formula that applies for an M/G/1 queue. The formula reads:

ρ2 + λ2Var(S) L = ρ + (7.29) 2(1 − ρ) where Var(S) is the variance of the service times. Note that for the Erlang-k distribution with parameters k and λ the mean value and variance is given by k/λ and k/λ2 respectively. Here “λ” represents the intensity in each “subtask” so that λ = kµ where µ is the intensity for the “entire task”. Hence, Var(S) = 1/(kµ2) which could be inserted in Equation (7.29). Note that Equation (7.29) applies for N = ∞ whereas Equation (7.28) applies for a finite N- value. If N = ∞ and a numerical solution is required, for example for the state probabilities, one may choose a (very) large N value for an approximation.

103 7.3.5 Final remarks Several closed formulas for queue problems have been presented in this chapter. Modelling ideas have also been provided such that the reader may specify own conditions and try to solve the problem at hand numerically. However, a wide range of situations remain to be described. Further references are therefore made to a standard textbook in Queuing theory.

Problem 7.2 In this problem an M/M/C = 3/N = 5 system is considered. The rate of customers is λ = 60/15 = 4 per hour and µ = 60/25 per hour.

• Assume first that there is no waiting room, i.e., N = 3. Write a VBA program implementing the Markov steady state solution to find the steady state probabilities and compare withthe analytic result that applies for a M/M/C = 3/N = 3 queue.

• Then change the program to find the steady state probabilities for the M/M/C = 3/N = 5 queue.

• Find the rate of lost customers, and discuss whether it pays off to increase the waiting room.

• Assume this is a hairdressing saloon with opening hours 10:00 to 18:00. Find the time dependent probabilities at 15:00 and at closing time. ♢

Problem 7.3 Consider the situation in Problem 7.2. Write a VBA program to verify the result but now use discrete event simulation.

Problem 7.4 Consider an M/Ek/1/N = 10 queue with λ = 60/30 = 2 per hour and µ = 60/20 = 3 per hour. Write a VBA program that implements the so-called embedded Markov chain method. First assume k = 1 (exponential) and then compare the result for the steady state probabilities for k = 3.

Problem 7.5 To be solved without a computer Consider an M/M/2/2 queue with λ = 1/20 (arrivals per minutes), and µ = 1/15 (treatment rate per server). Find the steady state probabilities for this queue. Find the time dependent solution after 15 minutes by a numerical approach where you use a step-length of ∆t = 5 minutes. Compare with the steady state solution.

Problem 7.6 To be solved without a computer Consider problem 7.5. Explain the main elements of a discrete event simulation model to solve this problem. Assume pseudo random numbers have been generated such that the first customers arrives after 35, 55, 65, 73 and 90 minutes. Assume that it takes 15 minutes to treat the first customer, 20 minutes to treat the second customer, and 15 minutes to treat the third customer. Show the sequence of EventNotice(), and getEvent() for these data.

104 Chapter 8

Inventory models

8.1 Introduction

This chapter deals with inventory control. The term ‘inventory’ is used as the measured amount of some good which varies in time. The inventory level is reduced over time due to a demand process, and increased due to a replenishment process. From the decision maker perspective the aim is to optimize the replenishment process. The simplest models assume that all quantities are deterministic, whereas more realistic models are introduced to cope with uncertainties in the demand and lead times in the replenishment process.

8.2 The classical economic order quantity

The aim of the classical economic order quantity (EOQ) model is to determine the order size when a single commodity is to be ordered. The following assumptions are made:

• The demand per time unit is constant and deterministic and the demand rate d is the number of items demanded per unit time (one “item” could be one litre for liquids, one kilogram for fruits and so on).

• The replenishment occurs immediately after an replenishment is made, i.e., there is zero lead time.

• An replenishment order is made when the inventory level equals zero.

• There is a fixed cost cF for handling each order, for example transportation and administrative costs.

• There is a storage cost cH for holding one item one unit time in the stockroom.

• There is a variable cost per item cI, but this cost will not influence the order size. • Each time a replenishment order is made, Q items are ordered.

The total cost per unit time comprises two cost elements; the fixed order cost and the holding cost. The fixed order cost per unit time will be the order cost per cycle times the number of cyclesper unit time. Since we do not need to consider the variable cost per item associated with an order the order cost per cycle is cF. To find the number of order cycles per unit time we calculate thetime

105 between orders. Since the demand rate is d and the order size is Q, it will take T = Q/d time units to consume one order. Hence the number of orders per unit time equals 1/T = d/Q. The holding cost per item per time unit is cH. To find the total holding cost we need to findhow many items we have in the stockroom. Immediately after the replenishment order is received, the inventory level equals Q and then drops linearly down to zero before a new order is made. Hence, the average inventory level in the cycle equals Q/2. To sum up the total cost per unit time is given by: c d Qc C(Q) = F + H (8.1) Q 2 The optimum order size is found by setting the derivative to zero, i.e.,

dC(Q) c d c = − F + H = 0 (8.2) dQ Q2 2 yielding the well known EOQ formula: √ 2c d Q = F (8.3) cH 8.3 Probabilistic models

The EOQ model described in the previous section makes assumptions that are not realistic in most cases. First of all the demand is usually stochastic, and lead times may also be stochastic in nature. In this section some classical probabilistic models are introduced.

8.3.1 The newsboy problem The newsboy problem is a classical problem in operations management that also is known under other names, that is the newsvendor-, single period- and perishable model. The basic elements to be taken into account is that the sales price is fixed, the demand is stochastic (uncertain) and demand above the quantity ordered Q is lost treated as lost sales. Ordered quantities not sold are will have zero value at the end of the period considered. The challenge is to determine the optimal quantity to order. The quantities of interest are:

• cF = Fixed cost to place an order

• cI = Cost per item ordered • s = Selling price per item

• v = Salvage value per item, i.e., value (positive or negative) of an item not sold

• cP = Lost sale penalty per item, for example future reduction in sale due to unsatisfied customers

• D = Demand, which is stochastic

• fD(d) = Probability density function for the demand • P (Q) = Profit when Q items are ordered, P (Q) is stochastic

106 The objective now is to obtain the expected value of P (Q). For a given demand d we have:

P (Q|d < Q) = sd + v(Q − d) − cF − cIQ (8.4)

P (Q|d > Q) = sQ − cP(d − Q) − cF − cIQ (8.5) The expected profit is found by integrating over the demand distribution, i.e.,: ∫ Q E[P (Q)] = [sd + v(Q − d) − cF − cIQ] fD(d) dd 0 ∫ ∞ + [sQ − cP(d − Q) − cF − cIQ] fD(d) dd ∫Q (8.6) Q = [(s − v)d + vQ] fD(d) dd 0 ∫ ∞ + [(s + cP)Q − cPd)] fD(d) dd − cF − cIQ Q The maximum profit is found by differentiating E[P (Q)] with respect to Q. A challenge to be solved is that Q is involved both in the limits of integration and in the integrand. To overcome this problem Leibinz’s rule is applied.This rule states that if: ∫ b(u) g(u) = h(u, x) dx a(u) then ∫ dg b(u) dh(u, x) db da = dx + h [u, b(u)] − h [u, a(u)] du a(u) du du du Applying Leibinz’s rule yields: ∫ ∫ dE [P (Q)] Q ∞ = v fD(d) dd + (s + cP) fD(d) dd − cI dQ 0 Q

Setting the derivative to zero and recognizing that fD(d) integrates to one is gives: [ ∫ ∞ ] ∫ ∞ 0 = v 1 − fD(d) dd + (s + cP) fD(d) dd − cI Q Q ∫ ∞ = v + (s − v + cP) fD(d) dd − cI Q And the optimum ordering quantity Q is given by the critical ratio policy: ∫ ∞ cI − v fD(d) dd = (8.7) Q s + cP − v or ( ) − −1 s + cP cI Q = FD (8.8) s + cP − v −1 where FD is the inverse cumulative distribution function for the demand. Note that MS Excel provides the NORM.INV( probability, mean, standard_dev ) and GAMMA.INV( probability, alpha, beta) functions for obtaining the inverse cumulative distribution. For the exponential and Weibull distribution closed formulas could be obtained.

107 Problem 8.1 How many Christmas trees to stock for the Christmas season? The selling price of a Christmas tree is s = NOK 350. The cost per Christmas tree when ordered from the farmer is cI = NOK 50. Lost sale penalty per item is cP = NOK 30. The salvage value (i.e., using the tree as firewood) is v = NOK 5. The demand is assumed to be Weibull distributed with mean value E(D) = 100 and shape parameter α = 4. Determine the optimal number of Christmas trees to order.

8.3.2 A lot size, reorder point policy; (r, Q) We now return to the situation discussed for the EOQ-model where multiple periods are considered, but where demand and delay time are stochastic. The following notation is introduced: • L = Lead times, i.e., time from an order is placed until the goods are received. Initially L is considered fixed but later on L is treated as a stochastic variable

• δ = Demand rate, i.e., expected number of items demanded per time unit.

• D = Demand during the lead time. D is stochastic and depends on the lead time L.

• fD(d) = Probability density function for D

• CO() = Expected ordering cost per unit time

• CS() = Expected stock-out cost per unit time, i.e., cost if demand is not fulfilled

• CH() = Expected holding cost per unit time

• cF = Fixed cost per order

• cI = Cost per item ordered

• cP = Lost sale penalty per item

• cH = Holding cost per item per unit time • r = Reorder point, i.e., the critical inventory level that triggers an order. In many presenta- tions r is denoted ROP.

• Q = The quantity to order when inventory level drops below r Two situations are usually considered regarding the stock-out situation. In the back-order case demands that could not be met in a stock-out situation are accumulated as back-orders and fulfilled when the next replenishment takes place, whereas in the lost sales case these orders are never fulfilled. In order to find an optimal policy we consider the expected cost per unit time:

E[C(r, Q)] = CO() + CS() + CH() (8.9) The ordering cost is rather straight forward to obtain. δ is the demand rate per unit time, and if the ordered quantity per order is Q the number of orders per unit time will be δ/Q, hence:

CO(Q) = cFδ/Q + cIδ (8.10) provided that no demands are lost, i.e., in the back-order case. The formula above gives a reasonable approximation for the lost sales case provided that the stock-out is not large.

108 The stock-out cost is the product of the lost sale penalty per item times the expected number of backorders/lost sales per cycle times the number of cycles per unit time. Generally we denote the expected number of back orders or lost sales per cycle by B(r) which is given by: ∫ ∞ B(r) = (d − r)fD(d) dd (8.11) r The expected stock-out cost per time unit is then given by:

CS(r) = cPB(r)δ/Q (8.12) ∫ ( ) ( ) a a−µ − a−µ For the normal distribution we have from Equation (2.25) that −∞ xf(x)dx = µΦ σ σϕ σ which yields the following expression for B(r) when D is normally distributed with mean µ and standard deviation σ: ( [ ]) ( ) r − µ r − µ B(r) = (µ − r) 1 − Φ + σϕ (8.13) σ σ To find the holding cost a very easy argument is given, although a more rigorous argumentis required to prove the results we obtain. Consider a cycle from when an order arrives until just before the next order arrives. When the first order was placed the inventory level was r per definition. From that time on, the inventory depletes to the expected value r − µ where µ = δL is the expect demand in the period from the order was placed until it arrives. This means that the expected inventory level just after an order arrives is in average r − µ + Q. The average inventory over a cycle is the mean value of the level at the beginning and the end of the cycle, hence the expected holding cost per unit time is:

CH(r) = cH (Q/2 + r − δL) (8.14) The argument holds for the back order case. For the lost sales case a slightly different argument applies, which gives:

CH(r) = cH (Q/2 + r − δL + B(r)) (8.15) The total expected cost per unit time is then given by:

E[C(r, Q)] = cFδ/Q + cIδ + cPB(r)δ/Q (8.16) +cH [Q/2 + r − δL + ILSB(r)] where LS represents the lost sales situation, and Ix is the indicator function equal to one if x is true, zero otherwise. Equation (8.16) is on the standard form f(x) = a/x +√bx + c when this function is seen as a function of Q. The optimum for the standard form is x = a/b, hence: √ 2δ [c + c B(r)] Q = F P (8.17) cH which is a function of r. If we are able to find an expression that minimizes Equation (8.16) that does not involve Q we are done. When minimizing Equation (8.16) with respect to r we observe that it involves B(r) which has r as a limit of integration. Again Leibniz’ rule apply, and the solution after differentiation is found by: ∫ ∞ cHQ fD(d) dd = (8.18) r δcP + ILScHQ

109 or ( ) − −1 δcP + (ILS 1)cHQ r = FD (8.19) δcP + ILScHQ Observe that the solution contains Q, hence we have two mutually dependent Equations (8.17) and (8.19) for finding a joint solution. Note that none of these equation contains the item cost, cI . This is not surprising since we in Equation (8.10) assume that item cost is proportional to the demand rate and independent of the policy. The following iterative scheme may be applied to obtain a solution:

1. Initially let B(r) = 0 and i = 1. Obtain Qi from Equation (8.17).

2. Obtain ri from Equation (8.19) where Qi is inserted for Q

3. Let i = i + 1 and obtain Obtain Qi from Equation (8.17) where ri−1 is inserted for r 4. GoTo step 2 as long the solution is changing.

Note that in some situations cHQ > δcP and a solution for r could not be obtained. In such situations we set r = 0.

Example 8.1 The method is demonstrated by the data given in Table8.1. It takes three days before cHthe goods arrive after the order has been placed. Since the expected number of demands per day is 100 the expected number√ of demands during the lead time is E(D) = 3 × 100 = 300, and the standard deviation is SD(D) = 3 × 10 ≈ 17 The results from the iteration is shown in Table

Table 8.1: Data for (r, Q) inventory policy, costs in Euro, time unit days Parameter Value Explanation L 3 Expected lead times δ 100 Demand rate σ 10 Standard deviation in number of demands per unit time cF 500 Fixed cost per order cP 10 Lost sale penalty per item cH 1 Holding cost per item per unit time

8.2. Seven iterations are required for the backorder case, less for the lost sales case. ♢

Table 8.2: Iteration results for the (r, Q) inventory policy Backorder case Lost sales case i Qi ri B(ri) Qi ri B(ri) 1 316 308 3.5 316 312 2.5 2 327 308 3.7 324 312 2.5 3 328 308 3.7 324 312 2.5

Up to now the lead times have been treated as deterministic quantities. If lead times are stochas- tic the variance of the number of demands during the lead time will increase. To find the ex- pected number and variance of demands we may apply Wald’s formula for the expected value and

110 Blackwell–Girshick equation respectively. Wald’s formula states the following: Let Xi be inde- pendent an identically distributed stochastic variables with expected value and variance E(X) and Var(X) respectively. Further pick a random number, say N, of X’s. The expected sum of the X’s picked is given by: ( ) ∑N E Xi = E(N)E(X) (8.20) i=1 where E(N) is the expected number picked. The Blackwell-Girshick equation applies in the same situation, but gives the result for the variance, i.e.,: ( ) ∑N 2 Var Xi = E(N)Var(X) + E (X)Var(N) (8.21) i=1 where Var(N) is the variance in the number picked. These results apply even in the continuous situation, i.e., let δ and σ be the expected number and standard deviation for the demand per time unit respectively, and let E(L) and SD(L) be the expected value and standard deviation for the lead times respectively, then:

E(D) = E(L)δ (8.22) and √ SD(D) = σ2E(L) + δ2Var(L) (8.23)

Equations (8.22) and (8.23) are used when calculating B(r) in Equation 8.17 and when finding the −1 inverse distribution function FD in Equation 8.19.

Example 8.2 We are revisiting Example 8.1 but now assume that lead times are stochastic. We assume that SD(L) = 0.5. For the back order case the Q value increases from 328 to 354, whereas the r value increases from 308 to 320. This is not surprising. The reorder point, r is now “earlier”, i.e., to take account of the variability in lead times, a higher stock level is required at the time of order to prevent stockout. The Q value has also increased, but this is due to the fact that the average cycle length increases from 3.3 days to 3.5 days. It is also interesting to investigate the total cost. If we consider the situation with fixed lead times for the back order case and apply Equation (8.16) the minimum cost for the optimal (r, Q) combination is 371 Euro per time unit. When the lead time is stochastic this number has increased to 375 Euro per time unit. This means that “uncertainty” is not “free of charge”. If we ignore uncertainty and order according to a policy that incorrectly assumes SD(L) = 0 we get an average cost of 509 Euro per time unit. This is an important result, i.e., not taking uncertainty into account is a costly strategy. ♢

Problem 8.2 An (r, Q) model for the Christmas tree situation in Problem 8.1 shall be investigated. We now assume that it is possible to ask for new Christmas trees during the entire Christmas tree season (December). The selling price of a Christmas tree is still s = NOK 350. The cost per Christmas tree when ordered from the farmer is now increased to cI = NOK 100. Lost sale penalty per item is cP = NOK 30. The salvage value (e.g., using the tree as firewood) is v = NOK 5. Lead time is 2 days, but we have increased our businesses such that the demand per day is normally

111 distributed with mean value of 25 and standard deviation of 7. Holding cost per Christmas tree is cH = NOK 5 per day, and fixed cost per order is cF = NOK 1 500. Determine the optimal values for r and Q. Discuss the assumption of a constant demand rate per day. Discuss the role of cI, s and v. Discuss what will happen if cH = increases to NOK 10 per day. ♢

Problem 8.3 To be solved without a computer Consider an (r, Q) policy with data given in Table 8.3. Find the first iteration results for the back- order case. Assume demands during lead times could be approximated by the normal distribution. ♢

Table 8.3: Data for (r, Q) inventory policy, costs in Euro, time unit days - Problem 8.3 Parameter Value Explanation L 5 Deterministic lead times δ 100 Demand rate σ 10 Standard deviation in number of demands per unit time cF 500 Fixed cost per order cP 10 Lost sale penalty per item cH 1 Holding cost per item per unit time

112 Chapter 9

Reliability and maintenance

9.1 Definitions

Reliability and maintenance of a production line is essential for modern production. Some important definitions related to reliability and maintenance are:

Reliability The ability of an item to perform a required function, under given environmental and operational conditions and for a stated period of time [ISO8402].

Maintenance All actions necessary for retaining an item in or restoring it to a specified condition [MIL-STD 721C]. Maintenance can be corrective or preventive.

Availability The ability of an item (under combined aspects of its reliability, maintainability, and maintenance support) to perform its required function at a stated instant of time or over a stated period of time [BS 4778]. Availability at time t: A(t) = Pr(item is functioning at time t)

Reliability measures • Mean time to failure (MTTF)

• Number of failures per time unit (failure rate)

• The probability that the item does not fail in a time interval (survival probability)

• The probability that the item is able to function at time t (availability)

9.2 Reliability terminology

In the following T denotes the stochastic variable of interest. Typically, T is the lifetime of a component. We have:

113 • Distribution function: F (t) = Pr(T ≤ t) (=CDF) d • Probability density function: f(t) = F (t) (=PDF) dt • Survivor function: R(t) = Pr(T > t) = 1 − F (t) f(t) • Failure rate function: z(t) = R(t) ∫ ∫ ∞ · ∞ • Mean time to failure: MTTF = 0 t f(t) dt = 0 R(t) dt ∫ 1 ∞ • Mean residual life: MRL(t) = R(x) dx R(t) t ∫ ∞ · • Mean down time: MDT = 0 t f(t) dt (where f(t) is the down time distribution) The failure rate function, often denoted the hazard rate, is the conditional probability that a component that has survived t will fail in a small time interval [t, t + ∆t >. This means that ∆t · z(t) ≈ Pr(t < T ≤ t + ∆t|T > t). To express the conditional survivor function we introduce R(x|t) = Pr(T > x+t|T > t), i.e., the conditional probability of surviving another x time units given that the component has survived t time units. It follows that Pr(T > x + t ∩ T > t) Pr(T > x + t) R(t + x) R(x|t) = = = Pr(T > t) Pr(T > t) R(t)

Table 9.1 gives relationships between the functions F (t), f(t), R(t), and z(t).

Table 9.1: Relationship between the functions F (t), f(t),R(t), and z(t) Expressed by F (t) f(t) R(t) z(t) ∫ ( ∫ ) t t F (t) = – f(u) du 1 − R(t) 1 − exp − z(u) du 0 0 ( ∫ ) d d t f(t) = F (t) – − R(t) z(t) · exp − z(u) du dt dt 0 ∫ ( ∫ ) ∞ t R(t) = 1 − F (t) f(u) du – exp − z(u) du t 0

dF (t)/dt ∫ f(t) − d z(t) = − ∞ ln R(t) – 1 F (t) t f(u) du dt

114 Reliability Block Diagram (RDB) Reliability block diagrams are valuable when we want to visualise the performance of a system comprised of several (binary) components. Figures 9.1 and 9.2 show reliability block diagrams for simple structures. The interpretation of the diagram is that the system is functioning if there is a connection between a and b, i.e., it is a path of functioning components from a to b. The system is in a fault state (is not functioning) if it does not exist a path of functioning components between a and b.

Figure 9.1: Reliability block diagram for a series structure

Figure 9.2: Reliability block diagram for a parallel structure

State variable The state variable for a component is given by: { 1 if component i is functioning at time t x (t) = (9.1) i 0 if component i is in a fault state at time t

Structure function For the system we now introduce { 1 if the system is functioning at time t ϕ(x, t) = (9.2) 0 if the system is in a fault state (not functioning) at time t

ϕ denotes the structure function, and depends on the xis (x is a vector of all the xis). ϕ(x,t) is thus a mathematical function that uniquely determines whether the system functions or not for a given value of the x-vector. Note that it is not always straight forward to find a mathematical expression for ϕ(x,t).

The structure function for some simple structures In the following we omit the time dependence from the notation.

115 For a serial structure we have ∏n ϕ(x) = x1 · x2 · ... · xn = xi (9.3) i=1 For a parallel structure we have

∏n ⨿n ϕ(x) = 1 − (1 − x1)(1 − x2) ... (1 − xn) = 1 − (1 − xi) = xi (9.4) i=1 i=1 Note that we for two components may simplify:

ϕ(x1, x2) = x1 ⨿ x2 = 1 − (1 − x1)(1 − x2) = x1 + x2 − x1x2 (9.5)

For a k-out-of-n structure we have  ∑n   1 if xi ≥ k i=1 ϕ(x) = ∑n (9.6)   0 if xi < k i=1 A k-out-of-n system is a system that functions if and only if at least k out of the n components in the system is functioning. We often write k oo n to denote a k-out-of-n system, for example 2 oo 3. The expression for the structure function of a k-out-of-n structure is not attractive from a calculation point of view. We may instead represent the structure by a parallel structure( ) where n each of the parallels comprises a serial structure of k components. There( ) are altogether k ways n we may choose k components out of n components, hence we will have k branches. For structures comprised of serial and parallel structures we may combine the above formulas by splitting the reliability block diagram into sub-blocks, and then apply the formulas within a block, and then treat a sub-block as a block on a higher level. Figure 9.3 shows how we may split

Figure 9.3: Splitting the reliability block diagram in sub-blocks the reliability block diagram into sub-blocks, here I and II. We may then write ϕ(x) = ϕI × ϕII because I and II are in serial. Further, we have ϕI = x1, and ϕII = 1-(1-x2)(1- x3), thus we have ϕ(x) = x1(1-(1-x2)(1- x3)).

9.2.1 System structure analysis A system of components is said to be coherent if all its components are relevant and the structure function is nondecreasing in each argument. The results that follow are only valid for coherent structures.

116 Paths and Cuts Cut- and path sets are often valuable for analysing a structure of n components: C = {1, 2, ..., n}:

• A cut set K is a set of components in C which by failing causes the system to fail. A cut set is minimal if it cannot be reduced without loosing its status as a cut set.

• A paht set P is a set of components in C which by functioning ensures that the system is functioning. A path set is minimal if it cannot be reduced without loosing its status as a path set.

Structure Represented by Minimal Path Series Structures

A path, say Pj, may be considered as a series structure. Since any functioning path ensures the system to function, each path may be considered as one branch in a parallel structure of all the minimal paths, hence we have: ⨿p ∏ ϕ(x) = xi

j=1 i∈Pj

Structure Represented by Minimal Cut Parallel Structures

A cut, say Kj, may be considered as a parallel structure. Since any failed cut ensures the system to fail, each cut may be considered as one element in a series structure of all the minimal cuts, hence we have:

∏k ⨿ ϕ(x) = xi

j=1 i∈Kj

Pivotal Decomposition Introduce the following notation:

• ϕ(1i, x) = The structure function of the structure when it is given that component i is in a functioning state, i.e., xi = 1.

• ϕ(0i, x) = The structure function of the structure when it is given that component i is in a fault state, i.e., xi = 0. We then have:

ϕ(x) ≡ xiϕ(1i, x) + (1 − xi)ϕ(0i, x) for all x (9.7)

This result is often used when obtaining the structure function of a complex structure. The idea is to use pivotal decomposition of the component that makes the structure troublesome. Conditioning on that component is functioning, i.e., xi = 1, we may rather easily obtain the structure function of the remaining structure, i.e., ϕ(1i, x), and similarly if xi = 0, we may rather easily obtain the structure function of the remaining structure, i.e., ϕ(0i, x), and then the result for pivotal decomposition may be applied.

117 Summary: Finding the structure function The following principles may be used to find the structure function for a reliability block diagram: ∏ · · · n • For a series structure we have ϕ(x) = x1 x2 ... xn = i=1 xi. ∏ ⨿ − − − − − n − n • For a parallel structure we have ϕ(x) = 1 (1 x1)(1 x2)(1 xn) = 1 i=1(1 xi) = i=1 xi.

• For a k-out-of-n structure we may represent the structure by a parallel structure where( ) each n of the parallels comprises a serial structure of k components. There are( altogether) k ways n we may choose k components out of n components, hence we will have k branches. • For bridge structures and other structures where it is not easy to “see” series and parallel structures, we may use pivotal decomposition, i.e., ϕ(x) ≡ xiϕ(1i, x) + (1 − xi)ϕ(0i, x) where we decompose around the “troublesome” component.

• If the minimal cut sets are available for∏ a system⨿ (or a sub system), the corresponding k structure function is given by: ϕ(x) = ∈ xi, and similarly for the path sets: ⨿ ∏ j=1 i Kj ϕ(x) = p x . j=1 i∈Pj i • We may identify sub-blocks in the diagram (modules), where we for each module represent the state variable of the module by a structure function. i.e., a function of the elements in the module, which is also binary, hence a “formula” can replace the module as if it was a component. In principle any combination of series and parallel structure may be analysed to get the structure function. Bridge structures and k-out-of-n structures may also be handled this way. The same principle may also be used if a sub-block is represented by its minimal cut sets or minimal path sets.

9.2.2 Systems of independent components

Up to now we have used the symbol xi to represent the value a state variable may take. In order to assess component and system reliability, we need to treat the state variables and the structure function as random quantities (stochastic variables). We let Xi(t) denote the state variable i, and X(t) = (X1(t),X2(t), ...Xn(t)) be the state vector. Further the structure function is now a random quantity, i.e., ϕ(X(t)). Now, introduce the following probabilities:

pi(t) = Pr(Xi(t) = 1) = Component reliability

pS(t) = Pr(ϕ(X(t)) = 1) = System reliability

Since both the state variables and the structure function is binary we have:

E(Xi(t)) = pi(t)

E(ϕ(X(t))) = pS(t)

Now assume that we are able to write the structure function as a sum of products of the state variables. Further assume stochastically independent components and that we have removed any n exponents in the expressions, i.e., xi = xi. We then use the results that “the expectation of a sum equals to the sum of expectations” and “the expectation of a product equals the product of the expectations”. This means that pS(t) = E(ϕ(X(t))) will equal the sum of products of expectations, i.e., a sum of products of E(Xi(t))’s. Further since E(Xi(t)) = pi(t) we have in fact proved that

118 the system reliability pS(t) may be found by replacing all the xi’s in the structure function with corresponding pi(t)’s as long as the components are independent. Note that this approach is only valid if we have carried out the multiplication, i.e., resolved any parentheses in the expression for the structure function, and removed any exponents. It now remains to find the component reliabilities, i.e., pi(t). Assume that components fail according to a Markovian process, i.e., failure times are exponentially distributed with failure rate λi. Consider only component i, and assume that when this component fails a repair man starts immediately to repair the component. Assume further that the repair process also is Markovian, i.e., repair times are exponentially distributed with repair rate µi. This failure and repair process can now be modelled by the standard Markov setup discussed in Chapter 7. Let state 1 represent that the system is in a functioning state, and 0 represent that the system is in a fault state. The transition matrix is now given by: [ ] [ ] a a −λ λ A = 00 01 = i i a10 a11 µi −µi

The steady state solution is found to be P0 = 1/(1 + µ/λ) and P1 = 1/(1 + λ/µ). P1 is then the probability that the system is in state 1, i.e., the system reliability. Note that it is also possible to find the time dependent solution, i.e.,P1(t) but usually we do not need the time dependent solution and use P1 = 1/(1 + λ/µ) for the system reliability. Rather than using the failure and repair rates it more common to use mean time to failure and mean time to repair. The notation used is MTTF = 1/λ and MDT = 1/µ where MDT in general is the mean down time which might be longer than the actual repair time, for example due to logistic delay. Introducing these two concepts the component reliability for component i is given by

MTTFi pi = (9.8) MTTFi + MDTi Example 9.1 Assume that we have component 1 in series with a parallel of two components 2 and 3 as shown in Figure 9.4.

Figure 9.4: Example RBD

The structure function is

ϕ(x) = x1 · (x2 ⨿ x3) = x1x2 + x1x3 − x1x2x3

′ We now replace all the xi’s in the structure function with corresponding pi’s and get

pS = p1p2 + p1p3 − p1p2p3

The reliability parameters are specified in Table 9.2. Component reliabilities are calculated by Equation (9.8). Inserting the component steady state reliabilities gives pS = p1p2 + p1p3 − p1p2p3 ≈ 0.9973. ♢

119 Table 9.2: Reliability parameters with calculated reliabilities

i MTTFi MDTi pi 1 10 000 24 0.9976 2 500 8 0.9842 3 700 12 0.9831

Problem 9.1 To be solved without a computer Consider a production facility where production is carried out in two steps (PU = processing unit). Items may be transported between the processing units by two (identical) redundant conveyor belts (CB). Reliability data is given in Table 9.3 where the time unit is hours.

Table 9.3: Reliability data for Problem 9.1 Component MTTF MDT Processing unit 1 4 000 8 Processing unit 2 6 000 8 Conveyor belt 1 000 24

Find the structure function, and the corresponding system reliability. Assume that production loss per hour is cU = 100 000 NOKs. Assume that installing a redundant processing unit 2 yields a yearly increase in capital and operation cost of cY = 5 000 NOKs. Does it pay off to install such a redundant unit? ♢

9.3 Maintenance model - Single component considerations

9.3.1 Preventive maintenance Due to wear and tear of components their failure probability often increases with component age. In order to reduce the likelihood of failures components are often preventively maintained. For example a switch machine in a railway system is in Norway tentatively overhauled every six year. Such an overhaul activity includes replacement of components exposed to wear, lubrication, cleaning etc. In maintenance theory it is made a distinction between age or calendar based activities, and condition based activities. Historically, preventive maintenance was carried out based on the age of a component. The only trigger for maintenance was age, running hour, mileage run for a car etc. Since the correlation between age and failure is rather vague, one seeks to find more precise indicators that could be correlated to the actual failures of the component. This has led to increased used of condition monitoring techniques where the objective is to utilize the condition of a component as an indicator whether a failure will occur in the near future or not. An example of a condition based activity is replacement of a rail when cracks of critical length are found by ultrasonic inspection. The main rationale for preventive maintenance is that it is cheaper to prevent a failure from occurring by a preventive maintenance task than it will be if a failure occurs. A range of optimization models have been derived in order to determine the appropriate level of preventive maintenance.

120 9.3.2 Single activity - Calendar based maintenance Consider a component where a preventive maintenance (PM) action is conducted at predetermined intervals due to an increasing failure rate function z(t). Typically we assume that failure times are Weibull distributed where the failure rate function is given by:

z(t) = αλαtα−1 (9.9)

In order to find an optimal interval for a maintenance action we may establish the average costper time unit as a function of the maintenance interval, say τ:

C(τ) = cPM/τ + λE(τ)[cCM + cEP + cES] (9.10) where cPM is the cost of a preventive maintenance action (to prevent failures), cCM is the cost of a corrective maintenance (CM) action (given that a failure did occur), λE(τ) is the effective failure rate, i.e., the expected number of failures per time unit when the component is preventively maintained every τ time unit, cEP is the expected production losses upon a component failure, and finally cES is the expected safety cost upon a component failure, including material damages and environmental losses. In the following we let cU = cCM + cEP + cES denote the expected unplanned cost upon a failure. The effective failure rate depends on the life time distribution of the component. The Weibull distribution is a widely used distribution for aging components. In the case of Weibull distributed life times we may find approximation formulas for the effective failure rate. If weknow the mean time to failure, MTTF (without maintenance), and the ageing parameter (α) of the lifetime distribution of the component, the effective failure rate may be approximated by: ( ) Γ(1 + 1/α) α λ (τ) = τ α−1 (9.11) E MTTF where Γ(·) is the gamma function. The approximation is good when the maintenance interval is small compared to the MTTF. If the maintenance interval is approaching the MTTF value, the approximation in equation (9.11) is not very accurate, and we might use the following improved approximation: ( ) [ ] α 2 − Γ(1 + 1/α) α−1 0.1ατ (0.09α 0.2)τ λE(τ) = τ 1 − + (9.12) MTTF MTTF2 MTTF In MS Excel the gamma function may be found by EXP(GAMMALN(x)). In the following we will always assume that the approximation in equation (9.11) is sufficient for our purpose. By setting the derivative of C(τ) in equation (9.10) equal to zero, we find the optimal interval to be: ( ) MTTF c 1/α τ ∗ = PM (9.13) Γ(1 + 1/α) cU(α − 1)

Example 9.2 In a production line a packing machine is critical for the production. Mean time to failure is estimated to MTTF = 600 hours (≈ one month) if the machine is not preventively maintained. Failure times are assumed to be Weibull distributed and the ageing parameter is estimated to α = 4. The cost of a preventive maintenance activity is cPM = 5 000 NOKs. If the machine fails the total cost of corrective maintenance and lost production is cU = 35 000. In the example we assume 24/7, i.e., continuous production using several shifts.

121 ( ) ( ) MTTF c 1/α 600 5000 1/4 τ ∗ = PM ≈ ≈ 309 hours Γ(1 + 1/α) cU(α − 1) 0.906 35000 × 3 ♢

Problem 9.2 Find the solution of example 9.2 by using the Solver in MS Excel. ♢

Problem 9.3 Improve the solution of example 9.2 by using the improved formula in Equation 9.12 for the effective failure rate. ♢

Problem 9.4 To be solved without a computer Consider Problem 9.1. Assume that a corrective maintenance strategy is applied for the processing unit 2 today. Further assume that cPM = 7 000 NOKs and cCM = 15 000 NOKs for preventive and corrective maintenance respectively. Assume failure times are Weibull distributed with ageing parameter α = 4. Determine the optimal maintenance interval if a preventive maintenance strategy is followed. You may use that Γ(1 + 1/4) ≈ 0.906. ♢

9.3.3 Synchronization of maintenance and production We will use Example 9.2 to illustrate aspects of synchronization of maintenance and production. The optimal interval is found to be 309 hours which is 12.9 days. The production manager is not very happy with closing down production in order to perform preventive maintenance of the machine. However, every week there is a small production stop required for reconfiguration of the production setup to account for variants in the production. The production manager therefore proposes to utilize every second of these stops for carrying out the maintenance. Figure 9.5 shows the cost as a function of the maintenance interval, and it is rather obvious that shifting the interval from 309 hours to 336 (14 days) is not making much difference. In fact the yearly cost increases from 189 000 per year to 191 000 per year which is only an increase of 1%. The maintenance manager is not very happy since he did all the calculations and claims that 2 000 NOKs here and 2 000 NOKs there makes money. When the production manager puts the argument that the PM cost in fact would increase with at least 1 000 NOK’s due to the interference with production, it is quite obvious that synchronization pays off. In an intense production period the production manager is reluctant to carry out the scheduled maintenance that takes place every fourteen days. The machine has been operated for t = 14 days = 336 hours. Rather than executing the PM activity on the due date, the production manager proposes to wait another week, i.e., to synchronize with the weekly production stop at the next occurrence. The maintenance manager hesitate to this proposal. The question is what are the arguments? Since the challenge now is a “once in a life time” situation, the long term consideration does not apply. The approach is therefore to analyse the increase in maintenance related cost for this shift in maintenance and compare to the original strategy. It will be sufficient to compare the first week, since for both the original strategy andthe situation with the proposed postponing of maintenance from next week on the average cost is given ∗ ∗ ∗ by C(τ ) = cPM/τ + λE(τ )cU ≈ 3 660 NOKs per week. For the original strategy the average cost for the coming week is also 3 660 NOK. When considering deferred maintenance we are facing a higher risk of failure since we are “climbing” on the increasing failure rate function. We will now assess the failure related cost from now on until the next PM. More generally we assume that t

122 Figure 9.5: Cost per hour split into PM and unplanned (U) maintenance cost time units has elapsed since last preventive maintenance, and that we are considering to run the system for another x time units before the next PM activity is performed. The total unplanned failure cost in this entire time period t + x is:

CU (0, t + x) = λE(t + x) · cU · (t + x) (9.14) but we have already “paid” unplanned cost up to time t equal to CU (0, t) = λE(t) · cU · (t), hence the cost in the coming x time units will be:

CU (t, t + x) = CU (0, t + x) − CU (0, t) (9.15)

= λE(t + x) · cU · (t + x) − λE(t) · cU · (t) (9.16)

In the example t = 2 weeks = 336 hours and x = 1 week = 168 hours yielding

CU (t = 336, t + x = 336 + 168) = CU (0, 504) − CU (0, 336) (9.17)

= λE(504) · cU · (504) − λE(336) · cU · (336) (9.18) ≈ 11 760 − 2 320 = 9 438 NOKs (9.19)

This cost may be treated as a cost of a “high risk strategy” in the meaning that we are postponing a maintenance activity that in the optimal case should be executed at time t but we continue to “climb” on the increasing failure rate function. The cost of 9 438 NOKs should then be compared with the base case of 3 660 NOK. The increase in cost is almost 6 000 NOKs and this amount should then be compared to the value of not interrupting the production with maintenance. Note that when calculating the unplanned failure cost in the period of deferred maintenance we have assumed that cU = 35 000. This figure is an average cost of a failure including production interruption. In this case with intense production it is reasonable to assume that cU could be even higher, causing the deferred maintenance strategy even more risky. The production manager therefore has to argue that the ”production value” of deferred maintenance is at least 6 000 NOKs.

9.3.4 Predictive maintenance The idea of predictive maintenance is to utilize the condition of a component and the future expected loads in order to judge the correct time for “hard” maintenance such as overhaul, replacement of

123 worn parts, calibration and so on. Sensor technology is usually used to capture the condition of components or a system, and the term ‘condition monitoring’ is often used to describe the collection and analysis of state data relevant for predictive maintenance. It should be noted that manual inspection and use of “human sensors” to capture noise, smell, vibration could also be treated as condition monitoring. Concepts like digital twin and cyber physical systems are used to describe the situation where computerized system models interact with the physical systems in real time often by means of internet of things (IoT) and internet of services (IoS). Cyber-physical systems (CPS) refers to smart systems that include engineered interacting net- works of physical and computational components. The term digital twin refers to a digital replica of physical assets, processes and systems that can be used in real time for control and decision purposes. The digital twin representation is seen as a prerequisite for effective synchronization of operation and maintenance within the manufacturing industry as well as in other industries. The relation between production plans and activities and actual production can to some extent be described deterministically. The relation between maintenance plans and activities and the production system availability on the other and requires probabilistic representation. The term stochastic digital twin is introduced whenever probabilistic models are required to represent the physical assets, processes and/or systems. A wide range of mathematical models exist for predictive maintenance. In this presentation only two models will be presented to illustrate the relation between maintenance and production. Assume that we at a given point in time t0 observe the state of a component. Let y be the vector describing the state at time t0. Let T denote the point of time when the component fails. Let t denote running time from t0. Given y assume that it is possible to describe the probability distribution of T . For simplicity we will as a starting point assume that T is Weibull distributed with shape parameter α and scale parameter λ. We assume that E(T |y) is relatively small, in the range days, weeks or months since we are in the condition monitoring situation. For example assume that we measure vibration in the bearing of a motor. An experienced maintenance engineer suggest that the mean time to failure, i.e., E(T |y) = 2 months = 60 days. When challenging him further and ask about the chances of failing earlier, after some discussions, he assess the probability that the component will fail within one month to be 10%. We will later on see how these assessment may be used to estimate α and λ. Since the component is likely to fail rather soon it is obvious that a preventive maintenance activity should be conducted. For example replacing the bearing with a new one. The production manager, is however, not very happy with shutting down the production for maintenance. After re- thinking, the production manager opens for using the weekly production shutdown for maintenance purpose. The first shutdown will come in τ1 = 3 days, the next in τ1 +τ days where τ = 7 days, and so on. The cost of the preventive maintenance action will decrease the longer we can wait because −t/θ planning may be improved. We assume a very simple cost structure where cPM(t) = cPM,0e . θ is here a characteristic time for the decrease in the sense that for t = θ the cost has dropped to e−1 ≈ 37%. The challenge is to determine which opportunity to apply, i.e., t = τ1, τ1 + τ, τ1 + 2τ, . . . . The cost function to minimize is:

−t/θ C(t) = cPM,0e + cUFT (t) (9.20)

Assuming that θ = 30 days, cPM,0 = 10 000 NOKs, and cU = 35 000 NOKs we are just to start the minimization. The only challenge is to find the parameters α and λ. −(λt)α For the Weibull distribution we have that E(T ) = Γ(1 + 1/α)/λ and FT (t) = 1 − e . Let x be such that FT (x) = px where both x and px are known. Further since E(T ) also is known we

124 may in principle determine α and λ. A closed formula solution is not possible to obtain, but the following iteration scheme may be applied to obtain α:

ln(− ln(1 − px)) αi+1 = (9.21) ln (xΓ(1 + 1/αi)/E(T )) and then we resolve for the second parameter, i.e., λ = Γ(1 + 1/α)/E(T ).

Example - Synchronization only age information only

For the example with x = 30 days, px = 10% and E(T ) = 60 days we obtain α ≈ 2.78 and λ ≈ 0.0148. Table 9.4 shows the result when applying Equation (9.20) for the calculations.

Table 9.4: PM-, unplanned failure (U)-, and total cost for Example 9.3.4 t PM U Total 3 9 048 6 9 054 10 7 165 173 7 339 17 5 674 752 6 426 24 4 493 1 928 6 421 31 3 558 3 815 7 373

This means that the optimal time for performing the PM task will be the third or forth oppor- tunity. ♢

9.3.5 Predictive maintenance and cyber physical systems Predictive maintenance is about utilizing information regarding the condition of a component and the future expected loads in order to judge the correct time for intervention. In the previous section a simple model was derived but the current condition of the component and the future expected loads were not explicitly used. Some formalism is required for such a utilization. This will be crucial for cyber physical systems (CPS) where a computerized mathematical model of the system is established where real time information regarding state, production profile and plans etc are connected via IoT. A reasonable simple extension of the model used in the previous section will be derived. The starting point is the failure rate function, z(t). We stick to the Weibull distribution where the failure rate function is given by z(t) = αλαtα−1. We observe that z(t) does not contain neither the current state nor the future loads. The so-called Cox-proportional hazard model is often used to incorporate the current state in the failure rate function. Let y be the vector of current relevant state information for the component, for example temperature, vibration level and so on. Next let Let x(t) be the vector of average loads in the time period [0, t). The failure rate function may be written on the form:

β y β x(t) z(t|y, x(t)) = z0(t)e 1 e 2 (9.22) where β1 and β2 are regression coefficient vectors established by for example statistical analysis of data. Data analysis is discussed in Chapter 11. z0(t) is a baseline failure rate function, typically α α−1 on the form z0(t) = αλ t

125 Failure progression

F Failure limit *

P * t PF-interval Figure 9.6: PF-Model

Now assume that the parameters α, λ, β1 and β2 are all known. Further assume that the current component state, y, is known and that we have an estimate of future load x(t). The cost equation to minimize is:

−t/θ C(t) = cPM,0e + cUFT (t|y, x(t)) (9.23) where the cumulative distribution function according to Table (9.1) is given by: ( ∫ ) t FT (t|y, x(t)) = 1 − exp − z(u|y, x(u)) du (9.24) 0 A main objective for cyber physical systems is to set up a regime for data collection and analysis. It is beyond the scope of this presentation to describe relevant statistical methods. Typically a partial likelihood approach is recommended where the impact of the regression coefficient is estimated, and then a separate approach is used for estimation of the failure rate function. If no data is available we might use expert judgements for elicitation of the relevant model parameters. The so-called PF-model is used as a basis for the elicitation. Figure 9.6 shows a principal sketch of the failure progression of a component. Up to point of time P there is no indication of a failure. But then starts failure progression until the failure progression exceeds the failure limit at point of time F. The point of time P is often referred to as a potential failure whereas the point of time F is a real failure. The time interval between the points P and F is denoted the PF-interval. The PF-interval is treated as a stochastic variable. In the assessment the FT (t|y, x(t)) corresponds to the cumulative distribution function of the PF-interval. The procedure for the elicitation is as follows: 1. Assess the expected length of the PF-interval under the assumption of insignificant future load x(t). Denote this value by ξ.

2. Asses the consistency of the PF-interval by the shape parameter α in the Weibull distribution. As a rule of thumb use

• α = 2 corresponds to a variety of failure mechanisms and causes leading to a failure. • α = 3 corresponds to a few failure mechanisms and causes leading to a failure. • α = 4 corresponds to a rather specific failure mechanism / failure cause leading toa failure.

3. Calculate the scale parameter by λ = Γ (1/α + 1) /ξ.

4. For each yi in y let yi = 0 correspond to the condition at the point of time P in Figure 9.6. This corresponds to no significant degradation for the actual regression variable.

126 5. For each yi in y let yi,C be a critical value for that particular regression variable. Under the assumption that all other regression variables yj = 0, j ≠ i assess the reduction in ξ by some factor, say ki. Note that there is no specific “rule” to determine yi,C, and the higher value chosen, the lower value will be assessed for ξ.

6. Calculate the corresponding regression parameters by βi = −(ln ki)/yi,C, i.e., for the elements in β1.

7. Repeat the procedure for each xi(t) in x(t) to find the elements of β2.

Example 9.3 Using vibration data and expected average loads Example 9.3.4 is now slightly modified to take into account explanatory variables. Let y be the vibration level measured by the so-called “RMS” value (Root Mean Square) which is an ISO convention. Technically the RMS value is calculated by multiplying the peak amplitude by 0.707. For machines of medium size the vibration level is mapped into zones where zone A is the normal level which we here assume corresponds to y = 0, zone B which still is considered acceptable ranges from y = 1.8 to y = 4.5, zone C which is critical ranges from y = 4.5 to y = 11.2 and zone D corresponds to y > 11.2. A machine in zone D is considered to have serious damages within very short time and is therefore often protected by a protection system causing the machine to shut down (TRIP). Further let x measure the portion of time the machine is run on more than 90% of maximum capacity. In the elicitation process the maintenance engineer assess the mean residual time to failure, i.e., the time until the protection system will trip the system (PF-interval) to be ξ = 120 days when an anomaly situation occur, i.e., drifting into zone B. Since only vibration and excessive load is considered as influencing factors of the PF-interval the shape parameter is assessed by α = 4. This gives λ = Γ (1/α + 1) /ξ = Γ(1.25)/120 ≈ 0.00775. The critical value for the vibration is set to yC = 4.5 and the corresponding reduction factor for the remaining time to trip is assessed to kY = 0.05 (only 6 days to failure in average). This gives βY = −(ln kY)/yC = −(ln 0.05)/4.5 ≈ 0.666. A machine running with 90% of maximum capacity or more in xC = 0.25 = 25% of the time is assessed to have a reduction factor of kX = 0.1 (12 days to failure in average). This gives βX = −(ln kX)/xC = −(ln 0.1)/.25 ≈ 9.2. The relevant parameters to calculate the cumulative distribution function for the PF-interval in Equation 9.24 have now been established. Now assume that we have observed y = 3 and we assess future loads to be x = 0.1. Inserting in Equation 9.24 and using the cost function in Equation (9.23) Table 9.5 indicates that we should use the opportunity that comes after 31 days.

Table 9.5: Results for Example 9.3 t PM U Total 3 13573 0 13573 10 10748 21 10769 17 8511 176 8687 24 6740 693 7433 31 5337 1894 7231 38 4227 4132 8358

Note that the cumulative distribution function calculated by Equation 9.24 is the unconditional distribution function given we were at point of time P in Figure 9.6. In reality since y = 3 it

127 is reasonable to believe that some days has elapsed since the potential failure was evident. A conditional distribution function is therefore more appropriate. This means that we also need to assess the time since the potential failure occurred. Let t0 be the current time, and assume that the time since the potential failure (P) is s time units. Let t denote the running time from now on, i.e., t0 corresponds to t = 0. Using the rule for conditional probabilities we obtain the following modified cost function: [ ] − | −t/θ 1 FT (t + s y, x(t + s)) C(t) = cPM,0e + cU 1 − 1 − FT (s|y, x(s))

In the example calculation this conditional approach is not used. ♢

Example 9.4 Towards a real time model - The digital twin Example ex:CPS2 is now used as motivation for developing a simple stochastic digital twin. A digital twin may be viewed as a digital simulation model with built in analytics, decision support, and self learning features. Learning features will not be discussed in this example, and only glimpse of analytics is provided. The digital twin is represented by two models, one maintenance model and one production model, where these models interact via the Internet of Things. In the following the maintenance model is denoted the maintenance twin and the production model is denoted the production twin. The physical counterpart of the maintenance twin is the actual component state, the physical load on the machine, the actual maintenance carried out the actual time the machine can not produce due to preventive and/or corrective maintenance and so on. The physical counterpart of the production twin is what is actually being produced, when the production takes place, the economic value of the production, the cost of production, the various machines being used, the use of personnel and resources and so on. Let T be the operational windows for execution of a preventive maintenance task of the packing machine, i.e., the point of times τ1, τ1 + τ, τ1 + 2τ, . . . . The decision support to be provided by the maintenance twin upon a potential failure situation is now:

−t/θ min C(t) = cPM,0e + cUFT (t|y, x(t) (9.25) t∈T The maintenance twin represented by Equation 9.25 has to be implemented on a digital platform, for example MS Excel. The maintenance twin needs to be fed with data from the production twin. Here the production twin is very simple, only a set of predefined scenarios combining different values of cPM,0, cU, y, and x(t). Table 9.6 shows the data used in this simple MS Excel representation of the two twins interacting. In a real life implementation the data in Table 9.6 needs to be generated by the ERP system, the SCADA system and so on.

Table 9.6: Data used in the production twin

cPM,0 cU y x CPS message/Comment 15000 35000 3 0.1 Base line (from example) 15000 35000 3 0.3 High future loads 5000 35000 3 0.1 Cheap PM due to low production 15000 35000 2 0.15 Lower degradation 15000 35000 4 0.15 Very high degradation 15000 100000 3 0.15 Very high failure cost

128 In order to communicate with the maintenance twin, the production twin needs to post data to the internet

Function postInternetData(data) ` Make the data available on internet End Function

Similarly, the maintenance twin needs to get access to this data:

Function getInternetData(specification) ` Get data from internet with some specification of what to get End Function

The maintenance twin is continuously reading data from the internet to come up with mainte- nance decision support. This information should then be posted on the internet and incorporated in the production twin for production and maintenance planning. This has not been implemented in this very first example. ♢

Problem 9.5 To be solved without a computer Consider Problem 9.1 again. For processing unit 1 it exists a condition based maintenance regime, where the condition monitoring system immediately detects a potential failure. Assume that the PF-interval is Weibull distributed with mean value ξ = 100 hours and ageing (shape) parameter α = 5. The production manager offers an opportunity for doing maintenance at time t = 16 hours at a cost of c16 = 100 000 NOKs and at time t = 80 hours at a cost of c80 = 25 000 NOKs. Determine which of these opportunities to choose. ♢

Problem 9.6 This problem should be solved by two or more students. Two digital twins, the maintenance twin and the production twin shall be developed. The maintenance twin shall be able to give decision support related to hen to do maintenance, type of maintenance and so on. The maintenance twin shall also simulate actual degradation (i.e, the physical counterpart). The production twin shall simulate the production in a very simple way and read component state information in order to plan maintenance and production. Define how to share data between the maintenance twin and the production twin. Then try to run the two twins in parallel and produce a log on what is happening. ♢

129 Chapter 10

Decision under uncertainties

10.1 Introduction

In this section we will give a general introduction to the field of decision theory where uncertainties are involved. Examples are related to project risk management. As a motivation consider the following situation where we have to choose between two or more alternatives:

• Choice of sub contractor.

• Choice of concept for an oil production platform.

• Choice between double track and single track for a new railway line.

• Choice of tunnel trace now, or perform more investigation into the ground.

We could also have decisions related to continuous variables:

• When to make an agreement with one of the sub contractors.

• When to start preparing for a major shut-down.

• Choice of diameter for a gas pipeline towards Skogn.

• Dimension of a critical part in a new construction.

10.1.1 Overview of the method We will first consider situations where one and only one decision is to be made. We will denotethe decision with the letter d. The decision alternatives will be denoted a1, a2, …, am. The decision, d, could then be which sub contractor to choose, and the alternative ai is the decision that we choose sub contractor i. The result of our choice will be a set of end consequences Y. The end consequences could occur at different times in the future, but we will simplify and assume thatthe effect will be immediate after the decision is made. In more complex situations we have toconsider that the effect will come some time in the future, and discounting is anissue. Y = [Y1,Y2,...,Yr] is an attribute vector and comprises many dimensions. For example Y1 could be the project duration, Y2 could be the project costs etc. Further we note that the Yi’s are stochastic variables and the values will depend on our decision d.We will seek the decision that gives the “best” value of the attribute vector Y. It is common to differentiate between the following four situations:

130 1. Decision under certainty. In this situation all the outcomes are known, and we will know for sure what the outcome will be for the different decision alternatives.

2. Decision under risk. In this situation all the possible outcomes are known, but we do not know which outcome will be the result of our decision. We are able to state probabilities for the various outcomes.

3. Decision under uncertainty. In this situation all the possible outcomes are known, and we are unsure about the probabilities for the various outcomes.

4. Decision under ignorance. In this situation we do not know all the possible outcomes, and we are also unsure about the probabilities of those outcomes we know about.

In this presentation we will only consider decisions under risk and uncertainties. We also note that many authors claim that it is not meaningful to state uncertainty about the probabilities, it is the outcome which is uncertain, not the probabilities. When it comes to the final outcome, i.e., related to the attribute vector Y we agree that there is no uncertainty in the probability distribution of Y, i.e., the probability distribution contains all the uncertainty about Y. We will therefore not differentiate between the situation of decisions under risk, and decisions under uncertainty. Most frequently we will use the term ‘decision under uncertainty’. As indicated above we will seek the decision d that gives the best “value” of the attribute vector Y, for example the lowest cost and the shortest execution time of a project. There are, however, some difficulties in this approach:

• We will not be neutral to the risk. Very often we are willing to make a decision that do not give the maximum expected revenue, but rather choose an option with a lower expected revenue but with a lower probability of big losses. We are what is called risk averse.

• The attribute vector Y comprises several dimensions, and it is not straight forward how to weight these dimensions. For example how should we treat a project with low cost, but a higher risk of accidents during project execution?

In order to treat such decision situations we introduced the concept of utility theory, and utility functions. We will only briefly mention the major aspects, and refer to[9] for further discussions. First we will treat situations where we make only one decision, and the end consequences are assumed to take effect immediately after our decision. In more complex situations the effects will come on a later stage, and we could make several decision in a sequence with time delays between each decision. In such decision problems we often use decision trees to help assisting the decision process.

10.2 Basic concepts

We use the notation d about a decision where it is only one decision to be made, whereas we use the notation d1, d2,... in the situation where several decisions have to be made. In this situation we also need an extra index for the decision alternatives. In the situation where the decision is to choose a value of a numeric quantity, either discrete or continuous we will notationally not differentiate between the decision, and the decision variable (d).

131 10.2.1 Discrete end consequences vs attribute vector Generally we use Y to describe the values of the end consequences resulting from a decision. In some situations we want to simplify the representation by a set of few end consequences, EC. We will let the end consequences, ECj, be disjoint. Further we will let pj = Pr(ECj occurs) be the probability that we get end consequence ECj. It is not always straight forward to determine what is the most convenient, either to work with the full attribute vector Y, or the set of end consequences EC1, EC2,.... This will depend on the level of precision in the risk analysis, the skill of the risk analyst, or the decision maker etc. In order to see the difference, consider the occupational safety dimension during project ex- ecution. If we work with end consequences it could be natural to introduce the following end consequences:

1. EC0 = No injury

2. EC1 = Minor injury

3. EC2 = Medical treatment

4. EC3 = Serious injury

5. EC4 = 1 fatality

6. EC5 = 2-10 fatalities

7. EC6 = > 10 fatalities

In order to describe the expected result we also specify the corresponding probabilities, p0, p1, . . . p6. These probabilities will be dependent on the decisions we make. If we want to use a full attribute vector we could use Y = [Y1,Y2,Y3,Y4], where Y1 = number of minor injuries, Y2 = number of major injuries, Y3 = number of fatalities, and Y4 = is the number of gross accidents, i.e., accidents with five or more fatalities. In this latter situation we specify the expected outcome intermsof the joint probability distribution function of Y. We then often introduce parameters that depend on the decision d we make.

Utility function The utility function expresses the preferences of the decision maker regarding various attribute vectors or end consequences. A prerequisite for establishing a utility function is that the decision maker is able to express preferences between different values of the attribute vector. For example in a one dimensional situation where we set Y = NPV (net present value) this will be rather obvious in the first place, it is reasonable that all decision makers will prefer a higher value to a lower value. Now let y1 and y2 denote two arbitrary values Y may take. The following relations are of interest between y1 and y2: The utility function is now a function that assigns a one-dimensional utility value to each value of the attribute vector or quantity, u = u(y). For the utility function we require: y1 ∼ y2 ⇔ u(y1) = u(y2) y1 ≻ y2 ⇔ u(y1) > u(y2) y1 ≽ y2 ⇔ u(y1) ≥ u(y2) y1 ≺ y2 ⇔ u(y1) < u(y2) y1 ≼ y2 ⇔ u(y1) ≤ u(y2)

132 Relation Explanation y1 ∼ y2 y1 and y2 is considered equal y1 ≻ y2 y1 is preferred over y2 y1 ≽ y2 y1 is as least as preferable as y2 y1 ≺ y2 y2 is preferred over y1 y1 ≼ y2 y2 is as least as preferable as y1

There exists, however, an infinite number of utility functions that satisfy the above criteria andwe therefore want to fix the utility function for some values. In order to be useful, the utility function should also express how much we prefer for example, y1 over y2. Further we also want the utility function to reflect the fact that there will be uncertainty regarding the future value of the attribute Y . We still consider the one dimensional situation where Y = NPV (net present value). Y will be a stochastic variable in the decision point. Now, assume that we could choose between a decision A that for sure gives the net present value Y = y0 and the decision B that gives the net present value Y = y1 with probability α and the net present value Y = y2 with probability 1 − α. Further assume that y1 ≺ y0 ≺ y2. For a given set of values of y0, y1 and y2 there will exist a value of α which makes the decision maker indifferent between the two decisions A and B. This will be reflected in the utility function which must satisfy:

u(y0) = αu(y1) + (1 − α)u(y2) (10.1) Equation (10.1) could now in principle be used to establish the utility function. In this process we might restrict our selves to let the utility function take values between 0 and 1, or 0 and 100.

Example 10.1 Private economy We are asked to do a job in the firm SmartConsult. It will be 100 hours of work, and we areoffered two options for payment: 1. A fixed hour rate of 20 Euro per hour.

2. A baseline hour rate of 10 Euro, and an additional value of 50 Euro which will be paid if the project reaches the targets that have been set up. We have been studying the progression in the project so far, and assess the probability that the extra 50 Euro to be paid is 40%. Simple calculations shows that the second alternative gives the highest expected hour rate (30 Euro vs 20 Euro). However, we are in a cash position which makes it very difficult for us if we only receive 10 Euro per hour. After some considerations wehavefound out that the two alternatives are equal, i.e., neither of them are prefered over the other. We will now utilise Equation (10.1) to set up our utility function. Three points could be assessed, u(1 000), u(2 000), and u(6 000). Since we arbitrary may choose the end points, we let u(1 000) = 0 and u(6 000) = 100. We now have (α = 0.6):

u(2 000) = 0.6u(1 000) + 0.4u(6 000) = 40

The utility function is shown in Figure 10.1 where we have fitted the function u(y) = 55.702 ln(y)− 384.25 and the diamonds represent the assessed values. ♢

Problem 10.1 Consider Example 10.1. What probability would you required for being paid the additional 50 Euro per hour in order to treat the two alternatives as equal. ♢

133 u(y) 100

y 1000 2000 3000 4000 5000 6000 Figure 10.1: Utility function for Example 10.1

Problem 10.2 Consider Example 10.1 again, but assume that you were going to work 500 hours. Make a sketch of your utility function in this situation. ♢ For private economies we are usually risk averse. Risk aversion means a concave utility function as shown in Figure 10.1. Also smaller enterprises will often be risk averse reflecting that rather than optimising expected revenue, decisions are taken to minimise the probability of big losses which could lead to bankruptcy. Larger enterprises will often have an almost linear utility function (in monetary values) because their economical strength is good, and there is no real possibility for bankruptcy.

Utility function for quantities other than monetary units In Example 10.1 we have seen how the utility function could be established for monetary units. We will now investigate how we could include the safety dimension into the utility function. Two important questions will be raised:

• What is the benefit, or utility of saving one (statistical) life vs saving 10 statistical lives?

• What is the benefit, or utility of saving one (statistical) life vs the possibility to earnanextra million Euro?

In the first situation we deal with the question to rank the consequences within the samemain dimension (safety), whereas we in the second situation need to compare benefits or disadvantages across dimensions. The discussion below will be very short, and we refer to e.g., [10] for further discussion on this topic. The first issue we will discuss is the concept ‘value of prevented fatality’ (VPF). The ideabehind this concept is that in any industrial activity, transportation services etc there will always be a risk of accidents, and hence possibilities of severe injuries and fatalities. As a decision maker we have to face this fact. However, we will make effort to reduce this risk, and we are willing to spend money to achieve such a reduction. The VPF value states then how much we are willing to spend in order to prevent one statistical fatality. We use the term ‘statistical’ fatality to emphasise that this willingness to pay is not related to specific persons, but arbitrary persons where it is not meaningful or possible to identify single persons. In some presentation also the term ‘value of life’ (VOL) has been used. We feel that this term is not appropriate because the term indicates that the life it self has a value which could be measured in monetary units. This is not our perspective. The value of life it self could not be measured. What we could assess figures to, is what we are willing to payin order to reduce risk, or the probability of fatalities. Hence, the term VPF make more sense in our understanding.

134 If we accept that the term VPF make sense, then the next question will be how to assess the value of VPF. Different approaches exist. One approach is to look into economical considerations from the society point of view. For example we could calculate the reduction in GNP (Gross National Product) caused by a fatality. Such calculations have been carried out, and in e.g., Norway this indicate a value of 3 million Euro for VPF. Another approach has been to ask single persons about their willingness to pay for risk reduction (see “The change in risk of death” [16]). For example for buyers of cars, we could ask what they are willing to pay for a given safety system or measure, for example an improved airbag system. Let assume that the amount one is willing to pay is ∆W , and that the assessed risk reduction during the service life of the car is ∆P . It would then be natural to set VPF = ∆W/∆P . In Norway no such systematic surveys have been conducted, but more arbitrary surveys at NTNU among ordinary students and continuation students a value of 2.5 million Euro has also been found for VPF. We will emphasise some challenges of such a willingness to pay approach: • Different persons have different preferences. For example young people tend to be lesswilling to pay for risk reduction compared to older persons with family obligations. • Individuals are not consistent in their preference statements. • In real surveys to establish ∆W and ∆P we face the problem that other dimensions than being killed are involved, e.g., the risk of minor and major injuries. Further one does not only consider the life of one self, but also the life of family members etc when making decisions about safety. • It is not obvious that “what I am willing to pay” is what I want the society to pay for risk reduction in general, or what I expect my employer to pay for my risk reduction. It is a tendency to set a lower value for VPF when it comes to the area of public responsibility compared to industrial activity. For example in the petroleum industry we see VPF values in the order 10 to 15 million Euro. It is also a tendency to set a higher VPF for multiple fatality accidents compared to single fatality accidents. This could be interpreted as an aversion against gross accidents. This aversion should not be confused with risk aversions which would be an aversion against a high number of fatalities in general, and not the number of fatalities in single accidents. An another perspective in this discussion is how we should treat injuries in such a framework. One common approach here is to introduce the concept of ‘equivalent fatality’. For example we could be willing to pay five times more to prevent a fatality than a severe injury, which corresponds to an equivalent fatality of 0.2. In a utility function approach we could now in case of a VPF value 2.5 million Euro let the utility of one fatality be equivalent to -2.5. If we now extend the situation to include multiple fatality accidents, and minor and major injuries, we could set up a more general utility function:

u(y1, y2, y3, y4) = −0.03y1 − 0.5y2 − 2.5y3 − 7y4 (10.2) where y1 is the number of minor injuries, y2 is the number of major injuries, y3 is the number of fatalities in accidents with less than five fatalities, and y4 is the number of fatalities in gross accidents (five or more fatalities in one accident). It is important to emphasise that theutility function offered in Equation (10.2) is a function that could be used as a start in a discussion about value trade-offs and preferences, and should not be considered as the “correct utility function”. Also note that Equation (10.2) includes an aversion against gross accidents, but there is no risk aversion in terms of a concave utility function in the attributes. If we also want to include attribute y7 as the profit in a project measured in million Euro we could extend the utility function:

135 −by7 u(y1, y2, y3, y4) = −0.03y1 − 0.5y2 − 2.5y3 − 7y4 + y7 − ae (10.3) where a and b are constant. Reasonable values of these constants are a = 0.08 and b = 0.7. The utility function in Equation (10.3) is an additive utility function. Very often we use additive utility functions for simplicity. However, arguments could indicate that a situation with one extra fatality is “worse” if there is a situation with a gross accident than without such a gross accident. Such discussions will not be pursued any further, and we refer to [9].

10.2.2 Maximising expected utility In the previous sections we have seen principles for establishing a utility function. The utility function expresses our preferences and value trade-offs. The utility function is independent ofthe given decision situation we are facing and could be viewed as a general function we could use in many decision situations. We also observe that the utility function is a function of the attributes. In a given situation these attributes, Y1, Y2, …, are stochastic variables which also means that the utility function will be stochastic, and the idea is to choose the decision that maximises the expected utility.

Result 10.1 ∫ ∞ The optimal decision d is the decision that maximises expected utility, E(u(Y)) = −∞ u(y)fY(y)dy ♢

The basic steps in obtaining the optimal decision is then:

1. Establish an explicit expression for the utility function, u = u(y1, y2,...) which corresponds to the preferences and value trade-offs of the decision maker.

2. Establish the probability distribution function for the attribute vector Y = [Y1,Y2,...] for each decision alternative, or for each value of a decision variable (d).

3. Calculate the expected utility to each decision alternative by integrating the utility function over the probability distribution of the attribute vector.

4. Find the decision alternative that gives the maximum expected utility.

Problem 10.3 In this problem you shall first make an attempt to construct the utility function u(y) for a given decision maker. In the problem there is only one dimension, and the attribute y is measured in thousand Euro by the procedures we have established in the previous sections. Assume that u(−100) = 0, and u(400) = 1. a) Why do we have the freedom to assess two points on the utility function, and why is is suitable to use these two values. Now, assume that the decision maker makes the following considerations regardingthe outcome of a project: • An uncertain project which gives -100 with probability 0.50 and +400 with probability 0.50 is considered as equal attractive as receiving the fixed amount +150.

• An uncertain project which gives -100 with probability 0.50 and +150 with probability 0.50 is considered as equal attractive as receiving the fixed amount +100.

136 • An uncertain project which gives +150 with probability 0.50 and +400 with probability 0.50 is considered as equal attractive as receiving the fixed amount +225.

b) Draw the points on the utility function which you could calculate based on the above infor- mation, and make a sketch of the utility function in the interval -100 to +400.

c) What does the graph say about the decision makers attitudes to risk?

d) Use the graph to choose the optimum project among the following projects:

A) A project returning -100 with probability 0.2, +150 with probability 0.2 and +350 with probability 0.6. B) A project returning 0 with probability 0.4 and +400 with probability 0.6.

e) Which of these two projects would the decision maker choose if he adopts the principle of maximum expectation. ♢

Problem 10.4 In a tunnel project one could choose between bursting or drilling. Bursting is considered to be the cheapest alternative, but the risk of personal injuries or fatalities is considered higher. Assume the utility function given in Equation 10.2 on page 132. Let fi = E(Yi), i = 1,..., 4 be the expected number of minor injuries, serious injures etc. and assume the following numbers:

• Bursting [f1,f2,…,f4] = [10, 1, 0.03,0.008]

• Drilling [f1,f2,…,f4] = [7, 0.2, 0.01,0.001] How much cheaper need bursting be compared to drilling if these two methods should be equally valued with respect to utility? ♢

Problem 10.5 To be solved without a computer// A company has established the following utility function: u(y) = y − ae−by, where a = 1/5 and b = 1/2, to be applied for prioritization of projects. y is given in million NOKs and represents the profit in the projects.

a. Make a sketch of the utility function, and discuss the risk attitude of the company.

b. The company considers to invest in a project which (i) either gives a profit of 10 (million NOKs), or (ii) a loss of 10 (million NOKs). What probability of success is required to invest in the project?

c. The company is going to choose between two projects, A and B. By analysis the following 2 2 2 has been derived: YA ∼ N(6, 4 ) and YB ∼ N(6.1, 5 ), where Y is the profit, and ∼ N(µ, σ ) indicates normally distributed quantities. Calculate the expected utility for each project to decide which project is best. Discuss the result. Hint: Show that if Y is normally distributed with expected value µ and standard deviation σ, then the expected utility is found by use of moment generating function to be: E(u(Y )) = µ − ae−bµ+1/2b2σ2 provided the utility function is given by u(y) = u(y) = y − ae−by. ♢

137 10.2.3 Examples with one decision node In this section we will investigate examples where only one decision is going to be made.

Example 10.2 Maximising expected utility - private economy In Example 10.1 we established the utility function in a situation with private economy. The function could be written as: u(y) = 55.702 ln(y) − 384.25. Now, assume that we are offered a job with two different forms of payment. For both alternatives the possible amounts are:

• Y = YL = 2,000

• Y = YM =5,000

• Y = YH =8,000 However, there is a difference in the probabilities for the two alternative forms of payments. These are shown under the column p for alternative a1 and a2 in Table 10.1 respectively. Expected utility

Table 10.1: Expected utility and expected values

Alternative a1 Alternative a2 ↓ p U V P U V Amount 2,000 0.1 3.9 200 0.3 11.7 600 5,000 0.8 72.1 4,000 0.4 36.1 2,000 8,000 0.1 11.6 800 0.3 34.9 2,400 Sum→ 1.0 87.7 5,000 1.0 82.7 5,000 is found by equation (10.4). ∑ ∑ E(u(Y )) = u(y) Pr(Y = y) = (55.702 ln(y) − 384.25) Pr(Y = y)

y∈{YL,YM ,YH } y∈{YL,YM ,YH } where each term in the sum is calculated in the column for U in Table 10.1. In the V column we have similarly calculated the expected monetary value. In the last row the sum is shown, and we observe that the expected utility for a1 and a2 is 87.7 and 82.7 respectively, and hence alternative a1 has the largest expected utility. When expectations are considered, the two alternatives are equivalent.

Problem 10.6 Consider Example 10.2, but now assume that the probability distribution for the payments are ∼ PERT(4000,5000,6000) and ∼ PERT(2000,6000,8000) respectively. Hint: you might use the program pRisk.xlsm. ♢

Problem 10.7 We are going to support a decision maker (DM), where we will use utility theory to give a decision support for ICT project with high risk (huge down side). A utility function on the following format is proposed:

u(y) = y − ae−by where a = 1.5 and b = 0.3 are constants, and, y is the profit (= income– investment amount) of the investment in million NOKs.

138 To determine the parameters a and b, we did put forward several questions to the DM. One of the questions was as follows: Consider an investment A with two possible outcomes, y1 = −5, and y2 = 5 respectively (in million NOKs). For which probability α = Pr(Y = y1) will the investment A be equally attractive as compared not to invest at all, i.e., a certain profit of y0 = 0. The project comprises 4 activities with relevant cost figures shown in Table 10.2 Activity A3 is

Table 10.2: Data for Problem 10.7 Activity E(Cost) SD(Cost) A1 5 3 A2 10 5 A3 10 5 A4 2 1 development of an “App” where the user can upload data from iPhone or iPad. (Currently Android is not supported). Rather than developing this app, it is possible to buy such an application from some former NTNU students. The DM has presented the “requirements” to the students, and based on this they have given a fixed price offer of 12 million NOKs for the application (included source code). The standard deviation is thus considered to be 0. In the following other aspects of buying rather than developing in-house is not considered. The selling price of the product from the ICT-project is 30 million NOKs. If Y is normally distributed with expected value µ and standard deviation σ, then the expected −bµ+ 1 b2σ2 utility is found (by use of moment generating function) to be: E[u(Y )] = µ − ae 2 provided the utility function is given by u(y) = y − ae−by. (a) Explain what is meant by a utility function, and what it is used for. What probability α would be consistent with the values a = 1.5 and b = 0.3. Make a sketch of the utility function.

(b) Use this result to recommend whether to develop the “App” in-house, or buy it from the company of former NTNU students. State your assumptions.

10.2.4 Decision trees The use of decision trees is a fruitful approach when we are going to systemise a decision process where the decisions are made at different point of times. The main reason for postponing a decision is to follow the development of e.g., a project, and hence make the most appropriate decision when more information is available. The drawback is that postponing a decision could yield more costly solutions. Another drawback could be that it is no time to implement necessary measures in due time if we wait to take action.

Analysis of decision trees The following notation is introduced:

CNi = Chance node i pi,j = probability that chance node i results in outcome j. DNi = Decision node i. CINj = Cost of intermediate node j CTNj = Cost related to terminal node j. EMV = Expected Monetary Value .

139 Starting point

Decision node

Chance node

Consecutive costs

C End consequences, terminal node

Coupling between nodes and end consequences Figure 10.2: Symbols used in decision trees

EMVi,j = EMV for branch j into chance node i. EMVi = EMV for chance node i, or decision node i. The algorithm for numeric calculating is shown in Figure 10.3.

Example 10.3 Construction Ltd. is the main contractor for a road tunnel project. During the work more water penetration than expected is discovered. Physically there are three alternatives to choose among:i) bursting an outlet drain which is very costly but a satisfactorily solution, ii) build a pumping station to pump away the water which is a cheaper solution but may not be adequate if there is very much water, and iii) carry out seal work which is even cheaper, but adequate only in case of very little water. The amount of water is uncertain at the time being. Below we discuss the decision process: The first decision is now 1(DN ), and at this decision node we have the following options: A: Immediate start bursting work

B: Wait half a year until more information about the amount water is available

If we postpone the decision (B) we would have more information about the amount of water in half a year and a better decision could be made. Two outcomes are foreseen in half a year (CN1): C: It is obviously so much water that bursting the outlet drain is necessary

D: There is still uncertainty regarding the amount of water, and we have a new option in decision node DN2:

E: Build a pumping station and hope that this is sufficient, or

F: Wait another half year to obtain even more information

If the pumping station is build at this time (E) this could result in the following outcomes (CN2): G: The pumping station was sufficient

H: The pumping station was not sufficient, and an outlet drain have to be bursted

140 Repeat for all end terminals (*) Move to the node to the left, and bring with the EMV-value in the current node IF this is a chance node THEN Calculate EMVi,j = pi,j· EMV IF EMV has been calculated for all branches into this node THEN Calculate EMVi = Σj EMVi,j GoTo (*) ELSE GoTo next terminal node ENDIF ELSEIF this is a decision node THEN IF EMV has been calculated for all branches into this node THEN Let EMVi = Min j( EMVi,j) Optimum decision in DNi is the branch with minimum EMVi,j GoTo (*) ELSE GoTo next terminal node ENDIF ELSEIF this is a consecutive node THEN Add EMV of the consecutive node to EMV GoTo (*) ELSEIF this is the start node THEN We are done ENDIF

Figure 10.3: Algorithm for processing a decision tree

If we wanted to postpone the decision (F) there are tree possible outcomes (CN3): I: Bursting the outlet drain is required

J: Sealing work is sufficient

K: A pumping station is sufficient

Table 10.3 shows the associate costs (the letter in parentheses corresponds to the alternative above).

Table 10.3: Cost of the various options Options Cost now Cost in half a year Cost in one year Outlet drainage bursting 50 mil. (A) 60 mil. (C) 70 mil. (I,H) Pumping station 20 mil. (G) 25 mil. (K) Seal work 10 mil. (J)

At the moment we make the following probability assessments:

P(C|CN1) = 30% P(D|CN1) = 70%

141 Drain outlet A 50 million

DN Drain outlet 1 C 60 million Pump station B CN E CN G 1 2 20 million DN Fiasco D 2 H 90 million Drain outlet I 70 million

F CN J Seal work 3 10 million Pump station K 25 million Figure 10.4: Decision tree for tunnel project

P(G|CN2) = 90% P(H|CN2) = 10% P(I|CN3) = 10% P(J|CN3) = 40% P(K|CN3) = 50% Note that in the example we have not used the symbol for consecutive costs. For the calculation we use the algorithm indicated in Figure 10.3. We start with the upper right terminal node, and “collect” the EMV = 50 mill. into the decision node to the left, e.g., DN1. In this decision node we observe that not all branches (from the right) into node DN1 have been processed, and we therefore need to go back to a new terminal node. We go back to the next upper terminal node and collect EMV = 60 mill. which is multiplied with the branch probability (30%) such that we get EMV = 0.3 · 60 mill. = 18 mill. into chance node CN1. Here, the second branch into the chance node has not been processed and we again have to go back to the next non-processed terminal node. Here we collected EMV = 20 mil which is multiplied with 90% gives EMV = 18 mill. into chacne node CN2. Similarily we get EMV = 90 mill. · 10% = 9 mill. for the second branch into chance noe CN2. We may now complete the processing of chance node CN2 by adding the EMV values entering the node from the right, yilding an EMV of 27 mill. This number now goes into decision node DN2. Now the remaining end nodes are processed, and we get the EMV to collect from CN3 equal to 23.5 which again will be the second EMV into decision node DN2. In decision node DN2 we shall choose the branch having the lowest EMV value, i.e., branch F with an EMV of 23.5. In decision node DN2 it is most beneficial to postpone the decision for another half year. We complete the tree similarly, and find that in decision node DN1 the optimal decision is to postpone any physical activity. We remain then with an EMV equal to 34.45. The number from these calculations are shown in Figure 10.5. Also note that we have not taken the discounting aspects into account, something that also would have been an argument for postponing the decision. ♢

Problem 10.8 An oil company has the rights for a given oil field, and have the options between: • Drill a well (D)

• Sell our rights (S)

142 50 Drain outlet 50 million

DN 18 Drain outlet 1 60 million Pump station 34.45 CN 27 CN 18 1 2 20 million

DN 9 Fiasco 16.45 2 90 million Drain outlet 7 70 million

23.4 CN 4 Seal work 3 10 million Pump station 12.5 25 million Figure 10.5: Decision tree for tunnel project with calculations

The decision will depend on the amount of oil that might reside in the field. There are two options: • Profitable oil pool (P) • Non profitable oil pool (NP) Before the final decision is made, our oil company could conduct an expensive seismic investigation which might give information regarding the probability that the field contains a profitable oil pool. The result from such an investigation will be one of the following statements: • No structure (NS) • Open structure (OS) • Closed structure (CS) In this problem you should establish a decision tree for the situation. The decision node following the start node should be: • Perform a seismic investigation (SEI) • Do not perform a seismic investigation (NINV) The cost involved in the decision tree is as follows:

The probability that shall go into the decision tree is as follows:

Probabilities in relation to a seismic investigation (e.g., based on experience figures from geol- ogists):

The probabilities for finding a profitable oil pool, given the result of the seismic investigation:

The probability for finding oil when no seismic investigation has been performed: 0.20. Findthe optimal decision in each decision node, and formulate the conclusions from your analysis. ♢

143 Cost of seismic inveestigation: 10 Cost of drilling a well: 100

Net profit (after drilling) if oil: 700

Selling price without seismic investigation: 30 Selling price with seismic investigation: - No structure 0 - Open structure 50 - Closed structure 200

No structure: 0.60 Open structure: 0.30 Closed structure: 0.10

No structure: 0.10 Open structure: 0.25 Closed structure: 0.70

144 Chapter 11

Parameter estimation

11.1 Introduction

In this chapter we will briefly describe principles for parameter estimation. A parameter inthis context is a quantity in the risk analysis for which we assign numerical values. There are two principles for establishing numerical values (parameter estimates):

• Statistical analysis of historical data

• Use of expert judgements

If we have access to relevant data we will usually use these data to estimate the parameters. Often we have little relevant data, and we then have to rely on expert judgements. In some situations we combine historical data with expert judgements by use of Bayesian methods.

11.2 The MLE principle

The basic idea behind the Maximum Likelihood Estimation (MLE) principle is to choose the nu- merical values of the parameters that are the most likely ones in light of the data. The procedure goes as follows:

• Assume that we know the probability density function of the observations for which we have data. Let this distribution be denoted f(x; θ).

• The involved parameters, are unknown, and are generally denoted θ.

• We have n independent observations (data points) that we denote X1, X2,…Xn. When we refer to the actual numerical values we have observed, we use the notation x1, x2,…xn.

The MLE principle now tells us to estimate θ by the value which is most likely given the observed data. To define “likelihood” we use the probability density function. The simultaneous probability density for X1, X2,…Xn is given by: ∏n f(x1; θ)f(x2; θ) . . . f(xn; θ) = f(xi; θ) (11.1) i=1

145 This density express how likely a given combination of the x-values are, given the value of θ. However, in our situation the x-values are given, whereas θ is unknown. We therefore interchange the arguments, and consider the expression as a function of θ: ∏n L(θ; x1, x1 . . . xn) = f(xi; θ) (11.2) i=1 where L(θ; x1, x1 . . . xn) in equation (11.2) denotes the likelihood function. The MLE principle will now be formulated as to choose the θ-value that maximizes the likelihood function. To denote the MLE estimator we write a “hat” over θ, θˆ. Generally, θ will be a function of the observations:

θˆ = θˆ(X1,X2,...Xn) (11.3)

When we insert numerical values for the x’s we denote the result as the parameter estimate.

Example 11.1 Estimation in the exponential distribution We consider the situation where we have observed n failure times, and we will estimate the failure rate, λ, under the assumption of exponentially distributed failure times.The observed failure times are denoted t1, t2, …, tn. Equation (11.2) gives:

∏n −λti L(λ; t1, t2, . . . , tn) = λe i=1 Note that the parameter is denoted λ, whereas we generally use θ. Further we denote the observa- tions with t because we here have failure times. The probability density function in the exponential distribution is given by f(t) = λe−λt. A common “trick” when maximising the likelihood function is to take the logarithm. Because the logarithm (ln) function is monotonically increasing, ln L will also be maximised for the same value as for which L is maximised. We could then find: ∑n l(λ; t1, t2, . . . , tn) = ln L(λ; t1, t2, . . . , tn) = n ln λ − λti i=1 By taking the derivative wrt λ and set this expression to zero, we easily obtain: / ∑n λˆ = n ti i=1 ♢

Problem 11.1 Find the MLE for µ and σ in the normal distribution. ♢

11.3 Method of moments – PERT distribution

The maximum likelihood principle is not numerically stable for the PERT distribution. We will therefore apply another principle, i.e., the methods of moments. The method of moments is a method of estimation of population parameters by equating sample moments with unobservable population moments and then solving those equations for the quantities to be estimated. In the PERT distribution we have the following moments: L + 4M + H E(X) = (11.4) 6 (E(X) − L)(H − E(X)) Var(X) = (11.5) 7

146 ∑ − − From (11.4) we easily obtain M = ¼(6E(X) L H). Now let x¯ = i xi denote the sample mean (first order moment), and we may estimate M by: Mˆ = ¼(6¯x − Lˆ − Hˆ ) (11.6) ˆ ˆ The challenge now is to find L and∑H. By rearranging equation (11.5) and inserting x¯ for E(X) 2 1 − 2 and the sample variance S = n−1 i (xi x¯) for Var(X) we have: 7S2 = (¯x − L)(H − x¯) (11.7) where L and H are the two unknowns to find by one equation. To overcome the problem of underdetermination it seems reasonable to require (x¯ − L)/(H − x¯) = (¯x − xMin)/(xMax − x¯) where xMin and xMax are the sample minimum and maximum respectively. Hence we have: x¯ − x 7S2 = (H − x¯)2 Min (11.8) xMax − x¯ yielding √ x − x¯ Hˆ =x ¯ + S 7 Max (11.9) x¯ − xMin and (Hˆ − x¯)(¯x − x ) Lˆ =x ¯ − Min (11.10) xMax − x¯ thus, the final estimates in the PERT distribution are given by equations (11.6), (11.9) and (11.10).

Problem 11.2 Assume that we have observed the following durations for a typical activity in projects: 9.3, 10.5, 9.4, 9.0, 9.4, 8.6, 10.1, 10.7, 12.0, 10.6, 9.8, 13.1, 12.0, 8.6 and 10.9. Estimate the parameters if you assume that the durations are PERT distributed. Hint: Use the Average() and STDEV() functions in MS Excel to find x¯ and S. ♢

11.4 The LS principle

The least squares (LS) principle for estimation is used when we have observations that do not come from the same distribution, but we know the expectation of each variable as a function of a set of parameters θ, and a set of explanatory variables. Previously we denoted the observations by the letter ‘X’, but we will now change the notation to let ‘Y ’ denote the observations, whereas we reserve the letter ‘X’ for explanatory variables. We now let ϕi(θ) denote the expectation of Yi (the i’th observation), where the functions ϕi are all known, but the parameter vector θ is unknown and shall be estimated. The LS principle now states that we may estimate θ by the value that minimises the square sum of the deviations between the observed and expected values, i.e.: ∑n 2 Q(θ) = [yi − ϕi(θ] (11.11) i=1 Equation (11.11) is the starting point for estimating the parameters in so-called regression models. The most simple formula is given by:

E(Yi) = β0 + β1xi (11.12) In this model x is denoted the independent variable, whereas Y is denoted the dependent variable because it depends on the independent variable, x.

147 Problem∑ 11.3 Prove that the estimators for β0 and β1 in equation (11.12) is given by: (x −x¯)y ˆ ∑i i i ˆ − ˆ β1 = 2 and β0 =y ¯ β1x¯. (xi−x¯) i ♢

The model in equation (11.12) could be extended to cover more independent variables. These are denoted regression variables , or explanatory variables. To extend the model we introduce an ′ extra index for each x. We write xij, where index i denotes the i th data point, whereas index j denotes the j’th explanatory variable. The model then reads:

E(Yi) = β0 + β1xi,1 + β2xi,2 + ··· + βrxi,r (11.13)

To obtain the LS estimators in this situation, we introduce matrix notation. Let y = [y1,y2, …, T yn] be a column vector containing the dependent variables, and let X be the design matrix given by:   1 x11 ...... x1r    1 x21 x1r  X =   (11.14) : xij 1 xi1 ...... xnr

T It could be shown that the LS estimator for β = [β0, β1, β2, ··· , βr] is given as the solution of the following matrix equation:

XTy = XTXβ (11.15)

If the design matrix has full rank, XTX will be non-singular, and the solution is given by:

βˆ = (XTX)−1XTy (11.16)

If one has access to a tool for matrix calculus, we easily obtain the LS estimates. We could also use commercial available statistical programs, or the “analysis” module of MS Excel.

Example 11.2 Estimation of the effects of regression variables We will consider a situation where we have observed the duration of construction the foundation wall of houses. The different values are shown in the Y-column below. The variable x1 denotes the base in square meters, whereas x2 is an indicator of ground frost. A value is given as 1 if there is ground frost, 0 otherwise. We have also introduced the variable x3 that denotes the walking distance from the workmen’s hut to the building site:

Y x1 x2 x3 8.4 100 1 100 7.8 150 1 50 11.4 250 1 50 6.1 80 0 75 6.1 100 0 200 8.3 90 1 30 7.5 180 0 25 7.2 200 0 50 6.0 110 0 75

148 From MS Excel we obtain the following parameters: βˆ0 = 4.211, βˆ1 = 0.0167, βˆ2 = 2.196, and βˆ3 = 0.0011 ♢

Problem 11.4 Use MS Excel to verify the above results. ♢

Note that we in equation (11.13) have written the expected value of Yi. Generally we write:

Yi = β0 + β1xi,1 + β2xi,2 + ··· + βrxi,r + εi (11.17) where εi is an error-term. Very often we assume εi to be normally distributed, but we might also assume that εi is PERT distributed. To estimate the parameters in an underlying PERT distribution we calculate the predicted values:

yˆi = βˆ0 + βˆ1xi,1 + βˆ2xi,2 + ··· + βˆrxi,r (11.18) then we estimate the error-terms by the residuals:

εˆi = yi − yˆi (11.19)

The residuals εˆi may now be used as input in the method of moments to estimate PERT parameters. Below se summarize the procedure to get L, M and H values for an activity, cost element etc. in a specific project.

1. Estimate regression parameters from data from similar projects: βˆ = (XTX)−1XTy

2. Calculate the predicted values: yˆi = βˆ0 + βˆ1xi,1 + βˆ2xi,2 + ··· + βˆrxi,r

3. Estimate the error-terms by the residuals: εˆi = yi − yˆi

4. Use the estimates εˆi as basis for estimation of L, M and H, i.e., find Lˆ, Mˆ and Hˆ by the method of moments

5. For the new activity, cost element etc., find the corresponding x-vector, and denote it x = [x1, x2, . . . , xr]

6. Find the prediction of the new observation by y0 = βˆ0 + βˆ1x1 + βˆ2x2 + ··· + βˆrxr

7. The PERT parameters to use in the analysis are given by L = Lˆ + y0, M = Mˆ + y0 and H = Hˆ + y0

Problem 11.5 Calculate the residuals in equation (11.19) with the data in Example 11.2 and estimate the parameters by assuming the residuals are PERT distributed. Find the probability that the duration for observation number 5 is shorter than the observed value of 6.1. From the observations it seems that duration number 3 is rather long. Conclude on this by applying the regression model. ♢

149 11.5 Bayesian meothds

In Bayesian estimation procedures we utilise prior information about the reliability parameters. The procedure could briefly be described as follows:

1. Specify a prior uncertainty distribution of the reliability parameter, π(θ).

2. Structure reliability data information into a likelihood function, L(θ; x), see equation (11.2).

3. Calculate the posterior uncertainty distribution of the reliability parameter vector, π(θ|x). The posterior distribution is found by π(θ|x) ∝ L(θ; x)π(θ), and the proportionality constant is found by requiring the posterior to integrate to one.

4. The Bayes estimate for the reliability parameter is given by the posterior mean, which in principle could be found by integration.

Example 11.3 Exponential distribution In the following we give an example showing the main elements of the procedure. In the example we will estimate the failure rate in the constant failure rate situation. Assume that we express our prior believe1 about the failure rate λ of a certain component (gas detector used on an oil and gas platform), in terms of the mean value E = 0.7 · 10−6 (failures / hour), and the standard deviation S = 0.3 · 10−6. For mathematical convenience, it is common to choose a gamma distribution2 with parameters α and ξ for the prior distribution. The expected value (E) and the variance (V ) in the gamma distribution are given by α/ξ and α/ξ2 respectively, and we obtain the following expressions for α and ξ:

ξ = E/V = E/S2 = (0.7 · 10−6)/(0.3 · 10−6)2 = 7.78 · 106 α = ξE = (7.78 · 106) · (0.7 · 10−6) = 5.44

To establish the likelihood function, we look at the data. In this example we assume that we have observed identical units for a total time in service, t, equal to 525 600 hours (e.g. 60 detector years). In this period we have observed n = 1 failure. If we assume exponentially distributed failure times, we know that the number of failures in a period of length t, N(t), is Poisson distributed with parameter λ · t. The probability of observing n failures is thus given by:

L(λ; n, t) = Pr(N(t) = n) ∝ λne−λ·t and we have an expression for the likelihood function L(λ;n, t). The posterior distribution is found by multiplying the prior distribution with the likelihood function:

π(λ|n) ∝ L(λ; n, t) · π(λ) ∝ λne−λ·t · λα−1e−ξλ ∝ λ(α+n)−1e−(ξ+t)λ and we recognize the posterior distribution as a gamma distribution with new parameters α′ = α+n, and ξ′ = ξ + t. The Bayes estimate is given by the mean in this distribution: α + n 5.44 + 1 λˆ = = = 0.78 · 10−6 ξ + t 7.78 · 106 + 525600

1 This could be based on statements from experts, see Øien et.al (1998), or by analysis of similar components (empirical Bayesian analysis). 2 π(λ) ∝ λα−1e−ξλ for the gamma distribution.

150 We note that the maximum likelihood estimator gives a much higher failure rate estimate (1.9 · 10−6), but the “weighing procedure” favours the prior distribution in our example. Generally we could interpret α and ξ here as “number of failures” and “time in service” respectively for the “prior information”. ♢

151 Chapter 12

Forecasting

12.1 Introdcution

Forecasting is the process of making predictions of the future based on past and present data. The term ´prediction’ is a similar, but is a more general term. Both terms refer to formal statistical methods employing time series and other relevant data. The usage of the two terms may differ between areas of application. With respect to supply chain management the term forecasting is used and we are in particular interested in values of a quantity such as demand the next days from now on based on observations in the past. In order to make reasonable forecasts it is essential to have a understanding of the underlying processes generating the variables of interest. In the following we use the notation yt to denote the value of the variable of interest at time t. We will assume that numerical values of yt is know for times in the past and up till the present time. Which value yt will take in the future is unknown for us, and the purpose of this chapter is to develop methods for forecasting. The forecasts are denoted yˆt, where t will be times in the future. Current time is denoted t0. Time series is the basis for forecasting used in this presentation. A time series is a series of data points in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are number of customers each day, number of sales each days, and the daily closing value of the Dow Jones Industrial Average. We usually assume that a time series is an instantiation of a stochastic process of some kind. The various assumptions regarding the underlying stochastic process is crucial for the methods to apply. For each point of time t in the time series we let Yt denote the underlying stochastic variable for that point of time.

12.2 Naïve approach

If we do not have any knowledge regarding the underlying stochastic process one approach is to assume that tomorrow would look like to day, i.e., if t0 is current time

yˆt0+1 = yt0 (12.1) and where the same forecast will be given for all subsequent point of times.

152 12.3 Average approach

If the underlying stochastic process could be described by

Yt = µ + ϵt where µ is a constant and ϵt is a stochastic variable with zero mean it seems reasonable to use time series data up to now to estimate µ and use µˆ as as forecast. Applying the method of moments (MoM) the mean value of the past observations is the estimator for µ and we have: ∑ t0 i=1 yi yˆt0+1 =µ ˆ = (12.2) t0 and where the same forecast will be given for all subsequent point of times.

12.4 Moving average method

An obvious disadvantage with the average approach is that if the model assumptions does not hold. For example if there is a drift in the data we are not interested in including data from first day into our estimate. Assume that the underlying process is described by:

Yt = µ + δt + ϵt where µ still is a constant and ϵt is a stochastic variable with zero mean but where we have added δt with a non zero mean. If δt is reasonable consistent around t0 it seams reasonable to use the last few observations in the forecast: ∑ M y − yˆ = i=1 t0 i+1 (12.3) t0+1 M This approach is called a moving average method for obvious reasons. The choice of M should be based on our understanding of how fast δt is changing over time. The same forecast will be given for all subsequent point of times.

12.5 Simple exponential smoothing method

A range of exponential smoothing methods exist and in this presentation only the simplest of these are presented. The simple exponential smoothing (SES) approach is sometimes referred to as “sim- ple exponential smoothing”. SES is suitable for forecasting data with no trend or seasonal pattern. The SES could be seen as something between the naïve approach where only the last observation is utilized and the average approach were all observations are utilized with equal weights. Forecasts are calculated using weighted averages where the weights decrease exponentially with the time in the past: − − 2 ··· yˆt0+1 = αyt0 + α(1 α)yt0−1 + α(1 α) yt0−2 + , (12.4) where 0 ≤ α ≤ 1 is the smoothing parameter. It may be shown that the setup in Equation (12.4) is equivalent to treat the next forecast as a weighted sum of the most recent observation and the most recent forecast, i.e.,:

yˆt+1 = αyt + (1 − α)ˆyt

153 that is − yˆt0+1 = αyt0 + (1 α)ˆyt0 − yˆt0 = αyt0−1 + (1 α)ˆyt0−1 − yˆt0−1 = αyt0−2 + (1 α)ˆyt0−2 . . where the starting point yˆ1 is not really defined. Several principles exist for initiating the set-up, for example letting yˆ1 = y1. In this presentation we will use this principle. Substituting equations into the following equation gives after some manipulation:

t∑0−1 − j − t0 yˆt0+1 = α (1 α) yt0−j + (1 α) yˆ1 j=0

And by using yˆ1 = y1 we get:

t∑0−1 − j − t0 yˆt0+1 = α (1 α) yt0−j + (1 α) y1 (12.5) j=0

The smoothing parameter α needs to be assessed. In some situations it might be possible to asses the smoothing parameter by experience from similar situations. Another approach is to investigate the current data set we have and see which value of α that performs “best”. To measure what is best we use the sum of the squared errors (SSE):

∑t0 2 SSE = (yt − yˆt) (12.6) t=1 and since yˆt =y ˆt(α) we also have that SSE = SSE(α). To minimize Equation (12.6) with respect to α we may simply test for suitable values, for example α = 0.1, 0.2,..., 0.9.

12.6 Holt’s method

Except for the naïve approach all methods described so far will not perform rather good if there is some kind of trend in the underlying stochastic process. Holt [34] extended simple exponential smoothing to allow forecasting of data with a trend. The following underlying stochastic process is assumed:

Yt = µ + δt + ϵt where µ and δ are unknown constants and ϵ is a stochastic variable with zero mean. Holt’s method involves a forecast equation and two smoothing equations (one for the level and one for the trend):

Forecast equationy ˆt+h = ℓt + hbt

Level equation ℓt = αyt + (1 − α)(ℓt−1 + bt−1) (12.7)

Trend equation bt = β(ℓt − ℓt−1) + (1 − β)bt−1 where ℓt is an estimate of the level of the series at time t, bt is an estimate of the trend (slope) of the series at time t, α is the smoothing parameter for the level (0 ≤ α ≤ 1) and β is the the

154 smoothing parameter for the trend (0 ≤ β ≤ 1). h = 1, 2,... is the number of time steps in the future to forecast. Compared to the underlying stochastic process (Yt = µ + δt + ϵt) we observe that ℓt is the best estimate of µ + δt at time t and bt is the best estimate of δ at time t. As with simple exponential smoothing, the level equation provides a weighted average of obser- vation yt and the within-sample one-step-ahead forecast for time t given by ℓt−1 + bt−1. The trend equation provides a weighted average of the estimated trend at time t based on the last change in level ℓt − ℓt−1 and and the previous estimate of the trend, bt−1. For initiating the procedure we may use ℓ0 = y1 and b0 = y2 − y1. Recalling the underlying stochastic process (Yt = µ + δt + ϵt) an alternative approach is to perform a linear regression on the time series up to now and set ℓ0 =µ ˆ and b0 = δˆ. Figure 12.1 shows MS Excel formulas realizing the equations in (12.7) where alpha=0.8 and beta=0.2:

Figure 12.1: Implementing Holt’s method in MS Excel

To obtain numerical values for α and β we minimize Equation (12.6) with respect to α and β applying the current time series.

12.7 Holt-Winters additive method

The Holt-Winters’ method is an extension of Holt’s method to capture seasonality. There are two versions of the Holt-Winters seasonal method; one with additive effect and one with multiplicative effect. For the additive method the underlying stochastic process is assumed to be on the following form:

Yt = µ + δt + ξt + ϵt where as before µ and δ are unknown constants and ϵ is a stochastic variable with zero mean. ξt is a seasonal term. The parameter m denotes the period of the seasonality, i.e., the number of seasons in a year. For example, for quarterly data m = 4 and for monthly data m = 12. m is assumed to be known. Note that we do not make any parametric assumption regarding ξt. One such parametric form could be η sin(ωt) which could be reasonable for products sold in either the winter or the summer season. The forecasting equations for h time units ahead are:

yˆt+h = ℓt + hbt + st−m+h

ℓt = α(yt − st−m) + (1 − α)(ℓt−1 + bt−1) (12.8) bt = β(ℓt − ℓt−1) + (1 − β)bt−1

st = γ(yt − ℓt−1 − bt−1) + (1 − γ)st−m

Compared to Holt’s method st is the seasonal term for time t and γ is introduced as a smoothing parameter for the seasonal term (0 ≤ γ ≤ 1). Observe that for the level ℓt, the most recent seasonal term for this “period” st−m is subtracted from yt, since ℓt corresponds to µ + δt and not µ + δt + ξt.

155 For initiating the procedure we may use ℓ0 = y1 and b0 = y2 − y1 as for Holt’s method. To initialize the seasonal terms for the first period (year) we may use: ∑ m y s = y − i=1 i (12.9) t t m To optimize performance, i.e., obtaining numerical values for α, β and γ we again have to minimize Equation (12.6) applying the current time series.

12.8 Holt-Winters multiplicative method

In supply chain management the additive method is hardly used because seasonal variation in for example sales would most likely be “proportional” to the baseline volume. Therefore the multi- plicative method is the most used approach. The underlying stochastic process is assumed to be on the following form:

Yt = (µ + δt)ξt + ϵt where the terms are as for the additive method. The forecasting equations for h time units ahead are:

yˆt+h = (ℓt + hbt)st−m+h yt ℓt = α + (1 − α)(ℓt−1 + bt−1) s − t m (12.10) bt = β(ℓt − ℓt−1) + (1 − β)bt−1 yt st = γ + (1 − γ)st−m (ℓt−1 + bt−1)

Observe that for the level ℓt, yt is divided by the most recent seasonal term for this “period” st−m, since ℓt corresponds to µ + δt and not (µ + δt)ξt. For initiating the procedure we may use ℓ0 = y1 and b0 = y2 − y1 as for Holt’s method. To initialize the seasonal terms for the first period (year) we may use:

∑yt st = 1 m (12.11) m i=1 yi To optimize performance, i.e., obtaining numerical values for α, β and γ we again have to minimize Equation (12.6) applying the current time series.

Problem 12.1 The file http://folk.ntnu.no/jvatn/eLearning/TPK4161/SalesTomatoes.xlsx contains sales of tomatoes for a period of some 250 days. Apply the Holt-Winters additive method and the Holt-Winters multiplicative method on the data where m = 6 corresponding to weekdays. Optimize the values of α, β and γ. Make predictions for the following 6 days and compare the results for the additive and multiplicative versions. ♢

Problem 12.2 To be solved without a computer Table 12.1 shows a snapshot for data used in the Holt-Winters additive method. Assume we know the value m = 4. Complete the table for the last m = 4 rows. Perform forecast for the next m = 4 periods. Use α = β = γ = 0.5. ♢

156 Table 12.1: Data for Problem 12.2 yt ℓt bt st :::: 13.5 14.1 0.09 -0.7 17 14.8 0.37 2.3 13 15.2 0.38 -2 16.7 15.2 0.22 1 14.3 . . . 18.1 . . . 14.3 . . . 17.2 . . .

157 Bibliography

[1] Fischhoff, B., S. Lichtenstein, P. Slovic, S.L. Derby, and R.L. Keeney. Acceptable Risk. Cam- bridge University Press, New York, 1981.

[2] Austeng, K. og Hugsted, R.: Trinnvis kalkulasjon, BATEK, 1995.

[3] Klakegg, O.J.: Tidsplanlegging under usikkerheit, BATEK, 1994.

[4] Klakegg, O.J.: Trinnvis-prosessen, Institutt for bygg- og anleggsteknikk NTH, 1993.

[5] Chapman, C. & Ward, S., 1997. Project Risk Management; Processes, Techniques and Insights. John Wiley & Sons, England.

[6] Hokstad, P. Life Cycle Cost Analysis in Railway Systems. SINTEF Report STF38 A98424. ISBN 82-14-00450-0. 1988.

[7] IEC 60300. International Standard, IEC 60300-3-3, Dependability management - Part 3: Ap- plication Guide - Section 3: Life Cycle Costing

[8] Kawauchi, Y. and Rausand, M. Life Cycle Cost (LCC) analysis in oil and chemical process industries. NTNU report . 1999.

[9] Keeney, R. L. and H. Raiffa. Decisions with Multiple Objectives: Preference and Value Trade- offs. New York: Wiley. 1976.

[10] Vatn, J. 1998. A discussion of the acceptable risk problem. Reliability Engineering and System Safety, 61(1-2):11-19, 1998.

[11] Øien, K, P.R. Hokstad, and R. Rosness. 1998. Handbook for performing Expert Judgement. SINTEF report STF38 A98419. Could be ordered from http://www.sintef.no.

[12] Aven, T. Foundations of risk analysis. Wiley, West Sussex, 2003.

[13] IEC 60300-3-9. International Electrotechnical Vocabulary (IEV) - Dependability management - Part 3: Application guide - Section 9: Risk analysis of technological systems. International Electrotechnical Commission, Geneva, 1995.

[14] Kaplan, S (1997): The Words of Risk Analysis. Risk Analysis 17(4), 407-417.

[15] Klinke, A. and Renn, O. (2001): Precautionary principle and discursive strategies: classifying and managing risk. Journal of Risk Research 4 (2), 159-173.

[16] Needleman, K. Methods of valuing life. In Technological Risk. University of Waterloo, 1982

158 [17] ESA. 1991 European Space Agency - ESA, Expert Judgment, Requirements and Methods, PSS-01-405, Issue 1 Draft 1-Nov. 1991, Noordwijk, the Netherlands.

[18] Cooke, R., Experts in Uncertainty - Opinion and Subjective Probability in Science, Oxford University Press, 1991.

[19] Gossens, L.H.J., Cooke, R.M., van Steen, J.F.J., Expert Opinions in Safety Studies, Vol. 1: Final Report, TUDelft, 1989.

[20] van Steen, J.F.J., Oortman Gerlings, P.D., Expert Opinions in Safety Studies, Vol. 2: Litera- ture Survey Report, TUDelft, 1989.

[21] Hokstad, P. and Øien, K., Expert Judgment - For Assessing Input Data to a Small/medium Size Reliability Study, Presentation at the Growth Point Centre Group Meeting, Trondheim 1994-08-25, STF75 S94018.

[22] Hokstad, P., Reinertsen, R. and Øien, K., Recommendations on the use of expert judgment in safety and reliability engineering studies. Recommendations on the use of expert judgment in safety and reliability engineering studies. Two offshore case studies Reliabiliity Engineering and System Safety 61 (1998) 65-76. case studies. To appear in Reliability Engineering and System Safety, 1997.

[23] Klakegg, O. J., The stepwise process (In Norwegian), Institutt for bygg- og anleggsteknikk, NTH, 1993.

[24] Hauge, S. and Tor Onshus, T (2004). Reliability Data for Safety Instrumented Systems – PDS Data Handbook, 2010 Edition. SINTEF report SINTEF A13502. Trondheim 1994.

[25] Austeng, K. og Hugsted, R., Stepwise calculation (In Norwegian), Institutt for bygg- og an- leggsteknikk, NTH, 1995.

[26] The PS 2000 project team (Hokstad, P., Klakegg, O.J., Rosness, R. and Øien, K.)

[27] Kahneman, D., Slovic, P. and Tversky, A., Judgment under Uncertainty. Heuristics and Biases. Cambridge, Cambridge University Press, 1982.

[28] Svenson, O., On expert judgments in safety analysis in the process industries. Reliability Engineering and Systems Safety, 25, 219-256, 1989.

[29] Arentz, C., Bakken, J., Kilde, H.S., Klakegg, O.J., Krogh, J., Competence as a steering parameter - The basis for development (in Norwegian), NTH Rapport NTH 95010, PS2000, 1995.

[30] Norman Dalkey, Olaf Helmer (1963) An Experimental Application of the Delphi Method to the use of experts. Management Science, 9(3), Apr 1963, pp 458-467

[31] Saaty, T.L. 1980 The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allo- cation, ISBN 0-07-054371-2, McGraw-Hill

[32] Archer, N. and Chasemzadeh, F. (2004) Project Portfolio Selection and Management. In The Wiley Guide to Managing Projects Morris, P and Pinto, J.K. (Editors). Willey. ISBN: 978-0- 471-23302-2

159 [33] Phillips, DT., Ravindran, A. and Solberg, J. (1976) Operations research: Principles and prac- tice. John Wiley&Sons. New York.

[34] Holt, CC. (1957) Forecasting seasonals and trends by exponentially weighted moving averages. International Journal of Forecasting. 20(1), pp 5 - 10”

[35] Wagner, HM. and Whitin, TM. (1958) Dynamic version of the economic lot size model. Man- agement Science, 5, pp 89 - 96

[36] Høyland, K. and Wallace, SW. (2001) Generating Scenario Trees for Multistage Decision Prob- lems. Management Science, 47(2), pp 295 - 307

[37] Bjerke, MM. and Bykvist, HR. (2018) Managing Supply Uncertainty in the Operational Pro- duction Planning in a Whitefish Value Chain - A Stochastic Programming Approach. Master thesis. NTNU - Norwegian University of Science and Technology.

[38] Birge, JR. and Louveaux, F. (2011). Introduction to Stochastic Programming. Springer Inter- national Publishing, second edition.

160 Index

α-percentiles, 13 Distribution Maximum values, 22 Activity on arc, 75 Product, 22 Activity on node, 75 Sums, 21 Ageing parameter, 121 Double expectation, 13 Arrival process, 95 Dynamic programming, 52 Availability, 113 Effective failure rate, 121 Back-order, 108 eMax function, 81 Basic solution, 39 EOQ model, 105 adjacent and feasible, 40 Erlang distribution, 16 feasible, 39 Event, 6 Basic variable, 39 Expectation, 12 Bayes theorem, 9 Expert judgement, 21 Bayesian methods, 150 Explanaotry variable, 148 Big M method, 41, 43 Exponential distribution, 15 Binary variables, 48 Extensive form, 62 Binomial distribution, 17 Birth-death processes, 93 Failure rate, 113 Blackwell-Girshick equation, 111 effective, 121 Branch and bound algorithm, 48 function, 114 Forecasting, 152 Canonical system, 39 Average approach, 153 Cause and effect, 25, 27 Holt’s method, 154 Central limit theorem, 22 Holt-Winters additive method, 155 Chapman-Kolmogorov equations, 89 Holt-Winters multiplicative method, 156 Complementary event, 7 Moving average method, 153 Conditional probability, 8 Naïve approach, 152 Critical Path Method, 76 Seasonal variation, 155 Critical path method, 76 Simple exponential smoothing, 153 Cumulative distribution function, 10 Trend, 154 Cyber physical system, 124, 125 Free float, 77, 78

Decision trees, 139 Gamma distribution, 16 Decision variable, 37 Design matrix, 148 Independent events, 8 Deterministic equivalent problem, 62 Input state, 52 Digital twin, 124 Internet of things, 124 Disjoint events, 7 Intersection, 7

161 Inventory model, 105 Objective function, 26, 27, 37 deterministic situation, 105 Output state, 52 Newsboy problem, 106 Inventory model:lot size, reorder point Parameter policy, 108 Estimation, 145 Inverse-Gauss distribution, 18 Penalty for default, 85 Inverted gamma distribution, 16 PERT Distribution, 19 Job assignment problem, 71 Method, 79 Pivot operation, 39 Likelihood function, 146 Pivotal Decomposition, 117 Linear programming, 37 Poisson distribution, 17 Solving by MS Excel, 44 Pollaczek-Khinchine formula, 103 standard form, 38 Posterior distribution, 150 Little’s formula, 97 Precision, 13 Lognormal distribution, 17 Predictive maintenance, 123 Lost sale penalty, 108 Preventive maintenance, 120 Lost sales, 108 Priror distribution, 150 LP, see Linear programming Probability, 6 LS principle, 147 Conditional, 8 Density function, 11 Maintenance, 113 Total, 8 Markov chain Program Evaluation and Review Technique, embedded, 103 76 Markov process Program Evaluation and Review Technique steady state solution, 91 (PERT), 79 time dependent solution, 90 Queue theory, 95 Markov processes, 88 Markov state equations, 90 Regression variable, 148 Maximal-flow problem, 72 Reliability, 113 Maximum likelihood principle, 20 Reliability block diagram, 115 Mean residual life, 114 parallel structure, 115 Mean time to failure, 113, see MTTF serial structure, 115 Median, 12 Reorder point, 108 Memoization, 56 Residual, 149 Method of moments, 146 Return function, 52 Minimal cut set, 117 Minimal path set, 117 Service process, 95 Mixed integer programming, 48 Shadow price, 43 MLE, 145 SIMPLEX method, 39 Mode, 12 Slack variable, 39 Model parameter, 27 Sojourn time, 88 Monte Carlo simulation, 76, 83 Standard deviation, 13 MTTF, 121 State variable, 115 Stationary transition probabilities, 88 Newsboy problem, 106 Stochastic programming, 61 Nonlinear programming, 59 discretization, 62 Normal distribution, 14 expected value of perfect information, 65

162 scenario building, 66 Triple estimate, 21 value of the stochastic solution, 64 Two stage stochastic programming Stochastic variable, 10, 26 formulation, 62 Stock-out, 109 cost, 108 Uncertainty Structure function, 115 Decision under, 130 Successive schedule planning, 76, 80 Union, 7 Surplus variable, 39 Utility Survival probability, 113 Maximising expected, 136 Synchronization of maintenance and Utility function, 132 production, 122 Variance, 13 varMax function, 81 Tableau, 40 Venn diagram, 6 Time series, 152 Visit frequency, 91 Total float, 77, 78 Total probability, 8 Wald’s formula, 111 Transportation problem, 69 Weibull distribution, 15 Triangular distribution, 19 Used in maintenance models, 121

163