<<

Universita` degli Studi di Perugia

Facolta` di Scienze Matematiche, Fisiche e Naturali

Corso di Laurea Triennale in Informatica

The Lambert W on matrices

Candidato Relatore MassimilianoFasi BrunoIannazzo

Contents

Preface iii

1 The 1 1.1 Definitions...... 1 1.2 Branches...... 2 1.3 Seriesexpansions ...... 10 1.3.1 and the Lagrange Inversion Theorem. . . 10 1.3.2 Asymptoticexpansions...... 13

2 Lambert W function for scalar values 15 2.1 Iterativeroot-findingmethods...... 16 2.1.1 Newton’smethod...... 17 2.1.2 Halley’smethod ...... 18 2.1.3 K¨onig’s family of iterative methods ...... 20 2.2 Computing W ...... 22 2.2.1 Choiceoftheinitialvalue ...... 23 2.2.2 Iteration...... 26

3 Lambert W function for matrices 29 3.1 Iterativeroot-findingmethods...... 29 3.1.1 Newton’smethod...... 31 3.2 Computing W ...... 34 3.2.1 Computing W (A)trougheigenvectors ...... 34 3.2.2 Computing W (A) trough an iterative method . . . . . 36

A Complex numbers 45 A.1 Definitionandrepresentations...... 45

B Functions of matrices 47 B.1 Definitions...... 47

i ii CONTENTS

C Source code 51 C.1 mixW(, ) ...... 51 C.2 blockW(, , ) ...... 52 C.3 matW(, ) ...... 53 Preface

Main aim of the present work was learning something about a not- so-widely known special function, that we will formally call Lambert W function. This function has many useful applications, although its presence goes sometimes unrecognised, in and in physics as well, and we found some of them very curious and amusing. One of the strangest situation in which it comes out is in writing in a simpler form the function .. z. h(z)= zz whenever it makes sense, that has been proven to be equal, wherever it converges, to the more elegant form

W ( log(z)) − . log(z) −

The most interesting aspect of our function is probably that there does not exists an explicit expression for it, but its inverse has an easy and elegant definition, within very good regularity properties.

The fact makes the question challenging and, then, quite fun. And we had a lot of fun indeed, studying the scalar case, creating and drawing fractals, looking at them and performing a wide variety of numerical tests. We ended the first part of our work developing a new algorithm, slightly faster than the one we studied, that is implemented by the function lambertw(b,a) of Octave.

We did not get a complete satisfaction, since we did not manage in proving that our algorithm and the one we started from were conver- gent, as we – and many other people – suspected, but we got happy

iii iv CONTENTS enough when we found that experimental results confirmed our hy- pothesis.

Since we were having too much fun, we turned our attention to the matrix case, more enveloped and then more suitable to rack our brains. We had indeed hard times, fighting against unstable and non- convergent algorithms. Nevertheless, at a certain point things changed, and we found a stable and convergent, even though slow, algorithm for computing the Lambert W function of matrix argument. We showed also that the currently used matrix algorithm had some defects, that the one we proposed did not have, even though is still less accurate in some cases.

At the end of our work we felt a little bit more familiar with W , so that now among us we call it, very informally, the Lambert. Chapter 1

The Lambert W function

1.1 Definitions

The Lambert W function is a multivalued complex function defined, for each x C, as the solution of the ∈ W (x)eW (x) = x, x C, (1.1) ∈ or, in a certain sense, as the inverse of the function f : C C, defined by x xex. → 7→ It should be noted that Equation (1.1), assuming x = 0, can be rewritten as 6

W (x) + log(W (x)) = log(x), x C (1.2) ∈ where we consider log(x) multivalued and fix the value of log(W (x)) by cutting the z-plane as we will see later. Note that (1.2) is the defining equation of the so-called Wright ω function [GLM99]. We will use the letter W for this function, following E. M. Wright usage (e.g. [Wri59]), that became a standard after the publication of [CGH+96], and will call it the Lambert W function because it is the of a special case of Euler’s version of Lambert’s series solution of the trinomial equation x = q + xm, where q and m is a positive . The relationships between that serie∈s expansion and our function have been deeply investigated by Corless et al. in [CGH+96].

1 2 CHAPTER 1. THE LAMBERT W FUNCTION

1.2 Branches

Since the Lambert W function is multivalued, choosing a convention for naming branches and branch points is mandatory, hence we will present the notation we will use in this work, recalling some basic concepts about multivalued functions, branches and branch points. Let us consider a function f : C C. We can create two planes, a z-plane for the domain space and a→w-plane for the range one. Thus we can view f(z) as a mapping from the z-plane to the w-plane, and in order to understand how such mapping works, we analyse how various geometric in the z-plane are mapped in the w-plane by w = f(z). To get a deeper insight, let us consider one of the simplest non- trivial complex functions, the p-th root function defined by

w = zp, p> 0. (1.3)

From the polar decomposition of a we get that

w = zp = zpepiθ (1.4) and then

w = z p, (1.5) | | | | arg(w)= p arg(z). (1.6)

Equation (1.5) shows that the circle z = ρ0 in the z-plane is mapped to the circle w = ρp, while equation| | (1.6) that a ray arg(z) = θ | | 0 0 issuing fom the origin in the z-plane is mapped to a ray arg(w)= pθ0 in the w-plane. In other words, as in the z-plane z moves in the positive direction at constant angular velocity around the circle of a radius of ρ, w moves in the w-plane around the circle of a radius of ρp, in the same direction but at p times the angular velocity (see Figure 1.1). Note that as in the z-plane z traverses the ray from 0 to at a constant speed, instead, w traverses the image ray in the same∞ direction but at an increasing speed. The positive real axis in the z-plane, that is a ray with angle 0, is mapped to the positive axis of the w-plane by the usual rule x xp. Turning back to the problem of finding an for7→ w = zp, we want to remark that every point w = 0 is hit by exactly p distinct 6 1.2. BRANCHES 3

(a) z-plane (b) w-plane

Figure 1.1: The mapping of two vectors under w = zp for p = 2. values of z, the p p-th roots of z, then, in order to define an inverse function, we must restrict the domain in the z-plane so that each value w is hit by only one value of z: there are several ways of doing this, so we will proceed somewhat arbitrarily. Note that as rays sweep out an open sector of 2π/p of the z-plane, with the angle increasing from π/p to π/p, the image rays sweep out the entire w-plane, except for the− negative real axis, with the angle of the rays increasing from π to π. Thus we can draw a branch cut in the w-plane along the negative− real axis, from to 0, and define in that range an inverse function we will call the −∞ of the p-th root function, and whose value is the unique p-th root lying in the aforementioned sector. The function we have just described is not the only continuous inverse function of w = zp that we can define, since for each sector Sk of the circle swept out by θ, where π π (2k 1) <θ< (2k + 1), k N, k

First of all, we put w = W (z) and z = wew, and then specify the boundary curves that partition the w-plane and their mapping to the z-plane. If we put w = ξ + iη, (1.8) z = x + iy, (1.9) by equating (1.9) and z = wew, we get x =eξ(ξ cos η η sin η), (1.10) − y =eξ(η cos η + ξ sin η). (1.11) Note that in that case the w-plane maps onto the z-plane, while for the p-th root it was the z-plane that mapped onto the w-plane. That change of notation is because of the standard usage, and is not our arbitrary choice. Now let us impose the z-plane branch cuts for W to be similar to that for the w-plane of the p-th root function (and, more properly, of the ), by finding where the axis z C Im(z) = 0, Re(z) < 0 is mapped by the W function. If we put{ y∈= 0| in (1.11), we get } 0=eξ(η cos η + ξ sin η) 0= η cos η + ξ sin η ⇔ that holds when cos η η =0 or ξ = η . (1.12) − sin η We can draw in the w-plane the curves that verify the second equa- tion in (1.12), obtaining the plot in Figure 1.2, where regions have been numbered conveniently. We will define the k-th branch of W as the one that takes values in region k, and will denote it by Wk. The curves that partition the w-plane are the inverse images of the negative real axis of the z-plane under the map w wew and can be described anlitically: 7→ the that separates the values of the principal branch, namely W0 (Figures 1.3 and 1.4), from the values of W−1 (Figures 1.5 and 1.7) and W1 (Figures 1.5 and 1.6), is w C, w = η cot η + iη : π<η<π,η =0 1 , (1.13) { ∈ − − 6 }∪{− } the curve separating W−1 and W1 is ( , 1], and the curves sepa- rating the remaining branches of W are−∞ − w C, w = η cot η + iη : 2kπ<η< 2(k + 1)π, η =0, k Z { ∈ − − 6 ∈(1.14)} 1.2. BRANCHES 5

Branch k = 2

Branch k = 1

Principal Branch k = 0

Branch k = −1

Branch k = −2

Figure 1.2: The ranges of the branches of W (z). A number is given to each branch.

Now, to show that each one of those regions maps bijectively onto the z-plane, we will use Theorem 1.2.1, a special case of the theorem that holds when domain and range of the mapping have the same dimension. We need to give one more definition before enunciating it.

Definition 1.2.1 (Jacobian Matrix). Let f : Rn Rn be contin- I ⊂ → uously differentiable, and let f1,...,fn be such that

f1(x) . f(x)= . , x Rn. (1.15)   ∈ fn(x)   6 CHAPTER 1. THE LAMBERT W FUNCTION

ℑ(w)

ℜ(w)

Figure 1.3: The branch cut for W0(z). Both the heavy solid and dashed lines are the images of the same kind of lines in Figure 1.4, and give indica- tion about closure, as specified in Figure 1.4.

Then the Jacobian matrix of f in x is defined as the matrix ∈I ∂f1 ... ∂f1 ∂x1 ∂xn . . . Cn×n Jf (x)= . .. . (1.16)   ∈ ∂fn ... ∂fn  ∂x1 ∂xn  Theorem 1.2.1 (Inverse Function Theorem) . Let Ω Rn be an open set, let f : Ω Rn be a continuously differentiable function,⊂ and let S = f(Ω) = →y Rn : y = f(x),x Ω . If, for some point a Ω, it { ∈ ∈ } ∈ holds that det(Jf (a)) = 0, then there is a uniquely defined function g and two open sets X 6 Ω and Y S such that ⊂ ⊂ 1. f : X Y is bijective; → 2. g is continuously differentiable on Y and g(f(x)) = x for each x X ∈ 1.2. BRANCHES 7

π ℑ(w)

ℜ(w) −2 2 4

−π

Figure 1.4: The range of W0(z). The heavy solid line indicates closure, the heavy dashed one that points on the edge do not belong to the region. The tight dashed line to the right of the imaginary axis is the image of the imaginary axis in Figure 1.3, the dotted one is the image of the circle in Figure 1.3.

Let us calculate the Jacobian determinant of the transformation defined by (1.10) and (1.11), considered as a mapping from R2 to R2.

The Jacobian matrix is

∂x ∂x ∂ξ ∂η Jf = ∂y ∂y = ∂ξ ∂η ! (1.17) ξ cos η η sin η + cos η ξ sin η sin η η cos η eξ , η cos η −+ ξ sin η + sin η ξ<− cos η− η sin−η ξ cos η  − −  8 CHAPTER 1. THE LAMBERT W FUNCTION

ℑ(w)

ℜ(w)

Figure 1.5: The branch cut for W (z), k = 0. The heavy solid and dashed k 6 lines have the same meaning they had in Figure 1.4. thus the Jacobian determinant of the transformation is 2ξ 2 det(Jf ) =e ((ξ cos η η sin η + cos η) +(η cos η + ξ sin η + sin η)) =e2ξ (((ξ + 1)cos− η η sin η)2 + ((ξ + 1)sin η + η cos η)2) =e2ξ (ξ + 1)2(cos2 η−+ sin2 η)+ η2(cos2 η + sin2 η) =e2ξ((ξ + 1)2 + η2).  (1.18) Now we are ready to prove that the Jacobian is nonzero everywhere except at a point that will be the :

det(J )=0 e2ξ((ξ + 1)2 + η2)=0 f ⇔ (ξ + 1)2 + η2 =0 ⇔ ξ =1,η =0. (1.19) ⇔ Equation (1.19) shows that the choices we made for the branch cut and for the region of the w-plane are consistent, since we have just 1.2. BRANCHES 9

ℜ(w) −6 −4 −2 2

ℑ(w)

Figure 1.6: The range of W1(z). The heavy lines have the usual meaning. The light solid line and the tight dashed one are the images of the positive real axis and of the imaginary one respectively. The light dotted, dashed and dot-dashed lines are the images of circles and semicircles in Figure 1.5.

proven that there exists a between each region of the w-plane and the z-plane minus ( , 1). −∞ −

As shown in Figure 1.2, W0 is the only branch that contains any part of the positive real axis, and is called the principal branch of the Lambert W function. The real values of the negative real axis lower or equal than 1 are in the range of W−1, and that’s the reason why these two branches− are called the real branches of W (see Figure 1.8).

More informations about closure and branch points can be found in the captions of Figures 1.3, 1.4, 1.5, 1.7 and 1.6. 10 CHAPTER 1. THE LAMBERT W FUNCTION

ℑ(w)

ℜ(w) −6 −4 −2 2

−3π

Figure 1.7: The range of W−1(z). All the lines have the same meaning as in Figure 1.7.

1.3 Series expansions

1.3.1 Taylor series and the Lagrange Inversion Theorem. The simplest Taylor series for W is the series expansions about the origin, that we will derive now, using the Lagrange Inversion Theorem. Even thought it is not widely known, it represents a powerful method for finding all terms in a reverted series, so we shall enunciate it. Since there are many different forms of this theorem for special cases, we will use the following version, that provides a formula for computing the reversion series of the Lambert W function in a neigh- borhood of 0.

Theorem 1.3.1 (Lagrange Inversion Theorem). Let C be an open neighborhood of 0 and let f : C be such that fUis ⊂ analytic with f(0) = 0. Then there is an openU → neighborhood of 0 and an analytic 6 V 1.3. SERIES EXPANSIONS 11

1

−1 1 2 3

−1

−2

−3 W 0 W 1 −4

Figure 1.8: The two real branches of W (x). function g : C that satisfies V → g(x)= xf(g(x)), x . (1.20) ∈V

Furthermore, the theorem provides a direct formula for computing the coefficients of the reversion series. Indeed it holds that

κ(n 1,f n) κ(n,g)= − for n N (1.21) n ∈ where we denote by κ(n,f) the coefficient of xn in the power series expansion of f(x) at 0.

If we put f(y)=e−y in (1.20), it becomes g(x)= xe−g(x) and then x = g(x)eg(x), that means g is a determination of the Lambert function. 12 CHAPTER 1. THE LAMBERT W FUNCTION

In order to apply Theorem (1.3.1) we need to calculate κ(n,g). We get ∞ ∞ ( y)ν ( 1)ν f(y)= e−y = − = − yν, ν! ν! ν=0 ν=0 X∞ X∞ ( yn)ν ( n)ν f n(y)= e−yn = − = − yν, ν! ν! ν=0 ν=0 X X ( n)ν κ(ν,f n)= − , ν! κ(n 1,f n) ( n)n−1 κ(n,g)= − = − . (1.22) n n! Now, using (1.22), we can write down the series expansion of a determination of W about 0. Indeed, ∞ ∞ ( n)n−1zn W (z)= κ(n,g)zn = − (1.23) n! n=1 n=1 X X and since W (0) = 0, we find that W (z)= W0(z). Let us find the of (1.23). Applying the we get

n+1 (n+1)n n+1 an+1 ( 1) (n+1)n! z lim = lim − − →∞ →∞ nn 1 n an n ( 1)n zn n! − − ( 1)(n + 1)n 1z = lim − n→∞ nn−1

n−1 1 = lim 1+ z n→∞ − n   n−1 1 1 1+ n = lim 1+ 1 z n→∞ − n 1+   n 1 n 1 = lim 1+ z n→∞ − n 1+ 1   n

= ez . (1.24) |− | Thus, the series (1.23) converges for z such that ez < 1, that is z 1/e, and then we can conclude that the radius|− of convergence| of | |≤ −1 the Taylor series expansion of W0(z) around 0 is e . 1.3. SERIES EXPANSIONS 13

1.3.2 Asymptotic expansions The discussion about the asymptotic expansions is quite pedantic and, in our opinion, not very useful according to the purpose of that work. Nevertheless, we will use these expansions later to determine a good initial guess for some iterative methods, thus we will briefly1 give these series. The asymptotic expansions of the Lambert W function at both 0 and coincide and the common value is, correcting a typographical error∞ in equation (4.20) of [CGH+96],

Wk(z) = log z +2πik log(log z +2πik) ∞ ∞ − m −j−m + cjm log (log z +2πik)(log z +2πik) . (1.25) j=0 m=1 X X Here log z +2πik is the k-th branch of the complex logarithm and

1 k + m c = ( 1)k (1.26) km m! − k +1   k + m where is a Stirling cycle number (see [JCHK95] for further k +1 details about them). If we put p = log z +2πik, we get the shorter form

∞ ∞ W (z)= p log(p)+ c (logm p)(p)−k−m. (1.27) k − jm j=0 m=1 X X We will use the first two terms of that formulation of the series expansion in the next chapter. Another asymptotic series expansion we will use later is the one about the branch point:

∞ W (z)= µ pℓ , k = 1, 0, 1 (1.28) k ℓ k − Xℓ=0 1See [CGH+96] for a detailed proof and [dB81] for further explanations about the asymptotic expansions. 14 CHAPTER 1. THE LAMBERT W FUNCTION where µℓ can be computed up to any order using the recurrence

k 1 µ − α − α µ − µ = − k 2 + k 2 k k 1 , µ = 1, µ =1 (1.29) k k +1 2 4 − 2 − k +1 0 − 1   α = k 1µ µ − , α =2, α = 1 (1.30) k − j k+1 j 0 1 − j=2 X that converges for p < √2 . | | | | For W0 we will put p0 = 2(ez + 1), while for W−1(z) with z such that Im(z) 0 and W (z) with z such that Im(z) < 0 we will use ≥ 1 p p− = p = 2(ez + 1). 1 1 − p Chapter 2

Lambert W function for scalar values

In this chapter we will present some ways to determine the values of W for both real and complex arguments. Since we will use various kind of iterative methods, we will start recalling some basic concepts about fixed-point iterations, and will continue describing the K¨onig family 1 of rational iterations, to which Newton’s and Halley’s methods belong. In order to be more precise about what we are going to talk about, let us clarify that we are only interested in iterative root-finding algo- rithms, standard methods for calculating the value of implicitly defined functions. We will call root of a function f, those values α such that f(α) = 0. The multiplicity of a root α is defined to be ν(α) = n N if f C(r+1)( α ) and if there exists ℓ such that ℓ< , ℓ = 0 and∈ ∈ { } ∞ 6 f(x) lim | | = ℓ. (2.1) x→α x a n | − | In such case it holds that f (i)(x)=0, for o i r, (2.2) ≤ ≤ f (r+1)(x) =0. (2.3) 6 The convergence speed is one of the crucial points when choosing an algorithm for solving a certain problem. Thus, we will recall, to end this introduction, the definition of order and rate of convergence. 1This family of iterations is known also as Householder’s method, since the American mathematician was one of the first to study it in [Hou06]. We will follow early Buff and Henriksen use and call it K¨oenig family, as they did in [BH03].

15 16 CHAPTER 2. LAMBERT W FUNCTION FOR SCALAR VALUES

Definition 2.0.1 (Rate and order of convergence). Let xi i∈N be a that converges to α such that, for all i N, it{ holds} that x = α. If there exists p 1 such that ∈ i 6 ≥

xi+1 α 0 <γ< 1 if p =1 lim | − | = γ with γ = (2.4) i→∞ x α p γ > 0 if p> 1 | i − |  then we will define p and γ to be respectively the order and rate of convergence of the sequence2.

2.1 Iterative root-finding methods

An iterative method constructs a sequence x ∈N whose form is { n}n x = g(x ) k+1 k (2.5) x = η  0 where g : C is a and η . The pointsz ˜ U ∈such→ that U z ˜ = g(˜z), to which the iteration∈ converges, U are defined∈ to U be the fixed points of the iteration (2.5). If the function g(z) is differentiable up to a sufficiently high order, then the order of convergence of g toz ˜ depends on the of g atz ˜. It must be stressed that, when these methods converge, the conver- gence to the required fixed point is local. Thus finding a good initial value for x0, is a quite challenging problem in developing algorithms for approximating the solution of an equation. Besides, if g has more than one root (as the Lambert W function has) we might wish to understand which initial values converge to which root.

2This is a second level division. Note that for p forced to be 1, the sequence is said to converge • superlinearly if γ = 0, • linearly if 0 < γ < 1, • sublinearly if γ = 1, but since the most interesting methods are those that belong to the first class, the order of convergence has been introduced to make easier to classify those interesting algorithms, and decide which ones have the higher order of convergence among them all. 2.1. ITERATIVE ROOT-FINDING METHODS 17

2.1.1 Newton’s method Since there exist some little differences whether we are working with real or complex numbers, we will start presenting how the method works with real valued functions of real variable. Let R be a closed , f : R such that f C2( ) and f(α) =I 0 ⊂ for α . Suppose that x I → is an approximation∈ I of α. ∈ I 0 ∈ I Using the Taylor about x0 , for h such that x0 + h , we get ∈I ∈I ′ 2 f(x0 + h)= f(x0)+ hf (x0)+ O(h ). (2.6)

We are looking for a value h such that x0 + h is a better approxi- mation of α. In order to find h we put f(x0 + h) = 0, and using the ′ ′ fact that f(x + h) f(x0)+ hf (x0), assuming that f (x0) = 0, we can write ≈ 6 f(x0) h = ′ (2.7) −f (x0) and then f(x0) x0 + h = x0 ′ . (2.8) − f (x0) Equation (2.8) gives us a formula to get a better approximation of α, if we already have a good one (where “good” could mean that our initial value x0 belongs to the region of local convergence of the method to α). Using this idea, Newton’s iteration is defined as

f(xk) xk+1 = xk ′ , − f (xk) (2.9) x =α, ˜  0 whereα ˜ is an initial guess of the root, but which need not be necessarily near the root to have convergence. To get a deeper insight, we will now enunciate a couple of theorems useful to better understand the potential of the that method. Theorem 2.1.1 (The order of convergence of Newton’s method). Let f C1([a,b]) and α [a,b] be such that f(α)=0, and let us suppose that∈ f ′(x) =0 for all∈x [a,b] α . Then 6 ∈ \ { } 1. if ν(α)=1, the method is locally convergent, and its order of convergence is at least 2. Inter alia, it is exactly 2 when f C2([a,b]) and f ′′(α) =0; ∈ 6 18 CHAPTER 2. LAMBERT W FUNCTION FOR SCALAR VALUES

2. if moreover f Cr([a,b]) and ν(α) = r 2, then the method converges linearly.∈ ≥ Theorem 2.1.2 (Convergence of Newton’s method in an interval). Let f : R R and let α be such that f(α)=0. Let us suppose that I ⊂ → f C2(S), where S =[α,α + ρ] or S =[α ρ,α] with ρ> 0, • ∈ − f(x)f ′′(x) > 0 for all x S α , • ∈ \ { } f ′(x) =0 for all x S α . • 6 ∈ \ { } Then for x S α it holds that lim →∞ x = α. 0 ∈ \ { } i { i} As we said before, Newton’s method can be applied to find the roots of complex functions. Let indeed f : Ω C, where Ω C, and let α C be a root of f, that is f(α) = 0, we→ define basin of⊂ attraction of α ∈the region in which every point can be chosen as initial value for a sequence converging to α. The boundary of any basin is the so-called Julia set, that, for many complex functions, is a fractal. It is worth pointing out that it may happen that a region of the may also not belong to any of those basins of attraction, as showed in Figure 2.1 for the Lambert W function.

2.1.2 Halley’s method Let R be a closed interval, f : R such that f C3( ), α such I ⊂ I → ∈ I that f(α) = 0 and x0 an approximation of α. Using the Taylor expansion about x ∈, for I h such that x + h , we get 0 ∈I 0 ∈I h2 f(x + h)= f(x )+ hf ′(x )+ f ′′(x )+ O(h3). (2.10) 0 0 0 2 0

We are looking for a value h such that x0 + h is a better approxi- mation of α. We could be tempted to try to find h, as we did in the previous case, applying the quadratic formula to (2.10), and getting a quite hard-to-manage formula, investigated by Gordon and Eschen in [GVE90]. We will follow a different direction which will lead to Halley’s method. Now, putting x0 + h = x, that means h = x x0, we can rewrite (2.10) as − (x x )2 f(x) f(x )+(x x )f ′(x )+ − 0 f ′′(x ) (2.11) ≈ 0 − 0 0 2 0 2.1. ITERATIVE ROOT-FINDING METHODS 19

(a) a =1 (b) a = i

(c) a =0.1 (d) a =0.1i

Figure 2.1: Basins of attraction for the Newton’s method. where we can put f(x) = 0, since we are interested to find an approxi- mation of a root of f better than x0. So (2.11) becomes

f(x0) 0= x x0 + − (2.12) − ′ (x x0) ′′ f (x0)+ 2 f (x0) and then f(x0) x = x0 − , (2.13) − ′ (x x0) ′′ f (x0)+ 2 f (x0) where substituting h from (2.7) we get

f(x0) x = x0 , (2.14) ′ f(x0) ′′ − f (x0) ′ f (x0) − 2f (x0) 20 CHAPTER 2. LAMBERT W FUNCTION FOR SCALAR VALUES and simplifying a little ′ 2f(x0)f (x0) x = x0 , (2.15) − 2[f ′(x )]2 f(x )f ′′(x ) 0 − 0 0 that can be considered a good (since rational for ) formula. The iteration derived from this formula is called Halley’s iteration. For sake of symmetry with (2.9) we shall write

′ 2f(xk)f (xk) xk+1 = xk ′ 2− ′′ − 2[f (xk)] f(xk)f (xk) (2.16) ( x0 =α ˜ whereα ˜ is a suitable initial value. Note that if we set 2f(x)f ′(x) h(x)= x (2.17) − 2[f ′(x)]2 f(x)f ′′(x) − by a straightforward computation, assuming that the function f is sufficiently differentiable, we get that h′(α)= h′′(α) = 0, where α is a root of f of multiplicity 1, that let us classify this method as a third order one. As the previous one, this method can be generalized to complex valued functions of complex variable, and as shown in Figure 2.2, seems to have better basins of attraction.

2.1.3 K¨onig’s family of iterative methods Those rational methods, that we found directly, belong to a more gen- eral family of rational iterations, that we will describe below. Let3 σ N, f : C C be a function such that f Cσ( ), α such∈ that f(α)I = ⊂ 0. Then→ the K¨onig method of order σ∈ appliedI ∈I to the function f, denoted by Kf,σ(z), is the following recurrence

1/f(z)σ−2 z = z +(σ 1) − , k+1 k − 1/f(z)σ 1 (2.18) ( x0 =α, ˜ whereα ˜ is the initial value. It can be also proven that, if ν(α) = 1, then the order of convergence of the method is exactly σ. For some general result on that family of rational iterations see [Hou70, BH03].

3Householder’s method can be obtained by substituting σ with d +1 2.1. ITERATIVE ROOT-FINDING METHODS 21

(a) a =1 (b) a = i

(c) a =0.1 (d) a =0.1i

Figure 2.2: Basins of attraction for Halley’s method.

Putting σ = 2 in (2.18), we get (2.9), while putting σ = 3 we get (2.16). Methods of higher order are seldom used, since they require to calculate higher order derivatives of f (and 1/f), and a alrger number of arithmetic operations, in the general case. Nevertheless, we are in a quite lucky case, having an explicit formula to calculate the n-th of f(x)= xex a, namely, −

∂nf =(x + n)ex, n N,n> 0, (2.19) ∂xn ∈ hence we will test a fourth-order method too. 22 CHAPTER 2. LAMBERT W FUNCTION FOR SCALAR VALUES

σ Iterative step x xke k −a 2 xk+1 = xk x − (xk+1)e k xk − 3 x = x 2(xke a) k+1 k x x (xk+2) − 2(xk+1)e k −(xke k −a) (xk+1) 2 2(ex+exx) x x 3 − 2e +e x  (−a+exx)3 (−a+exx)2  4 xk+1 = xk + x x 3 x x x x x x − 6(e +e x) + 6(e +e x)(2e +e ) − 3e +e x (−a+exx)4 (−a+exx)3 (−a+exx)2

Table 2.1: Methods of orders two through four in the K¨onig’s family

2.2 Computing W

The very first algorithm for computing all branches of W was probably developed by E M. Wright in [Wri59], an impressing though heuristic set of rules and ideas to hand calculating (by drawing too) a good approximation of Wk(z). We did not apply much of his ideas because he used different branch cuts, and the method he developed has not a big numerical interest. A fast and machine-ready algorithm for solving the equation wew = x when both x and w are real and x> 0, is given in [FSC73], with an implementation in Fortran 70. Here Fritsch et al. gave two different methods, a fourth-order and a third-order one, and choice to focus their attentions of finding an initial guess very close to the root, in order to achieve 6 figure accuracy after just one application of the first method, and machine precision accuracy after two applications of the second one. In [CGH+96], Corless et al. tested the performances of three differ- ent methods, namely

Newton’s method • Halley’s method • An extended version of the fourth-order method developed in • [FSC73], on a DEC (Alpha) 3000/800S, and concluded that Halley’s method, in the formulation of (2.13), is the fastest one among them for a wide set 2.2. COMPUTING W 23

of arguments and branches. The article ends up with a few details on how the initial guess is determined in . On 1998 Nicol N. Schraudolph released [Scha] an Octave implemen- tation of W that became very popular since fast and well working. He used Halley’s method, as presented by Corless, and based the strat- egy for the initial guess on the same paper, tuning the algorithm and making it simpler and more efficient than that described by Corless [Schb]. We based much of our work on his algorithm for choosing the initial guess, hence we will discuss it in a quite detailed way.

2.2.1 Choice of the initial value Have a look to the following listing, that belongs to the source file of Octave’s function lambertw(b,a), that is released under the terms of the GNU General Public License version 2 or later, where the argument of W is stored in z and the number of branch in b.

1 %% series expansion about 1/e % − % p = (1 2 abs(b)). sqrt(2 e z + 2); % w = (11/72)− ∗ p; ∗ ∗ ∗ % w = (w 1/3).∗ p; − ∗ 6 % w = (w + 1). p 1 % ∗ − % first order version suffices: % − w = (1 2 abs(b)). sqrt(2 e z + 2) 1; − ∗ ∗ ∗ ∗ − 11 %% asymptotic expansion at 0 and Inf % v = log(z + ˜(z b)) + 2 pi I b; v = v log(v +| ˜v); ∗ ∗ ∗ − 16 %% choose strategy for initial guess % c = abs(z + 1/e); c = (c > 1.45 1.1 abs(b)); − ∗ 21 c = c (b. imag(z) > 0) (˜imag(z)&(b == 1)); w = (1| ∗c). w + c. v; | − ∗ ∗ At line 10 series expansion (1.28) is used. The syntax is quite involved, for the sake of the brevity, but the value of w at the end of 24 CHAPTER 2. LAMBERT W FUNCTION FOR SCALAR VALUES the execution of the statement can be predicted as follows:

if we are interested in calculating W , and then b = 0, w will • 0 contain the first two terms of (1.28), where p = p0 = 2(ez + 1);

when calculating W− or W , the first two terms ofp the same • 1 1 equation will be stored in w, but the substitution p = p−1 = p1 = 2(ez +1) will be used; − otherwise,p w will hold a meaningless value, since this expansion • will not be used.

At lines 15 – 16 the asymptotic expansion (1.27) is applied. Also in this case a trick that need some explanation is used. Indeed Wk(z) is not defined when z = 0 and k = 0, in fact W0(0) = 0, but for z = b =0 the first expansion is used. Thus6 we would like our code to return some kind of NaN value. And in fact at the end of line 16 if z = b = 0 the variable w contains 0, otherwise contains -Inf, that will become Nan after just one iteration of the method. But, in our opinion, the most important contribution of Schrau- dolph is at lines 15 – 16, where the algorithm chooses which series shall be used for given values of z and b. Since analysing it line-by-line would we quite hard to do and could easily become incomprehensible, we will discuss the final value of w as b and z change. When b > 1, the series expansion about is used, while if b =0a series about| | e−1 is chosen when z +e−1 <∞1.45, otherwise the other one is preferred− 4. For b = 1, a series| expansion| about e−1 is used when z +e−1 < 0.34,| and| only if Im(z) < 0, when computing− W , or | | 1 Im(z) 0, when computing W−1. Proving≥ that those approximations and that strategy return a cor- rect initial value for the iterative methods is a quite challenging prob- lem, that we tried to solve, though without success. Besides, in order to better understand and possibly tune a little bit the algorithm, we made a few more experiments, to determine which other radii could be used to separate the region in which we approximate with (1.28) from that in which we prefer (1.27), and what their performances could be.

4 Note that we can equivalently imagine |z − z0| < ρ, where z, z0 ∈ C and ρ ∈ R, ρ > 0 as the set of z that are inside a circle of radius ρ and center z0. We will refer to this situation when will talk about radius, suggesting that the center we are interested in is −e−1. 2.2. COMPUTING W 25

Tests were performed as follows: for each radius r we wanted to test, we considered a certain number5 of points on the circumference of radius r centered in c = e−1, and an equal number of points at 0 − distance r+eps*2.5 from c0. Then we calculated the value of W−1,W0, 6 and W1 with both Octave’s lambertw function and same code mod- ified in order to use r as regions bounding radius and no more than ten iterative steps, and compared the results (maximum number of iterations, residual of wew z, difference between the two values). −

Branch KW,2 KW,3 Schraudolph 0 – 12.270 13.330 -1 10.860 10.510 11.370 1 13.360 14.330 15.650 -2 10.580 10.230 11.020 2 10.560 10.220 11.080 -3 9.318 10.230 11.090 3 9.396 10.230 11.080 -4 9.319 10.230 11.090 4 9.299 10.240 11.110 -5 9.332 10.250 11.120 5 9.302 10.240 11.110 -10 9.329 10.250 11.120 10 9.315 10.240 11.100 -100 8.095 8.466 9.098 100 8.060 8.436 9.09 0

Table 2.2: Execution time, expressed in seconds, of our Octave implemen- tation of the algorithms, on an arrays of 2 million complex elements.

For each one of the interesting branches, we found four values, namely

1st min – the minimum radius using which the iterative method • converges in less than eleven steps;

2nd min – the minimum radius using which the iterative method • converges in as many steps as the Schraudolph’s one would do;

5from 63 to 628319 6Schraudolph’s implementation is now part of the Octave’s specfun package 26 CHAPTER 2. LAMBERT W FUNCTION FOR SCALAR VALUES

Branch 1st min 2nd min old value 1rd max 2nd max 1 0.037 0.216 0.340 0.344 0.367 − 0 1.368 1.444 1.450 1.480 2.043 1 0.039 0.216 0.340 0.344 0.367

Table 2.3: Values of radius for the circles separating the region in which series expansion about e−1 is chosen from that in which expansion about − is preferred. ∞

1st max – the maximum radius using which the iterative method • converges in as many steps as the Schraudolph’s one would do; 2nd max – the maximum radius using which the iterative method • converges in less than eleven steps; that are resumed in Table 2.3. Since the situation did not seem to be clear enough, we tried to understand the situation using a more systematic set of experiments, considering for each branch k, the following sets

= a C : z W (a), z = ψ(a) , (2.20) Vk { ∈ i → k 0 } = a C : z W (a), z = ϕ(a) , (2.21) Uk { ∈ i → k 0 } where ψ(a) and ϕ(a) are the first two terms of the asymptotic series expansion of w exp(w) a about and about e−1, respectively. The resulting plot are shown− in Figures∞ 2.3. − Since we did not find any significant performance improvement changing that parameter, we decided to maintain the choices Schrau- dolph made, widely used in practice and thus more tested than ours experimental evidences.

2.2.2 Iteration We implemented and tested the performances of four different iterative methods, namely Schraudolph’s algorithm and our implementations of KW,2, KW,3 and KW,4, using the initial value as described above. We immediately noticed that KW,4 was too much slower than the others, since even tough it needs less steps to converge, it needed to do too many operations at each step, and we found it not interesting. 2.2. COMPUTING W 27

The different performances of the other methods was on other hand quite similar, and, to understand how things stood, we performed many tests, resulting in Table 2.2. We noted that Schraudolph’s iteration was generally the slowest, KW,3 was the fastest for lower determinations and KW,2 the best for higher ones, so we proposed an algorithm, that combines the merits of both algorithms we implemented so far, using the best iteration depending on the number of the branch. The algorithm we proposed is only a first hint, since we expect that for big values of the lowest determination Newton’s method would be the best choice, and for small values of higher branches Halley’s method would fit better. Some heuristics for answering to that question are under development, and it could be matter for a future work on the scalar Lambert W function, within another algorithm under studying, intended for calculating every branch of a real argument involving real arithmetic only. 28 CHAPTER 2. LAMBERT W FUNCTION FOR SCALAR VALUES

(a) V0 (b) Vk, k 6=0

(c) U0 (d) Vk, k 6=0

(e) U1 (f) U2

Figure 2.3: Region of convergence of the two asymptotic series expansion at e−1 and in the complex plane, ranging from 10 to 10 and from − ∞ − 10i to 10i. The white light lines are the axes of the complex plane, the − − small black cross near 0 is e 1. − Chapter 3

Lambert W function for matrices

In that last chapter we will present some aspects of computing W (A), where A Cn×n, introducing some theoretical results and a few al- gorithmic∈ ideas. We will also discuss how W (A) has been numerically calculated so far, underlining on the one hand its issues and limitations and showing on the other that a matrix iterative method can achieve a greater accuracy in the worst case. We advice the reader to have a look to Appendix B before starting this chapter, if one has no idea of what a function of a matrix is. We will follow more or less the same structure as the previous chap- ter, even though in this case things are more complicated, partly be- cause the matrix case is in general odder and a little bit more difficult than the scalar one, and partly because this time we did not start our study from a working implementation of an iterative method, and the problem we faced was, in some aspects, new.

3.1 Iterative root-finding methods

Since defining an iterative method needs the concept of derivative of a function at a point, we will start a definition of Fr´echet derivative, suitable for operators.

Definition 3.1.1. Let F : Cn×n Cn×n be a matrix function. Then → 29 30 CHAPTER 3. LAMBERT W FUNCTION FOR MATRICES the Fr´echet derivative of D at A Cn×n is a linear mapping ∈ DF : Cn×n Cn×n A → H DF [H] (3.1) 7→ A such that for all H Cn×n, it holds that ∈ F (A + H) F (A) DF [H]= o( H ). (3.2) − − A || || Note that the Fr´echet derivative is a linear operator, and thus can be seen as a matrix in a suitable basis of Cn×n. Indeed when we write DFA [H] we intend to stress that we are applying the linear opera- tor DFA to the matrix H, that is an operator itself, but is seen here as a point of the linear space of the matrices. Since, as a matrix, n2×n2 DFA C the operation we actually perform is a standard matrix ∈ 1 multiplication of DFA and and H, written in terms of the vec basis. The chain rule holds true for that kind of derivative, and that give us an effective rule for calculating the derivative of a compound func- tion. Let us observe also that what we said in the previous chapter about the order and rate of convergence is still true when we are working with matrices, by simply substituting C with Cn×n, and the modulus by a suitable matrix norm. 1The linear space Cn×n has many bases, we uses here the vec one because it allows us to “multiply” an n2 × n2 matrix for an n × n one, getting another n × n matrix as result. Let us clarify the situation with an example that involves C2×2 and a matrix B ∈ C4×4. The vec basis U of C2×2 is 1 0 0 0 0 1 0 0 U = , , , (3.3) 0 0 1 0 0 0 0 1

T and then A =(aij ), i,j =1, 2 can be written in the vec basis as a11 a21 a12 a22 . If B =(bij ), i,j =1,... 4, then 

b11 b12 b13 b14 a11 b11a11 + b12a21 + b13a12 + b14a22 b21 b22 b23 b24 a21 b21a11 + b22a21 + b23a12 + b24a22 C = = (3.4) b b b b a b a + b a + b a + b a  31 32 33 34  12  31 11 32 21 33 12 34 22 b41 b42 b43 b44 a22 b41a11 + b42a21 + b43a12 + b44a22       . Now we can reshape C and get

b a + b a + b a + b a b a + b a + b a + b a × B[A] = 11 11 12 21 13 12 14 22 31 11 32 21 33 12 34 22 ∈ C2 2 b21a11 + b22a21 + b23a12 + b24a22 b41a11 + b42a21 + b43a12 + b44a22 (3.5) 3.1. ITERATIVE ROOT-FINDING METHODS 31

Thus in this chapter we will find of matrices Xn n∈N in the form { } X = G(X ) k+1 k (3.6) X = Ξ  0 where G : Cn×n Cn×n is a function of matrix and and Ξ Cn×n. → ∈ The sequences Xn n∈N we are interested in have the nice property that { } n×n lim Xi = X C (3.7) i→∞{ } ∈ where X = G(X) is a fixed point of thee matrix iteration (3.6).

3.1.1e Newton’se method Let F : Cn×n Cn×n be the function we want to find the root I ∈ → of, let A be such that F (A) = O and X0 be an initial approximation∈ I of A. Let us now consider the first∈ two I terms of the Taylor series of F about X0, we have that

F (X +H)= F (X )+DF [H]+O(H2) F (X )+DF [H] , (3.8) 0 0 X0 ≈ 0 X0 where we wish to determine H Cn×n such that X + H is an ∈ 0 ∈ I approximation of A better than X0. Then we put in (3.8) F (X0 +H)= 0, getting

0= F (X0)+ DFX0 [H] , (3.9) where H is unknown. If DFX0 is not singular, we can revert it and write

0= F (X0)+ DFX0 [H] DFX0 [H]= F (X0) ⇔ −1− H = DF [F (X0)] . (3.10) ⇔ − X0 Now that we have a direct formula for calculating H, we can define Newton’s iteration as

−1 Xk+1 = Xk DF [F (X0)] , − X0 (3.11) X = A,  0 where if A is a good approximatione of A, where “good” has the usual meaning, then for each k, Xk+1 may be an approximation of A better than Xk. e 32 CHAPTER 3. LAMBERT W FUNCTION FOR MATRICES

From now on, we will refer to (3.11) as the true Newton’s method. That implies that at least another homonym method exists, and that this one is, in any sense, false2. We are now to present it. If f C2( ) and f ′(x) = 0 for all x , then we can apply the ∈ J 6 ∈ J Newton’s method to find any root αk , and the iteration will be in the form ∈J f(xk) xk+1 = xk ′ (3.12) − f (xk) for an initial value x0. To get the simplified method we have to modify (3.12) as follows X = X (f ′(X ))−1f(X ) (3.13) k+1 k − k k ′ where both f and f are functions of matrices, and for each k, Xk Cn×n. ∈ To better understand the situation let us write the simplified New- ton’s method’s iterative step for W . For scalar values the Newton’s method for computing W (a) where a C is ∈ x exk a x exk a x = x k − x = x k − (3.14) k+1 k − xex +ex ⇔ k+1 k − (x + 1)ex then for a matrix A Cn×n the simplified Newton’s method corre- sponds to the formula∈ X = X (X + I)−1 exp(X )−1(X exp(X ) A) (3.15) k+1 k − k k k k − where we denote by exp(Xk) the matrix exponential. Those two methods have, in general, a different behaviour for the same initial value X0. The simplified one could diverge when the other one converges, and the first could be stable when the second one is not. Nevertheless, for the Lambert W function the following theorem holds. Theorem 3.1.1 (Equivalence between true and simplified Newton’s methods). Let A Cn×n, let X = Y Cn×n be the initial value, and ∈ 0 0 ∈ let Xn n∈N and Yn n∈N be the two sequences of true and simplified Newton’s{ } method,{ respectively.} If both the methods can be applied and X (A)3, then for all k 0 it holds that X ,Y (A), and that 0 ∈ P ≥ k k ∈ P Xk = Yk. 2We will call it more politely the simplified one. 3 X0 is a polynomial of A 3.1. ITERATIVE ROOT-FINDING METHODS 33

Proof. We will prove the theorem by induction. If k = 0 we have noth- ing to prove, since both X0 = Y0 and X0,Y0 (A) are hypotheses. Let us now suppose that X = Y and X ∈P (A), and prove that k k k ∈P Xk+1 = Yk+1 and Xk+1 (A). We can rewrite the two sequences as follows ∈ P

Xk+1 = Xk + Hk, (3.16)

Yk+1 = Yk + Hk, (3.17) where, recalling (3.9) and (3.15), b −1 Hk = DF [F (Xk)] (3.18) − Xk H = (X + I)−1 exp(X )−1(X exp(X ) A) (3.19) k − k k k k − Cn×n Cn×n we denoteb by F : the function of matrices F (X) = X exp(X) A. → − Let us now calculate the Fr´echet derivative of F at Xk in the direc- tion of the increment Hk: DF [H ]= X d exp [H ]+ H exp(X ) Xk bk k Xk k k k = Xk exp(Xk)Hk + Hk exp(Xk) b b b = Hk(Xk + I)exp(Xk) b b = X exp(X )+ A − k k = bF (X ) (3.20) − k when we used the fact that if Hk commutes with Xk then d expXk [Hk]= exp(Xk)Hk (see for instance [Ian11]). Then it holds that

−1 DFX [Hk]= F (Xk) Hk = DF [F (Xk)] = Hk (3.21) k − ⇔ − Xk and we haveb that b X = Y ; • k+1 k+1 H (A), since H (A); • k ∈P k ∈P X ,Y ( ) since sums of polynomials of A. • k+1 k+1 ∈P A b 34 CHAPTER 3. LAMBERT W FUNCTION FOR MATRICES

3.2 Computing W

The main aim of this section is to analyze the currently implemented method for calculating W (A), when A Cn×n, and to show that in some cases the result can be more accurate,∈ without needing any extra precision, using an iterative method.

3.2.1 Computing W (A) trough eigenvectors That simple method uses the following simpler form of the definition B.1.2 to easily calulate f(A) when A is diagonalizable. Definition 3.2.1 (Function of a diagonalizable matrix). Let A Cn×n be diagonalizable, that is, there exist M,D Cn×n, where D is diagonal∈ and M is invertible, such that A = MDM∈−1. Then, if f is defined on the spectrum of A,

f(d11) − f(d22) − f(A)= Mf(D)M 1 = M   M 1. ...  f(d )   nn    (3.22) Since we know numerical methods for computing all the eigenvalues and eigenvectors of a general matrix n n, tough not accurately if the problem is ill-conditioned, and the Lambert× W function of a scalar value, definition (3.22) leads to a direct method for computing W (A). The first limitation the method has, as the reader can easily under- stand, is that it can be applied only when the argument A is diago- nalizable, that is a quite strong condition that many matrices do not respect. Let us consider, for the sake of example, the matrix

1 0 × A = C2 2 (3.23) 1 1 ∈   that is the simplest example of non-diagonalizable matrix. Since it has two eigenvalues λ1 = λ2 = 1, but only one linearly independent eigenvector, namely t(0, 1)T , there not exists any invertible matrix M −1 such that A = MDM , where D is the diagonal matrix d11 = λ1 =1 and d22 = λ2 = 1. 3.2. COMPUTING W 35

To introduce the second limitation we have to define what do we mean by eigenvalues’ conditioning. Let A Cn×n be a diagonalizable matrix, and let δA be a per- turbation of∈ A. We know that there exist M,D Cn×n, where M is invertible and D is diagonal, such that A = MDM∈ −1. We want to calculate the value of the diagonal matrix δD such that M −1(A + δA)M = D + δD. (3.24) We have that δD = X−1δAM (3.25) but we are only interested in a of the magnitude of δD, so let us consider its norm δD = M −1δAM M −1 δA M = κ(M) A (3.26) || || || || ≤ || || || || || || || || where κ(A)= A−1 A is the conditioning number of A and is a matrix norm||4. || || || ||·|| Since D is a diagonal matrix similar to A, we can conclude on the one hand that δD represents a measure of the sensitivity of the eigenvalues to small|| || numerical perturbations, and, on the other, that this value is strictly related to the conditioning of the matrix of eigenvectors M. Turning back to the specific problem at hand, that of computing W (A) using (3.22), let us investigate what happens when κ(M) is big- ger than the inverse of the of machine precision and the matrix of the eigenvectors is thus ill-conditioned, and eigenvalues can- not be determined with more accuracy than the square root of the machine precision. Then when we compute W (D) we cannot be accurate because half of all significative figures of at least one of the elements of D (one of the eigenvalues) are wrong, and, computing W (A) = MW (D)M −1 propagates this error ratio to the whole resultant matrix. We will clarify the problem with an example. Let

3 1 2 2 0 1 1 Cn×n A = 2 2 1 . (3.27)  1 1 2 ∈ 2 − 2 4   Note that κ(A) ≥ n if we use the Frobenius norm || · ||F , κ(A) ≥ 1 if we use || · ||1, ||·||2 or ||·||∞ 36 CHAPTER 3. LAMBERT W FUNCTION FOR MATRICES

We can calculate its eigenvalues as roots of the characteristic polyno- mial p(λ) = det(A λI), that are λ1 = 2 and λ2 = λ3 = 1. We can − 5 now calculate the value of W0(D) with Octave :

0.852605502013726 0 0 0 0.567143290409784 0 .  0 00.567143290409784  (3.28) Let us calculate the value of W0(D) where D is calculated with Octave’s eig function. Its elements are e e 2.000000000000000 0 0 0 1.000000012529664 0  0 00.999999987470336  (3.29) and since we know what the correct values of D are, we can write D = D + δD where 4 10−16 0 0 e δD = − ×0 1.25296635 10−8 0 .  0 0 × 1.2529663 10−8 − ×  (3.30) Note that the first eigenvalue has an accuracy of the order of the ma- chine precision, while the second one and the third one have an accuracy of the order of square root the machine precision. Now we can calculate f(D)

e 0.852605502013725 0 0 0 0.567143294944222 0  0 00.567143285875345  (3.31) that has, as we could expect, the same lack of accuracy as the elements of D.

3.2.2e Computing W (A) trough an iterative method As we already said, we tried to develop an iterative method for com- puting W (A), applying what we learned looking at the scalar case. We

516 significant figures 3.2. COMPUTING W 37

100 100

10-2 10-2

10-4 10-4

10-6 10-6

10-8 10-8

10-10 10-10

10-12 10-12

10-14 10-14

10-16 10-16

10-18 10-18

10-20 10-20 0 5 10 15 20 0 5 10 15 20

100 100

10-2 10-2

10-4 10-4

10-6 10-6

10-8 10-8

10-10 10-10

10-12 10-12

10-14 10-14

10-16 10-16 0 5 10 15 20 0 5 10 15 20

Figure 3.1: That figure is referred to a well-conditioned matrix (κ(A) 10) ∼ having all eigenvalues in the region in which the scalar algorithm would use the asymptotic expansion at . ∞ recall that the convergence of a matrix iteration applied to a matrix A is guaranteed by the convergence of the scalar iteration applied to each of the eigenvalues of A, in view of Theorem 2.4 of [Ian09]. From here on in we will denote the argument by A Cn×n, and its eigenvalues by ∈ λ1,...,λn. Our very first trial was an implementation of the simplified Halley’s method, namely

Xk+1 = Xk (Xk exp(Xk) A) − − −1 1 − exp(X )(X + I) (X +2I)(X + I) 1 , (3.32) × k k − 2 k k   and 3 2 Xk+1 = Xk exp(Xk)+ 2I +4Xk + Xk A 2 −1 exp(Xk) 2I +2Xp + Xp + (2I + Xk)A , (3.33) ×   38 CHAPTER 3. LAMBERT W FUNCTION FOR MATRICES

100 100

10-2 10-2 10-4 10-4 10-6 10-6 10-8

10-8 10-10

10-12 10-10 10-14 10-12 10-16 10-14 10-18

10-16 10-20 0 5 10 15 20 0 5 10 15 20

1030 100

25 10 10-2

1020 10-4

1015 10-6 1010 10-8 105

10-10 100

-12 10-5 10

10-10 10-14 0 5 10 15 20 0 5 10 15 20

Figure 3.2: Here an ill-conditioned matrix (κ(A) 105) is used, but all ∼ the eigenvalues are still in the same region. Note that the symmetric version of the Newton method is the only stable one. where the initial value was obtained using the series expansion about e−1 and rewritten as matrix power series. That method worked −only when∞ the argument was very well-conditioned (κ(A) smaller than twenty), and moreover, it often converged to a mixed determination, i.e. we got a matrix W such that the norm of the residual W exp(W ) A − was small, that had eigenvalues µ1 = Wk1 (λ1),...,µn = Wkn (λn) where in general ki could be different from kj for i = j. For ill-conditioned matrices the method6 was initially convergent, since we could see that the norm of the residual rapidly decreased, but after a few iterations it started to linearly increase. Then we tried also the simplified Newton method, namely X =(X2 + A exp( X))(X + I)−1 (3.34) k+1 k − that had a little bit slower convergence speed, but a little bit more stability. In order to make it much more stable, we tried to increase 3.2. COMPUTING W 39

104 104

102 102

100 100

10-2 10-2

10-4 10-4

10-6 10-6

10-8 10-8

10-10 10-10

10-12 10-12

10-14 10-14

0 5 10 15 20 0 5 10 15 20

104 104

102 102

100 100

10-2 10-2

10-4 10-4

10-6 10-6

10-8 10-8

10-10 10-10

10-12 10-12

10-14 10-14

0 5 10 15 20 0 5 10 15 20

Figure 3.3: Here we have a well-conditioned matrix, but with eigenvalues belonging to both the convergence region. Note that even though all the methods initially converge to a value X that verifies X exp(X) = A, the determination is mixed. the symmetry of the formula, getting X =(X2 + A1/2 exp( X)A1/2)(X + I)−1 (3.35) k+1 k − and then X =(X + I)−1/2(X2 + A1/2 exp( X)A1/2)(X + I)−1/2. (3.36) k+1 k − The method got better, and gave a correct result for matrices not so well-conditioned, so, even tough it still had some stability problems, we tried to do the same with iterations (3.32) and (3.33), obtaining very long formulations that improved the quite weak stability of the initial methods. Since we got an improvement, but the issue had not been completely solved yet, we tried to modify the method in a more radical and com- putationally expensive way, using a matrix projection, a technique that 40 CHAPTER 3. LAMBERT W FUNCTION FOR MATRICES

102 100

100 10-2

10-2 10-4

10-4 10-6

10-6 10-8

10-8 10-10

10-10 10-12

10-12 10-14 0 5 10 15 20 0 5 10 15 20

10140 100

120 10 10-2

10100 10-4

1080 10-6 1060 10-8 1040

10-10 1020

-12 100 10

10-20 10-14 0 5 10 15 20 0 5 10 15 20

Figure 3.4: The worst case, an ill-conditioned matrix with mixed eigen- values. All the methods are very unstable, and initially converge to a mixed determination. let us find, for a given matrix X, the matrix X such that X (A) ∈ C and, for all X∗ (A), it holds that X X X∗ X . To do that we laid our∈ hands C on projection|| matrices,− ||e ≤that || we− aree|| going to formally define. e Definition 3.2.2 (Projection matrix). A matrix Π Cn×n is called a projection matrix if ∈ Π = Π∗ and Π2 = Π. If u C the vector Πu is called the projection of u. ∈ Note that in our case the projection matrix had size n2, and we applied it to an n n matrix, written in the vec basis, reshaped at the end. × 3.2. COMPUTING W 41

Since we wanted to project a matrix onto the subspace (A), we needed to find the basis of that subspace. C Using the fact that vec(AXB)=(BT A)vec(X) we wrote ⊗ vec(AX)= vec(AXI)=(I A)vec(X) (3.37) ⊗ vec(XA)= vec(IXA)=(AT I)vec(X). (3.38) ⊗ We were looking for X such that AX XA = 0, that holds when also vec(AX XA) = 0. We substituted− (3.37) and (3.38), getting − vec(AX XA)=(I A AT I)vec(X) (3.39) − ⊗ − ⊗ where L = I A AT I is a linear application such that ⊗ − ⊗ (A) = ker(L). (3.40) C We wrote an algorithm that, using Octave’s svd function, finds a basis of eigenvectors of L with a Matlab-like method. We considered the matrix K Cn2×m made up by the m last eigenvectors of the single value decomposition∈ of L, where m is the multiplicity of the eigenvalue 0 in L, and the projector matrix P = KK∗ Cn×n. Thenwe ∈ projected once for every step of the method the matrix Xk+1 obtained onto the space of commutators, getting an algorithm that proved itself to be stable, when the iterative step (one of the least compact possible formulations)

X = X3 exp(X )+ 2I +4X +2 A k+1 k ∗ k k 2 −1 exp(Xk) 2I +2X k + Xk + (2I + Xk) A (3.41) × is used.   Solved that first issue, we concentrated on the problem of getting the correct determination of W . Indeed with the algorithm described so far three different situation were possible:

some eigenvalues converged, others diverged; • all the eigenvalues converged, some to the right determination of • W , some to another one (usually the conjugate of the right one);

all the eigenvalues converged to their right determination. • 42 CHAPTER 3. LAMBERT W FUNCTION FOR MATRICES

We made some tests, and concluded that the third case occurred when the number of the branch we wanted to calculate was not equal to 1, 0 or 1, or when, working on those “problematic” branches, all the− eigenvalues of the matrix belonged to the same region of the two described in Chapter 2. We had to find a method to make our algorithm converge to the right determination in those bad cases too, and our choice fell on Schur- Parlett’s way [DH03], and we developed the following algorithm. Since we had a stable method for computing W of a matrix having all the eigenvalues in the same region, we considered a Schur form6 of the matrix, where the eigenvalues on the diagonal were divided into two blocks, an head one, containing all those eigenvalues that could be calculated by an iteration starting from the asymptotic expansion about e−1, and a tail one, that contained all the eigenvalues that can be computed− starting from the asymptotic series expansion about . ∞ To divide them we used the function [U S] = schur (A, OPT) of Octave, with OPT=’d’, that indicates that all eigenvalues λi such that λ 1 should be moved to the leading block of S. | i|≤ Since we were not interested in λi such that λi 1, but in those −1 | | ≤ such that λi + z0 R, with z0 = e and R = 1.45 or R = 0.34 depending| on the| branch, ≤ we considered− the following matrix

(A + z I) A = 0 (3.43) R e that has eigenvalues in a ball of radius 1 centered in 0 if and only if A has eigenvalues in a ball of radius R centered in z0. Then we considered the ordered Schur decomposition

Q∗AQ = T (3.44)

6Let A ∈ Cn×n. Then there exist ae unitarye e matrixe Q,T ∈ Cn×n, where Q is unitary and T is upper triangular, such that

∗ Q AQ = T (3.42)

where tii = λi for each i =1,...,n and λi is an eigenvalue of A. Furthermore, Q can be chosen such that the eigenvalues λi appear in any order along the diagonal. 3.2. COMPUTING W 43 that substituted in (3.43) let us write

T = Q∗AQ (A + z I) = Q∗ 0 Q e e e e R Q∗AQ Q∗Q = e + z e R 0 R Qe∗AeQe z Ie e = + 0 R R e e e and then

z I Q∗AQ T 0 = TR z I = Q∗AQ − R R ⇔ − 0 e e e Q∗AQ = TR z I e ⇔ e e− e0e T = TR z0I. (3.45) ⇔ e e e −e Calculating the ordered decompositione of A and then rescaling the result back using (3.45), we got the Schur triangulation of A. Let us name the blocks of the resulting matrix as follows

A A T = 11 12 . (3.46) 0 A  22  For k such that k < 2 we knew how to correctly calculate Z = | | 1 Wk(A11) and Z2 = Wk(A22), and since W (A) (A) as every function of A, we had that ∈P

Z X A A A A Z X 1 11 12 = 11 12 1 (3.47) 0 Z 0 A 0 A 0 Z  2  22   22  2  where X is the unknown of the matrix equation (3.47). We determined X equating left and right expressions

Z1A12 + XA22 = A11X + A12Z2 (3.48) rearranging to get the Sylvester equation

A X XA = Z A12 A Z (3.49) 11 − 22 1 − 12 2 44 CHAPTER 3. LAMBERT W FUNCTION FOR MATRICES and then solving it using the Octave’s function lyap. Once we found X we put the pieces together, getting the final result

Z X W (A)= 1 . (3.50) 0 Z  2  The method here described is only a first idea that we implemented for W0 in a crude way. Many optimization are still needed, since the work made so far seems to be weak in particular cases, e.g. when the matrix argument has slightly different eigenvalues. We are also working on a clustering algorithm, based on the same ideas, from which we expect better results in those particular conditions.

Remark on figures. In Figures 3.1, 3.3, 3.2 and 3.4, the top line reports the performances of two different implementations of Newton’s method, the bottom one the performances of two different implemen- tations of Halley’s method. In both cases the standard implemen- tation is on the left, the symmetric one on the right. The vertical axis uses logarithmic scaling, and represents the norm of the residual of W exp(W ) A (red line) and W W (green line), where W is determined using− our algorithm and W− using the function of matrix definitionf off the section below. f f Appendix A

Complex numbers

A.1 Definition and representations

A complex number is an expression of the form z = a + ib, where a,b R are called real part of z and imaginary part of z, respectively. We∈ will denote these by

x = Re(z) (A.1) y = Im(z) and the complex plane, i.e. the set of complex numbers, by C. Since there exists a one-to-one correspondence between complex numbers and points in R2, namely a + ib (a,b), any point (x,y) C such that (x,y) = (0, 0) can be described7→ by polar coordinates ρ ∈and 6 θ, where ρ = x2 + y2 and θ is the angle subtended by (x,y) and the x-axis. The Cartesian coordinates can be recovered from the polar ones by p

a = ρ cos θ, (A.2) (b = ρ sin θ, that leads us to write, using the polar coordinates in complex notation,

z = a + ib = ρ(cos θ + i sin θ). (A.3)

We define the modulus and argument of z to be the value of ρ and θ respectively, and will denote them by

45 46 APPENDIX A. COMPLEX NUMBERS

ρ = z , (A.4) θ = arg| | z. Thus arg z is a , defined for z = 0, and we will denote by Argz its , specified quite arbitrarily6 to be the value that satisfies π<θ<π. Another notation− we will often use as well is the polar representa- tion, that comes by substituting the Euler formula

eiθ = sin θ + i sin θ (A.5) in (A.3), and looks like

z = ρeiθ, ρ = z , θ = arg z. (A.6) | | Finally the complex conjugate of a complex number is defined to be z = a ib or, equivalently, z C such that − ∈ Re(z) = Re(z), (A.7) Im(z)= Im(z). ( − Appendix B

Functions of matrices

This appendix can be seen as a set of prerequisites for Chapter 3, and was put here because we did not want to break the symmetry between the discussion about the and matrix algorithms, since we took for granted the notion of scalar function.

B.1 Definitions

Let us point out that we are interested in defining a function of matrix as a generalization of a scalar function f : C C to a matrix case f : Cn×n Cn×n that we will denote from here→ on by f(A), suggesting A Cn×n→. ∈ We will give two definitions of f(A), the first of which needs the notion of Jordan canonical form that we will briefly recall.

Theorem B.1.1 (Jordan canonical form). Let A Cn×n. Then there exists a non-singular matrix Z Cn×n and a block∈ diagonal matrix J Cn×n such that A = ZJZ−1 ∈and ∈ J 0 0 1 ··· − 0 J2 0 Z 1AZ = J =  . . ···. .  , (B.1) . . .. .  0 0 J   k  ···  47 48 APPENDIX B. FUNCTIONS OF MATRICES where each Jh is of the type

λi 1 0 0 0 λ 1 ··· 0 i ···   µh×µh Jh(λi)= 0 0 λi 0 C (B.2) . . . ···. ∈  . . . ..   1   0 0 0 λ   i  ···  and k

µt = n. (B.3) t=1 X The matrix J is called Jordan matrix and it is unique up to the ordering of the blocks Ji, while the transformating matrix Z is not unique and depends on J.

Matrix (B.1) and (B.2) are called Jordan matrix and Jordan block, respectively. Moreover, for any eigenvalue λi of A, we denote by by νi the order of the largest Jordan block in which λi appears, νi is called index of λi. Note that the sum of the dimensions of the blocks that contain the same eigenvalue λi represents the algebraic multiplicity of λi, while the number of blocks represents its geometric multiplicity. Definition B.1.1 (Function defined on the spectrum of A). 1 The function f is said to be defined on the spectrum of A if the values

f (j)(λ ), j =0,...,n 1, i =1,...,s, (B.4) i i − exist. These are called the values of the function f on the spectrum of A.

Now we are ready to give the first definition of f(A).

Definition B.1.2 (Matrix function via Jordan canonical form). Let f be defined on the spectrum of A Cn×n and let matrix (B.1) be the Jordan canonical form of A. Then∈ we define

f(A)= Zf(J)Z−1 (B.5)

1This definition can be applied to any matrix. Nevertheless, if A ∈ Cn×n is diagonal- izable we can say that f : D → C is defined on the spectrum of A if σ(A) ⊂ D. B.1. DEFINITIONS 49 where f(J ) 0 0 1 ··· 0 f(J2) 0 f(J)=  . . ···. .  (B.6) . . .. .  0 0 f(J )  k   ···  and

(µ −1) ′ ′′ f h (λi) f(λi) f (λi) f (λi) − ··· (µh a)! ′ .. .  0 f(λi) f (λi) . .  × . . Cµh µh f(Jh(λi)) =  0 0 f(λ ) .. .  .  i  ∈  . . . .. ′   . . . . f (λi)     0 0 0 f(λi)   ···    (B.7)

The second definition we are interested in, uses concepts of polyno- mial interpolation and polynomial at matrix arguments, that we will introduce right now. Definition B.1.3 (Matrix polynomial). Let A Cn×n, we call matrix polynomial an expression of the form ∈

n i P (A)= aiA , (B.8) i=0 X where Ai is defined by the following inductive law

A0 = I, − (B.9) Ai = Ai 1A.  Definition B.1.4 (Minimum polynomial). We call minimum polyno- mial of A Cn×n the monic polynomial ψ of lowest degree such that ψ(A)=0. ∈ It can be proven that, if A = 0, such polynomial 6 exists, • is unique, • 50 APPENDIX B. FUNCTIONS OF MATRICES

divides any other polynomial p that verifies p(A)=0. • Considering (B.1) and (B.2), we can write that

s ψ(γ)= (t λ )νi (B.10) − i i=1 Y where s is the number of distinct eigenvalues of A, λi is the eigenvalue of the i-th Jordan block and νi is the index of λi. It follows immediately that ψ(λi) = 0 for i = 1 ...s. A consequence of that is the theorem below. Theorem B.1.2 (Equivalence of matrix polynomials). Let A Cn×n and let p,q be polynomials. Then p(A)= q(A) if and only if, for∈ each i =1 ...s and j =1,...νi, it holds that

j j p (λi)= q (λi). (B.11) Proof. Let us suppose that p and q satisfy p(A) = q(A). We want to prove that they take the same values on the spectrum of A. Let then d = p q be another polynomial. Since d(A) = 0, d is divisible by the minimal− polynomial ψ, and thus d takes only the value zero on the spectrum of A, that is, p and q take the same values on the spectrum of A. On the other hand, let us suppose that p and q take the same values on the spectrum of A. Then d = p q is zero on the spectrum of A and so from (B.10) it must be divisible− by ψ. Thus d = ψr for some polynomial r, and since ψ(A)=0, d(A)=0 too. Then we should stress that for each polynomial p, the matrix p(A) is determined by the values of p on the spectrum of A. The following definition of f(A) generalizes this property to an arbitrary function, by interpolating it on the spectrum of A with a polynomial. Definition B.1.5 (Matrix function via Hermite interpolation). Let A Cn×n, let f be defined on the spectrum of A, let λ ...λ be the ∈ 1 s eigenvalues of A and νi the index of the i-th eigenvalue. Then f(A)= p(A) where p is the Hermite interpolating polynomial that satisfies the interpolating conditions f (i)(λ )= p(i)(λ ), j =1 ...s,i =1 ...ν 1. (B.12) j j i − Appendix C

Source code

C.1 mixW(, )

function [w, n] = mixW(b,z)

3 if (nargin == 1) b = 0; else if (nargin !=2) warning(’usage:mixW(,)’); return; 8 end

%% series expansion about 1/e tmp = 2 (e z + 1); − w = (1 ∗ 2∗abs(b)). sqrt(tmp) 1; − ∗ ∗ − 13 %% asymptotic expansion at 0 and Inf % v = log(z + ˜(z b)) + 2 pi I b; v = v log(v +| ˜v); ∗ ∗ ∗ − 18 %% choose strategy for initial guess % c = abs(z + 1/e); c = (c > 1.45 1.1 abs(b)); − ∗ 23 c = c (b. imag(z) > 0) (˜imag(z)&(b == 1)); w = (1| ∗c). w + c. v; | − ∗ ∗ %choose iteration strategy h = (abs(b)<3 & b != 1);

51 52 APPENDIX C. SOURCE CODE

28 %% Halley iteration % maxit = 9; n = 0; 33 if (h) do n++; if (w!= 1) p = exp− (w); 38 t = w. p z; w1 = w∗.+1;− t = t./(p. w1 (0.5. t. (w1.+1)./w1)); else ∗ − ∗ ∗ t = 0; 43 end; w = w t; %until (abs−(t) < (2.48 eps) (1.0+abs(w)) n >= maxit) until (abs(real(t)) < (2.48∗ eps∗ ) (1.0 + abs||(real(w))) && abs(imag(t)) <∗ (2.48∗ eps) (1.0 + abs(imag(w))) ∗ ∗ 48 n >= maxit) else || do n++; oldw = w; 53 if (w!= 1) p = exp− (w); t = w. p z; t = t./((∗ w−.+1). p); else ∗ 58 t = 0; end; w = w t; until (abs(real− (t)) < (2.48 eps) (1.0 + abs(real(w))) && abs(imag(t)) <∗ (2.48∗ eps) (1.0 + abs(imag(w))) ∗ ∗ 63 n >= maxit) end; ||

if (n>maxit) warning(’iteration limit reached, result of W may be inaccurate’); 68 end end C.2 blockW(, , ) C.3. MATW(, ) 53

1 function W = blockW (A, b, V)

maxit=100; n=size(A,1); I=eye(n); 6 %calculating projection matrix M=kron(eye(n),A) kron(A.’,eye(n)); [U ZZ VV]=svd(M−); m=sum(diag(ZZ)<1e 14); − 11 K=VV(:,nˆ2 m+1:nˆ2); P=K K’; − ∗ ct=0; Xp=V; 16 ct=0; for k=2:maxit

% Halley’s step ct=ct+1; 21 eX=expm(Xp); num=(Xpˆ3 eX+(2 eye(n)+4 Xp+Xpˆ2) A); den=eX (2 ∗eye(n)+2∗ Xp+Xp∗ˆ2)+(2 eye(∗n)+Xp) A; Xp=num∗/den∗ ; ∗ ∗ ∗

26 %projection of Xp Xp=reshape(P (Xp(:)),n,n); ∗ end

31 W = Xp; return;

end C.3 matW(, )

1 function W = matW(A, b)

z0 = 1/e; r = 1.45;− n = size(A,1); 6 if (b == 0) 54 APPENDIX C. SOURCE CODE

% translation and scaling of A At = (A+z0 eye(n))/r; ∗ 11 [Qt Rt] = schur(At, ’d’); n1 = sum(abs(diag(Rt))<1); R = (Rt r z0 eye(n)); R11 = R∗(1:−n1,1:∗ n1); R12 = R(1:n1,n1+1:n); 16 R22 = R(n1+1:n,n1+1:n);

% W of R(1,1) using exp at 1/e V1 = (1 2 abs(b)) sqrtm(2−e R11+2 eye(n1)) eye(n1); W11 = blockW− ∗ (R11∗,b,V1); ∗ ∗ ∗ − 21 % W of R(2,2) using exp at 0 and inf n2 = n n1; V2=logm−(R22)+2 pi i b eye(n2); V2=V2 logm(V2);∗ ∗ ∗ ∗ − 26 W22 = blockW(R22,b,V2);

% W of R(1,2) C = R12 W22 W11 R12; X = lyap∗(R11,− R22∗, C); − 31 Wt = [W11 X; zeros(n n1,n n2) W22]; W = Qt Wt Qt’; − − else if (abs∗(b) !=∗ 1)

V=logm(A)+2 pi i b eye(n); ∗ ∗ ∗ ∗ 36 V=V logm(V); W = −blockW(A,b,V);

else

41 disp(’Branches 1 and 1 not implemented yet.’); − end

end; Bibliography

[BCDJ08] Manuel Bronstein, Robert M. Corless, James H. Davenport, and D. J. Jeffrey, Algebraic properties of the Lambert W function from a result of Rosenlicht and of Liouville, Inte- gral Transforms Spec. Funct. 19 (2008), no. 9-10, 709–712. MR 2454730 (2010b:11087) [BH03] Xavier Buff and Christian Henriksen, On K¨onig’s root- finding algorithms, Nonlinearity 16 (2003), no. 3, 989–1015. MR 1975793 (2004c:37086) [CDHJ07] Robert M. Corless, Hui Ding, Nicholas J. Higham, and David J. Jeffrey, The solution of S exp(S) = A is not al- ways the Lambert W function of A, ISSAC 2007, ACM, New York, 2007, pp. 116–121. MR 2396192 (2009e:34187) [CGH+96] R. M. Corless, G. H. Gonnet, D. E. G. Hare, D. J. Jef- frey, and D. E. Knuth, On the Lambert W function, Adv. Comput. Math. 5 (1996), no. 4, 329–359. MR 1414285 (98j:33015) [CJK97] Robert M. Corless, David J. Jeffrey, and Donald E. Knuth, A sequence of series for the Lambert W function, Pro- ceedings of the 1997 International Symposium on Symbolic and Algebraic Computation (Kihei, HI) (New York), ACM, 1997, pp. 197–204 (electronic). MR 1809988 [dB81] N. G. de Bruijn, Asymptotic methods in analysis, third ed., Dover Publications Inc., New York, 1981. MR 671583 (83m:41028) [DH03] Philip I. Davies and Nicholas J. Higham, A Schur-Parlett algorithm for computing matrix functions, SIAM J. Ma-

55 56 BIBLIOGRAPHY

trix Anal. Appl. 25 (2003), no. 2, 464–485 (electronic). MR 2047429 (2004m:65056) [FSC73] F. N. Fritsch, R. E. Shafer, and W. P. Crowley, Solution of the transcendental equation wew = x, Comm. ACM 16 (1973), no. 2, 123–124. [GLM99] Rudolf Gorenflo, Yuri Luchko, and Francesco Mainardi, Analytical properties and applications of the Wright func- tion, Fract. Calc. Appl. Anal. 2 (1999), no. 4, 383–414, TMSF, AUBG’99, Part A (Blagoevgrad). MR 1752379 (2001c:33011) [GVE90] Sheldon P. Gordon and Ellis R. Von Eschen, A parabolic extension of Newton’s method, Internat. J. Math. Ed. Sci. Tech. 21 (1990), no. 4, 519–525. MR 1065909 (91f:65096) [Hig08] Nicholas J. Higham, Functions of matrices, Society for Industrial and Applied Mathematics (SIAM), Philadel- phia, PA, 2008, Theory and computation. MR 2396439 (2009b:15001) [Hou70] A. S. Householder, The numerical treatment of a single non- linear equation, McGraw-Hill Book Co., New York, 1970, International Series in Pure and Applied Mathematics. MR 0388759 (52 #9593) [Hou06] Alston S. Householder, Principles of numerical analysis, corrected ed., Dover Publications Inc., Mineola, NY, 2006. MR 2450153 (2009g:65001) [Ian09] Bruno Iannazzo, A family of rational iterations and its ap- plication to the computation of the matrix pth root, SIAM J. Matrix Anal. Appl. 30 (2008/09), no. 4, 1445–1462. MR 2486848 (2010e:65062) [Ian11] , The derivative of the exponential and logarithm of a matrix, 2011. [JCHK95] David J. Jeffrey, Robert M. Corless, David E. G. Hare, and Donald E. Knuth, Sur l’inversion de yαey au moyen des nombres de Stirling associ´es, C. R. Acad. Sci. Paris BIBLIOGRAPHY 57

S´er. I Math. 320 (1995), no. 12, 1449–1452. MR 1340051 (97a:11032) [JHC96] D. J. Jeffrey, D. E. G. Hare, and Robert M. Corless, Un- winding the branches of the Lambert W function, Math. Sci. 21 (1996), no. 1, 1–7. MR 1390696 (97e:33003) [Scha] Nicol N. Schraudolph, Matlab implementation of the Lam- bert W function, ftp://ftp.idsia.ch/pub/nic/W.m. [Schb] , Personal communication. [Sti87] Eberhard Stickel, On the Fr´echet derivative of matrix func- tions, Linear Algebra Appl. 91 (1987), 83–88. MR 888481 (88d:15038) [Wri59] E. M. Wright, Solution of the equation zez = a, Proc. Roy. Soc. Edinburgh (A)65 (1959), 193–203.