BIT Numer Math (2009) 49: 3–19 DOI 10.1007/s10543-009-0213-4

Using dual techniques to derive componentwise and mixed condition numbers for a linear function of a linear least squares solution

Marc Baboulin · Serge Gratton

Received: 22 July 2008 / Accepted: 21 January 2009 / Published online: 12 February 2009 © Springer Science + Business Media B.V. 2009

Abstract We prove results for adjoint operators and product norms in the framework of Euclidean spaces. We show how these results can be used to derive condition numbers especially when perturbations on data are measured componen- twise relatively to the original data. We apply this technique to obtain formulas for componentwise and mixed condition numbers for a linear function of a linear least squares solution. These expressions are closed when perturbations of the solution are measured using a componentwise or the infinity norm and we get an upper bound for the Euclidean norm.

Keywords Dual norm · Adjoint operator · Componentwise perturbations · Condition number · Linear least squares

Mathematics Subject Classification (2000) 65F35

1 Introduction

The condition number of a problem measures the effect on the solution of small changes in the data and this quantity depends on the norms chosen for measuring

Communicated by Lars Eldén. M. Baboulin () Departamento de Matemática, Universidade de Coimbra, 3001-454 Coimbra, Portugal e-mail: [email protected]

M. Baboulin University of Tennessee, Knoxville, TN 37996-3450, USA

S. Gratton Centre National d’Etudes Spatiales and CERFACS, 31057 Toulouse Cédex 1, France e-mail: [email protected] 4 M. Baboulin, S. Gratton perturbations on data and solution. A commonly used approach consists of measuring these perturbations globally using classical norms (e.g. ·p, p = 1, 2, ∞ or · F ) resulting in so-called normwise condition numbers. But as mentioned in [20], using norms for measuring matrix perturbations has several drawbacks. First, it does not give information about how the perturbation is distributed among the data. Second, as pointed out in [7, p. 31], errors can be overestimated by a normwise sensitivity analysis when the problem is badly scaled. In the componentwise analysis, perturbations are measured using metrics that take into account the structure of the matrix like sparsity or scaling. With such metrics, we could expect to minimize the amplification of perturbations resulting in a mini- mal condition number. Componentwise metrics are well suited for that because the perturbations on data are measured relatively to a given tolerance for each compo- nent of the data. For instance, if E is the matrix whose nonnegative entries are the individual tolerance for errors in components of A, then the componentwise relative error bound ω for a perturbation ΔA will be such that |ΔA|≤ωE (here partial order- ing and absolute values must be understood component by component). A common choice for E is E =|A|. Componentwise error analysis that provide us with exact expressions or bounds for componentwise condition numbers can be found for example in [5, 8, 19, 26, 28–30] for linear systems and in [4, 6, 9, 19, 23] for linear least squares. In particular, com- ponentwise backward errors are commonly used as stopping criteria in iterative re- finement for solving linear systems (see e.g. [3]) or linear least squares (see e.g. [10]). For the full rank linear least squares problem (LLSP), generalizing [16], [2]gives exact formulas for the conditioning of a linear transformation of the LLSP solu- tion when the perturbations of data and solution are measured normwise. Our ob- jective in this paper is to obtain similar quantities when perturbations on data are, contrary to [2], measured componentwise and the perturbations on the solution are measured either componentwise or normwise resulting in respectively component- wise and mixed condition numbers. In [17], a technique is presented to compute or estimate condition numbers using adjoint formulas. The results are presented in Banach spaces, and make use of the corresponding duality results in order to derive normwise condition numbers. In our paper, we show that these dual techniques are easy and helpful when presented in the framework of Euclidean spaces. In particular, as also mentioned in [17], they enable us to derive condition numbers by maximizing a linear function over a space of smaller dimension than the data space. We show in this paper that dual techniques can be used to derive condition num- bers when perturbations on the data are measured componentwise and we apply this method to LLSP. We propose exact formulas for the conditioning of LT x, linear func- tion of the LLSP solution when perturbations on data are measured componentwise and perturbations on the solution are measured either componentwise or normwise. Studying the condition number of LT x is relevant for instance when there is a dispar- ity between the conditioning of the solution components or when the computation of the least squares solution involves auxiliary variables without physical significance. The situations of common interest correspond to the cases where L is the identity ma- trix (condition number of an LLSP solution), a canonical vector (condition number of Componentwise condition of linear least squares 5 one solution component), or a projector, when we are interested in the sensitivity of a subset of the solution components. The conditioning of a nonlinear function of an LLSP solution can also be obtained by replacing in the condition number expression LT by the Jacobian matrix at the solution. When L is the identity matrix and when perturbations on the solution are measured using the infinity norm or a component- wise norm, we obtain the exact formula given in [9]. By considering the special case where we have a residual equal to zero, we obtain componentwise and mixed condi- tion numbers for LT x where x is the solution of a consistent linear system. When L is the identity matrix, these quantities recover the expressions known from [21, p. 123] and [18].

2 Deriving condition numbers using dual techniques

2.1 Preliminary results on dual norms and adjoint operators

We consider a linear mapping J : E → G where the Euclidean spaces E and G   are equipped respectively with any norms .E, .G and scalar products ., . E and   ., . G. Note that the norms .E and .G may not be, and in general won’t be, the     particular norms induced by the scalar products ., . E and ., . G.

Definition 1 The adjoint operator of J , J ∗ : G → E is defined by   = ∗ y,Jx G J y,x E , where (x, y) ∈ E × G. We also define the dual norm .E∗ of .E by   x,u E xE∗ = max , u =0 uE and define similarly the dual norm .G∗.

For the common vector norms, the dual norms with respect to the canonical scalar product in Rn are well-known, and are given by:

·1∗ = ·∞ and ·∞∗ =·1 and ·2∗ =·2. For the matrix norms in Rm×n with respect to the scalar product A,B = T trace(A B),wehaveA2∗ =σ(A)1 (Lemma 3.5 of [31, p. 78]) where σ(A) T = 2 is the vector containing the singular values of A, and since trace(A A) A F we also have that AF ∗ =AF . For the linear applications mapping E to G, we denote by .E,G the induced by the norms .E and .G. Likewise, the norm .G∗,E∗ is the op- erator norm for linear applications mapping G to E and induced by the dual norms .G∗ and .E∗. Then we have the following theorem.

Theorem 1 ∗ J E,G =J G∗,E∗. 6 M. Baboulin, S. Gratton

Proof

JxG J E,G = max x∈E xE Jx,u = max G we use the “duality theorem” [22, p. 287] x∈E,u∈G uG∗xE 1 x,J∗u = max max E u∈G uG∗ x∈E xE ∗ J u ∗ = max E u∈G uG∗ ∗ =J G∗,E∗. 

∗ As mentioned in [17], it may be interesting to compute J G∗,E∗ (instead of J E,G) when G∗ is an Euclidean space of lower dimension than E because it im- plies a maximization over a space of smaller dimension. We now consider a product space E = E1 ×···×Ep where each Euclidean space   Ei is equipped with the norm . Ei and the scalar product ., . Ei . In E, we consider the following scalar product (u1,...,up), (v1,...,vp) = u1,v1 +···+ up,vp , E1 Ep and the product norm

= (u1,...,up) ν ν( u1 E1 ,..., up Ep ), where ν is an absolute norm on Rp (i.e. such that ν(|x|) = ν(x) ∀x ∈ Rp,[24, p. 367]). We denote by ν∗ the dual of ν with respect to the canonical inner-product p of R and we are interested in determining the dual .ν∗ of the product norm .ν with respect to the scalar product of E. Then we have the following result.

Theorem 2 The dual of the product norm can be expressed as

= (u1,...,up) ν∗ ν∗( u1 E1∗,..., up Ep∗).

ui ,vi E Proof From u ∗ = max = i ,wehave∀v ∈ E u ,v  ≤ i Ei vi 0 vi E i i i i Ei i ui Ei ∗ vi Ei and then

p   (u1,...,up)ν∗ = max ui,vi E (v ,...,v ) =1 i 1 p ν i=1 p ≤ max ui Ei ∗ vi Ei (v ,...,v ) =1 1 p ν i=1 Componentwise condition of linear least squares 7 ⎛ ⎞ ⎛ ⎞ T u1 E1∗ v1 E1 ⎜ . ⎟ ⎜ . ⎟ = max ⎝ . ⎠ ⎝ . ⎠ ν(v ,...,v )=1 . . 1 E1 p Ep up Ep∗ vp Ep = ν∗( u1 E1∗,..., up Ep∗).

So, we have shown that ν∗( u1 E1∗,..., up Ep∗) is an upper-bound for the dual of the product norm. ∀   = ∗ Now let w1,...,wp be nonzero vectors such that i, ui,wi Ei ui Ei wi Ei (i.e. choose wi that attains the maximum in the definition of the dual norm ui Ei ∗). Then

ν∗( u1 E1∗,..., up Ep∗) ⎛ ⎞ ⎛ ⎞ T u1 E1∗ α1 w1 E1 ⎜ . ⎟ ⎜ . ⎟ = max ⎝ . ⎠ ⎝ . ⎠ ν(α w ,...,α w )=1 . . 1 1 E1 p p Ep up Ep∗ αp wp Ep

is attained for a particular (α1,...,αp) such that

p

∗ ∗ ∗ =   ν ( u1 E1 ,..., up Ep ) αi ui,wi Ei , i=1 = with ν(α1 w1 E1 ,...,αp wp Ep ) 1.

Using the fact that ν is an absolute norm, we get

= (α1w1,...,αpwp) ν ν( α1w1 E1 ,..., αpwp Ep ) = | | | | ν( α1 w1 E1 ,..., αp wp Ep ) = ν(α1 w1 E1 ,...,αp wp Ep ) = 1.

Thus

p   (u1,...,up)ν∗ = max ui,vi E (v ,...,v ) =1 i 1 p ν i=1 p

≥ ui,α wi i Ei i=1 = ν∗( u1 E1∗,..., up Ep∗), which concludes the proof.  8 M. Baboulin, S. Gratton

2.2 Application to condition numbers

We represent here a given problem as a mapping g defined by x = g(y) where x ∈ G is the solution of the problem corresponding to the data y ∈ E. The data space E and the solution space G are Euclidean spaces equipped respectively with the norms ·E and ·G. Then the condition number of the problem is a measure of the sensitivity of the mapping g to perturbations. Following [27], if g is Fréchet-differentiable in a neighborhood of y,theabsolute condition number of g at the point y ∈ E is the quantity K(y) defined by

K(y) =g (y)E,G, where ·E,G is the operator norm induced by the norms ·E and ·G. Then we also have

K(y) = max g (y).zG. (2.1) zE =1 If g(y) is nonzero, we can define the relative condition number of g at y ∈ E as

(rel) K (y) = K(y)yE/g(y)G. (2.2) The expression of K(y) corresponds to the operator norm of the linear operator g (y). Then using Theorem 1 and with the same notations as Sect. 2.1, K(y) can be ex- pressed as ∗ K(y) = max g (y).ΔyG = max g (y) .xE∗. (2.3) ΔyE =1 xG∗=1 We can summarize the method for deriving condition numbers using dual techniques as follows:

1. choose the norms ·E and ·G respectively on the solution and the data spaces and determine their dual norms, 2. determine the derivative g (y) of the mapping that represents our problem, 3. determine the adjoint operator g (y)∗ of the linear operator g (y), = ∗ 4. compute K(y) maxxG∗=1 g (y) .x E∗. Let us now consider the case where we use a componentwise metric on a data space n n n E = R . For a given y ∈ R , we denote by EY the subset of all elements Δy ∈ R with Δyi = 0 whenever yi = 0. Then in a componentwise perturbation analysis, we measure the perturbation Δy ∈ EY of y using the norm = { | |≤ | | = } Δy c min ω, Δyi ω yi ,i 1,...,n . (2.4) · c is called the componentwise relative norm with respect to y. As mentioned in [12], we can extend the definition in (2.4) to the case where Δyi = 0 while yi = 0 =∞ by having the convention Δy c for those Δy. Let us determine the dual norm of the componentwise norm. First we observe that, for any Δy ∈ EY ,wehave | | | | = Δyi = = Δyi = Δy c max ,yi 0 ,yi 0 . |yi| |yi| ∞ Componentwise condition of linear least squares 9

Then we can apply Theorem 2 by considering the product space E = Rn, with each |Δy | E = R, ν = ·∞ and Δy = i , if y = 0. We have i i Ei |yi | i

|Δyi.z| ΔyiE ∗ = max = max |Δyi.z|/(|z|/|yi |) =|Δyi||yi|, i = = z 0 z Ei z 0 and also ·∞∗ =·1. Then we get

Δyc∗ =(|Δy1||y1|,...,|Δyn||yn|)1. (2.5)

If there are zero components in y, we observe that, due to the condition ΔyE = 1 in (2.3), the definition of K(y) is the same whether Δy is in EY or not. Indeed, if ∈ =∞ Δy / EY , then with the convention given previously we have Δy c and the perturbation Δy is not taken into account in the computation of K(y). As a result, the zero components of y should not be explicitly excluded as data. Using (2.3), K(y) will be obtained with ∗ K(y) = max g (y) .xc∗, (2.6) xG∗=1 where ·c∗ is expressed by (2.5). Note that the norm on the solution space G has not been chosen yet. Following the terminology given in [13] and also used in [9], K(y) is referred to as componentwise (resp. mixed) condition number when ·G is componentwise (resp. normwise). In Sect. 3, we consider different norms for the solution space but the norm on the data space is always component- wise.

3 Componentwise and mixed condition numbers for a linear function of an LLSP solution

3.1 Least squares conditioning

m We consider the linear least squares problem minx∈Rn Ax − b2, where b ∈ R and A ∈ Rm×n is a matrix of full column rank n. Then the unique solution x is expressed by x = (AT A)−1AT b = A†b, where A† denotes the pseudo-inverse of A.Inthere- mainder, the matrix I is the identity matrix and ei may denote the ith canonical vector of Rm or Rn. We study here the sensitivity of a linear function of the LLSP solution to pertur- bations in the data A an b, which corresponds to the function

g(A,b) = LT (AT A)−1AT b = LT A†b, where L is an n × k matrix, with k ≤ n. In the most common case, L is the identity matrix (conditioning of the solution) but L can also be for instance a canonical vector of Rn if we are interested in the conditioning of one component of x. In the sequel, we suppose that L is not numerically perturbed. 10 M. Baboulin, S. Gratton

Since A has full rank n, g is continuously F-differentiable in a neighborhood of (A, b) and we denote by J = g (A, b) its derivative. Let B ∈ Rm×n and c ∈ Rm. Using the chain rules of composition of derivatives, we get

g (A, b).(B, c) = LT (AT A)−1BT (b − A(AT A)−1AT b) − LT (AT A)−1AT B(AT A)−1AT b + LT A†c i.e.

J(B,c)= g (A, b).(B, c) = LT (AT A)−1BT r − LT A†Bx + LT A†c, where r = b − Ax is the residual vector.

Remark 1 If we define x(A,b) = A†b, the case where g(A,b) = h(x(A, b)), with h being a differentiable nonlinear function mapping Rn to Rk is also covered because we have g (A, b).(B, c) = h (A†b).(x (A, b).(B, c)), and LT would correspond to the Jacobian matrix h (A†b).

We can find in [2] closed formulas, bounds and statistical estimates for the condi- tioning of g(A,b) when perturbations on the data are measured normwise using the weighted norm

1 2 2 + 2 2 2 α A For2 β b 2 ,α,β>0. Here we are interested in the case where perturbations of A and b are measured componentwise.

3.2 Choice of norms

We consider the following norms and scalar products:

– for any vector u, ·1, ·2 and ·∞ are the vector norm corresponding to the 1 · = | | · = 2 2 · = | | classical definitions 1 i ui , 2 ( i ui ) and ∞ maxi ui . – on the solution space Rk, we use the scalar product x,y = xT y, whereas the norm · can be ·2, ·∞ or a componentwise norm with respect to the solution LT x, and its dual norm is denoted by ·∗. – on the data space Rm×n × Rm, we use the scalar product (A, b), (B, c) = trace(AT B) + bT c, and the componentwise relative norm (as given e.g. in [21, p. 122]): (ΔA, Δb) c = min{ω,|ΔA|≤ω|A|, |Δb|≤ω|b|} where absolute val- ues and inequalities between matrices or vectors are understood to hold compo- nentwise.

3.3 Determination of the adjoint operator

The following proposition gives us the expression of the adjoint operator of J . Componentwise condition of linear least squares 11

Proposition 1 The adjoint of J , Fréchet derivative of a linear function of the full rank least squares solution,

J : Rm×n × Rm −→ Rk, T T −1 T T † T † (3.1) (B, c) −→ L (A A) B r − L A Bx + L A c = J1B + J2c is J ∗ : Rk −→ Rm×n × Rm, (3.2) u −→ (ruT LT (AT A)−1 − A†T LuxT ,A†T Lu).

Proof Using (3.1), we obtain for the first part of the adjoint of the derivative J ,

k T T T −1 T T † ∀u ∈ R , we have, u, J1B = u (L (A A) B r − L A Bx) − = trace(LT (AT A) 1BT ruT ) − trace(LT A†BxuT ) − = trace(ruT LT (AT A) 1BT ) − trace(xuT LT A†B) − = trace((ruT LT (AT A) 1)T B)− trace(xuT LT A†B) − = trace(((ruT (AT A) 1)T − xuT LT A†)B) = ruT LT (AT A)−1 − A†T LuxT ,B ∗ = J1 u, B .

For the second part of the adjoint of the derivative J ,wehave

k T T † ∀u ∈ R u, J2c = u L A c = A†T Lu, c ∗ = J2 u, c . 

As already mentioned in Sect. 2.1, the advantage of working with the adjoint J ∗ here is that the operator norm computation, involved in the condition number de- finition, implies a maximization over a of dimension k, instead of a maximization over a vector space of dimension mn + m for J . Indeed, using (2.6), the condition number of LT x is given by

∗ K(L,A,b) = max J(ΔA,Δb)= max J (u)c∗. (ΔA,Δb)c=1 u∗=1

3.4 Expression of the condition number

The following theorem provides us with an explicit formula for K(L,A,b) thanks to the use of J ∗. In the remainder, vec is the operator that stacks the columns of a matrix into a long vector, ⊗ denotes the Kronecker product of two matrices [15], and for any m-by-n matrix Y = (yij ), DY denotes the diagonal matrix diag(vec(Y )) = diag(y11,...,ym1,...,y1n,...,ymn). 12 M. Baboulin, S. Gratton

Theorem 3 The condition number of LT x, linear function of the full rank least squares solution, is expressed by

T K(L,A,b) =[VDA,WDb] ∗,1, − where V = (LT (AT A) 1) ⊗ rT − xT ⊗ (LT A†), W = LT A†,

· · · and ∗,1 is the subordinate to the vector norms ∗ and 1 defined in Sect. 3.2.

Proof If (Δaij ) and (Δbi) are the entries of ΔA and Δb then, using (2.5), we have (ΔA, Δb)c∗ = |Δaij ||aij |+ |Δbi||bi|. i,j i

Then, using Proposition 1, we get

n m m ∗ T T T −1 †T T †T J (u)c∗ = |aij ||(ru L (A A) − A Lux )ij |+ |bi||(A Lu)i| j=1 i=1 i=1 n m m T T −1 T †T T †T = |aij ||(riej (A A) − xj ei A )Lu|+ |bi||ei A Lu| j=1 i=1 i=1 n m m = | T |+ | T | vij aij u wi biu , j=1 i=1 i=1

T T −1 T † T T −1 T T † where vij = (riL (A A) ej − xj L A ei) = (L (A A) ej r − xj L A )ei and T † wi = L A ei . Note that, in the column vectors vij and wi , the vectors ej and ei are canonical Rn Rm T vectors from different spaces (respectively and ). The quantities vij aij u and T ∗ wi biu are scalars and J (u) c∗ can be interpreted as the 1-norm of a vector as ∗ T J (u)c∗ = (v11a11,...,vm1am1,...,v1na1n,...,vmnamn,w1b1,...,wmbm) u 1 T = [VDA,WDb] u , 1 where V is the k-by-mn matrix whose columns are the vij ordered in j first and W is the k-by-m matrix whose columns are the wi . V can be expressed as T T −1 T T † T T −1 T T † V = L (A A) e1r − x1L A ,...,L (A A) enr − xnL A = (LT (AT A)−1) ⊗ rT − xT ⊗ (LT A†) ,

T † T † T † and we also have W = (L A e1,...,L A em) = L A . Componentwise condition of linear least squares 13

Finally we get ∗ T J (u)c∗ = [VDA,WDb] u , 1 and T T K(L,A,b) = max [VDA,WDb] u =[VDA,WDb] ∗,1.  u∗=1 1

Depending on the norm chosen for the solution space Rk, we can have different expressions for K(L,A,b). In the following section, we apply Theorem 3 to obtain expressions of K(L,A,b) for some commonly used norms. Using the terminology given in Sect. 2.2, K(L,A,b) will be referred to as mixed (resp. componentwise) condition number if the perturbations of the solution are measured normwise (resp. componentwise).

3.5 Condition number expressions for some norms on the solution space

3.5.1 Use of the infinity norm on the solution space

· = · · = · If ∞, then ∗ 1 and we have T K∞(L,A,b)= [VDA,WDb] = [VDA,WDb]∞ . 1

Then, with the notations used in the proof of Theorem 3, we get

K∞(L,A,b)

= (v11a11,...,vm1am1,...,v1na1n,...,vmnamn,w1b1,...,wmbm)∞

=|v11a11|+···+|vm1am1|+···+|v1na1n|

+···+|vmnamn|+|w1b1|+···+|wmbm|∞

= |V |vec(|A|) +|W||b|∞ , and thus T T −1 T T T † T † K∞(L,A,b)= |(L (A A) ) ⊗ r − x ⊗ (L A )|vec(|A|) +|L A ||b| . ∞ (3.3)

Remark 2 For small problems, Matlab has a routine kron that enables us to compute K∞(L,A,b)using the syntax:

Kinf = norm(abs(kron(C,r’)-kron(x’,L’*pinv(A)))*vec(abs(A)) +abs(L’*pinv(A))*abs(b),inf), with vec=inline(’A(:)’,’A’). 14 M. Baboulin, S. Gratton

We also observe that K∞(L,A,b)can also be written n T † K∞(L,A,b)= [|v1j |,...,|vmj |]|A(:,j)|+|L A ||b| , j=1 ∞

T T −1 T T and since vij = L (A A) (ej r − xj A )ei , we get [|v1j |,...,|vmj |] = T T −1 T T |L (A A) [ej r − xj A ]|. Then we have n T T −1 T T T † K∞(L,A,b)= |L (A A) (ej r − xj A )||A(:,j)|+|L A ||b| . (3.4) j=1 ∞

Equation (3.4) has the advantage to avoid forming the Kronecker products and then it requires less memory. Moreover, if the LLSP is solved using a direct method, the R factor of the QR decomposition of A (or equivalently in exact arithmetic, the Cholesky factor of AT A) might be available and we have (AT A)−1 = R−1R−T . T −1 T T Then the computation of (A A) (ej r − xj A ) can be performed by solving suc- cessively 2 triangular systems with multiple right-hand sides and (3.4) can be easily implemented using LAPACK [1] and Level 3 BLAS [11] routines.

When L is the identity matrix, the expression given in (3.3) is the same as the · one established in [9] (using the norm · = ∞ on the solution space). Note that x∞ bounds of K∞(I,A,b)arealsogivenin[7, p. 34] and in [21, p. 384]. When m = n (case of a linear system Ax = b), we obtain, using the formulas |A ⊗ B|=|A|⊗|B| and vec(AY B) = (BT ⊗ A)vec(Y ): T T −1 T −1 K∞(L,A,b)= |x ⊗ (L A )|vec(|A|+|L A ||b| ∞ − − = vec(|LT A 1||A||x|) +|LT A 1||b| . ∞

But |LT A−1||A||x| is a vector and is then equal to its vec operator and then we obtain T −1 K∞(L,A,b)= |L A |(|A||x|+|b|) , (3.5) ∞ which generalizes the condition number given in [21, p. 123] to the case where L is · not the identity (using the norm · = ∞ on the solution space). x∞

3.5.2 Use of the 2-norm on the solution space

· = · · = · If 2 then ∗ 2 and, using Theorem 3, the mixed condition number of LT x is T K2(L,A,b)=[VDA,WDb] 2,1, · · · where 2,1 is the matrix norm subordinate to the vector norms 2 and 1. Componentwise condition of linear least squares 15

= = As mentioned in [14, p. 56], we have for any matrix B, B 2√,1 max u 2 1 Bu 1 = ∈ Rk ≤ Bu 1 for some u having unit 2-norm. Since u 1 k u 2, we get √ = ≤ ≤ B 2,1 Bu 1 u 1 B 1 k B 1 .

T Applying this inequality to B =[VDA,WDb] , we obtain, √ T K2(L,A,b)≤ k [VDA,WDb] , 1 and then we have the following upper bound for the mixed condition number of LT x √ K2(L,A,b)≤ kK∞(L,A,b), (3.6) and this upper bound can be computed using (3.3)or(3.4).

3.5.3 Use of a componentwise norm on the solution space

We consider here a componentwise norm on the solution space defined by

T u=min{ω,|ui|≤ω|(L x)i|,i= 1,...,k}.

Then, following (2.5), the dual norm is expressed by T T u∗ = (|(L x)1||u1|,...,|(L x)k||uk|) . 1 With the convention mentioned in Sect. 2.2, we consider perturbations u in the so- = T = = lution space such that ui 0 whenever (L x)i 0. Let DLT x diag(α1,...,αk) T T be the k-by-k diagonal matrix such that αi = (L x)i if (L x)i = 0 and αi = 1 otherwise. Then if we apply Theorem 3 and if we perform the change of variable = T u DLT xu, the componentwise condition number of L x is = [ ]T −1 Kc(L,A,b) max VDA,WDb D T u u =1 L x 1 1 −1 = D [VDA,WDb] , LT x ∞ that can be computed using the following variant of (3.4):

K (L,A,b) c n = | −1 T T −1 T − T || : |+| −1 T †|| | D T L (A A) (ej r xj A ) A( ,j) D T L A b . L x L x j=1 ∞

Using a demonstration similar to that of (3.3), this expression simplifies to −1 Kc(L,A,b)= |D |(|V |vec(|A|) +|W||b|) , (3.7) LT x ∞ 16 M. Baboulin, S. Gratton and when m = n (case of a linear system Ax = b), we obtain −1 T −1 Kc(L,A,b)= |D ||L A |(|A||x|+|b|) , (3.8) LT x ∞ which is the condition number given in [18] when L = I .

3.5.4 Numerical example

Let us consider the following example. ⎛ ⎞ 11ε2 ⎛ ⎞ ⎜ 2 ⎟ ε = ⎜ ε 0 ε ⎟ = ⎝ ⎠ = + −5 A ⎝ 2 ⎠ ,xε and b Ax 10 v, 0 εε 1 ε2 ε2 2 ε

= − + 4 − ε4 − ε4 − 2 + ε3 T ∈ T where v ( ε ε , 1 2 , 1 2 , ε 2 ) (v Ker(A )). The vector x denotes − ˜ the exact solution of the LLSP minx∈R3 Ax b 2 and x is the computed solution with Matlab (augmented system method) using a machine precision 2.22 × 10−16. Note that A is inspired from a Läuchli [25] matrix in which we introduce a term ε2 to have a weak coupling between the variables (x1,x2) and x3. In our experiments, we choose ε = 10−7 and we obtain the following relative errors on the components of x

|x −˜x | − |x −˜x | − 1 1 = 2.1 × 10 7, 2 2 = 3.1 × 10 8 and |x1| |x2|

|x −˜x | − 3 3 = 1.9 × 10 16. |x3| Due to the discrepancy between the errors in the solution components, the condition number of the global solution will not be give a good idea of the sensitivity of the T problem and it is relevant to consider separately the condition numbers of L1 x and L T x where 2 ⎛ ⎞ 10 ⎝ ⎠ T L1 = 01, and L2 = (0, 0, 1) . 00 We report in Table 1 the mixed and componentwise relative condition numbers, as defined in (2.2), for x, (x1,x2) and x3, corresponding respectively to the case L = I , L = L1, L = L2. We consider a random perturbation (ΔA, Δb) of small component- −10 wise norm (here (ΔA, Δb) c = 10 ). The condition numbers are computed using the formulas

(rel) K∞ = |V |vec(|A|) +|W||b|∞ / x∞ , − K(rel) = |D 1 |(|V |vec(|A|) +|W||b|) , c LT x ∞

− V W D 1 where the matrices and are defined in Theorem 3 and the matrix LT x is de- (rel) (rel) fined in Sect. 3.5.3. We also report the first order approximations of K∞ and Kc Componentwise condition of linear least squares 17

Table 1 Mixed and componentwise condition numbers for LT x

Mixed Componentwise (rel) (rel) (rel) (rel) (rel) (rel) K∞ K∞ err∞ Kc Kc errc

− − L = I 2.01.67 1.9 × 10 16 3.0 × 109 2.8 × 108 2.1 × 10 7 9 8 −7 9 8 −7 L = L1 3.0 × 10 2.8 × 10 2.1 × 10 3.0 × 10 2.8 × 10 2.1 × 10 −16 −16 L = L2 2.01.67 1.9 × 10 2.01.67 1.9 × 10

(rel) (rel) denoted respectively by K∞ and Kc . These quantities correspond to the ratio between relative errors on data and solution for this data and are given by

LT x(A + ΔA, b + Δb) − LT x(A,b)∞ (A, b) K(rel) = · c , ∞ T L x(A,b)∞ (ΔA, Δb) c LT x(A + ΔA, b + Δb) − LT x(A,b) (A, b) K(rel) = · c . c T L x(A,b) (ΔA, Δb) c

We also give the relative forward errors on LT x defined by

T ˜ − T T ˜ − T (rel) = L x L x ∞ (rel) = L x L x err∞ and errc . LT x∞ LT x

We observe in Table 1 that the exact condition numbers, which correspond to the worst case in error amplification at first order, are slightly greater than their ap- proximates (the perturbations are here small enough). As expected from perturbation analysis, large condition numbers correspond to large forward errors. Moreover, we notice that in all cases, the forward error has the same order of magnitude as the prod- uct of the exact condition number by the machine precision. This indicates that, in our example, the augmented system approach behaves like a backward stable method when using mixed and componentwise error analysis.

4 Conclusion

We proved that working on the enables us to derive condition numbers and we applied this to the case where perturbations on data are measured componentwise. By using this method, we obtained formulas for the conditioning of a linear function of the linear least squares solution for which we provided an exact expression when the perturbations of the solution are measured using the infinity or a componentwise norm and an upper bound when using the Euclidean norm. We also gave the corre- sponding expressions for the special case of consistent linear systems.

Acknowledgement We are grateful to Jack Dongarra (Innovative Computing Laboratory, University of Tennessee) for hosting the first author of this paper in his research group. 18 M. Baboulin, S. Gratton

References

1. Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide, 3 edn. SIAM, Philadelphia (1999) 2. Arioli, M., Baboulin, M., Gratton, S.: A partial condition number for linear least-squares problems. SIAM J. Matrix Anal. Appl. 29(2), 413–433 (2007) 3. Arioli, M., Demmel, J.W., Duff, I.S.: Solving sparse linear systems with sparse backward error. SIAM J. Matrix Anal. Appl. 10(2), 165–190 (1989) 4. Arioli, M., Duff, I.S., M de Rijk, P.P.: On the augmented system approach to sparse least squares problems. Numer. Math. 55, 667–684 (1989) 5. Bauer, F.L.: Genauigkeitsfragen bei der Lösung linearer Gleichungssysteme. Z. Angew. Math. Mech. 46(7), 409–421 (1966) 6. Björck, Å: Component-wise perturbation analysis and error bounds for linear least squares solutions. BIT Numer. Math. 31, 238–244 (1991) 7. Björck, Å: Numerical Methods for Least Squares Problems. SIAM, Philadelphia (1996) 8. Chandrasekaran, S., Ipsen, I.C.F.: On the sensitivity of solution components in linear systems of equa- tions. SIAM J. Matrix Anal. Appl. 16(1), 93–112 (1995) 9. Cucker, F., Diao, H., Wei, Y.: On mixed and componentwise condition numbers for Moore-Penrose inverse and linear least squares problems. Math. Comput. 76(258), 947–963 (2007) 10. Demmel, J., Hida, Y., Li, X.S., Riedy, E.J.: Extra-precise iterative refinement for overdetermined least squares problems. Technical Report EECS-2007-77, UC Berkeley (2007). Also LAPACK Working Note 188 11. Dongarra, J., Du Croz, J., Duff, I., Hammarling, S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16, 1–17 (1990) 12. Geurts, A.J.: A contribution to the theory of condition. Numer. Math. 39, 85–96 (1982) 13. Gohberg, I., Koltracht, I.: Mixed, componentwise, and structured condition numbers. SIAM J. Matrix Anal. Appl. 14(3), 688–704 (1993) 14. Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Balti- more (1996) 15. Graham, A.: Kronecker Products and Matrix Calculus with Application. Wiley, New York (1981) 16. Gratton, S.: On the condition number of linear least squares problems in a weighted Frobenius norm. BIT Numer. Math. 36(3), 523–530 (1996) 17. Grcar, J.F.: Adjoint formulas for condition numbers applied to linear and indefinite least squares. Technical Report LBNL-55221, Lawrence Berkeley National Laboratory (2004) 18. Higham, D.J., Higham, N.J.: Componentwise perturbation theory for linear systems with multiple right-hand sides. Linear Algebra Appl. 174, 111–129 (1992) 19. Higham, N.J.: Computing error bounds for regression problems. In: Brown, P.J., Fuller, W.A. (eds.) Statistical Analysis of Measurement Error Models and Applications. Contemporary Mathematics, vol. 112, pp. 195–208. American Mathematical Society, Providence (1990) 20. Higham, N.J.: A survey of componentwise perturbation theory in numerical linear algebra. In: Gautschi, W. (ed.) Mathematics of Computation 1943–1993: A Half Century of Computational Math- ematics. Proceedings of Symposia in Applied Mathematics, vol. 48, pp. 49–77. American Mathemat- ical Society, Providence (1994) 21. Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2 edn. SIAM, Philadelphia (2002) 22. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1985) 23. Kenney, C.S., Laub, A.J., Reese, M.S.: Statistical condition estimation for linear least squares. SIAM J. Matrix Anal. Appl. 19(4), 906–923 (1998) 24. Lancaster, P., Tismenetsky, M.: The Theory of Matrices, 2nd edn. Academic Press, San Diego (1985). Second edition with applications 25. Läuchli, P.: Jordan-elimination und Ausgleichung nach kleinsten Quadraten. Numer. Math. 3, 226– 240 (1961) 26. Oettli, W., Prager, W.: Compatibility of approximate solution of linear equations with given error bounds for coefficients and right-hand sides. Numer. Math. 6, 405–409 (1964) 27. Rice, J.: A theory of condition. SIAM J. Numer. Anal. 3, 287–310 (1966) Componentwise condition of linear least squares 19

28. Rohn, J.: New condition numbers for matrices and linear systems. Computing 41(1–2), 167–169 (1989) 29. Rump, S.M.: Structured perturbations Part II: componentwise distances. SIAM J. Matrix Anal. Appl. 25(1), 31–56 (2003) 30. Skeel, R.D.: Scaling for numerical stability in Gaussian elimination. J. Assoc. Comput. Mach. 26(3), 494–526 (1979) 31. Stewart, G.W., Sun, J.: Matrix Perturbation Theory. Academic Press, San Diego (1991)