Using Dual Techniques to Derive Componentwise and Mixed Condition Numbers for a Linear Function of a Linear Least Squares Solution
Total Page:16
File Type:pdf, Size:1020Kb
BIT Numer Math (2009) 49: 3–19 DOI 10.1007/s10543-009-0213-4 Using dual techniques to derive componentwise and mixed condition numbers for a linear function of a linear least squares solution Marc Baboulin · Serge Gratton Received: 22 July 2008 / Accepted: 21 January 2009 / Published online: 12 February 2009 © Springer Science + Business Media B.V. 2009 Abstract We prove duality results for adjoint operators and product norms in the framework of Euclidean spaces. We show how these results can be used to derive condition numbers especially when perturbations on data are measured componen- twise relatively to the original data. We apply this technique to obtain formulas for componentwise and mixed condition numbers for a linear function of a linear least squares solution. These expressions are closed when perturbations of the solution are measured using a componentwise norm or the infinity norm and we get an upper bound for the Euclidean norm. Keywords Dual norm · Adjoint operator · Componentwise perturbations · Condition number · Linear least squares Mathematics Subject Classification (2000) 65F35 1 Introduction The condition number of a problem measures the effect on the solution of small changes in the data and this quantity depends on the norms chosen for measuring Communicated by Lars Eldén. M. Baboulin () Departamento de Matemática, Universidade de Coimbra, 3001-454 Coimbra, Portugal e-mail: [email protected] M. Baboulin University of Tennessee, Knoxville, TN 37996-3450, USA S. Gratton Centre National d’Etudes Spatiales and CERFACS, 31057 Toulouse Cédex 1, France e-mail: [email protected] 4 M. Baboulin, S. Gratton perturbations on data and solution. A commonly used approach consists of measuring these perturbations globally using classical norms (e.g. ·p, p = 1, 2, ∞ or · F ) resulting in so-called normwise condition numbers. But as mentioned in [20], using norms for measuring matrix perturbations has several drawbacks. First, it does not give information about how the perturbation is distributed among the data. Second, as pointed out in [7, p. 31], errors can be overestimated by a normwise sensitivity analysis when the problem is badly scaled. In the componentwise analysis, perturbations are measured using metrics that take into account the structure of the matrix like sparsity or scaling. With such metrics, we could expect to minimize the amplification of perturbations resulting in a mini- mal condition number. Componentwise metrics are well suited for that because the perturbations on data are measured relatively to a given tolerance for each compo- nent of the data. For instance, if E is the matrix whose nonnegative entries are the individual tolerance for errors in components of A, then the componentwise relative error bound ω for a perturbation ΔA will be such that |ΔA|≤ωE (here partial order- ing and absolute values must be understood component by component). A common choice for E is E =|A|. Componentwise error analysis that provide us with exact expressions or bounds for componentwise condition numbers can be found for example in [5, 8, 19, 26, 28–30] for linear systems and in [4, 6, 9, 19, 23] for linear least squares. In particular, com- ponentwise backward errors are commonly used as stopping criteria in iterative re- finement for solving linear systems (see e.g. [3]) or linear least squares (see e.g. [10]). For the full rank linear least squares problem (LLSP), generalizing [16], [2]gives exact formulas for the conditioning of a linear transformation of the LLSP solu- tion when the perturbations of data and solution are measured normwise. Our ob- jective in this paper is to obtain similar quantities when perturbations on data are, contrary to [2], measured componentwise and the perturbations on the solution are measured either componentwise or normwise resulting in respectively component- wise and mixed condition numbers. In [17], a technique is presented to compute or estimate condition numbers using adjoint formulas. The results are presented in Banach spaces, and make use of the corresponding duality results in order to derive normwise condition numbers. In our paper, we show that these dual techniques are easy and helpful when presented in the framework of Euclidean spaces. In particular, as also mentioned in [17], they enable us to derive condition numbers by maximizing a linear function over a space of smaller dimension than the data space. We show in this paper that dual techniques can be used to derive condition num- bers when perturbations on the data are measured componentwise and we apply this method to LLSP. We propose exact formulas for the conditioning of LT x, linear func- tion of the LLSP solution when perturbations on data are measured componentwise and perturbations on the solution are measured either componentwise or normwise. Studying the condition number of LT x is relevant for instance when there is a dispar- ity between the conditioning of the solution components or when the computation of the least squares solution involves auxiliary variables without physical significance. The situations of common interest correspond to the cases where L is the identity ma- trix (condition number of an LLSP solution), a canonical vector (condition number of Componentwise condition of linear least squares 5 one solution component), or a projector, when we are interested in the sensitivity of a subset of the solution components. The conditioning of a nonlinear function of an LLSP solution can also be obtained by replacing in the condition number expression LT by the Jacobian matrix at the solution. When L is the identity matrix and when perturbations on the solution are measured using the infinity norm or a component- wise norm, we obtain the exact formula given in [9]. By considering the special case where we have a residual equal to zero, we obtain componentwise and mixed condi- tion numbers for LT x where x is the solution of a consistent linear system. When L is the identity matrix, these quantities recover the expressions known from [21, p. 123] and [18]. 2 Deriving condition numbers using dual techniques 2.1 Preliminary results on dual norms and adjoint operators We consider a linear mapping J : E → G where the Euclidean spaces E and G are equipped respectively with any norms .E, .G and scalar products ., . E and ., . G. Note that the norms .E and .G may not be, and in general won’t be, the particular norms induced by the scalar products ., . E and ., . G. Definition 1 The adjoint operator of J , J ∗ : G → E is defined by = ∗ y,Jx G J y,x E , where (x, y) ∈ E × G. We also define the dual norm .E∗ of .E by x,u E xE∗ = max , u =0 uE and define similarly the dual norm .G∗. For the common vector norms, the dual norms with respect to the canonical scalar product in Rn are well-known, and are given by: ·1∗ = ·∞ and ·∞∗ =·1 and ·2∗ =·2. For the matrix norms in Rm×n with respect to the scalar product A,B = T trace(A B),wehaveA2∗ =σ(A)1 (Lemma 3.5 of [31, p. 78]) where σ(A) T = 2 is the vector containing the singular values of A, and since trace(A A) A F we also have that AF ∗ =AF . For the linear applications mapping E to G, we denote by .E,G the operator norm induced by the norms .E and .G. Likewise, the norm .G∗,E∗ is the op- erator norm for linear applications mapping G to E and induced by the dual norms .G∗ and .E∗. Then we have the following theorem. Theorem 1 ∗ J E,G =J G∗,E∗. 6 M. Baboulin, S. Gratton Proof JxG J E,G = max x∈E xE Jx,u = max G we use the “duality theorem” [22, p. 287] x∈E,u∈G uG∗xE 1 x,J∗u = max max E u∈G uG∗ x∈E xE ∗ J u ∗ = max E u∈G uG∗ ∗ =J G∗,E∗. ∗ As mentioned in [17], it may be interesting to compute J G∗,E∗ (instead of J E,G) when G∗ is an Euclidean space of lower dimension than E because it im- plies a maximization over a space of smaller dimension. We now consider a product space E = E1 ×···×Ep where each Euclidean space Ei is equipped with the norm . Ei and the scalar product ., . Ei . In E, we consider the following scalar product (u1,...,up), (v1,...,vp) = u1,v1 +···+ up,vp , E1 Ep and the product norm = (u1,...,up) ν ν( u1 E1 ,..., up Ep ), where ν is an absolute norm on Rp (i.e. such that ν(|x|) = ν(x) ∀x ∈ Rp,[24, p. 367]). We denote by ν∗ the dual of ν with respect to the canonical inner-product p of R and we are interested in determining the dual .ν∗ of the product norm .ν with respect to the scalar product of E. Then we have the following result. Theorem 2 The dual of the product norm can be expressed as = (u1,...,up) ν∗ ν∗( u1 E1∗,..., up Ep∗). ui ,vi E Proof From u ∗ = max = i ,wehave∀v ∈ E u ,v ≤ i Ei vi 0 vi E i i i i Ei i ui Ei ∗ vi Ei and then p (u1,...,up)ν∗ = max ui,vi E (v ,...,v ) =1 i 1 p ν i=1 p ≤ max ui Ei ∗ vi Ei (v ,...,v ) =1 1 p ν i=1 Componentwise condition of linear least squares 7 ⎛ ⎞ ⎛ ⎞ T u1 E1∗ v1 E1 ⎜ . ⎟ ⎜ . ⎟ = max ⎝ . ⎠ ⎝ . ⎠ ν(v ,...,v )=1 . 1 E1 p Ep up Ep∗ vp Ep = ν∗( u1 E1∗,..., up Ep∗). So, we have shown that ν∗( u1 E1∗,..., up Ep∗) is an upper-bound for the dual of the product norm.