Computing Revealing QR Factorizations of

Dense Matrices

Christian H Bischof

Mathematics and Computer Science Division Argonne National Lab oratory S

Cass Ave Argonne IL bischofmcsanlgov

and

Gregorio QuintanaOrt

Departamento de InformaticaUniversidad Jaime I Campus Penyeta Roja Castellon

Spain gquintaninfujies

We develop algorithms and implementations for computing rankrevealing QR RRQR factor

izations of dense matrices First we develop an ecient blo ck algorithm for approximating an

RRQR factorization employing a windowed version of the commonly used Golub pivoting strategy

safeguarded by incremental condition estimation Second we develop eciently implementable

variants of guaranteed reliable RRQR algorithms for triangular matrices originally suggested by

Chandrasekaran and Ipsen and by Pan and Tang We suggest algorithmic improvements with re

sp ect to condition estimation termination criteria and Givens up dating By combining the blo ck

algorithm with one of the triangular p ostpro cessing steps we arrive at an ecient and reliable

algorithm for computing an RRQR factorization of a dense matrix Exp erimental results on IBM

RS and SGI R platforms show that this approach p erforms up to three times faster

than the less reliable QR factorization with column pivoting as it is currently implemented in LA

PACK and comes within of the p erformance of the LAPACK blo ck algorithm for computing

a QR factorization without any column exchanges Thus we exp ect this routine to b e useful in

many circumstances where numerical rank deciency cannot b e ruled out but currently has b een

ignored b ecause of the computational cost of dealing with it

Categories and Sub ject Descriptors G Numerical Analysis Numerical

G Mathematical Software

Additional Key Words and Phrases rankrevealing orthogonal factorization numerical rank blo ck

algorithm QR factorization leastsquares systems

This work was supp orted by the Applied and Computational Mathematics Program Advanced

Research Pro jects Agency under contracts DME and P

Bischof was also supp orted by the Mathematical Information and Computational Sciences Divi

sion subprogram of the Oce of Computational and Technology Research U S Department of

Energy under Contract WEng

Quintana also received supp ort through the Europ ean ESPRIT Pro ject GEPPCOM and

the Spanish Research Agency CICYT under grant TICC During part of this work

Quintana was a research fellow of the Spanish Ministry of Education and Science of the Valencian

Government at the Universidad Politecnicade Valencia and a visiting scientist at the Mathematics and Computer Science Division at Argonne National Lab oratory

2 

INTRODUCTION

We briey summarize the prop erties of a rankrevealing QR RRQR factorization

Let A b e an m n matrix wlog m n with singular values

n

and dene the numerical rank r of A with resp ect to a threshold as follows

r r

Also let A have a QR factorization of the form

R R

AP QR Q

R

where P is a p ermutation matrix Q has orthonormal columns R is upp er triangu

lar and R is of order r Further let A denote the twonorm condition number

of a matrix A We then say that is an RRQR factorization of A if the following

prop erties are satised

R and kR k R

r max r

Whenever there is a welldetermined gap in the singularvalue sp ectrum b etween

r

and and hence the numerical rank r is well dened the RRQR factorization

r

reveals the numerical rank of A by having a wellconditioned leading submatrix

R and a trailing submatrix R of small norm We also note that the matrix

R R

T

P

I

which can b e easily computed from is usually a go o d approximation of the

nullvectors and a few steps of subspace iteration suce to compute nullvectors

that are correct to working precision Chan and Hansen

The RRQR factorization is a valuable to ol in b ecause

it provides accurate information ab out rank and numerical nullspace Its main

use arises in the solution of rankdecient leastsquares problems for example

in geo desy Golub et al computeraided design Grandine nonlinear

leastsquares problems More the solution of integral equations Elden and

Schreiber and the calculation of splines Grandine Other applications

arise in b eam forming Bischof and Shro sp ectral estimation Hsieh et al

and regularization Hansen Hansen et al Walden

Stewart suggested another alternative to the singular value decomp osition

a complete orthogonal decomp osition called URV decomp osition This factorization

decomp oses

R R

T

V A U

R

where U and V are orthogonal and b oth kR k and kR k are of the order

r

In particular compared with RRQR factorizations URV decomp ositions employ a

general orthogonal matrix V instead of the p ermutation matrix P URV decomp o

sitions are more exp ensive to compute but they are well suited for nullspace up dat

ing RRQR factorizations on the other hand are more suited for the leastsquares

 3

setting since one need not store the orthogonal matrix V the other orthogonal

matrix is usually applied to the righthand side on the y Of course RRQR

factorizations can b e used to compute an initial URV decomp osition where U Q

and V P

We briey review the history of RRQR algorithms From the interlacing theorem

for singular values Golub and Loan Corollary we have

R k k A and Rk n k n A

min k max k

Hence to satisfy condition we need to pursue two tasks

Task Find a p ermutation P that maximizes R

min

Task Find a p ermutation P that minimizes R

max

Golub suggested what is commonly called the QR factorization with column

pivoting Given a set of already selected columns this algorithm chooses as the

next pivot column the one that is farthest away in the Euclidean norm from the

subspace spanned by the columns already chosen Golub and Loan p

P This intuitive strategy addresses task

While this greedy algorithm is known to fail on the socalled Kahan matri

ces Golub and Loan p Example it works well in practice and

forms the basis of the LINPACK Dongarra et al and LAPACK Anderson

et al a Anderson et al b implementations Recently QuintanaOrtSun

and Bischof developed an implementation of the Golub algorithm that allows

half of the work to b e p erformed with BLAS kernels Bischof also had developed

restrictedpivoting variants of the Golub strategy to enable the use of BLAS type

kernels Bischof for almost all of the work and to reduce communication cost

on distributedmemory machines Bischof

One approach to task is based in essence on the following fact which is proved

in Chan and Hansen

W

nn np

Lemma For any R IR and any W IR with a nonsingular

W

pp

W IR we have

kRn p n n p nk kRW k kW k

This means that if we can determine a matrix W with p linearly indep endent

columns all of which lie approximately in the nullspace of R ie kRW k is small

and if W is well conditioned such that W kW k is not large we

min

are guaranteed that the elements of the b ottom right p p blo ck of R will b e small

Algorithms based on computing wellconditioned nullspace bases for A include

these by Golub Klema and Stewart Chan and Foster Other

algorithms addressing task are these by Stewart and Gragg and Stewart

Algorithms addressing task include those of Chan and Hansen

and Golub Klema and Stewart In fact the latter achieves b oth task and

task and therefore reveals the rank but it is to o exp ensive in comparison with

the others

Bischof and Hansen combined a restrictedpivoting strategy with Chans algo rithm Chan to arrive at an algorithm for sparse matrices Bischof and Hansen

4 

and also developed a blo ck variant of Chans algorithm Bischof and Hansen

A Fortran implementation of Chans algorithm was provided by Reichel

and Gragg

Chans algorithm Chan guaranteed

i

p

R i i

min i

ni

nn i

and

p

ni

nn i Ri n i n

i max i

That is as long as the rank of the matrix is close to n the algorithm is guaranteed

to pro duce reliable b ounds but reliability may decrease with the rank of the matrix

Hong and Pan then showed that there exists a p ermutation matrix P such

that for the triangular factor R partitioned as in we have

jjR jj Ap r n

r

and

R A

min r

p r n

where p and p are loworder p olynomials in n and r versus an exp onential factor

in Chans algorithm

Chandrasekaran and Ipsen were the rst to develop RRQR algorithms that

satisfy and Their pap er also reviews and provides a common framework

for the previously devised strategies In particular they introduce the socalled

unication principle which says that running a task algorithm on the rows of the

inverse of the matrix yields a task algorithm They suggest hybrid algorithms that

alternate b etween task and task steps to rene the separation of the singular

values of R

Pan and Tang and Gu and Eisenstat presented dierent classes of

algorithms for achieving and addressing the p ossibility of nontermination

of the algorithms b ecause of oatingp oint inaccuracies

The goal of our work was to develop an ecient and reliable RRQR algorithm

and implementation suitable for inclusion in a numerical library such as LAPACK

Sp ecically we wished to develop an implementation that was b oth reliable and

close in p erformance to the QR factorization without any pivoting Such an imple

mentation would provide algorithm developers with an ecient to ol for addressing

p otential numerical rank deciency by minimizing the computational p enalty for

addressing potential rank deciency Our strategy involves the following ingredients

an ecient blo ck algorithm for computing an approximate RRQR factorization

based on the work by Bischof and

ecient implementations of RRQR algorithms well suited for triangular matri

ces based on the work by Chandrasekaran and Ipsen and Pan and Tang

These algorithms seemed b etter suited for triangular matrices than those

suggested by Gu and Eisenstat We nd that

 5

P I

foreach i f ng do r es ka ik end do

i

for i to minm n do

Let i pv t n b e such that r es is maximal

pv t

P i P pv t a i a pv t r es r es

pv t

i

u ai m i g enhhai m i

i

ai m i n apphhu ai m i n

i

foreach j fi ng do

p

r es ai j r es

j

j

end foreach

end for

Fig The QR Factorization Algorithm with Traditional Column Pivoting

in most cases the approximate RRQR factorization computed by the blo ck algo

rithm is very close to the desired RRQR factorization requiring little p ostpro

cessing and

the almost entirely BLAS prepro cessing algorithm p erforms considerably faster

than the QR factorization with column pivoting and close to the p erformance of

the QR factorization without pivoting

The pap er is structured as follows In the next section we review the blo ck

algorithm for computing an approximate RRQR factorization based on a restricted

pivoting approach In Section we describ e our mo dications to Chandrasekaran

and Ipsens HybridI I I algorithm and Pan and Tangs Algorithm Section

presents our exp erimental results on IBM RS and SGI R platforms In

Section we summarize our results

A BLOCK QR FACTORIZATION WITH RESTRICTED PIVOTING

In this section we describ e a blo ck QR factorization algorithm that employs a

restricted pivoting strategy to approximately compute an RRQR factorization em

ploying the ideas describ ed by Bischof

We compute Q by a sequence of Householder matrices

T

H H u I uu kuk

For any given vector x we can choose a vector u so that H ux e where e is

the rst canonical unit vector and j j kxk see for example Golub and Loan

p The application of a Householder matrix B H uA involves a

T T

matrixvector pro duct z A u and a rankone up date B A uz

Figure describ es the Golub Householder QR factorization algorithm with tra

ditional column pivoting Golub for computing the QR decomp osition of an

m n matrix A The primitive op eration u y g enhhx computes u such that

y H ux is a multiple of e while the primitive op eration B apphhu A

overwrites B with H uA

After step i is completed the values r es j i n are the length of the

j

pro jections of the j th column of the currently p ermuted AP onto the orthogonal

complement of the subspace spanned by the rst i columns of AP The values r es

j

can b e up dated easily and do not have to b e recomputed at every step although

6 

X

XX

e

X

XX BLOCK

e

d

X

XX

e

o

UP X

pivot XX

e

n

X

XX

e

e DATE

e

e

X

XX

e

Fig Restricting Pivoting for a Blo ck Algorithm

roundo errors may make it necessary to recompute r es kai m j k j

j

i n p erio dically Dongarra et al p we suppressed this detail

in line of Figure

The bulk of the computational work in this algorithm is p erformed in the apphh ker

nel which relies on matrixvector op erations However on to days cachebased

architectures ranging from workstations to sup ercomputers matrixmatrix op

erations p erform much b etter Matrixmatrix op erations are exploited by using

socalled blo ck algorithms whose toplevel unit of computation is matrix blo cks

instead of vectors Such algorithms play a central role for example in the LA

PACK implementations Anderson et al a Anderson et al b LAPACK

employs the socalled compact WY representation of pro ducts of Householder ma

trices Schreiber and Van Loan which expresses the pro duct

Q H H

nb

of a series of m m Householder matrices as

T

Q I YTY

where Y is an m nb matrix and T is an nb nb upp er triangular matrix Stable

implementations for generating Householder vectors as well as forming and applying

compact WY factors are provided in LAPACK

To arrive at a blo ck QR factorization algorithm we would like to avoid up dating

part of A until several Householder transformations have b een computed This

strategy is imp ossible with traditional pivoting since we must up date r es b efore

j

we can choose the next pivot column While we can mo dify the traditional approach

to do half of the work using blo ck transformations this is the b est we can do these

issues are discussed in detail in QuintanaOrtet al Therefore we instead

limit the scop e of pivoting as suggested in Bischof Thus we do not have

to up date the remaining columns until we have computed enough Householder

transformations to make a blo ck up date worthwhile

The idea is graphically depicted in Figure At a given stage we are done with

the columns to the left of the pivot window We then try to select the next pivot

columns exclusively from the columns in the pivot window not touching the part of

A to the right of the pivot window Only when we have combined the Householder

vectors dened by the next batch of pivot columns into a compact WY factor do

we apply this blo ck up date to the columns on the right

Since the leading blo ck of R is supp osed to approximate the large singular values

 7

of A we must b e able to guard against pivot columns that are close to the span

of columns already selected That is given the upp er triangular matrix R dened

i

v

T

by the rst i columns of Q AP and a new column determined by the new

candidate pivot column we must determine whether

R v

i

R

i

has a condition number that is larger than a threshold which denes what we

consider a rankdecient matrix

We approximate

1

3

max kR k k k R b R n

max i max i

k i

which is easy to compute To cheaply estimate R we employ incremental

min i

condition estimation ICE Bischof Bischof and Tang Given a go o d

T

x d kdk estimate b R kxk dened by a large norm solution x to R

min i

i

v

and a new column incremental condition estimation with only k ops

computes s and c s c such that

sx

R b R k k

min i min i

c

A stable implementation of ICE based on the formulation in Bischof and Tang

is provided by the LAPACK routine xLAIC ICE is an order of magnitude

cheaper than other condition estimators see for example Higham More

over it is considerably more reliable than simply using j j as an estimate for

R see for example Bischof We also dene

min i

b R

max i

bR

i

b R

min i

The restricted blo ck pivoting algorithm pro ceeds in four phases

Phase Pivoting of largest column into rst position This phase is motivated

by the fact that the norm of the largest column of A is usually a go o d estimate for

A

Phase Block QR factorization with restricted pivoting Given a desired blo ck

size nb and a window size w s w s nb we try to generate nb Householder transfor

mations by applying the Golub pivoting strategy only to the columns in the pivot

window using ICE to assess the impact of a column selection on the condition

number When the pivot column chosen from the pivot window would lead to a

leading triangular factor whose condition number exceeds we mark al l remaining

columns in the pivot window k say as rejected pivot them to the end of the

matrix generate a blo ck transformation of width not more than nb apply it to

the remainder of the matrix and then rep osition the pivot window to encompass

Here as in the sequel we use the convention that the prex x generically refers to the appropriate one of the four dierent precision instantiations SLAIC DLAIC CLAIC or ZLAIC

8 

the next w s notyetrejected columns When all columns have b een either accepted

as part of the leading triangular factor or rejected at some stage of the algorithm

this phase stops

Assuming we have included r columns in the leading triangular factor we have

R r r at this p oint computed an r r upp er triangular matrix R

r

2

that satises

bR

r

2

That is r is our estimate of the numerical rank with resp ect to the threshold at

this p oint

In our exp eriments we chose

nb

ng w s nb maxf

This choice tries to ensure a suitable pivot window and lo osens up a bit as the

matrix size increases A pivot window that is to o large will reduce p erformance

b ecause of the overhead in generating blo ck orthogonal transformations and the

larger number of unblocked op erations On the other hand a pivot window that is

to o small will reduce the pivoting exibility and thus increase the likelihoo d that

the restricted pivoting strategy will fail to pro duce a go o d approximate RRQR

factorization In our exp eriments the choice of w had only a small impact not

more than on overall p erformance and negligible impact on the numerical

b ehavior

Phase Traditional pivoting strategy among rejected columns Since phase

rejects all remaining columns in the pivot window when the pivot candidate is

rejected a column may have b een pivoted to the end that should not have b een

rejected Hence we now continue with the traditional Golub pivoting strategy on

the remaining n r columns up dating as an estimate of the condition number

This phase ends at column r say where

bR

r

3

and the inclusion of the next pivot column would have pushed the condition number

b eyond the threshold We do not exp ect many columns if any to b e selected in

this phase It is mainly intended as a cheap safeguard against p ossible failure of

the initial restrictedpivoting strategy

Phase Block QR factorization without pivoting on remaining columns The

columns not yet factored columns r n are with great probability linearly

dep endent on the previous ones since they have b een rejected in b oth phase

and phase Hence it is unlikely that any kind of column exchanges among the

remaining columns would change our rank estimate and the standard BLAS

blo ck QR factorization as implemented in the LAPACK routine xGEQRF is the

fastest way to complete the triangularization

After the completion of phase we have computed a QR factorization that

satises

bR

r 3

 9

and for any column y in R r n we have

R

r

3

b y

This result suggests that this QR factorization is a go o d approximation to an RRQR

factorization and r is a go o d estimate of the rank

However this QR factorization does not guarantee to reveal the numerical rank

correctly Thus we back up this algorithm with the guaranteed reliable RRQR

implementations introduced in the next two sections

POSTPROCESSING ALGORITHMS FOR AN APPROXIMATE RRQR FACTOR

IZATION

In Chandrasekaran and Ipsen introduced a unied framework for

RRQR algorithms and developed an algorithm guaranteed to satisfy and

and thus to prop erly reveal the rank Their algorithm assumes that the initial ma

trix is triangular and thus is well suited as a p ostpro cessing step to the algorithm

presented in the preceding section Shortly thereafter Pan and Tang intro

duced another guaranteed reliable RRQR algorithm for triangular matrices In the

following subsections we describ e our improvements and implementations of these

algorithms

The RRQR Algorithm by Pan and Tang

We implement a variant of what Pan and Tang call Algorithm Pseu

do co de for our algorithm is shown in Figure It assumes as input an upp er

R

triangular matrix R i j i j denotes a right cyclic p ermutation that ex

changes columns i and j in other words i i j j j i whereas

L

i j denotes a left cyclic p ermutation that exchanges columns i and j in

ij

other words j i i i j j In the algorithm triuA denotes

the upp er triangular factor R in a QR factorization A QR of A As can b e seen

from Figure we use this notation as shorthand for retriangularizations of R after

column exchanges

p

k the algorithm is Given a value for k and a socalled f factor f

guaranteed to halt and pro duce a triangular factorization that satises

f

p

A R

k min

k n k

p

k n k

R A

max k

f

Our implementation incorp orates the following features

Incremental condition estimation is used to arrive at estimates for smallest

singular values and vectors Thus line and v line of Figure can b e

computed inexp ensively from u line The use of ICE signicantly reduces

implementation cost

The QR factorization up date line must b e p erformed only when the iftest

line is false Thus we delay it if p ossible

10 

Algorithm PTMfk

i k accepted col I

u left singular vector corresp onding to R k k

min

col n k do while accepted

R R

R triuR

k i k i

Compute R k k

min

if f jRk k j then

accepted col accepted col

else

v right singular vector corresp onding to

Find index q q k such that jv j max jv j

q

j j

L L

R triuR accepted col

q k q k

u left singular vector corresp onding to R k k

min

end if

if i n then i k else i i end if

end while

Fig Variant of PanTang RRQR Algorithm

For the algorithm to terminate all columns need to b e checked and no new

p ermutations must o ccur In Pan and Tangs algorithm rechecking of columns

after a p ermutation always starts at column k We instead b egin checking

at the column right after the one that just caused a p ermutation Thus we

rst concentrate on the columns that have not just b een worked over

The left cyclic shift p ermutes the triangular matrix into an upp er Hessenberg

form which is then retriangularized with Givens rotations Applying Givens

rotations to rows of R in the obvious fashion as done for example in Rei

chel and Gragg is exp ensive in terms of data movement b ecause of the

columnoriented nature of Fortran data layout Thus we apply Givens rotations

in an aggregated fashion up dating matrix strips R j b j b j b of

width b with al l previously computed Givens rotations

Similarly the right cyclic shift introduces a spike in column j which is elim

inated with Givens rotations in a b ottomup fashion To aggregate Givens

rotations we rst compute all rotations only touching the spike and the di

agonal of R and then apply all of them one blo ck column at a time In our

exp eriments we choose the width b of the matrix strips to b e the same as the

blo cksize nb of the prepro cessing

Compared with a straightforward implementation of Pan and Tangs Algorithm

improvements through on average decreased runtime by a factor of ve on

matrices on an Alliant FX When retriangularizations were frequent

improvement had the most noticeable impact resulting in a twofold to fourfold

p erformance gain on matrices of order and on an IBM RS

Pan and Tang introduced the f factor to prevent cycling of the algorithm The

higher f is the tighter are the b ounds in and and the b etter the approx

imations to the k and k st singular values of R However if f is to o large it

introduces more column exchanges and therefore more iterations and b ecause of

p

k roundo errors it might present convergence problems We used f

 11

Algorithm HybridI I Isffk

I

rep eat

GolubIsffk

GolubIsffk

ChanI Isffk

ChanI Isffk

until none of the four subalgorithms mo died the column ordering

Fig Variant of ChandrasekaranIpsen HybridI I I algorithm

Algorithm GolubIsffk

Find smallest index j k j n such that

kRk j j k max kRk i ik

k in

if f kRk j j k j Rk k j then

R R

R triuR

k j k j

end if

Fig ffactor Variant of GolubI Algorithm

in our work

The RRQR Algorithm by Chandrasekaran and Ipsen

Chandrasekaran and Ipsen introduced algorithms that achieve b ounds and

with f We implemented a variant of the socalled HybridI I I algorithm

pseudo co de for which is shown in Figures

Compared with the original HybridI I I algorithm our implementation incorp o

rates the following features

We employ the ChanI I strategy an O n algorithm instead of the socalled

StewartII strategy an O n algorithm b ecause of the need for the inversion of

R k k that Ipsen and Chandrasekaran employed in their exp eriments

The original HybridI I I algorithm contained two sublo ops with the rst one

lo oping over GolubIk and ChanI Ik till convergence the second one lo oping

over GolubIk and ChanI Ik We present a dierent lo op ordering in

our variant one that is simpler and seems to enhance convergence On matrices

that required considerable p ostpro cessing the new lo op ordering required ab out

fewer steps for matrices one step b eing a call to GolubI or

ChanI I than Chandrasekaran and Ipsens original algorithm In addition the

Algorithm ChanI Isfk

v right singular vector corresp onding to R k k

min

Find largest index j j k such that jv j max jv j

j i

ik

if f jv j jv j then

j

k

L L

R triuR

jk jk

end if

Fig ffactor Variant of ChanI I Algorithm

12 

new ordering sp eeds detection of convergence as shown b elow

As in our implementation of the PanTang algorithm we use ICE for estimating

singular values and vectors and the ecient aggregated Givens scheme for

the retriangularizations

We employ a generalization of the f factor technique to guarantee termination

in the presence of rounding errors The pivoting metho d assigns to every col

umn a weight namely kRk i ik in GolubIk and v in ChanI Ik where

i

v is the right singular vector corresp onding to the smallest singular value of

R k k To ensure termination Chandrasekaran and Ipsen suggested piv

oting a column only when its weight exceeded that of the current column by

at least n where is the computer precision they did not analyze the im

pact of this change on the b ounds obtained by the algorithm In contrast we

use a multiplicative tolerance factor f like that of Pan and Tang the analysis

in QuintanaOrtand QuintanaOrt proves that our algorithm achieves

the b ounds

f

p

R A and

min k

k n k

p

k n k

A R

k max

f

These b ounds are identical to and except that an f instead of an

f enters into the equation and that now f We used f in our

implementation

We claimed b efore that the new lo op ordering can avoid unnecessary steps when

the algorithm is ab out to terminate To illustrate we apply Chandrasekaran and

Ipsens original ordering to a matrix that almost reveals the rank

GolubIk Final p ermutation o ccurs here

Now the rank is revealed

ChanI Ik

GolubIk Another iteration of inner klo op

since p ermutation o ccurred

ChanI Ik

GolubIk Inner lo op for k

ChanI Ik

GolubIk Another iteration of the main lo op

since p ermutation o ccurred in last pass

ChanI Ik

GolubIk

ChanI Ik Termination In contrast the HybridI I Isf algorithm terminates in four steps

 13

Algorithm RRQRfk

rep eat

call HybridI I Isffk or PTMfk

R k k

R k k

if and then

rank k stop

else if and then

k k

else if and then

k k

end if

Fig Algorithm for Computing RankRevealing QR Factorization

GolubIsfk Final p ermutation

GolubIsfk

ChanI Isfk

ChanI Isfk Termination

Determining the Numerical Rank

As Stewart p ointed out b oth the ChandrasekaranIpsen and PanTang al

gorithms as well as our versions of those algorithms do not reveal the rank of

a matrix p er se Rather given an integer k they compute tight estimates for

A R k k and A Rk n k n

k min k max

To obtain the numerical rank with resp ect to a given threshold given an initial

estimate for the rank as provided for example by the algorithm describ ed in Sec

tion we employ the algorithm shown in Figure In our actual implementation

and are computed in HybridI I Isf or PTM

EXPERIMENTAL RESULTS

We rep ort in this section exp erimental results with the doubleprecision imple

mentations of the algorithms presented in the preceding section We consider the

following co des

DGEQPF The implementation of the QR factorization with column pivoting

provided in LAPACK

DGEQPB An implementation of the windowed QR factorization scheme de

scrib ed in Section

DGEQPX DGEQPB followed by an implementation of the variant of the Chan

drasekaranIpsen algorithm describ ed in Subsections and

DGEQPY DGEQPB followed by an implementation of the variant of the

PanTang algorithm describ ed in Subsections and

DGEQRF The blo ck QR factorization without any pivoting provided in LA

PACK

In the implementation of our algorithms we make heavy use of available LA

PACK infrastructure The co de used in our exp eriments including test and timing

14 

drivers and test matrix generators is available as rrqrtargz in pubprism on

ftpsuperorg

We tested matrices of size and on an IBM RS Mo del

and SGI R In each case we employed the vendorsupplied BLAS in the

ESSL and SGIMATH libraries resp ectively

Numerical Reliability

We employed dierent matrix types to test the algorithms with various singular

value distributions and numerical rank ranging from to full rank Details of the

test matrix generation are b eyond the scop e of this pap er and we give only a brief

synopsis here For details the reader is referred to the co de

Test matrices through were designed to exercise column pivoting Matrix

was designed to test the b ehavior of the condition estimation in the presence

of clusters for the smallest singular value For the other cases we employed the

LAPACK matrix generator xLATMS which generates random symmetric matrices by

multiplying a diagonal matrix with prescrib ed singular values by random orthogonal

matrices from the left and right For the break distribution all singular values are

except for one In the arithmetic and geometric distributions they decay from

to a sp ecied smallest singular value in an arithmetic and geometric fashion

resp ectively In the reversed distributions the order of the diagonal entries was

reversed For test cases though we used xLATMS to generate a matrix of

n

with smallest singular value e and then interspersed random order

linear combinations of these fullrank columns to pad the matrix to order n For

test cases through we used xLATMS to generate matrices of order n with the

smallest singular value b eing e We b elieve this set to b e representative of

matrices that can b e encountered in practice

We rep ort in this section on results for matrices of size n noting that

identical qualitative b ehavior was observed for smaller matrix sizes We decided

to rep ort on the largest matrix sizes b ecause the p ossibility for failure in general

increases with the number of numerical steps involved Numerical results obtained

on the three platforms agreed to machine precision For this case we list in Table

the numerical rank r with resp ect to a condition threshold of the largest

singular value the r th singular value the r st singular value

max r r

and the smallest singular value for our test cases

min

Figures and display the ratio

r

bR

r

where bR as dened in is the computed estimate of the condition number of

R after DGEQPB Figure and DGEQPX and DGEQPY Figure Thus

is the ratio b etween the ideal condition number and the estimate of the condition

number of the leading triangular factor identied in the RRQR factorization If this

ratio is close to and b is a go o d condition estimate our RRQR factorizations

do a go o d job of capturing the large singular values of A Since the pivoting

strategy and hence the numerical b ehavior of DGEQPB is p otentially aected by

the blo ck size chosen Figures and contain seven panels each of which shows

the results obtained with the test matrices and a blo ck size ranging from to

 15

Table Test Matrix Types r rank for n

Description r

max r r min

minmn

Matrix with rank e e e e

A min m n has full rank

e e e e

RA RA min m n

Full rank e e e e

A small in norm

e e e e

A n of full rank

A small in norm

e e e e

RA RA

smallest sing values clustered e e e e

Break distribution e e e e

Reversed break distribution e e e e

Geometric distributio n e e e e

Reversed geometric distributio n e e e e

Arithmetic distribution e e e e

Reversed arithmetic distribution e e e e

Break distribution e e e e

Reversed break distribution e e e e

Geometric distributio n e e e e

Reversed geometric distributio n e e e e

Arithmetic distribution e e e e

Reversed arithmetic distribution e e e e

shown in the top of each panel

We see that except for matrix type in Figure the blo ck size do es not play

much of a rule numerically although close insp ection reveals subtle variations in

b oth Figure and With blo ck size DGEQPB just b ecomes the standard Golub

pivoting strategy Thus the rst panel in Figure corrob orates the exp erimentally

robust b ehavior of this algorithm We also see that except for matrix type the

restricted pivoting strategy employed in DGEQPB do es not have much impact on

numerical b ehavior For matrix type however it p erforms much worse Matrix

n

indep endent columns and generating the leading is constructed by generating

1

n

4

as random linear combinations of those columns scaled by where is the

machine precision Thus the restricted pivoting strategy in its myopic view of the

matrix gets stuck so to sp eak in these columns

The p ostpro cessing of these approximate RRQR factorizations on the other

hand remedies p otential shortcomings in the prepro cessing step As can b e seen

from Figure the inaccurate factorization of matrix is corrected while the other

in essence correct factorizations get improved only slightly Except for small

variations DGEQPX and DGEQPY deliver identical results

We also computed the exact condition number of the leading triangular subma

trices identied in the triangularizations by DGEQPB DGEQPX and DGEQPY

and compared it with our condition estimate Figure shows the ratio of the ex act condition number to the estimated condition number of the leading triangular

16 

1 10 1 5 8 12 16 20 24

0 10

-1 10

-2 10

-3 10

-4 10 Optimal cond_no. / Estimated

-5 10

-6 10 20 40 60 80 100 120

Tests

Fig Ratio b etween Optimal and Estimated Condition Number for DGEQPB

1 10

1 5 8 12 16 20 24

0 10

-1 10

-2 10 Optimal cond_no. / Estimated -- QPX .. QPY

-3 10 20 40 60 80 100 120

Tests

Fig Ratio b etween Optimal and Estimated Condition Number for DGEQPX solid line and DGEQPY dashed

 17

1 10

1 5 8 12 16 20 24

0 10

-1 10

-- QPB

-. QPX Exact cond_of_R11 / Estimated ... QPY

-2 10 20 40 60 80 100 120

Tests

Fig Ratio b etween Exact and Estimated Condition Number of Leading Triangular Factor

for DGEQPB dashed DGEQPX dasheddotted and DGEQPY dotted

factor We observe excellent agreement within an order of magnitude in all cases

Hence the spikes for test matrices and in Figures and are not due

to errors in our estimators Rather they show that all algorithms have diculties

when confronted with dense clusters of singular values We also note that in this

context the notion of rank is numerically ill dened since there is no sensible place

to draw the line The rank derived via the SVD is in b oth cases and our

algorithms deliver estimates b etween and with minimal changes in the

condition number of their corresp onding leading triangular factors

In summary these results show that DGEQPX and DGEQPY are reliable al

gorithms for revealing numerical rank They pro duce RRQR factorizations whose

leading triangular factors accurately capture the desired part of the sp ectrum of A

and thus reliable and numerically sensible rank estimates Thus the RRQR fac

torization takes advantage of the eciency and simplicity of the QR factorization

yet it pro duces information that is almost as reliable as that computed by means

of the more exp ensive singular value decomp osition

Computing Performance

In this section we rep ort on the p erformance of the LAPACK co des DGEQPF and

DGEQRF as well as the new DGEQPB DGEQPX and DGEQPY co des For these

co des as well as all others presented in this section the Mop rate was obtained by

dividing the number of op erations required for the unblocked version of DGEQRF

by the runtime This normalized Mop rate readily allows for timing comparisons

We rep ort on matrix sizes and using blo ck sizes nb of

and

Figures and show the Mop p erformance averaged over the matrix

18 

n = 100 n = 250 50 65

60 45

55

40 50

35 45

40

Performance (in Mflops) 30 Performance (in Mflops)

35

25 30

20 25 0 10 20 30 0 10 20 30 Block size Block size

n = 500 n = 1000 75 75

70 70

65 65

60 60

55 55 50

50

Performance (in Mflops) 45 Performance (in Mflops)

45 40

35 40

30 35 0 10 20 30 0 10 20 30

Block size Block size

Fig Performance versus Blo ck Size on IBM RS DGEQPF DGEQRF

DGEQPB DGEQPX x DGEQPY

types versus blo ck size on the IBM and SGI platforms The dotted line denotes

the p erformance of DGEQPF the solid one that of DGEQRF and the dashed one that

of DGEQPB the and symbols indicate DGEQPX and DGEQPY resp ectively

On all three machines the p erformance of the two new algorithms for computing

RRQR is robust with resp ect to variations in the blo ck size The two new blo ck

algorithms for computing RRQR factorization are except for small matrices on the

SGI faster than LAPACKs DGEQPF for all matrix sizes We note that the SGI

has a data cache of MB while the IBM platform has only a KB data cache

Thus matrices up to order t into the SGI cache but matrices of order do

not Therefore for matrices of size or less we observe limited b enets from the

b etter inherent data lo cality of the BLAS implementation on this computer These

results also show that DGEQPX and DGEQPY exhibit comparable p erformance Figures through oer a closer lo ok at the p erformance of the various test

 19

n = 100 n = 250 100 160

150 90

140 80 130

70 120

60 110

Performance (in Mflops) Performance (in Mflops) 100 50 90

40 80

30 70 0 10 20 30 0 10 20 30 Block size Block size

n = 500 n = 1000 190 180

180 160 170

160 140

150 120

140

100 130 Performance (in Mflops) Performance (in Mflops) 120 80

110 60 100

90 40 0 10 20 30 0 10 20 30

Block size Block size

Fig Performance versus Blo ck Size on SGI R DGEQPF DGEQRF DGE

QPB DGEQPX x DGEQPY

20 

60

50

40

30 Performance (in Mflops)

20

10

0 2 4 6 8 10 12 14 16 18

Matrix type

Fig Performance versus Matrix Type on an IBM RS for n and nb

DGEQPF DGEQRF DGEQPB DGEQPX x DGEQPY

matrices We chose nb and n as a representative example Similar

b ehavior was observed in the other cases

We see that on the IBM platforms Figure the p erformance of DGEQRF

and DGEQPF do es not dep end on the matrix type We also see that except for

matrix types and the p ostpro cessing of the initial approximate RRQR

factorization takes very little time with DGEQPX and DGEQPY p erforming sim

ilarly For matrix type considerable work is required to improve the initial QR

factorization For matrix types and the p erformances of DGEQPX and DGE

QPY dier noticeably on the IBM platform but there is no clear winner We also

note that matrix type is suitable for DGEQPB since the indep endent columns

are up front and thus are revealed quickly and the rest of the matrix is factored

with DGEQRF

The SGI platform Figure oers a dierent picture The p erformance of all

algorithms shows more dep endence on the matrix type and DGEQPB p erforms

worse on matrix type than on all others Nonetheless except for matrix

DGEQPX and DGEQPY do not require much p ostpro cessing eort

The pictures for other matrix sizes are similar The cost for DGEQPX and

DGEQPY decreases as the matrix size increases except for matrix type where it

increases as exp ected We also note that Figures and would have lo oked even

more favorable for our algorithm had we omitted matrix or chosen the median

instead of the average p erformance

Figure shows the p ercentage of the actual amount of ops sp ent in monitoring

the rank in DGEQPB and in p ostpro cessing the initial QR factorization for dierent

matrix sizes on the IBM RS We show only matrix types through since

the b ehavior of matrix type is rather dierent in this sp ecial case roughly

 21

140

120

100

80

Performance (in Mflops) 60

40

20 0 2 4 6 8 10 12 14 16 18

Matrix type

Fig Performance versus Matrix Type on an SGI R for n and nb DGEQPF

DGEQRF DGEQPB DGEQPX x DGEQPY

DGEQPX DGEQPY

10 10

9 9

8 8

7 7

6 6

5 5

4 4 % in flops of pivoting % in flops of pivoting

3 3

2 2

1 1

0 0 5 10 15 5 10 15

Matrix type Matrix type

Fig Cost of Pivoting in of ops versus Matrix Types of Algorithms DGEQPX and DGEQPY

on an IBM RS for Matrix Sizes x and o

of the overall ops is exp ended in the p ostpro cessing Note that the actual

p erformance p enalty due to these op erations is while small still considerably higher

than the op count suggests This is not surprising given the relatively negrained

nature of the condition estimation and p ostpro cessing op erations

One may wonder whether the use of DGEQRF to compute the initial QR factor ization would lead to b etter results since DGEQRF is the fastest QR factorization

22 

algorithm This is not the case since DGEQRF do es not provide any rank pre

ordering and thus p erformance gains from DGEQRF are annihilated in the p ost

pro cessing steps For example for matrices of order on an IBM RS

the average Mop rate excluding matrix was with a standard deviation of

The p ercentage of ops sp ent in p ostpro cessing in these cases was on average

with a standard deviation of For matrix we are lucky since the

matrix is of low rank and all indep endent columns are at the front of the matrix

Thus we sp end only in p ostpro cessing obtaining a p erformance of Mops

overall In all other cases though considerable eort is exp ended in the p ostpro

cessing phase leading to overall disapp ointing p erformance These results show

that the preordering done by DGEQPB is essential for the eciency of the overall

algorithm

CONCLUSIONS

In this pap er we presented rankrevealing QR RRQR factorization algorithms

that combine an initial QR factorization employing a restricted pivoting scheme

with p ostpro cessing steps based on variants of algorithms suggested by Chan

drasekaran and Ipsen and Pan and Tang

The restrictedpivoting strategy results in an initial QR factorization that is

almost entirely based on BLAS kernels yet still achieves at a go o d approximation

of an RRQR factorization most of the time To guarantee the reliability of the

initial RRQR factorization and improve it if need b e we improved an algorithm

suggested by Pan and Tang relying heavily on incremental condition estimation and

blo cked Givens rotation up dates for computational eciency As an alternative

we implemented a version of an algorithm by Chandrasekaran and Ipsen which

among other improvements uses the f factor technique suggested by Pan and Tang

to avoid cycling in the presence of roundo errors

Numerical exp eriments on eighteen dierent matrix types with matrices ranging

in size from to on IBM RS and SGI R platforms show that this

approach pro duces reliable rank estimates while outp erforming the less reliable

QR factorization with column pivoting the currently most common approach for

computing an RRQR factorization of a dense matrix

ACKNOWLEDGMENTS

We thank Xiaobai Sun Peter Tang and Enrique S QuintanaOrtfor stimulating

discussions on the sub ject

References

Anderson E Bai Z Bischof C Demmel J Dongarra J DuCroz J Greenbaum

A Hammarling S McKenney A Ostrouchov S and Sorensen D a

LAPACK Users Guide SIAM Philadelphia

Anderson E Bai Z Bischof C Demmel J Dongarra J DuCroz J Greenbaum

A Hammarling S McKenney A Ostrouchov S and Sorensen D b

LAPACK Users Guide Release SIAM Philadelphia

Bischof C H A blo ck QR factorization algorithm using restricted pivoting In

Proceedings SUPERCOMPUTING Baltimore Md pp ACM Press

Bischof C H Incremental condition estimation SIAM Journal on Matrix Analysis

and Applications

 23

Bischof C H A parallel QR factorization algorithm with controlled lo cal pivoting

SIAM Journal on Scientic and Statistical Computing

Bischof C H and Hansen P C Structurepreserving and rankrevealing QR

factorizations SIAM Journal on Scientic and Statistical Computing November

Bischof C H and Hansen P C A blo ck algorithm for computing rankrevealing

QR factorizations Numerical Algorithms

Bischof C H and Shroff G M On up dating signal subspaces IEEE Transac

tions on Signal Processing

Bischof C H and Tang P T P Robust incremental condition estimation

Preprint MCSP Mathematics and Computer Science Division Argonne National

Lab oratory

Chan T F Rank revealing QR factorizations Linear Algebra and Its Applica

tions

Chan T F and Hansen P C Some applications of the rank revealing QR factor

ization SIAM Journal on Scientic and Statistical Computing

Chan T F and Hansen P C Lowrank revealing QR factorizations Numerical

Linear Algebra and Applications

Chandrasekaran S and Ipsen I On rankrevealing QR factorizations SIAM

Journal on Matrix Analysis and Applications

Dongarra J J Bunch J R Moler C B and Stewart G W LINPACK

Users Guide SIAM Philadelphia

Elden L and Schreiber R An application of systolic arrays to linear discrete

illp osed problems SIAM Journal on Scientic and Statistical Computing

Foster L V Rank and null space calculations using matrix decomp osition without

column interchanges Linear Algebra and Its Applications

Golub G H Numerical metho ds for solving linear least squares problems Nu

merische Mathematik

Golub G H Klema V and Stewart G W Rank degeneracy and least squares

problems Technical Rep ort TR University of Maryland Dept of Computer Science

Golub G H and Loan C F V Matrix Computations The Johns Hopkins Uni

versity Press Baltimore

Golub G H and Loan C F V Matrix Computations nd ed The Johns

Hopkins University Press Baltimore

Golub G H Manneback P and Toint P L A comparison b etween some

direct and iterative metho ds for certain large scale geo detic leastsquares problem SIAM

Journal on Scientic and Statistical Computing

Gragg W B and Stewart G W A stable variant of the secant metho d for

solving nonlinear equations SIAM Journal on Numerical Analysis

Grandine T A An iterative metho d for computing multivariate C piecewise

p olynomial interpolants Computer Aided Geometric Design

Grandine T A Rank decient interpolation and optimal design An example

Technical Rep ort SCATR February Bo eing Computer Services Engineering and

Scientic Services Division

Gu M and Eisenstat S A stable and ecient algorithm for the rankone mo d

ication of the symmmetric eigenproblem Technical Rep ort YALEUDCSRR Yale

University Department of Computer Science

Hansen P C Truncated SVD solutions to discrete illp osed problems with ill

determined numerical rank SIAM Journal on Matrix Analysis and Applications

Hansen P C Seki i T and Shibahashi H The mo died truncated SVDmetho d

for regularization in general form SIAM Journal on Scientic Computing

Higham N J Ecient algorithms for computing the condition number of a tridi

agonal matrix SIAM Journal on Scientic and Statistical Computing

24 

Hong Y P and Pan CT The rank revealing QR decomp osition and SVD

Mathematics of Computation

Hsieh S F Liu K J R and Yao K Comparisons of truncated QR and SVD

metho ds for AR sp ectral estimations In R J Vaccaro Ed SVD and Signal Processing

II Amsterdam pp Elsevier Science Publishers

More J The LevenbergMarquardt algorithm Implementation and theory In G A

Watson Ed Proceedings of the Dundee Conference on Numerical Analysis Berlin

Springer Verlag

Pan CT and Tang P T P Bounds on singular values revealed by QR factor

izaton Technical Rep ort MCSP Mathematics and Computer Science Division

Argonne National Lab oratory

QuintanaOrt G and QuintanaOrt E S Guaranteeing termination of Chan

drasekaran Ipsens algorithm for computing rankrevealing QR factorizations Preprint

MCSP Mathematics and Computer Science Division Argonne National Lab ora

tory

QuintanaOrt G Sun X and Bischof C H A BLAS version of the QR

factorization with column pivoting Preprint MCSP Mathematics and Computer

Science Division Argonne National Lab oratory

Reichel L and Gragg W Fortran subroutines for up dating the QR factorization

ACM Transactions on Mathematical Software

Schreiber R and Van Loan C A storage ecient WY representation for pro d

ucts of Householder transformations SIAM Journal on Scientic and Statistical Comput

ing

Stewart G W Rank degeneracy SIAM Journal on Scientic and Statistical Com

puting

Stewart G W An up dating algorithm for subspace tracking Technical Rep ort

CSTR University of Maryland Department of Computer Science

Stewart G W Determining rank in the presence of error Technical Rep ort CS

TR Dept of Computer Science University of Maryland

Walden B Using a fast signal pro cessor to solve the inverse kinematic problem with

sp ecial emphasis on the singularity problem Ph D thesis LinkopingUniversity Dept of Mathematics