A Sparsity-aware QR Decomposition Algorithm for Efficient Cooperative Localization
Ke Zhou and Stergios I. Roumeliotis {kezhou|stergios}@cs.umn.edu
Department of Computer Science & Engineering University of Minnesota
Multiple Autonomous Robotic Systems Labroratory Technical Report Number -2011-004 September 2011
Dept. of Computer Science & Engineering University of Minnesota 4-192 EE/CS Building 200 Union St. S.E. Minneapolis, MN 55455 Tel: (612) 625-2217 Fax: (612) 625-0572 A Sparsity-aware QR Decomposition Algorithm for Efficient Cooperative Localization
Ke Zhou∗ and Stergios I. Roumeliotis† ∗Dept. of Electrical and Computer Engineering, †Dept. of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455 Email: {kezhou|stergios}@cs.umn.edu
Abstract—This paper focuses on reducing the computational However, since CL involves joint-state estimation, the pro- complexity of the extended Kalman filter (EKF)-based multi- cessing of exteroceptive measurements (i.e., relative robot-to- robot cooperative localization (CL) by taking advantage of the robot measurements) requires the robots to communicate with sparse structure of the measurement Jacobian matrix H. In contrast to the standard EKF update, whose time complexity each other and update the covariance matrix corresponding is up to O(N 4) (N is the number of robots in a team), we to all pose estimates in a centralized fashion. Using the introduce a Modified Householder QR algorithm which fully extended Kalman filter (EKF) framework, the time complexity exploits the sparse structure of the matrix H, and prove that the for processing each exteroceptive measurement is O(N 2) (N overall time complexity of the EKF update, based on our QR being the number of robots) [7]. Since the maximum possible factorization scheme, reduces to O(N 3). Additionally, the space complexity of the proposed Modified Householder QR algorithm number of exteroceptive measurements is (N − 1)N at every is O(N 2). Finally, we validate the Modified Householder QR time step, the overall time complexity for updating the state algorithm through extensive simulations and experiments, and and covariance, in the worst case, becomes O(N 4) per time demonstrate its superior performance both in accuracy and in step. As N grows, this high (time) complexity may prohibit CPU runtime, as compared to the current state-of-the-art QR real-time performance. decomposition algorithm. In this paper, we investigate the computational complexity (i.e., time complexity and space complexity) of the centralized EKF-based CL algorithm, considering the most challenging I.INTRODUCTION situation where the total number of relative measurements per Multi-robot teams (sensor networks) have recently attracted time step is (N −1)N. The main contributions of this work are the following: We show that the time complexity of state and significant interest in the research community because of their 3 robustness, versatility, speed, and potential applications, such covariance updates can be reduced to O(N ) by employing as environmental monitoring [1], surveillance [2], human- the Information filter (see Section III-C). To further improve robot interaction [3], defense applications [4], as well as target the numerical stability of the Information filter, we present an tracking [5], [6]. EKF update scheme based on the QR factorization (also called QR decomposition) of the measurement Jacobian H (see Regardless of the applications, every robot in the team must Section III-D). While the Standard Householder QR algorithm be able to accurately localize itself in an unknown environ- has time complexity O(N 4) and space complexity O(N 3) ment to ensure successful execution of its tasks. While each for decomposing (or factorizing) H, we develop the Modified robot can independently estimate its own pose (position and Householder QR algorithm (see Section V), which exploits orientation) by integrating its linear and rotational velocities, the sparse structure of H and reduces the time complexity e.g., from wheel encoder [7], the uncertainty of the robot and space complexity of QR decomposition to 3 and pose estimates generated using this technique (dead-reckoning) O(N ) O(N 2), respectively. As a result, the overall computational increases fast, and eventually renders these estimates useless. complexity of the EKF-based CL is reduced by an order of Although one can overcome this limitation by equipping every magnitude (from O(N 4) to O(N 3) for time complexity and robot with absolute positioning sensors such as the Global O(N 3) to O(N 2) for space complexity) per update step. Positioning System (GPS), GPS signals are unreliable in urban Following a brief review of related work in Section II, environments and unavailable in space and underwater. On we present the formulation of the EKF-based CL, as well the other hand, by performing cooperative localization (CL), as an EKF update scheme based on the QR factorization where communicating robots use relative measurements (dis- of H in Section III. In Section IV, we briefly describe the tance, bearing, and orientation) to jointly estimate the robots’ QR decomposition based on the Standard Householder QR poses, it has been shown that the accuracy and performance algorithm, and show its time complexity and space complexity of the robot pose estimates can significantly improve [8], even of factorizing H are O(N 4) and O(N 3), respectively. We then in the absence of GPS. develop the Modified Householder QR algorithm in Section V, As shown in [7], in CL, each robot can process its own which exploits the sparse structure of H and achieves QR proprioceptive measurements independently and distributively. decomposition with time complexity O(N 3) and space com- plexity O(N 2). Simulation and real-world experimental results This work was supported by the University of Minnesota through the Digital Technology Center (DTC), the National Science Foundation (IIS-0643680, are presented in Sections VI and VII, respectively, followed IIS-0811946, IIS-0835637), and AFOSR (FA9550-10-1-0567). by the conclusions and future directions in Section VIII. II. LITERATURE REVIEW The main drawback of the previous gradient-based iterative Multi-robot cooperative localization (CL) has received con- algorithms (see [13], [16], [17]) is that the total number of siderable attention in the literature (e.g., [9], [10], [11], [12]). iterations required for convergence has not been addressed, as Various system architectures, such as centralized [7], [13] well as no guarantees of convergence to global optimum, due or distributed [14], [15], [16], have been proposed for CL. to the existence of multiple local minima of the resulting non- These system architectures use various estimation algorithms, convex optimization problem arising from CL. Furthermore, such as the EKF [7], the maximum likelihood estimator the above batch-mode estimators can be implemented only (MLE) [13], the maximum a posteriori estimator (MAP) [16], when all data become available, and are unable to provide on- and particle filters [10]. In this paper, we focus our discussion line estimations when new data arrives. This motivates us to on centralized or distributed approaches that fuse data in batch consider EKF-based estimators in CL. mode (i.e., MLE or MAP, see Section II-A) or sequentially B. EKF-based Estimator (i.e., EKF, see Section II-B). The latter case is the main focus of our work. In what follows, we denote as K the total number In [7], Roumeliotis and Bekey showed that within the EKF of time steps addressed in batch mode, N the number of framework, each robot can independently propagate its own robots in the team, and consider the worst-case computational state and covariance in a fully distributed fashion. However, complexity where the total number of relative measurements during the update stage, the observing robot needs to broadcast per time step is (N − 1)N. its relative measurements to the rest of the robots in the team, and update the covariance matrix corresponding to all pose estimates in a centralized fashion. In this approach, the time A. MLE or MAP-based Estimator complexity for processing each exteroceptive measurement Howard et al. [13] presented a centralized MLE-based is O(N 2). Therefore, in the worst case, the overall time algorithm to address the CL problem. The solution of the complexity for sequentially updating state and covariance underlying non-convex optimization problem, formulated by becomes O(N 4) per time step. To reduce the cost of EKF- maximizing the conditional probability density function (pdf), based CL, several approximate algorithms have been proposed. is computed iteratively by the conjugate gradient (CG) algo- Panzieri et al. [18] presented a fully decentralized algorithm rithm with time complexity per CG iteration of O(KN 2). based on the Interlaced EKF [19], where each robot only Contrary to [13], where all measurements are transmitted processes relative measurements taken by itself to update its and processed at a fusion center, the authors [15] proposed own pose estimate, while treating the other robots’ poses as to solve the MLE in a distributed fashion. The original non- deterministic parameters. The time complexity per robot and convex optimization problem is decomposed into N sub- per time step is O(N) (hence the overall time complexity problems, one for each robot. In particular, the ith robot (i.e., per time step is O(N 2)) when relative measurements are robot-i) minimizes only a part of the cost function that involves processed sequentially. Similar to [18], Karam et al. [20] terms corresponding to its own proprioceptive measurements, proposed a distributed EKF-based method for CL. However, as well as the relative measurements between robot-i and other contrary to [18], here the robots are restricted to exchange robots. The optimization process in this case approximates their state estimates with others within communication range. other robots’ poses as constants. However, there exists no The time complexity per robot and per time step is O(N 2), proof that the proposed approach guarantees convergence to resulting in the overall time complexity O(N 3) per time step. even one of the local minima. Additionally, the computational Martinelli [21] developed a distributed approach for CL that complexity of the algorithm is not addressed in the paper. uses hierarchical EKF filters to estimate the robots’ poses. In Similar to [13], Dellaert et al. [17] addressed MAP-based this case, N robots in a team are divided into L groups, each centralized CL by incorporating prior information about the consisting of M robots (N = LM). Every group contains initial poses of robots. Contrary to [13], the resulting non- a group leader who processes all the relative measurements convex optimization problem is solved iteratively by the between any two robots in that group and only updates the Levenberg-Marquardt (LM) algorithm. The solution of the pose estimates of the robots belonging to its group. A team system of linear equations at each iteration of the LM method leader is in charge of processing all observations between any is determined through a sparse QR solver. Although some two robots from different groups and only updates the pose insights were provided in the selection of initial estimates, estimates of the L leaders. The time complexity per time step the algorithm does not guarantee convergence to the global for each group leader and the team leader are O(M 4) and optimum. Furthermore, the paper does not provide information O(N(N − M)L2), respectively. about the worst-case time complexity of the sparse QR solver. The main drawback of the above approximate algorithms Recently, Nerurkar et al. [16] introduced a fully distributed (see [18], [20], [21]) is that in order to reduce the computa- CG algorithm to address the MAP-based CL. Similar to [17], tional complexity of EKF-based CL, these approaches ignore the resulting non-convex optimization problem is solved it- cross-correlations amongst robots, which often leads to overly eratively using the LM algorithm. In contrast to [13], the optimistic and inconsistent estimates. In this paper, we present incremental solution at each iteration of the LM method an algorithm that reduces the time complexity of EKF-based is distributively computed by the CG algorithm. The time CL to O(N 3), without introducing any approximations, but, complexity per robot and per iteration of the LM method is instead, by taking advantage of the specific sparse structure of of O((KN)2). the measurement Jacobian matrix. III. PROBLEM FORMULATION measuring robot-j (1 ≤ i = j ≤ N) are given by Consider a group of N mobile robots performing CL in 2-D zi,j = hi,j (xi , xj )+ ni,j , (2) by processing relative distance and bearing measurements. In k k k k k i,j i j i,j i,j T this paper, we study the case of global localization, i.e., the where hk (xk, xk) = [dk θk ] , with pose (position and orientation) of each robot is described with di,j = pj − pi , respect to a fixed (global) frame of reference. k k k 2 The pose of the ith robot (or robot-i) at time-step k is yj − yi θi,j = arctan k k − ϕi denoted as xi = [(pi )T ϕi ]T, where pi = [xi yi ]T and ϕi k j i k k k k k k k k xk − xk represent the position and orientation of robot-i at time-step k with respect to the global frame of reference, respectively. denoting the true distance and bearing from robot-i observing i,j i,j i,j T The state vector for the robot team at time-step k is defined robot-j at time-step k, respectively. nk = [nk,d nk,θ] is 1 T 2 T N T T 3N zero-mean white Gaussian measurement noise with covariance as xk = [(x ) (x ) ... (x ) ] ∈ R . We assume that k k k i,j i,j each robot is equipped with proprioceptive sensors (e.g., wheel Cn . Without loss of generality, we assume Cn = I2 (1 ≤ encoders), which measure its linear and rotational velocities, as i = j ≤ N) throughout the rest of paper. i,j well as exteroceptive sensors (e.g., laser scanners), which can The measurement-error equation for zk , obtained by lin- detect, identify, and measure the relative distance and bearing earizing (2) around the current best state estimate xˆk|k−1, is to other robots. i,j i,j i,j i j ˜zk|k−1 = zk − hk (xˆk|k−1, xˆk|k−1) (3) i,j i i,j j i,j i,j i,j A. State Propagation ≈ Ψk x˜k|k−1 +Υk x˜k|k−1 +nk = Hk x˜k|k−1 +nk , The discrete-time state propagation equation for robot-i where from time-step k − 1 to k is pj pi T (ˆk|k−1−ˆk|k−1) − j 0 p pi 2 i i i i i i,j i,j ˆk|k−1−ˆk|k−1 x = f (x , u , w ) , i =1,...,N, Ψ ∇ i h j T T k k−1 k−1 k−1 k−1 k = x k = p pi J , (4) k (ˆk|k−1−ˆk|k−1) i i i T − j i 2 −1 u pˆ 1−pˆ 1 2 where the control input k−1 = [vk−1 ωk−1] , consisting of k|k− k|k− i j T pˆ pˆi the linear velocity measurement vk−1 and rotational velocity ( k|k−1− k|k−1) i j 0 pˆ pˆi 2 measurement ωk−1 recorded by the proprioceptive sensors, i,j i,j k|k−1− k|k−1 Υ ∇ j h j T T k = x k = p pi J (5) is corrupted by zero-mean, white Gaussian process noise k (ˆk|k−1−ˆk|k−1) i i i T i j i 2 0 w with covariance C . pˆ 1−pˆ 1 2 k−1 = [wk−1,v wk−1,ω] w k|k− k|k− In this work, we employ the extended Kalman filter (EKF) i 0 −1 for recursively estimating the robot’s pose xk, i =1,...,N. are of dimensions 2×3, and J = . Furthermore, note 1 1 0 Thus the estimate of robot-i’s pose is propagated by : i,j that the measurement Jacobian matrix Hk (of dimensions i i i i xˆk|k−1 = fk−1(xˆk−1|k−1, uk−1, 0) , i =1,...,N, 2 × 3N) has a sparse structure (without loss of generality, we assume i < j): where xˆℓ|j is the state estimate at time-step ℓ, after measure- i,j i,j i,j ments up to time-step j have been processed. Hk = 02×3(i−1) Ψk 02×3(j−i−1) Υk 02×3(N−j) . The covariance matrix corresponding to the state estimate In fact, there are at most 9 nonzero elements in Hi,j (5 from xˆk|k−1 is propagated as k Ψi,j and 4 from Υi,j ). T T k k Pk|k−1 = Φk−1Pk−1|k−1Φk−1 + Gk−1CwGk−1 , (1) In this paper, we consider the worst-case scenario where the 1 N i sensing range of the exteroceptive sensors is sufficiently large where Φk−1 = diag(Φk−1,..., Φk−1) with Φk−1 = i 1 N so that each robot can detect, identify, and measure the relative ∇ i f , and Gk−1 = diag(G ,..., G ) with xk−1 k−1 k−1 k−1 i i distance and bearing to the remaining N − 1 robots at every G = ∇wi f , i =1,...,N. The overall process noise k−1 k−1 k−1 time step. Thus, the total number of relative measurements 1 N covariance Cw = diag(Cw,..., Cw ). per time step is (N − 1)N, which corresponds to the most Note that due to the block diagonal structures of Φk−1 and challenging case in terms of computational complexity. For Gk−1, the overall computational complexity (both time and the general case when the number of measurements is less 2 space complexity) of implementing (1) is O(N ) [7]. than (N −1)N due to the limited sensing range of each robot, our proposed method (see Section V) can be readily applied B. Measurement Model without modifications. At time-step k, the relative distance and bearing obser- The measurement-error equation for the robot team, ob- i,j vations recorded by the exteroceptive sensors from robot-i tained by stacking all the measurement residuals z˜k|k−1 [see (3)] into a column vector, is 1In the remainder of the paper, the “hat” symbol ˆ is used to denote the estimated value of a quantity, while the “tilde” symbol ˜ is used to signify the z˜k|k−1 ≈ Hkx˜k|k−1 + nk , error between the actual value of a quantity and its estimate. The relationship between a variable x and its estimate xˆ, is x˜ = x − xˆ. Additionally, 0m×n 1,2 T i,j T N,N−1 T T where z˜k|k−1 = [(˜zk|k−1) ... (˜zk|k−1) ... (˜zk|k−1 ) ] and In represent the m×n zero matrix and n×n identity matrix, respectively. 1n denotes the n dimensional (column) vector with all elements being 1, and is the measurement residual error vector of dimension e 1,2 T i,j T N,N−1 T T j is the unit (column) vector with a 1 in the jth coordinate and 0’s elsewhere. 2(N − 1)N, and nk = [(nk ) ... (nk ) ... (nk ) ] is zero-mean, white Gaussian measurement noise with covari- Unfortunately, the time complexity and space complexity 6 ance Cn = I2(N−1)N . of the state and covariance updates using (7)-(8) are O(N ) The overall measurement Jacobian matrix Hk = and O(N 4), respectively, due to the inversion and storage H1,2 T Hi,j T HN,N−1 T T, whose dimensions are [( k ) ... ( k ) ... ( k ) ] of a dense matrix Sk whose dimensions are 2(N − 1)N × 2(N − 1)N × 3N, has the following sparse structure: 2(N − 1)N. Therefore, the real-time implementation of (7)- Ψ1,2 Υ1,2 (8) becomes prohibitive as the number of robots increases. k k 2 1 2 1 Υ , Ψ , It is possible to avoid the inversion and storage of Sk k k . by processing (N − 1)N measurements sequentially. In this . 1 1 case, since the time complexity for processing every relative Ψ ,N Υ ,N i,j k k 2 1 1 measurement z is O(N ) [7], the total time complexity per ΥN, ΨN, k k k time step becomes O(N 4). In addition, space complexity can Ψ2,3 Υ2,3 k k 2 Υ3,2 Ψ3,2 be reduced to O(N ) [7]. However, due to the nonlinearity of Hk = k k . (6) . the measurement model [see (2)], the robot pose estimates . . obtained by sequential updates are less accurate than the Ψ2,N Υ2,N k k estimates obtained by concurrent updates using (7)-(8). N,2 N,2 Υ Ψ k k Another alternative approach for updating the state and . . . . . . covariance is to employ the Information filter [22, Chapter −1 −1 ΨN ,N ΥN ,N 5, Section 5.6] (for Cn = I2(N−1)N ): k k N,N−1 N,N−1 Υ Ψ −1 k k −1 T Pk|k = Pk|k−1 + Hk Hk , (9) Remark 1: We highlight two important properties of Hk. T Firstly, due to its sparse structure [see (6)], each row of Hk xˆk|k = xˆk|k−1 + Pk|kHk ˜zk|k−1 . (10) has at most 5 nonzero elements [see (4)-(5)]. Per column, there It is worth noting that computing HTH and P HTz˜ exist at most 4(N − 1) non-zeros, e.g., the nonzero elements k k k|k k k|k−1 requires basic computational steps only O(N 2), due to the of the first column of H originate from the first columns of k sparse structure of H (See Appendix A). Hence, the most the matrices Ψ1,j and Υj,1, j =2,...,N, where each column k k k computationally demanding operation of (9)-(10) involves the of Ψ1,j (or Υj,1) has at most 2 nonzero elements [see (4)- k k inversions of two matrices (i.e., P and P−1 +HTH ), (5)]. Therefore nnz(H ) ∼ O(N 2), where nnz(H) denotes k|k−1 k|k−1 k k k whose dimensions are linear in N. Thus, the total time the number of nonzero elements of H, and the storage of complexity of the state and covariance updates using (9)-(10) H can be efficiently achieved using O(N 2) memory cells. k is O(N 3), a significant complexity reduction as compared to Secondly, since all the exteroceptive measurements are relative the standard EKF updates using (7)-(8). Additionally, since observations between each pair of robots, and absolute pose both P and H occupy O(N 2) memory cells (because measurements (such as GPS) are unavailable, H is rank k|k−1 k k of nnz(H ) ∼ O(N 2) and we only need to store non-zeros of deficiency. In particular, we have the following proposition. k H ), the space complexity of updating the state and covariance Proposition 1: rank(H ) ≤ 3N − 2, hence, H is not full k k k using (9)-(10) is reduced from O(N 4) to O(N 2). (column) rank. Unfortunately, (9) often suffers from numerical instability, Proof: To show rank(Hk) ≤ 3N − 2, we first construct T 2 due to κ(H Hk)= κ (Hk) (κ(H) is the condition number of I2 k a 3N × 2 matrix Ξ = 1N ⊗ , where ⊗ denotes H), which renders the state and covariance updates using (9)- 01×2 the Kronecker tensor product. Next, for every sub-matrix (10) numerically less robust [23]. i,j Hk , 1 ≤ i = j ≤ N, it is straightforward to verify that T T (pˆj −pˆi ) (pˆj −pˆi ) k|k−1 k|k−1 k|k−1 k|k−1 D. State and Covariance Updates through QR Factorization − j j p pi 2 p pi 2 i,j ˆk|k−1−ˆk|k−1 ˆk|k−1−ˆk|k−1 H Ξ j T T j T T 0 k = p pi J + p pi J = . (ˆk|k−1−ˆk|k−1) (ˆk|k−1−ˆk|k−1) An effective strategy to overcome the numerical instability − j i 2 j i 2 pˆ 1−pˆ 1 2 pˆ 1−pˆ 1 2 of the Information filter [see (9)-(10)] is to apply thin QR k|k− k|k− k|k− k|k− factorization with column pivoting on Hk [23], i.e., Hence, HkΞ=02(N−1)N×2. Since rank(Ξ)=2, we conclude T rank(Hk) ≤ 3N − 2, and thus Hk is not full rank. H Q R Π (11) k = Hk Hk Hk ,
where QHk is a 2(N − 1)N × 3N matrix with orthonormal C. State and Covariance Updates T columns (i.e., Q QH = I3N ), and RH is an upper Once all the relative measurements zi,j , Hk k k k 1 ≤ i = j ≤ N, triangular matrix with the absolute value of its diagonal become available, the state estimate and its covariance are elements arranged in decreasing order. Furthermore, ΠHk is a updated as (column) permutation matrix generated from column pivoting T T techniques, satisfying ΠH Π = Π ΠH = I . Note xˆk|k = xˆk|k−1 + Kkz˜k|k−1, (7) k Hk Hk k 3N T that in order to ensure orthogonality of QHk , it is necessary Pk|k = Pk|k−1 − KkSkKk , (8) to invoke column pivoting on Hk [24, Section 5.4.1], due to T −1 the fact that H is rank deficient (see Proposition 1). where Kk = Pk|k−1Hk Sk is the Kalman gain, Sk = k T HkPk|k−1Hk +Cn is the measurement residual covariance. Substituting (11) into (9), we obtain the EKF update equa- tions based on QR factorization Householder QR, has time complexity O(N 4) and space 3 −1 complexity O(N ). For clarity, the time-step index k is P P−1 Π RT R ΠT (12) k|k = k|k−1 + Hk Hk Hk Hk dropped from (11) throughout the rest of the paper, with T −1 T mH = 2(N − 1)N and nH = 3N denoting the number of = P − P ΠH R Σ RH Π P , k|k−1 k|k−1 k Hk k k Hk k|k−1 H T rows and columns of , respectively. xˆk|k = xˆk|k−1 + Pk|kHk ˜zk|k−1 , (13) where Σ R ΠT P Π RT I , and the last A. Householder Reflection k = Hk Hk k|k−1 Hk Hk + 3N equality in (12) is established by the matrix inversion lemma. We first introduce an orthogonal and symmetric House- T 2 Note that in contrast to Sk [see (8)], whose dimensions are holder (reflection) matrix Q = I − βvv , where β =2/ v 2, 2(N −1)N ×2(N −1)N, the matrix Σk in (12) has dimensions and the nonzero vector v is called a Householder vector. Any matrix H, multiplied by Q, can be computed as only 3N ×3N. Thus, assuming both RHk and ΠHk are given, the total time complexity and space complexity of the state and T QH = (I − βvvT)H = H − v βHTv . (14) covariance updates using (12)-(13) are O(N 3) and O(N 2), respectively, the same order of computational complexity as It is worth emphasizing that Householder updates [see (14)] of using (9)-(10). More importantly, κ(RHk ) = κ(Hk) [24]. “never entail the explicit formation of the Householder matrix Hence the numerical stability of (12)-(13) is improved as Q. Instead, an Householder update involves a matrix-vector compared to (9)-(10). multiplication ζ = βHTv and an outer product update H − Therefore, in order to ensure that the time complexity vζT” to reduce computational complexity [24, Section 5.1.4]. and space complexity for the state and covariance updates Now suppose for a given matrix H, our objective is to seek through QR factorization [see (12)-(13)] remain O(N 3) and a Householder matrix Q (or corresponding Householder vector O(N 2), we conclude that the maximum number of arithmetic v), such that the first column of QH has zeros below its first operations and memory usage to implement (11) should be component. In other words, letting u denote the first column O(N 3) and O(N 2), respectively. of H, we select a vector v and a scalar α, such that In what follows, we focus on the Householder QR algo- Qu = u − β(vTu)v = re . (15) rithm, one of the most widely adopted numerical techniques 1 for QR factorization [24, Section 5.2.1]. We analyze the The solution of (15) is provided by [24, Section 5.1.2] computational complexity of QR decomposition of H using k 2 −1 the Householder QR algorithm (see Section IV), and develop a r = −sign(u1) u 2 , β = u 2 + |u1| u 2 , (16) modified version of Householder QR that exploits the sparsity v = u + sign(u1) u 2e1 , (17) of H and reduces the overall computational complexity by k where u is the first element of u. For notational convenience, an order of magnitude (see Section V). 1 we use u−1 to denote the vector obtained by removing the first T T element of u, i.e., u = [u1 u−1] . IV. STANDARD HOUSEHOLDER QR Remark 2: A key observation of (17) is that the vectors v As mentioned in Section III-D, a numerically robust and and u differ only by their first elements (v1 = u1 −r). In other efficient QR decomposition algorithm with time complexity words, v−1 = u−1. This property plays a pivotal role in the and space complexity at most O(N 3) and O(N 2), respec- development of the Modified Householder QR in Section V. tively, is a prerequisite for successfully implementing the state and covariance updates through (12)-(13). Several methods B. Description of the Standard Householder QR exist for performing QR factorization, such as the Cholesky In the previous section (see Section IV-A), we provide a decomposition (CHO), the modified Gram-Schmidt process framework to generate a Householder matrix Q such that the (MGS), the Givens rotations (GIV), or the Householder trans- product QH eliminates all except the first element of the first formations (also called Householder reflections) [24, Sec- column of H. The Standard Householder QR algorithm (with tion 5.2]. Due to its simplicity and numerical stability, we column pivoting) [24, Algorithm 5.4.1] extends the above adopt the column pivoted QR factorization algorithm utilizing strategy by applying a sequence of Householder matrices Householder transformations, which is termed as Standard (multiplying H from left by Q), as well as a sequence of Householder QR in this paper.2 In what follows, we present a column permutation matrices (multiplying H from right by brief overview of Householder reflections (see Section IV-A), Π), to gradually transform H into an upper triangular form as well as the essence of the Standard Householder QR RH, whose diagonal elements ri,i, i =1,...,nH, satisfying algorithm (see Section IV-B). Additionally, the computational |ri,i| ≥ |rj,j | for 1 ≤ i < j ≤ nH. complexity analysis conducted in Sections IV-C reveals that More specifically, suppose that after (ℓ − 1) Householder T the QR decomposition of Hk, when applying the Standard matrices {Qi = ImH − βivivi ,i = 1,...,ℓ − 1} have left- multiplied H, as well as (ℓ − 1) column permutation matrices 2We have conducted computational complexity analysis when employing {Πi,i =1,...,ℓ − 1} have right-multiplied H, the resulting CHO, MGS, and GIV. More specifically, it is shown in Appendix B that the (ℓ−1) time complexity and space complexity of CHO are O(N 3) and O(N 2), matrix H takes the following block form respectively, due to the sparse structure of Hk. However, CHO is not HTH (ℓ−1) (ℓ−1) applicable since it requires k k to be positive definite. On the other (ℓ−1) H1,1 H1,2 4 3 H =Qℓ−1 Q1HΠ1 Πℓ−1 = , (18) hand, both MGS and GIV require O(N ) arithmetic operations and O(N ) 0 H(ℓ−1) memory cells (see Appendix B). 2,2 (ℓ−1) where H1,1 is an upper triangular matrix of dimensions (ℓ− In summary, we have briefly described column pivoting 1) × (ℓ−1) with the absolute value of its diagonal elements and Householder update at the ℓth iteration of the Standard arranged in decreasing order. Householder QR algorithm. The overall algorithm terminates At the ℓth iteration, the Standard Householder QR algo- when c∗ < ǫ, where ǫ is a small positive scalar chosen rithm firstly performs column pivoting (i.e., right-multiplying to take into account of machine roundoff errors (ideally the (ℓ−1) ∗ H with Πℓ), followed by an Householder update by left- algorithm terminates when c = 0). Denoting the overall (ℓ−1) multiplying H Πℓ with Qℓ. number of iterations as l, the output RH is selected as the (l) Specifically, for column pivoting, we compute the squared first nH rows of H (Note that the explicit expression of each (ℓ−1) individual Q , as well as Q Q Q , is norm of every column of H2,2 , denoted as cℓ,...,cnH , and ℓ, ℓ = 1,...,l H = 1 l ∗ ∗ seek ℓ = argmaxj {cj, j = ℓ,...,nH} and c = cℓ∗ . Note not computed, since the covariance update [see (12)] does not that cℓ,...,cnH can be updated recursively by a formula (see involve QH. In addition, we never entail the explicit formation Algorithm 1, Line 11) discovered by Businger and Golub [25], of Πℓ, ℓ = 1,...,l, as well as ΠH = Π1 Πl. Instead, which significantly reduces the computational complexity [24, ΠH is represented and updated by a permutation vector
Section 5.4.1]. The permutation matrix Πℓ is generated by [π1 ... πnH ], i.e., ΠH = [eπ1 ... eπnH ]. For completeness, exchanging canonical vectors eℓ and eℓ∗ in InH . Furthermore, the flow chart of the Standard Householder QR is summarized (ℓ−1) H Πℓ is equivalent to performing column permutations by in Algorithm 1. swapping the ℓth column and ℓ∗th column of H(ℓ−1). Without loss of generality, in what follows, we assume Πℓ = InH . Algorithm 1 Standard Householder QR T (0) For Householder update, we seek Qℓ = ImH −βℓvℓvℓ such Require: H = H of dimensions mH × nH (mH ≥ nH). (ℓ) T (ℓ−1) that the first ℓ columns of H = (ImH − βℓvℓvℓ )H Πℓ Ensure: RH and ΠH of dimensions nH × nH. (ℓ−1) are upper triangular. Since H1,1 is upper triangular, the 1: for i =1 to nH, do Householder matrix Qℓ only needs to zero all but the 1st 2: Initialize πi = i, and compute ci, the squared norm of u u H(ℓ−1) (0) (0) component of , where is the first column of 2,2 , while the ith column of H2,2 = H . leaving the previous ℓ−1 columns of H(ℓ−1) intact [24, Section 3: end for T T ∗ 5.2.1]. This is achieved by selecting vℓ = [01×(ℓ−1) v ] , with 4: Set ℓ =1, determine ℓ = argmaxj {cj, j = ℓ,...,nH} ∗ and c =cℓ∗ . v = u + sign(u )(c∗)1/2 e , (19) 1 1 5: while c∗ >ǫ do ∗ (ℓ−1) ∗ 2 ∗ ∗ 1/2 −1 6: Swap the ℓth and ℓ th columns of H , exchange cℓ where c = u 2 and βℓ = (c + |u1|(c ) ) . Note that ∗ ∗ ∗ c = c ∗ is available after the column pivoting stage and needs and cℓ , and then interchange πℓ with πℓ . ℓ (ℓ−1) ∗ T not to be recomputed through c = u u. 7: Calculate βℓ, v from u (u is the first column of H2,2 ) (ℓ) (ℓ−1) ∗ Now let us examine the structure of H = QℓH Πℓ, using (19), and rℓ from c using (21). T after the ℓth iteration of the Standard Householder QR. For 8: Compute δ and s = [s1 ... snH−ℓ] [see (22)]. (ℓ) (ℓ−1) (ℓ−1) 9: Update the sub-matrix H [see (23)]. clarity, we partition the matrices H1,2 and H2,2 as follows: 2,2 (ℓ−1) (ℓ−1) (ℓ−1) 10: for j = ℓ +1 to n , do H = [t H ], where t is the first column of H , and H 1,2 1,2 1,2 11: 2 (ℓ−1) (ℓ−1) Modify cj ⇐ cj − sj−ℓ. H1,2 consists of its remaining columns; similarly H2,2 = 12: end for T u ¯s (ℓ−1) ∗ ∗ (ℓ−1) 1 H 13: Seek ℓ = argmaxj {cj, j = ℓ +1,...,nH} and c = u H2,2 = (ℓ−1) , where 2,2 consists of all u H c ∗ , and then ℓ ⇐ ℓ +1. −1 2,2 ℓ (ℓ−1) T 14: end while but the first column of H2,2 , the row vector ¯s represents (ℓ−1) (ℓ−1) 15: return Π =[e 1 ... e ], and R selected as the first H H sT H π πnH H the first row of 2,2 , and 2,2 is obtained by removing ¯ (l) (ℓ−1) nH rows of H , l being the total number of iterations. from H2,2 . Using this notation, we establish Proposition 2. Proposition 2: H(ℓ) has the following block structure
(ℓ) (ℓ) (ℓ) H1,1 H1,2 H = Qℓ Q1HΠ1 Πℓ = (ℓ) , (20) C. Complexity Analysis of the Standard Householder QR 0 H2,2 We now analyze the computational complexity of factor- (ℓ−1) (ℓ) H1,1 t izing H using the Standard Householder QR algorithm. The where H1,1 = is an upper triangular matrix of 0 rℓ main result is that the time complexity and space complexity, T when employing the Standard Householder QR algorithm, are dimensions ℓ × ℓ, and H(ℓ) = (ℓ−1) T , with 1,2 H1,2 s O(N 4) and O(N 3), respectively. As explained later on, this high computational cost is due to the fact that the sparse struc- r = −sign( u )(c∗) 1/2 , (21) ℓ 1 ture of H is destroyed by the Householder transformations. s = ¯s − v1δ , (22) To proceed, we notice that the computational complexity of (ℓ) (ℓ−1) T the Standard Householder QR algorithm is contributed from H2,2 = H2,2 − u−1δ . (23) two sources, (1) column pivoting in association with Πℓ and (ℓ−1) T where δ = βℓ H2,2 v. (2) Householder update in association with Qℓ,ℓ =1,...,nH. Proof: The proof is shown in Appendix C. Specifically, we decompose Algorithm 1 into two categories, where Lines 1-4, 6, 10-13 corresponds to column pivoting, From Proposition 3, the overall time complexity associated nΛ while Householder update consists of Lines 7-9. We examine with Householder update is in the order of ℓ=1(nΛ −ℓ)τℓ−1. the overhead associated with each individual source separately. In what follows, we firstly seek the expressions of τℓ−1,ℓ = Thus, the total complexity of the algorithm is dominated by 1,...,nΛ, to determine the time complexity associated with the higher overhead from these two sources. Householder update (see Section IV-C1), followed by exam- By exploiting the sparse structure of H, we can show the ining the space complexity in Section IV-C2. time complexity and space complexity associated with column 1) Time Complexity: In this section, we focus on analyzing pivoting in the Standard Householder QR are O(N 2) and the number of arithmetic operations for Householder update O(N), respectively. More details are provided in Appendix D. per iteration (see Algorithm 1, Lines 7-9). Additionally, we Remark 3: As will become clear in following sections, the examine the evolution of the sparsity pattern of Λ. computational overhead associated with column pivoting is at Initially, Λ(0) = Λ has a sparse structure [see (24)]. More least one order of magnitude less than that of Householder specifically, each column of Λ has τ¯0 = τ0 = N −1 non-zeros, update. Furthermore, in practice, column pivoting can be and nnz(Λ) = (N − 1)N. efficiently implemented using pointers, without physically • At the 1st iteration (0) (0) swapping data in memory. In what follows, without loss Since u, the first column of Λ2,2 = Λ , satisfies nnz(u)= of generality, we assume Πℓ = InH ,ℓ = 1,...,l, hence τ0 = N − 1, based on Proposition 3, the time complexity of 2 ΠH = InH and [π1 ... πnH ] = [1 ... nH]. Householder update at the 1st iteration is O((N − 1) ). In what follows, we address the computational complexity Now let us examine the structure of Λ(1), after the 1st associated with Householder update in the Standard House- Householder matrix Q1 (generated by the Householder vector holder QR (see Lines 7-9 in Algorithm 1). For the purpose of v) has been applied to Λ(0). Due to the nonzero elements of presentation, we introduce a matrix Λ of dimensions mΛ×nΛ, u (i.e., ψ1,j , j = 2,...,N), the first N − 1 elements of all (1) with mΛ = (N − 1)N/2 = mH/4 and nΛ = N = nH/3 except for the first column of Λ become nonzero [see (25)]. [see (24)]. Specifically, Λ is constructed by firstly removing Therefore the Householder update does not preserve the orig- i,j (0) (1) (0) from H the measurement Jacobian matrices H for all inal sparsity of Λ , and the resulting matrix Λ = Q1Λ 1 ≤ j