A Sparsity-aware QR Decomposition Algorithm for Efficient Cooperative Localization

Ke Zhou and Stergios I. Roumeliotis {kezhou|stergios}@cs.umn.edu

Department of Computer Science & Engineering University of Minnesota

Multiple Autonomous Robotic Systems Labroratory Technical Report Number -2011-004 September 2011

Dept. of Computer Science & Engineering University of Minnesota 4-192 EE/CS Building 200 Union St. S.E. Minneapolis, MN 55455 Tel: (612) 625-2217 Fax: (612) 625-0572 A Sparsity-aware QR Decomposition Algorithm for Efficient Cooperative Localization

Ke Zhou∗ and Stergios I. Roumeliotis† ∗Dept. of Electrical and Computer Engineering, †Dept. of Computer Science and Engineering University of Minnesota, Minneapolis, MN 55455 Email: {kezhou|stergios}@cs.umn.edu

Abstract—This paper focuses on reducing the computational However, since CL involves joint-state estimation, the pro- complexity of the extended Kalman filter (EKF)-based multi- cessing of exteroceptive measurements (i.e., relative robot-to- robot cooperative localization (CL) by taking advantage of the robot measurements) requires the robots to communicate with sparse structure of the measurement Jacobian H. In contrast to the standard EKF update, whose time complexity each other and update the covariance matrix corresponding is up to O(N 4) (N is the number of robots in a team), we to all pose estimates in a centralized fashion. Using the introduce a Modified Householder QR algorithm which fully extended Kalman filter (EKF) framework, the time complexity exploits the sparse structure of the matrix H, and prove that the for processing each exteroceptive measurement is O(N 2) (N overall time complexity of the EKF update, based on our QR being the number of robots) [7]. Since the maximum possible factorization scheme, reduces to O(N 3). Additionally, the space complexity of the proposed Modified Householder QR algorithm number of exteroceptive measurements is (N − 1)N at every is O(N 2). Finally, we validate the Modified Householder QR time step, the overall time complexity for updating the state algorithm through extensive simulations and experiments, and and covariance, in the worst case, becomes O(N 4) per time demonstrate its superior performance both in accuracy and in step. As N grows, this high (time) complexity may prohibit CPU runtime, as compared to the current state-of-the-art QR real-time performance. decomposition algorithm. In this paper, we investigate the computational complexity (i.e., time complexity and space complexity) of the centralized EKF-based CL algorithm, considering the most challenging I.INTRODUCTION situation where the total number of relative measurements per Multi-robot teams (sensor networks) have recently attracted time step is (N −1)N. The main contributions of this work are the following: We show that the time complexity of state and significant interest in the research community because of their 3 robustness, versatility, speed, and potential applications, such covariance updates can be reduced to O(N ) by employing as environmental monitoring [1], surveillance [2], human- the Information filter (see Section III-C). To further improve robot interaction [3], defense applications [4], as well as target the of the Information filter, we present an tracking [5], [6]. EKF update scheme based on the QR factorization (also called QR decomposition) of the measurement Jacobian H (see Regardless of the applications, every robot in the team must Section III-D). While the Standard Householder QR algorithm be able to accurately localize itself in an unknown environ- has time complexity O(N 4) and space complexity O(N 3) ment to ensure successful execution of its tasks. While each for decomposing (or factorizing) H, we develop the Modified robot can independently estimate its own pose (position and Householder QR algorithm (see Section V), which exploits orientation) by integrating its linear and rotational velocities, the sparse structure of H and reduces the time complexity e.g., from wheel encoder [7], the uncertainty of the robot and space complexity of QR decomposition to 3 and pose estimates generated using this technique (dead-reckoning) O(N ) O(N 2), respectively. As a result, the overall computational increases fast, and eventually renders these estimates useless. complexity of the EKF-based CL is reduced by an order of Although one can overcome this limitation by equipping every magnitude (from O(N 4) to O(N 3) for time complexity and robot with absolute positioning sensors such as the Global O(N 3) to O(N 2) for space complexity) per update step. Positioning System (GPS), GPS signals are unreliable in urban Following a brief review of related work in Section II, environments and unavailable in space and underwater. On we present the formulation of the EKF-based CL, as well the other hand, by performing cooperative localization (CL), as an EKF update scheme based on the QR factorization where communicating robots use relative measurements (dis- of H in Section III. In Section IV, we briefly describe the tance, bearing, and orientation) to jointly estimate the robots’ QR decomposition based on the Standard Householder QR poses, it has been shown that the accuracy and performance algorithm, and show its time complexity and space complexity of the robot pose estimates can significantly improve [8], even of factorizing H are O(N 4) and O(N 3), respectively. We then in the absence of GPS. develop the Modified Householder QR algorithm in Section V, As shown in [7], in CL, each robot can process its own which exploits the sparse structure of H and achieves QR proprioceptive measurements independently and distributively. decomposition with time complexity O(N 3) and space com- plexity O(N 2). Simulation and real-world experimental results This work was supported by the University of Minnesota through the Digital Technology Center (DTC), the National Science Foundation (IIS-0643680, are presented in Sections VI and VII, respectively, followed IIS-0811946, IIS-0835637), and AFOSR (FA9550-10-1-0567). by the conclusions and future directions in Section VIII. II. LITERATURE REVIEW The main drawback of the previous gradient-based iterative Multi-robot cooperative localization (CL) has received con- algorithms (see [13], [16], [17]) is that the total number of siderable attention in the literature (e.g., [9], [10], [11], [12]). iterations required for convergence has not been addressed, as Various system architectures, such as centralized [7], [13] well as no guarantees of convergence to global optimum, due or distributed [14], [15], [16], have been proposed for CL. to the existence of multiple local minima of the resulting non- These system architectures use various estimation algorithms, convex optimization problem arising from CL. Furthermore, such as the EKF [7], the maximum likelihood estimator the above batch-mode estimators can be implemented only (MLE) [13], the maximum a posteriori estimator (MAP) [16], when all data become available, and are unable to provide on- and particle filters [10]. In this paper, we focus our discussion line estimations when new data arrives. This motivates us to on centralized or distributed approaches that fuse data in batch consider EKF-based estimators in CL. mode (i.e., MLE or MAP, see Section II-A) or sequentially B. EKF-based Estimator (i.e., EKF, see Section II-B). The latter case is the main focus of our work. In what follows, we denote as K the total number In [7], Roumeliotis and Bekey showed that within the EKF of time steps addressed in batch mode, N the number of framework, each robot can independently propagate its own robots in the team, and consider the worst-case computational state and covariance in a fully distributed fashion. However, complexity where the total number of relative measurements during the update stage, the observing robot needs to broadcast per time step is (N − 1)N. its relative measurements to the rest of the robots in the team, and update the covariance matrix corresponding to all pose estimates in a centralized fashion. In this approach, the time A. MLE or MAP-based Estimator complexity for processing each exteroceptive measurement Howard et al. [13] presented a centralized MLE-based is O(N 2). Therefore, in the worst case, the overall time algorithm to address the CL problem. The solution of the complexity for sequentially updating state and covariance underlying non-convex optimization problem, formulated by becomes O(N 4) per time step. To reduce the cost of EKF- maximizing the conditional probability density function (pdf), based CL, several approximate algorithms have been proposed. is computed iteratively by the conjugate gradient (CG) algo- Panzieri et al. [18] presented a fully decentralized algorithm rithm with time complexity per CG iteration of O(KN 2). based on the Interlaced EKF [19], where each robot only Contrary to [13], where all measurements are transmitted processes relative measurements taken by itself to update its and processed at a fusion center, the authors [15] proposed own pose estimate, while treating the other robots’ poses as to solve the MLE in a distributed fashion. The original non- deterministic parameters. The time complexity per robot and convex optimization problem is decomposed into N sub- per time step is O(N) (hence the overall time complexity problems, one for each robot. In particular, the ith robot (i.e., per time step is O(N 2)) when relative measurements are robot-i) minimizes only a part of the cost function that involves processed sequentially. Similar to [18], Karam et al. [20] terms corresponding to its own proprioceptive measurements, proposed a distributed EKF-based method for CL. However, as well as the relative measurements between robot-i and other contrary to [18], here the robots are restricted to exchange robots. The optimization process in this case approximates their state estimates with others within communication range. other robots’ poses as constants. However, there exists no The time complexity per robot and per time step is O(N 2), proof that the proposed approach guarantees convergence to resulting in the overall time complexity O(N 3) per time step. even one of the local minima. Additionally, the computational Martinelli [21] developed a distributed approach for CL that complexity of the algorithm is not addressed in the paper. uses hierarchical EKF filters to estimate the robots’ poses. In Similar to [13], Dellaert et al. [17] addressed MAP-based this case, N robots in a team are divided into L groups, each centralized CL by incorporating prior information about the consisting of M robots (N = LM). Every group contains initial poses of robots. Contrary to [13], the resulting non- a group leader who processes all the relative measurements convex optimization problem is solved iteratively by the between any two robots in that group and only updates the Levenberg-Marquardt (LM) algorithm. The solution of the pose estimates of the robots belonging to its group. A team system of linear equations at each iteration of the LM method leader is in charge of processing all observations between any is determined through a sparse QR solver. Although some two robots from different groups and only updates the pose insights were provided in the selection of initial estimates, estimates of the L leaders. The time complexity per time step the algorithm does not guarantee convergence to the global for each group leader and the team leader are O(M 4) and optimum. Furthermore, the paper does not provide information O(N(N − M)L2), respectively. about the worst-case time complexity of the sparse QR solver. The main drawback of the above approximate algorithms Recently, Nerurkar et al. [16] introduced a fully distributed (see [18], [20], [21]) is that in order to reduce the computa- CG algorithm to address the MAP-based CL. Similar to [17], tional complexity of EKF-based CL, these approaches ignore the resulting non-convex optimization problem is solved it- cross-correlations amongst robots, which often leads to overly eratively using the LM algorithm. In contrast to [13], the optimistic and inconsistent estimates. In this paper, we present incremental solution at each iteration of the LM method an algorithm that reduces the time complexity of EKF-based is distributively computed by the CG algorithm. The time CL to O(N 3), without introducing any approximations, but, complexity per robot and per iteration of the LM method is instead, by taking advantage of the specific sparse structure of of O((KN)2). the measurement Jacobian matrix. III. PROBLEM FORMULATION measuring robot-j (1 ≤ i = j ≤ N) are given by Consider a group of N mobile robots performing CL in 2-D zi,j = hi,j (xi , xj )+ ni,j , (2) by processing relative distance and bearing measurements. In k k k k k i,j i j i,j i,j T this paper, we study the case of global localization, i.e., the where hk (xk, xk) = [dk θk ] , with pose (position and orientation) of each robot is described with di,j = pj − pi , respect to a fixed (global) frame of reference. k k k 2 The pose of the ith robot (or robot-i) at time-step k is yj − yi θi,j = arctan k k − ϕi denoted as xi = [(pi )T ϕi ]T, where pi = [xi yi ]T and ϕi k j i k k k k k k k k xk − xk represent the position and orientation of robot-i at time-step k with respect to the global frame of reference, respectively. denoting the true distance and bearing from robot-i observing i,j i,j i,j T The state vector for the robot team at time-step k is defined robot-j at time-step k, respectively. nk = [nk,d nk,θ] is 1 T 2 T N T T 3N zero-mean white Gaussian measurement noise with covariance as xk = [(x ) (x ) ... (x ) ] ∈ R . We assume that k k k i,j i,j each robot is equipped with proprioceptive sensors (e.g., wheel Cn . Without loss of generality, we assume Cn = I2 (1 ≤ encoders), which measure its linear and rotational velocities, as i = j ≤ N) throughout the rest of paper. i,j well as exteroceptive sensors (e.g., laser scanners), which can The measurement-error equation for zk , obtained by lin- detect, identify, and measure the relative distance and bearing earizing (2) around the current best state estimate xˆk|k−1, is to other robots. i,j i,j i,j i j ˜zk|k−1 = zk − hk (xˆk|k−1, xˆk|k−1) (3) i,j i i,j j i,j i,j i,j A. State Propagation ≈ Ψk x˜k|k−1 +Υk x˜k|k−1 +nk = Hk x˜k|k−1 +nk , The discrete-time state propagation equation for robot-i where from time-step k − 1 to k is pj pi T (ˆk|k−1−ˆk|k−1) − j 0 p pi 2 i i i i i i,j i,j ˆk|k−1−ˆk|k−1 x = f (x , u , w ) , i =1,...,N, Ψ ∇ i h j T T k k−1 k−1 k−1 k−1 k = x k = p pi J , (4) k  (ˆk|k−1−ˆk|k−1)  i i i T − j i 2 −1 u pˆ 1−pˆ 12 where the control input k−1 = [vk−1 ωk−1] , consisting of  k|k− k|k−  i j T  pˆ pˆi  the linear velocity measurement vk−1 and rotational velocity ( k|k−1− k|k−1) i j 0 pˆ pˆi 2 measurement ωk−1 recorded by the proprioceptive sensors, i,j i,j k|k−1− k|k−1 Υ ∇ j h j T T k = x k = p pi J (5) is corrupted by zero-mean, white Gaussian process noise k  (ˆk|k−1−ˆk|k−1)  i i i T i j i 2 0 w with covariance C . pˆ 1−pˆ 12 k−1 = [wk−1,v wk−1,ω] w  k|k− k|k−  In this work, we employ the extended Kalman filter (EKF)   i 0 −1 for recursively estimating the robot’s pose xk, i =1,...,N. are of dimensions 2×3, and J = . Furthermore, note 1 1 0 Thus the estimate of robot-i’s pose is propagated by : i,j that the measurement Jacobian matrix Hk (of dimensions i i i i xˆk|k−1 = fk−1(xˆk−1|k−1, uk−1, 0) , i =1,...,N, 2 × 3N) has a sparse structure (without loss of generality, we assume i < j): where xˆℓ|j is the state estimate at time-step ℓ, after measure- i,j i,j i,j ments up to time-step j have been processed. Hk = 02×3(i−1) Ψk 02×3(j−i−1) Υk 02×3(N−j) . The covariance matrix corresponding to the state estimate In fact, there are at most 9 nonzero elements in Hi,j (5 from xˆk|k−1 is propagated as k Ψi,j and 4 from Υi,j ). T T k k Pk|k−1 = Φk−1Pk−1|k−1Φk−1 + Gk−1CwGk−1 , (1) In this paper, we consider the worst-case scenario where the 1 N i sensing range of the exteroceptive sensors is sufficiently large where Φk−1 = diag(Φk−1,..., Φk−1) with Φk−1 = i 1 N so that each robot can detect, identify, and measure the relative ∇ i f , and Gk−1 = diag(G ,..., G ) with xk−1 k−1 k−1 k−1 i i distance and bearing to the remaining N − 1 robots at every G = ∇wi f , i =1,...,N. The overall process noise k−1 k−1 k−1 time step. Thus, the total number of relative measurements 1 N covariance Cw = diag(Cw,..., Cw ). per time step is (N − 1)N, which corresponds to the most Note that due to the block diagonal structures of Φk−1 and challenging case in terms of computational complexity. For Gk−1, the overall computational complexity (both time and the general case when the number of measurements is less 2 space complexity) of implementing (1) is O(N ) [7]. than (N −1)N due to the limited sensing range of each robot, our proposed method (see Section V) can be readily applied B. Measurement Model without modifications. At time-step k, the relative distance and bearing obser- The measurement-error equation for the robot team, ob- i,j vations recorded by the exteroceptive sensors from robot-i tained by stacking all the measurement residuals z˜k|k−1 [see (3)] into a column vector, is 1In the remainder of the paper, the “hat” symbol ˆ is used to denote the estimated value of a quantity, while the “tilde” symbol ˜ is used to signify the z˜k|k−1 ≈ Hkx˜k|k−1 + nk , error between the actual value of a quantity and its estimate. The relationship between a variable x and its estimate xˆ, is x˜ = x − xˆ. Additionally, 0m×n 1,2 T i,j T N,N−1 T T where z˜k|k−1 = [(˜zk|k−1) ... (˜zk|k−1) ... (˜zk|k−1 ) ] and In represent the m×n zero matrix and n×n identity matrix, respectively. 1n denotes the n dimensional (column) vector with all elements being 1, and is the measurement residual error vector of dimension e 1,2 T i,j T N,N−1 T T j is the unit (column) vector with a 1 in the jth coordinate and 0’s elsewhere. 2(N − 1)N, and nk = [(nk ) ... (nk ) ... (nk ) ] is zero-mean, white Gaussian measurement noise with covari- Unfortunately, the time complexity and space complexity 6 ance Cn = I2(N−1)N . of the state and covariance updates using (7)-(8) are O(N ) The overall measurement Jacobian matrix Hk = and O(N 4), respectively, due to the inversion and storage H1,2 T Hi,j T HN,N−1 T T, whose dimensions are [( k ) ... ( k ) ... ( k ) ] of a dense matrix Sk whose dimensions are 2(N − 1)N × 2(N − 1)N × 3N, has the following sparse structure: 2(N − 1)N. Therefore, the real-time implementation of (7)- Ψ1,2 Υ1,2 (8) becomes prohibitive as the number of robots increases. k k 2 1 2 1  Υ , Ψ ,  It is possible to avoid the inversion and storage of Sk k k  .  by processing (N − 1)N measurements sequentially. In this  .   1 1  case, since the time complexity for processing every relative Ψ ,N Υ ,N i,j  k k  2  1 1  measurement z is O(N ) [7], the total time complexity per ΥN, ΨN,  k  k k  time step becomes O(N 4). In addition, space complexity can  Ψ2,3 Υ2,3   k k  2  Υ3,2 Ψ3,2  be reduced to O(N ) [7]. However, due to the nonlinearity of Hk =  k k  . (6)  .  the measurement model [see (2)], the robot pose estimates  .   .  obtained by sequential updates are less accurate than the  Ψ2,N Υ2,N   k k  estimates obtained by concurrent updates using (7)-(8).  N,2 N,2   Υ Ψ   k k  Another alternative approach for updating the state and  . . .   . . .  covariance is to employ the Information filter [22, Chapter  −1 −1  ΨN ,N ΥN ,N 5, Section 5.6] (for Cn = I2(N−1)N ):  k k   N,N−1 N,N−1   Υ Ψ  −1 k k −1 T Pk|k = Pk|k−1 + Hk Hk , (9) Remark 1: We highlight two important properties of Hk. T Firstly, due to its sparse structure [see (6)], each row of Hk xˆk|k = xˆk|k−1 + Pk|kHk ˜zk|k−1 . (10) has at most 5 nonzero elements [see (4)-(5)]. Per column, there It is worth noting that computing HTH and P HTz˜ exist at most 4(N − 1) non-zeros, e.g., the nonzero elements k k k|k k k|k−1 requires basic computational steps only O(N 2), due to the of the first column of H originate from the first columns of k sparse structure of H (See Appendix A). Hence, the most the matrices Ψ1,j and Υj,1, j =2,...,N, where each column k k k computationally demanding operation of (9)-(10) involves the of Ψ1,j (or Υj,1) has at most 2 nonzero elements [see (4)- k k inversions of two matrices (i.e., P and P−1 +HTH ), (5)]. Therefore nnz(H ) ∼ O(N 2), where nnz(H) denotes k|k−1 k|k−1 k k k whose dimensions are linear in N. Thus, the total time the number of nonzero elements of H, and the storage of complexity of the state and covariance updates using (9)-(10) H can be efficiently achieved using O(N 2) memory cells. k is O(N 3), a significant complexity reduction as compared to Secondly, since all the exteroceptive measurements are relative the standard EKF updates using (7)-(8). Additionally, since observations between each pair of robots, and absolute pose both P and H occupy O(N 2) memory cells (because measurements (such as GPS) are unavailable, H is rank k|k−1 k k of nnz(H ) ∼ O(N 2) and we only need to store non-zeros of deficiency. In particular, we have the following proposition. k H ), the space complexity of updating the state and covariance Proposition 1: rank(H ) ≤ 3N − 2, hence, H is not full k k k using (9)-(10) is reduced from O(N 4) to O(N 2). (column) rank. Unfortunately, (9) often suffers from numerical instability, Proof: To show rank(Hk) ≤ 3N − 2, we first construct T 2 due to κ(H Hk)= κ (Hk) (κ(H) is the condition number of I2 k a 3N × 2 matrix Ξ = 1N ⊗ , where ⊗ denotes H), which renders the state and covariance updates using (9)- 01×2 the Kronecker tensor product. Next, for every sub-matrix (10) numerically less robust [23]. i,j Hk , 1 ≤ i = j ≤ N, it is straightforward to verify that T T (pˆj −pˆi ) (pˆj −pˆi ) k|k−1 k|k−1 k|k−1 k|k−1 D. State and Covariance Updates through QR Factorization − j j p pi 2 p pi 2 i,j ˆk|k−1−ˆk|k−1 ˆk|k−1−ˆk|k−1 H Ξ j T T j T T 0 k = p pi J + p pi J = .  (ˆk|k−1−ˆk|k−1)   (ˆk|k−1−ˆk|k−1)  An effective strategy to overcome the numerical instability − j i 2 j i 2 pˆ 1−pˆ 12 pˆ 1−pˆ 12 of the Information filter [see (9)-(10)] is to apply thin QR  k|k− k|k−   k|k− k|k−      factorization with column pivoting on Hk [23], i.e., Hence, HkΞ=02(N−1)N×2. Since rank(Ξ)=2, we conclude T rank(Hk) ≤ 3N − 2, and thus Hk is not full rank. H Q R Π (11) k = Hk Hk Hk ,

where QHk is a 2(N − 1)N × 3N matrix with orthonormal C. State and Covariance Updates T columns (i.e., Q QH = I3N ), and RH is an upper Once all the relative measurements zi,j , Hk k k k 1 ≤ i = j ≤ N, with the absolute value of its diagonal become available, the state estimate and its covariance are elements arranged in decreasing order. Furthermore, ΠHk is a updated as (column) permutation matrix generated from column pivoting T T techniques, satisfying ΠH Π = Π ΠH = I . Note xˆk|k = xˆk|k−1 + Kkz˜k|k−1, (7) k Hk Hk k 3N T that in order to ensure orthogonality of QHk , it is necessary Pk|k = Pk|k−1 − KkSkKk , (8) to invoke column pivoting on Hk [24, Section 5.4.1], due to T −1 the fact that H is rank deficient (see Proposition 1). where Kk = Pk|k−1Hk Sk is the Kalman gain, Sk = k T HkPk|k−1Hk +Cn is the measurement residual covariance. Substituting (11) into (9), we obtain the EKF update equa- tions based on QR factorization Householder QR, has time complexity O(N 4) and space 3 −1 complexity O(N ). For clarity, the time-step index k is P P−1 Π RT R ΠT (12) k|k = k|k−1 + Hk Hk Hk Hk dropped from (11) throughout the rest of the paper, with T −1 T mH = 2(N − 1)N and nH = 3N denoting the number of = P − P ΠH R Σ RH Π P , k|k−1 k|k−1 k Hk k k Hk k|k−1 H T rows and columns of , respectively. xˆk|k = xˆk|k−1 + Pk|kHk ˜zk|k−1 , (13) where Σ R ΠT P Π RT I , and the last A. Householder Reflection k = Hk Hk k|k−1 Hk Hk + 3N equality in (12) is established by the matrix inversion lemma. We first introduce an orthogonal and symmetric House- T 2 Note that in contrast to Sk [see (8)], whose dimensions are holder (reflection) matrix Q = I − βvv , where β =2/v2, 2(N −1)N ×2(N −1)N, the matrix Σk in (12) has dimensions and the nonzero vector v is called a Householder vector. Any matrix H, multiplied by Q, can be computed as only 3N ×3N. Thus, assuming both RHk and ΠHk are given, the total time complexity and space complexity of the state and T QH = (I − βvvT)H = H − v βHTv . (14) covariance updates using (12)-(13) are O(N 3) and O(N 2), respectively, the same order of computational complexity as It is worth emphasizing that Householder updates [see (14)] of using (9)-(10). More importantly, κ(RHk ) = κ(Hk) [24]. “never entail the explicit formation of the Householder matrix Hence the numerical stability of (12)-(13) is improved as Q. Instead, an Householder update involves a matrix-vector compared to (9)-(10). multiplication ζ = βHTv and an outer product update H − Therefore, in order to ensure that the time complexity vζT” to reduce computational complexity [24, Section 5.1.4]. and space complexity for the state and covariance updates Now suppose for a given matrix H, our objective is to seek through QR factorization [see (12)-(13)] remain O(N 3) and a Householder matrix Q (or corresponding Householder vector O(N 2), we conclude that the maximum number of arithmetic v), such that the first column of QH has zeros below its first operations and memory usage to implement (11) should be component. In other words, letting u denote the first column O(N 3) and O(N 2), respectively. of H, we select a vector v and a scalar α, such that In what follows, we focus on the Householder QR algo- Qu = u − β(vTu)v = re . (15) rithm, one of the most widely adopted numerical techniques 1 for QR factorization [24, Section 5.2.1]. We analyze the The solution of (15) is provided by [24, Section 5.1.2] computational complexity of QR decomposition of H using k 2 −1 the Householder QR algorithm (see Section IV), and develop a r = −sign(u1)u2 , β = u2 + |u1|u2 , (16) modified version of Householder QR that exploits the sparsity v = u + sign(u1)u2e1 , (17) of H and reduces the overall computational complexity by k where u is the first element of u. For notational convenience, an order of magnitude (see Section V). 1 we use u−1 to denote the vector obtained by removing the first T T element of u, i.e., u = [u1 u−1] . IV. STANDARD HOUSEHOLDER QR Remark 2: A key observation of (17) is that the vectors v As mentioned in Section III-D, a numerically robust and and u differ only by their first elements (v1 = u1 −r). In other efficient QR decomposition algorithm with time complexity words, v−1 = u−1. This property plays a pivotal role in the and space complexity at most O(N 3) and O(N 2), respec- development of the Modified Householder QR in Section V. tively, is a prerequisite for successfully implementing the state and covariance updates through (12)-(13). Several methods B. Description of the Standard Householder QR exist for performing QR factorization, such as the Cholesky In the previous section (see Section IV-A), we provide a decomposition (CHO), the modified Gram-Schmidt process framework to generate a Householder matrix Q such that the (MGS), the Givens rotations (GIV), or the Householder trans- product QH eliminates all except the first element of the first formations (also called Householder reflections) [24, Sec- column of H. The Standard Householder QR algorithm (with tion 5.2]. Due to its simplicity and numerical stability, we column pivoting) [24, Algorithm 5.4.1] extends the above adopt the column pivoted QR factorization algorithm utilizing strategy by applying a sequence of Householder matrices Householder transformations, which is termed as Standard (multiplying H from left by Q), as well as a sequence of Householder QR in this paper.2 In what follows, we present a column permutation matrices (multiplying H from right by brief overview of Householder reflections (see Section IV-A), Π), to gradually transform H into an upper triangular form as well as the essence of the Standard Householder QR RH, whose diagonal elements ri,i, i =1,...,nH, satisfying algorithm (see Section IV-B). Additionally, the computational |ri,i| ≥ |rj,j | for 1 ≤ i < j ≤ nH. complexity analysis conducted in Sections IV-C reveals that More specifically, suppose that after (ℓ − 1) Householder T the QR decomposition of Hk, when applying the Standard matrices {Qi = ImH − βivivi ,i = 1,...,ℓ − 1} have left- multiplied H, as well as (ℓ − 1) column permutation matrices 2We have conducted computational complexity analysis when employing {Πi,i =1,...,ℓ − 1} have right-multiplied H, the resulting CHO, MGS, and GIV. More specifically, it is shown in Appendix B that the (ℓ−1) time complexity and space complexity of CHO are O(N 3) and O(N 2), matrix H takes the following block form respectively, due to the sparse structure of Hk. However, CHO is not HTH (ℓ−1) (ℓ−1) applicable since it requires k k to be positive definite. On the other (ℓ−1) H1,1 H1,2 4 3 H =Qℓ−1 Q1HΠ1 Πℓ−1 = , (18) hand, both MGS and GIV require O(N ) arithmetic operations and O(N ) 0 H(ℓ−1) memory cells (see Appendix B). 2,2 (ℓ−1) where H1,1 is an upper triangular matrix of dimensions (ℓ− In summary, we have briefly described column pivoting 1) × (ℓ−1) with the absolute value of its diagonal elements and Householder update at the ℓth iteration of the Standard arranged in decreasing order. Householder QR algorithm. The overall algorithm terminates At the ℓth iteration, the Standard Householder QR algo- when c∗ < ǫ, where ǫ is a small positive scalar chosen rithm firstly performs column pivoting (i.e., right-multiplying to take into account of machine roundoff errors (ideally the (ℓ−1) ∗ H with Πℓ), followed by an Householder update by left- algorithm terminates when c = 0). Denoting the overall (ℓ−1) multiplying H Πℓ with Qℓ. number of iterations as l, the output RH is selected as the (l) Specifically, for column pivoting, we compute the squared first nH rows of H (Note that the explicit expression of each (ℓ−1) individual Q , as well as Q Q Q , is norm of every column of H2,2 , denoted as cℓ,...,cnH , and ℓ, ℓ = 1,...,l H = 1 l ∗ ∗ seek ℓ = argmaxj {cj, j = ℓ,...,nH} and c = cℓ∗ . Note not computed, since the covariance update [see (12)] does not that cℓ,...,cnH can be updated recursively by a formula (see involve QH. In addition, we never entail the explicit formation Algorithm 1, Line 11) discovered by Businger and Golub [25], of Πℓ, ℓ = 1,...,l, as well as ΠH = Π1 Πl. Instead, which significantly reduces the computational complexity [24, ΠH is represented and updated by a permutation vector

Section 5.4.1]. The permutation matrix Πℓ is generated by [π1 ... πnH ], i.e., ΠH = [eπ1 ... eπnH ]. For completeness, exchanging canonical vectors eℓ and eℓ∗ in InH . Furthermore, the flow chart of the Standard Householder QR is summarized (ℓ−1) H Πℓ is equivalent to performing column permutations by in Algorithm 1. swapping the ℓth column and ℓ∗th column of H(ℓ−1). Without loss of generality, in what follows, we assume Πℓ = InH . Algorithm 1 Standard Householder QR T (0) For Householder update, we seek Qℓ = ImH −βℓvℓvℓ such Require: H = H of dimensions mH × nH (mH ≥ nH). (ℓ) T (ℓ−1) that the first ℓ columns of H = (ImH − βℓvℓvℓ )H Πℓ Ensure: RH and ΠH of dimensions nH × nH. (ℓ−1) are upper triangular. Since H1,1 is upper triangular, the 1: for i =1 to nH, do Householder matrix Qℓ only needs to zero all but the 1st 2: Initialize πi = i, and compute ci, the squared norm of u u H(ℓ−1) (0) (0) component of , where is the first column of 2,2 , while the ith column of H2,2 = H . leaving the previous ℓ−1 columns of H(ℓ−1) intact [24, Section 3: end for T T ∗ 5.2.1]. This is achieved by selecting vℓ = [01×(ℓ−1) v ] , with 4: Set ℓ =1, determine ℓ = argmaxj {cj, j = ℓ,...,nH} ∗ and c =cℓ∗ . v = u + sign(u )(c∗)1/2 e , (19) 1 1 5: while c∗ >ǫ do ∗ (ℓ−1) ∗ 2 ∗ ∗ 1/2 −1 6: Swap the ℓth and ℓ th columns of H , exchange cℓ where c = u2 and βℓ = (c + |u1|(c ) ) . Note that ∗ ∗ ∗ c = c ∗ is available after the column pivoting stage and needs and cℓ , and then interchange πℓ with πℓ . ℓ (ℓ−1) ∗ T not to be recomputed through c = u u. 7: Calculate βℓ, v from u (u is the first column of H2,2 ) (ℓ) (ℓ−1) ∗ Now let us examine the structure of H = QℓH Πℓ, using (19), and rℓ from c using (21). T after the ℓth iteration of the Standard Householder QR. For 8: Compute δ and s = [s1 ... snH−ℓ] [see (22)]. (ℓ) (ℓ−1) (ℓ−1) 9: Update the sub-matrix H [see (23)]. clarity, we partition the matrices H1,2 and H2,2 as follows: 2,2 (ℓ−1) (ℓ−1) (ℓ−1) 10: for j = ℓ +1 to n , do H = [t H ], where t is the first column of H , and H 1,2 1,2 1,2 11: 2 (ℓ−1) (ℓ−1) Modify cj ⇐ cj − sj−ℓ. H1,2 consists of its remaining columns; similarly H2,2 = 12: end for T u ¯s (ℓ−1) ∗ ∗ (ℓ−1) 1 H 13: Seek ℓ = argmaxj {cj, j = ℓ +1,...,nH} and c = u H2,2 = (ℓ−1) , where 2,2 consists of all u H c ∗ , and then ℓ ⇐ ℓ +1. −1 2,2 ℓ (ℓ−1) T 14: end while but the first column of H2,2 , the row vector ¯s represents (ℓ−1) (ℓ−1) 15: return Π =[e 1 ... e ], and R selected as the first H H sT H π πnH H the first row of 2,2 , and 2,2 is obtained by removing ¯ (l) (ℓ−1) nH rows of H , l being the total number of iterations. from H2,2 . Using this notation, we establish Proposition 2. Proposition 2: H(ℓ) has the following block structure

(ℓ) (ℓ) (ℓ) H1,1 H1,2 H = Qℓ Q1HΠ1 Πℓ = (ℓ) , (20) C. Complexity Analysis of the Standard Householder QR 0 H2,2 We now analyze the computational complexity of factor- (ℓ−1) (ℓ) H1,1 t izing H using the Standard Householder QR algorithm. The where H1,1 = is an upper triangular matrix of 0 rℓ main result is that the time complexity and space complexity, T when employing the Standard Householder QR algorithm, are dimensions ℓ × ℓ, and H(ℓ) = (ℓ−1) T , with 1,2 H1,2 s O(N 4) and O(N 3), respectively. As explained later on, this high computational cost is due to the fact that the sparse struc- r = −sign(u )(c∗)1/2 , (21) ℓ 1 ture of H is destroyed by the Householder transformations. s = ¯s − v1δ , (22) To proceed, we notice that the computational complexity of (ℓ) (ℓ−1) T the Standard Householder QR algorithm is contributed from H2,2 = H2,2 − u−1δ . (23) two sources, (1) column pivoting in association with Πℓ and (ℓ−1) T where δ = βℓ H2,2 v. (2) Householder update in association with Qℓ,ℓ =1,...,nH. Proof: The proof is shown in Appendix C. Specifically, we decompose Algorithm 1 into two categories, where Lines 1-4, 6, 10-13 corresponds to column pivoting, From Proposition 3, the overall time complexity associated nΛ while Householder update consists of Lines 7-9. We examine with Householder update is in the order of ℓ=1(nΛ −ℓ)τℓ−1. the overhead associated with each individual source separately. In what follows, we firstly seek the expressions of τℓ−1,ℓ = Thus, the total complexity of the algorithm is dominated by 1,...,nΛ, to determine the time complexity associated with the higher overhead from these two sources. Householder update (see Section IV-C1), followed by exam- By exploiting the sparse structure of H, we can show the ining the space complexity in Section IV-C2. time complexity and space complexity associated with column 1) Time Complexity: In this section, we focus on analyzing pivoting in the Standard Householder QR are O(N 2) and the number of arithmetic operations for Householder update O(N), respectively. More details are provided in Appendix D. per iteration (see Algorithm 1, Lines 7-9). Additionally, we Remark 3: As will become clear in following sections, the examine the evolution of the sparsity pattern of Λ. computational overhead associated with column pivoting is at Initially, Λ(0) = Λ has a sparse structure [see (24)]. More least one order of magnitude less than that of Householder specifically, each column of Λ has τ¯0 = τ0 = N −1 non-zeros, update. Furthermore, in practice, column pivoting can be and nnz(Λ) = (N − 1)N. efficiently implemented using pointers, without physically • At the 1st iteration (0) (0) swapping data in memory. In what follows, without loss Since u, the first column of Λ2,2 = Λ , satisfies nnz(u)= of generality, we assume Πℓ = InH ,ℓ = 1,...,l, hence τ0 = N − 1, based on Proposition 3, the time complexity of 2 ΠH = InH and [π1 ... πnH ] = [1 ... nH]. Householder update at the 1st iteration is O((N − 1) ). In what follows, we address the computational complexity Now let us examine the structure of Λ(1), after the 1st associated with Householder update in the Standard House- Householder matrix Q1 (generated by the Householder vector holder QR (see Lines 7-9 in Algorithm 1). For the purpose of v) has been applied to Λ(0). Due to the nonzero elements of presentation, we introduce a matrix Λ of dimensions mΛ×nΛ, u (i.e., ψ1,j , j = 2,...,N), the first N − 1 elements of all (1) with mΛ = (N − 1)N/2 = mH/4 and nΛ = N = nH/3 except for the first column of Λ become nonzero [see (25)]. [see (24)]. Specifically, Λ is constructed by firstly removing Therefore the Householder update does not preserve the orig- i,j (0) (1) (0) from H the measurement Jacobian matrices H for all inal sparsity of Λ , and the resulting matrix Λ = Q1Λ 1 ≤ j

N ℓ As mentioned in Section IV-B, the ℓth iteration of the (N − ℓ) (N − i) − ℓ +1 ∼ O(N 4) . Householder QR is decomposed into two separated stages, col- umn pivoting and Householder update. The Modified House- ℓ=1 i=1 holder QR algorithm performs column pivoting identically 2) Space Complexity: To analyze space complexity, we to that of the Standard Householder QR, with difference in focus on the storage of Λ(ℓ),ℓ = 1,...,N, which is the the implementation of Householder update. Similar to the most dominant source in terms of memory cell usage. Since discussion in Section IV-B, due to the minimum computational nnz(Λ(ℓ)) = (N − ℓ) ℓ+1(N − i)+ ℓ i, we conclude i=1 i=1 overhead associated with column pivoting (see Remark 3), in that the space complexity of QR decomposition of Λ using the following description of the Modified Householder QR, the Standard Householder QR algorithm is in the order of we assume Πℓ = InH ,ℓ = 1,...,nH, and focus on the ℓ+1 ℓ modifications of the implementation of Householder update. max nnz(Λ(ℓ)) = (N − ℓ) (N − i)+ i ∼ O(N 3) . 1≤ℓ≤N From (18), we have (under the assumption Πℓ =InH , ∀ℓ) i=1 i=1 ℓ−1 3) Summary of Complexity Analysis: In summary, we have (ℓ−1) T h = ImH − βiviv hj , j = ℓ,...,nH. (27) shown that factorizing H by employing the Standard House- j i i=1 holder QR has time complexity O(N 4) and space complexity 3 (ℓ−1) O(N ). Therefore, we conclude that the overall time com- Hence hj (j = ℓ,...,nH) is a linear combination of hj plexity and space complexity for the state and covariance up- and the Householder vectors {vi,i =1,...,ℓ−1}. Addition- dates through standard Householder QR factorization become ally, a crucial property of the Householder transformation (see 4 3 (i−1) O(N ) and O(N ), an order of magnitude increase than that Remark 2) is (vi)−i = (hi )−i,i = 1,...,ℓ − 1, where of using (9)-(10). This motivates us to develop the Modified (v)−i denotes the vector obtained by removing the first i (i−1) Householder QR algorithm, which is based on the Standard components of v. Accordingly, (vi)−(ℓ−1) = (hi )−(ℓ−1) (ℓ−1) (ℓ−1) for 1 ≤ i ≤ ℓ − 1. Hence (hj )−(ℓ−1) (j = ℓ,...,nH) To proceed, we first note that the vector u = (hℓ )−(ℓ−1), can be expressed as a linear combination of (h ) and (ℓ−1) j −(ℓ−1) i.e., the first column of H2,2 , plays an important role in (i−1) {(hi )−(ℓ−1),i =1,...,ℓ−1}. Furthermore, notice that ev- generating the Householder vector and the subsequent process. (i−1) Hence, we explicitly compute u using (29), ery vector hi ,i =2,...,ℓ−1, itself is a linear combination of hi and the Householder vectors {vη, η =1,...,i−1}, and (ℓ−1) 1 (ℓ−1) (η−1) u = (hℓ )−(ℓ−1) = (hℓ)−(ℓ−1) + H−(ℓ−1) γℓ , (31) recalling (vη)−(ℓ−1) = (hη )−(ℓ−1) for 1 ≤ η ≤ i−1, hence (i−1) 1 where H = [(h1)−(ℓ−1) ... (hℓ−1)−(ℓ−1)] consists of the (hi )−(ℓ−1) (i = 2,...,ℓ−1) can be expressed as a linear −(ℓ−1) combination of h and h(η−1) first ℓ−1 columns of H−(ℓ−1). Note that we explicitly compute ( i)−(ℓ−1) {( η )−(ℓ−1), η = 1,...,i− (ℓ−1) (ℓ−1) (0) (0) only the vector u = (hℓ )−(ℓ−1), the first column of H2,2 , 1}. In addition, h1 = h1 by definition (thus, (h1 )−(ℓ−1) = while the remaining columns h(ℓ−1) , (h1)−(ℓ−1)). Therefore, we conclude by recursion that each vec- ( j )−(ℓ−1), j = ℓ+1,...,nH (ℓ−1) are not explicitly computed. Instead, they are represented tor (hj )−(ℓ−1), j = ℓ,...,nH, can be expressed as a linear (ℓ−1) implicitly by Γ . combination of (hj )−(ℓ−1) and {(hi)−(ℓ−1),i =1,...,ℓ−1}: Once the explicit form of u, computed by (31), is known, ℓ−1 ∗ (ℓ−1) (ℓ−1) and c is retrieved from column pivoting, the Householder (h )−(ℓ−1) = (hj )−(ℓ−1) − (hi)−(ℓ−1) γ , (28) j i,j vector vℓ (or equivalently v), as well as βℓ and rℓ, are readily i=1 available through (19) and (21). Notice that computing v from (ℓ−1) u only requires updating the first element of u [see Remark 2]. where the coefficients γi,j ; i =1,...,ℓ − 1, j = ℓ,...,nH, need to be sought at each iteration. In what follows, we will Now we are ready to present the recursive formulas for (ℓ−1) computing H(ℓ) , H(ℓ) , and Γ(ℓ). provide a formula [see (39)] that updates γi,j recursively. 1,1 1,2 (ℓ) (ℓ−1) ( 1) ℓ− 1) Computing H1,1: Notice that the terms H1,1 as well Notice that the matrix H2,2 [see (18)] comprises all the (ℓ−1) as t are already available after the (ℓ − 1)th iteration of the column vectors (hj )−(ℓ−1), j = ℓ,...,nH. Hence, (28) can (ℓ) Householder QR. Hence the only unknown in H1,1 [see (20)] be summarized into a compact matrix form ∗ is the scalar rℓ, which can be calculated from c and u1 in a (ℓ−1) (ℓ−1) Γ fixed number of arithmetic operations [see (21)]. H2,2 = H−(ℓ−1) , (29) (ℓ) (ℓ−1) InH−ℓ+1 2) Computing H : Since H is known after the 1,2 1,2 th iteration of the Householder QR, updating H(ℓ) is where H−(ℓ−1) = [(h1)−(ℓ−1) ... (hnH )−(ℓ−1)] is the sub-matrix (ℓ − 1) 1,2 of H resulting by removing its first ℓ − 1 rows. The (ℓ − 1) × equivalent to calculating the vector s from (22), which requires (ℓ−1) (ℓ−1) (ℓ−1) T (nH − ℓ + 1) matrix Γ = [γi,j ] is termed the coefficient ¯s and δ = βℓ H2,2 v. Remember that we do not have the (ℓ−1) matrix. In order to facilitate the presentation of the ensuing explicit forms of ¯s and H . However, using the fact that (ℓ−1) 2,2 Γ(ℓ−1) (ℓ−1) Γ (ℓ−1) (ℓ−1) derivations, is written as [γℓ ], where γℓ sT corresponds to the first row of H and based on (29), (ℓ−1) ¯ 2,2 is the first column of Γ . (ℓ−1) we can rewrite H2,2 and ¯s as follows In contrast to the original vector (hj )−(ℓ−1), j = ℓ,...,nH, (ℓ−1) (ℓ−1) which has at most O(N) non-zero elements (see Remark 1), H = H2 + H1 Γ ; (32) by following the analysis conducted on Λ in Section IV-C, it 2,2 −(ℓ−1) −(ℓ−1) (ℓ−1) T (ℓ−1) ¯s = ρ + Γ ρ ; (33) can be shown that the vector (hj )−(ℓ−1), j = ℓ,...,nH, 2 1 obtained after the (ℓ − 1)th iteration of the Householder 2 2 where H = [(hℓ+1)−(ℓ−1) ... (hnH )−(ℓ−1)] comprises the QR is processed, has nnz((h(ℓ−1)) ) ∼ O(ℓN − ℓ ), −(ℓ−1) j −(ℓ−1) 2 last n − ℓ columns of H , and ρT and ρT are the which results in 4 time complexity and 3 space H −(ℓ−1) 1 2 O(N ) O(N ) first rows of H1 and H2 , respectively. From (32), we complexity when employing the Standard Householder QR −(ℓ−1) −(ℓ−1) compute the vector algorithm (see Section IV-C). In order to preserve the original sparsity of H, and at the same time reduce the memory usage, (ℓ−1) T (ℓ−1) T δ = βℓ H2,2 v = δ2 + Γ δ1 , (34) our modification to the Standard Householder QR is that the (ℓ−1) where δ H1 Tv and δ H2 Tv. explicit form of H2,2 [see (29)] (or equivalently, the explicit 1 = βℓ −(ℓ−1) 2 = βℓ −(ℓ−1) (ℓ−1) Substituting (33) and (34) in (22), we arrive at the following form of the vectors (hj )−(ℓ−1), j = ℓ,...,nH [see (28)]) (ℓ) (ℓ−1) update equation for s (or equivalently, H1,2): is not calculated. Instead, H2,2 is stored and represented implicitly by the coefficient matrix Γ(ℓ−1) [see (29)]. (ℓ−1) T s = ρ2 − v1δ2 + Γ ρ1 − v1δ1 . (35) Next, we address two key issues in the Modified House- (ℓ) (ℓ) 3) Computing Γ(ℓ): To determine Γ(ℓ), we begin with (30) holder QR. Firstly, computing the matrices H1,1, H1,2 in (20); (ℓ) and (23). Recall that we do not have the explicit expression Secondly, deriving the recursive rule for obtaining Γ from (ℓ−1) (ℓ−1) (ℓ−1) Γ(ℓ−1). Equivalently, we seek the expression of Γ(ℓ) such that of H2,2 . However, since H2,2 corresponds to H2,2 with the first row removed, we obtain [from (32)], (ℓ) (ℓ) Γ H = H , (30) (ℓ−1) (ℓ−1) 2,2 −ℓ I H = H2 + H1 Γ , (36) nH−ℓ 2,2 −ℓ −ℓ 1 2 where H−ℓ = [(h1)−ℓ ... (hnH )−ℓ] is the sub-matrix of H where H−ℓ = [(h1)−ℓ ... (hℓ−1)−ℓ] and H−ℓ = resulting after removing its first ℓ rows. [(hℓ+1)−ℓ ... (hnH )−ℓ]. Next we explore the property v−1 = u−1 [see Remark 2]. B. Computational Complexity Analysis In particular, based on (31), we have In Section V-A, we provide a general description of the 1 (ℓ−1) Modified Householder QR algorithm which is applicable to v−1 = u−1 = (hℓ)−ℓ + H−ℓγℓ . (37) decompose arbitrary matrix. In this section, we apply the Finally, we substitute (36), (37), and (34) into (23), and Modified Householder QR to perform QR factorization on H 1 2 notice that H−ℓ = [H−ℓ (hℓ)−ℓ H−ℓ], to arrive at [see (6)], and analyze the overall computational complexity by (ℓ−1) taking into account the sparse structure of H. H(ℓ) = H1 Γ − γ(ℓ−1)δT − (h ) δT + H2 2,2 −ℓ ℓ ℓ −ℓ −ℓ Since the Modified Householder QR only differs from the (ℓ−1) (ℓ−1) T Standard Householder QR algorithm in the implementation Γ − γℓ δ T of Householder update, while column pivoting is performed = H−ℓ  −δ  . (38) identically by both algorithms, we conclude that the com- InH−ℓ   putational overhead associated with column pivoting in the   Comparing (38) and (30), we immediately obtain the expres- Modified Householder QR has the time complexity of O(N 2) (ℓ) sion of the ℓ × (nH − ℓ) matrix Γ : and the space complexity of O(N). Therefore, in what fol- (ℓ−1) (ℓ−1) T lows, we analyze the computational complexity associated (ℓ) Γ − γℓ δ Γ = T . (39) with Householder update in the Modified Householder QR, −δ which corresponds to Lines 7-11 in Algorithm 2. We start Note that the upper part of Γ(ℓ) is a rank-one modification by analyzing the time complexity, followed by the space (ℓ−1) complexity. of the existing matrix Γ , which has a relatively low time complexity. Furthermore, (39) affirms that every vector 1) Time Complexity: In this section, we briefly analyze the (ℓ) time complexity associated with Householder update when ap- (h ) , j = ℓ+1,...,n , is a linear combination of (h ) j −ℓ H j −ℓ plying the Modified Householder QR algorithm to decompose and {(hi)−ℓ,i =1,...,ℓ}. 3 In summary, we outline the algorithmic flow chart of H. We claim that the worst-case time complexity is O(N ). Modified Householder QR in Algorithm 2. Note that column To prove it, we will identify the number of flops for every pivoting implemented by the Modified Householder QR is line between Lines 7-11 in Algorithm 2, and show that the identical to that of the Standard Householder QR algorithm. total number of arithmetic operations required per iteration of the Modified Householder QR is bounded above by O(N 2). Algorithm 2 Modified Householder QR Since the maximum number of iterations is nH = 3N, the overall time complexity associated with Householder update Require: H(0) H of dimensions ( ). = mH × nH mH ≥ nH is O(N 3). Ensure: R and Π of dimensions n × n . H H H H • Computational cost of Line 7: 1: for i =1 to n , do H Recall that nnz(h ) ∼ O(N), j = 1,..., 3N [see Re- 2: Initialize π = i, and compute c , the squared norm of j i i mark 1], hence nnz((h ) ) ∼ O(N),i = 1,...,ℓ. Thus, the ith column of H(0) = H(0). Additionally, initialize i −(ℓ−1) 2,2 computing u from (31) requires O(ℓN) operations. Therefore, Γ(0) as an empty matrix. the cost of performing Line 7 is bounded above by O(N 2). 3: end for ∗ • Computational cost of Line 8: 4: Set ℓ =1, determine ℓ = argmaxj {cj, j = ℓ,...,nH} ∗ ∗ Once u becomes available from Line 7, as well as c = and c =c ∗ . ℓ uTu retrieved from column pivoting, determining the scalars 5: while c∗ >ǫ do βℓ and rℓ invokes constant number of arithmetic operations. 6: Swap the ℓth and ℓ∗th columns of H(ℓ−1), as well as Furthermore, updating v from u only requires the modification the 1st and (ℓ∗ − ℓ + 1)th columns of Γ(ℓ−1), exchange of the 1st element of u, which can be achieved in O(1) process c and c ∗ , and then interchange π with π ∗ . ℓ ℓ ℓ ℓ time. In summary, the cost of performing Line 8 is of O(1). 7: Calculate u from (31). (ℓ) • Computational cost of Line 9: 8: Compute v,β , r and update the sub-matrix H . ℓ ℓ 1,1 Since each column of H1 and H2 has O(N) non- 9: Determine δ1, δ2, and δ using (34). −(ℓ−1) −(ℓ−1) T zeros, which is attained by preserving the original sparse 10: Calculate s = [s ... s H ] from (35) and update 1 n −ℓ structure of H, computing δ and δ has a cost of O(N 2), the sub-matrix H(ℓ) . 1 2 1,2 regardless of the structure of v. Additionally, calculating δ 11: Update Γ(ℓ) based on (39). from δ1 and δ2 has a cost of O(ℓN), since (34) involves an 12: for j = ℓ +1 to nH, do 2 (nH − ℓ) × (ℓ − 1) matrix multiplied by an (ℓ − 1) × 1 vector 13: Modify cj ⇐ cj − s . j−ℓ and a vector-vector addition of dimension nH −ℓ. In summary, 14: end for the cost of performing Line 9 is of O(N 2). 15: Seek ℓ∗ = argmax {c , j = ℓ +1,...,n } and c∗ = j j H • Computational cost of Line 10: c ∗ , and then ℓ ⇐ ℓ +1. ℓ Since (35) involves a (ℓ − 1) × 1 vector multiplying with 16: end while an (nH − ℓ) × (ℓ − 1) matrix and a vector-vector addition of 17: return ΠH =[eπ1 ... eπnH ], and RH selected as the first dimension (nH − ℓ), the overall cost of performing Line 10 n rows of H(l), l being the total number of iterations. H is bounded above by O(N 2). • Computational cost of Line 11: TABLE I In (39), the vector δ [see (34)] is available from Line 9 and CPU RUNTIME (SEC) does not need to be recomputed. Hence, we only need to focus on the upper part of (39), which is a rank-one update of the N SS-QR MH-QR (ℓ−1) (ℓ−1) −5 −5 existing matrix Γ . Since the vectors γℓ and δ are of 6 7.5378 × 10 4.9683 × 10 dimensions (ℓ − 1) and (nH − ℓ), respectively, we conclude 11 2.0238 × 10−4 2.1658 × 10−4 that the overall cost of performing Line 11 is of O(ℓN), which 16 6.3311 × 10−4 6.4813 × 10−4 2 is bounded above by O(N ). 21 1.3985 × 10−3 1.4737 × 10−3 In summary, we have shown that the number of arithmetic 26 2.8506 × 10−3 2.8315 × 10−3 operations required per iteration of Householder update in the 51 2.8017 × 10−2 2.2363 × 10−2 Modified Householder QR algorithm is bounded above by 76 1.1370 × 10−1 7.5767 × 10−2 2 . Therefore the worst-case time complexity of applying O(N ) 101 3.2935 × 10−1 1.8164 × 10−1 the Modified Householder QR algorithm on H is O(N 3). 151 1.4685 × 100 6.3158 × 10−1 2) Space Complexity: To address space complexity, we 201 4.3048 × 100 1.5436 × 100 focus on the memory requirement for H, u, and Γ(ℓ),ℓ = 251 9.9297 × 100 3.5646 × 100 1,...,nH, which are the most dominant sources in terms of 1 0 2 301 1.9804 × 10 6.7462 × 10 memory cell usage. Recall from Remark 1 nnz(H) ∼ O(N ). 1 1 u 351 3.6083 × 10 1.1572 × 10 In addition, notice that the dimension of is at most mH ∼ 1 1 O(N 2). On the other hand, since the dimensions of Γ(ℓ) are 401 5.7972 × 10 1.8523 × 10 2 1 (ℓ) 2 451 1.9673 × 10 2.7400 × 10 ℓ×(nH −ℓ), thus max{nnz(Γ ); ℓ =1,...,nH} ∼ O(N ). 3 1 Therefore, we conclude that the space complexity of QR 501 1.5869 × 10 3.8437 × 10 1 decomposition of H using the Modified Householder QR 551 N/A 5.3501 × 10 1 algorithm is in the order of O(N 2). 601 N/A 7.0800 × 10 3) Summary of Complexity Analysis: In summary, we have 651 N/A 9.3076 × 101 shown that factorizing H by employing the Modified House- 701 N/A 1.1808 × 102 holder QR algorithm has time complexity of O(N 3) and 751 N/A 1.4926 × 102 space complexity of O(N 2). Therefore, we conclude that the 801 N/A 1.8626 × 102 overall time complexity and space complexity for the state 851 N/A 2.2381 × 102 and covariance updates through the Modified Householder 901 N/A 2.6653 × 102 QR factorization are O(N 3) and O(N 2), respectively, which 951 N/A 3.1863 × 102 has one order of magnitude decrease than that of using the 1001 N/A 3.7570 × 102 Standard Householder QR. Additionally, it achieves the same computational complexity as of using (9)-(10). Furthermore, we would like to point out that the Modi- fied Householder QR algorithm is applicable to any sparse have conducted 120 simulations. We count the CPU running matrix H of sparsity pattern other than that of the mea- time for a complete QR decomposition [see (11)] when surement Jacobian matrix [see (6)]. In particular, suppose employing the Modified Householder QR algorithm (MH-QR). that the dimensions of H are m × n (m ≥ n), and denote The average running times are summarized in Table II, as well τ = max(nnz(h1),..., nnz(hn)). Following the analysis as in Figure 1. Furthermore, we compared our results with conducted in Sections V-B1 and V-B2, it can be shown that the CPU running time when employing SuiteSparseQR (SS- the space complexity of the Modified Householder QR algo- QR) [27], [28], the current state-of-the-art QR decomposition rithm is of O(max{min{τn,m},n2}), and its associated time package for sparse matrices, which is an implementation of the complexity is of O(max{τn2,n3}), in contrast to O(mn2) of multi-frontal sparse QR factorization algorithm. In addition, the Standard Householder QR algorithm [24]. SuiteSparseQR has been integrated into MATLABr (version 7.9 and later), and can be invoked through MATLAB built-in VI. SIMULATION RESULTS function “qr” [29]. All simulations were run under MATLAB In the previous section, we have shown that the worst-case 7.12 on a Linux (kernel 2.6.32) desktop computer with a 2.66 time complexity and space complexity of the Modified House- GHz Intel Core-i5 Quadcore CPU and 4 GB of RAM. holder QR algorithm, when applied to the sparse measurement The results presented in Table I and Figure 1 illustrate that Jacobian matrix H in CL [see (6)], is of O(N 3) and O(N 2), when the number of robots is small (N ≤ 26), both SS-QR and respectively. In order to corroborate our theoretical analysis, MH-QR achieve indistinguishable performances, with SS-QR we have evaluated the running time required by the Modified slightly faster as compared to MH-QR. However, as N in- Householder QR algorithm for a team of N robots performing creases (N ≥ 26), MH-QR significantly outperforms SS-QR. CL. Specifically, we randomly generated the robots’ poses and Additionally, we were unable to perform QR decomposition assumed that each robot is able to detect, identify, and measure using SS-QR when N ≥ 501, due to memory shortage. On both relative distance and bearing to the remaining N − 1 the other hand, MH-QR is applicable even when the number robots. Hence, H has dimensions 2(N − 1)N × 3N. of robots increases to 1001, and it successfully performs QR We have examined the scalability of our algorithm by factorization on H, whose dimensions are 2 million by 3 varying N from 6 to 1001, and for every value of N, we thousand, in about 375 seconds. Additionally, after performing Monte−Carlo test: CPU runtime vs Number of robots TABLE II 4 T T H H R RH 10 − H F SS−QR MH−QR

2 N SS-QR MH-QR 10 6 1.1529 × 10−12 7.0749 × 10−13 −12 −12 0 11 7.1060 × 10 3.1729 × 10 10 16 2.0568 × 10−11 8.4163 × 10−12 21 1.0719 × 10−10 4.0289 × 10−11 −2 10 26 1.7801 × 10−10 5.8771 × 10−11 CPU runtime (sec) 51 1.3050 × 10−9 2.9588 × 10−10

−4 −9 −10 10 76 4.9221 × 10 7.9515 × 10 101 1.9087 × 10−8 2.6235 × 10−9 −7 −8 −6 151 1.1631 × 10 1.2418 × 10 10 0 1 2 3 10 10 10 10 201 6.1207 × 10−7 4.7511 × 10−8 Number of robots (N) 251 8.0832 × 10−7 6.0513 × 10−8 −6 −8 Fig. 1. [Monte Carlo simulations] Average CPU runtime of QR decompo- 301 1.0625 × 10 4.2499 × 10 sition of H in 120 trials. Comparison between SuiteSparseQR (SS-QR) and 351 3.7326 × 10−6 9.3947 × 10−8 Modified Householder QR (MH-QR). 401 9.3274 × 10−6 3.0049 × 10−7 451 2.6734 × 10−5 6.4118 × 10−7 Monte−Carlo test: Frobenius norm of error vs Number of robots −5 −7 −4 10 501 1.7036 × 10 3.7327 × 10 SS−QR 551 N/A 5.2642 × 10−7 MH−QR −6 −6 601 N/A 1.4607 × 10 10 651 N/A 9.4386 × 10−7 701 N/A 2.4464 × 10−6 −8 10 751 N/A 4.3013 × 10−6 801 N/A 1.0321 × 10−6

−10 −6 10 851 N/A 2.9335 × 10 −6

Frobenius norm of error 901 N/A 4.2254 × 10 −6 −12 951 N/A 4.8307 × 10 10 1001 N/A 5.1291 × 10−6

−14 10 0 1 2 3 10 10 10 10 Number of robots (N) MH-QR attains higher numerical accuracy than SS-QR, which Fig. 2. [Monte Carlo simulations] Average Frobenius norm of HTH−RTR is attributed to column pivoting implemented in MH-QR. in 120 trials. Comparison between SuiteSparseQR (SS-QR) and Modified Householder QR (MH-QR). VII. EXPERIMENTAL RESULTS We further validate the performance of our proposed Mod- linear regression on the logarithm of the CPU running time and ified Householder QR algorithm through experiments. Our the number of robots , we have determined that the logarithm N experimental setup is shown in Figure 3(a), where a team of of the average time required by each of these algorithms is four Pioneer II robots are deployed in a rectangular region

log(tSS-QR)=3.6067 × log(N) − 17.2798 of size approximately 4.5 m × 2.5 m. A calibrated overhead camera is employed to provide ground truth for evaluating the log(tMH-QR)=3.1802 × log(N) − 16.1481 . (40) estimator’s performance. For this purpose, rectangular boards Both Figure 1 and (40) confirm that the time complexity of with specific patterns (see Figure 3(a)) are mounted on top of MH-QR is cubic in N. Finally, we should note that the main the Pioneers, and the pose (position and orientation) of each reason for the linear coefficient in (40) (i.e., 3.1802) deviating robot, with respect to a global frame of reference, is computed from its ideal value 3 is because as N increases, a significant from the captured images. These measurements are corrupted portion of CPU resources is devoted to memory accessing, by zero-mean noise with standard deviation of 0.0087 rad for reading, and writing. One of our future research directions is orientation and 0.01 m along each axis for position. The real to optimize the implementation of MH-QR and improve its trajectories of four robots are shown in Figure 3(b). For the performance in terms of CPU running time. sake of clarity, only partial trajectories, corresponding to the Furthermore, we have examined the accuracy of MH-QR. first 200 time steps, are plotted in Figure 3(b). In particular, we computed the Frobenius norm of HTH − In the experiment, each robot moves at approximately T RHRH, which is 0 in the ideal case. We have compared the constant speed of 0.1 m/sec, while avoiding collisions between Frobenius norms of SS-QR and MH-QR, averaged over 120 each other and the boundary of the rectangular arena. The simulations for each N. As evident from Table II and Figure 2, robots record linear and acceleration velocities at a frequency Real trajectories of 4 robots

Robot 1 2.5 Robot 2 Robot 3 Robot 4 2

1.5 y (m) 1

0.5

0

0 0.5 1 1.5 2 2.5 3 3.5 x (m)

(a) (b)

Fig. 3. [Experimental setup] (a) Four Pioneer robots perform CL in a rectangular region of size approximately 4.5 m × 2.5 m, each with a pattern board attached on its top. (b) Real trajectories of four robots. For clarity, only partial trajectories, corresponding to the first 200 time steps, are plotted. The robots’ starting points are marked by ◦.

TABLE III of 1 Hz. These odometry measurements are corrupted by zero- RMS ERRORINTHEROBOTS’ POSE ESTIMATES mean process noise. Due to manufacture imperfections, the accuracy of the odometry measurements is not identical for all Position Err. RMS (m) Orient. Err. RMS (rad) four robots. In particular, the noise in the rotational velocity R1 0.2085 0.0801 measurements follows Gaussian distribution with standard R2 0.2250 0.0775 deviation of 0.0239 rad/sec, 0.095 rad/sec, 0.0121 rad/sec, R3 0.2255 0.0811 0.094 rad/sec for robots 1, 2, 3, 4 respectively. Similarly, the R4 0.2224 0.0795 linear velocity measurement noise is well modeled as Gaussian noise with standard deviation of 0.0037 m/sec, 0.0063 m/sec, 0.0027 m/sec, 0.0035 m/sec for robots 1, 2, 3, 4, respectively. The experimental data from robots’ sensors is collected and orientation. From these experimental results it is clear that processed off-line. the EKF based on the Modified Householder QR algorithm is We consider the scenario where each robot can measure robust and applicable to real systems. both relative distance and bearing to the remaining three robots, resulting in the measurement Jacobian matrix H of VIII. CONCLUSION dimensions 24×12 at every time step. These relative observa- In this paper, we have developed an efficient algorithm for tions are generated synthetically by adding noise to the relative QR decomposition of sparse matrices, namely the Modified distance and bearing calculated from the robots’ pose estimates Householder QR. The proposed algorithm has been success- using the overhead camera. In our experiment, the standard fully applied to 2-D multi-robot CL. In particular, we have deviations of the relative distance and bearing measurement shown that the overall computational complexity per EKF noise are set to σd =0.05 m and σθ =0.035 rad, respectively. update using QR factorization, when implemented using the The duration of the experiment is 1000 sec (i.e., 1000 time Modified Householder QR algorithm, is of O(N 3) in time steps). At every time step, the state estimate and its associated complexity and O(N 2) in space complexity, i.e., at least one covariance are updated based upon (12) and (13), with QR order of magnitude reduction as compared to the standard decomposition of H [see (11)] implemented by using the EKF update process. Simulation results demonstrate that for Modified Householder QR algorithm. large number of robots, the Modified Householder QR algo- Figures 4(a)–4(d) depict the time evolutions of the robots’ rithm attains higher accuracy and significantly outperforms pose estimation errors and corresponding 3σ-bounds along x, SuiteSparseQR, the current state-of-the-art QR decomposition y, and φ-axis, respectively. As evident from Figures 4(a)–4(d), algorithm of sparse matrices, in terms of CPU runtime. Fur- the EKF based on the Modified Householder QR algorithm thermore, we performed experiments using a team of four performs well in a real-world CL experiment. Furthermore, mobile robots that demonstrate the applicability of the EKF the EKF produces consistent pose estimates for all four robots, update using the Modified Householder QR to real systems. i.e., the real robots’ poses is within the 3σ bound centered at In our future work, we plan to extend our current approach the robots’ estimated pose. and apply it to CL in 3-D. Finally, we intend to investigate Finally, we plot the real and estimated trajectories of the distributed and decentralized implementations of the Modified robots in Figures 5(a)–5(d), while Table III shows the averaged Householder QR algorithm to ensure that the overall com- root-mean-square (RMS) errors of the robots’ position and putational load is evenly shared among every robot in the Combined Error in x (m) Combined Error in x (m) 5 5 State Error State Error 3σ − bound 3σ − bound 0 0 −x (m) −x (m) hat hat x x −5 −5 0 200 400 600 800 1000 0 200 400 600 800 1000 time (s) time (s) Combined Error in y (m) Combined Error in y (m) 5 5 State Error State Error 0 3σ − bound 0 3σ − bound −y (m) −y (m) hat hat y −5 y −5 0 200 400 600 800 1000 0 200 400 600 800 1000 time (s) time (s) Combined Error in φ (rad) Combined Error in φ (rad) 5 5 State Error State Error σ σ (rad) 3 − bound (rad) 3 − bound

φ 0 φ 0 − − hat hat φ −5 φ −5 0 200 400 600 800 1000 0 200 400 600 800 1000 time (s) time (s)

(a) (b)

Combined Error in x (m) Combined Error in x (m) 5 5 State Error State Error 3σ − bound 3σ − bound 0 0 −x (m) −x (m) hat hat x x −5 −5 0 200 400 600 800 1000 0 200 400 600 800 1000 time (s) time (s) Combined Error in y (m) Combined Error in y (m) 5 5 State Error State Error 0 3σ − bound 0 3σ − bound −y (m) −y (m) hat hat y −5 y −5 0 200 400 600 800 1000 0 200 400 600 800 1000 time (s) time (s) Combined Error in φ (rad) Combined Error in φ (rad) 5 5 State Error State Error σ σ (rad) 3 − bound (rad) 3 − bound

φ 0 φ 0 − − hat hat φ −5 φ −5 0 200 400 600 800 1000 0 200 400 600 800 1000 time (s) time (s)

(c) (d)

Fig. 4. [Experimental results] Pose estimation errors and corresponding 3σ-bounds for (a) Robot 1, (b) Robot 2, (c) Robot 3, and (d) Robot 4. team [16], as well as to account for limitations on the robots’ for calculating each term of (41) and (42) are O(N) and communication bandwidth and range [14], [30]. O(1), respectively. Therefore the overall time complexity for T evaluating Hk Hk is quadratic in N. APPENDIX A T Similarly, it can be shown that calculating Pk|kHk z˜k|k−1 COMPUTATIONAL COMPLEXITY OF INFORMATION FILTER has time complexity O(N 2). To proceed, we first compute T We show that due to the sparse structure of Hk [see (6)], the 3N dimensional vector y˜ = Hk z˜k|k−1. Notice that the jth T T 2 element of y˜, y˜ , is the inner product of the vector ˜z and computing Hk Hk and Pk|kHk ˜zk|k−1 requires O(N ) basic j k|k−1 the jth column of H . Recall that each column of H has at steps. To proceed, let us first partition Hk = Hk1 ... HkN , k k where H is a 2(N − 1)N × 3 sub-matrix consisting of the most 4(N −1) nonzero elements (see Remark 1), therefore the ki (3i − 2)th, (3i − 1)th, and 3ith columns of Hk, for i = processing requirement of computing y˜j has time complexity T T O(N), despite the 2(N − 1)N dimensional vector z˜ . 1,...,N. Following this partition, Hk Hk = Hk Hkj , k|k−1 i i,j 2 T Thus, the time complexity of calculating y˜ is O(N ). Next, where H Hk (i, j =1,...,N) are 3 × 3 block matrices. ki j Next we observe that the sparse structure of H implies we compute Pk|ky˜, a matrix (of dimensions 3N × 3N) and k 2 that (for 1 ≤ i = j ≤ N) vector (of dimension 3N) multiplication, with O(N ) basic operations. Hence, the overall time complexity of computing N P HT˜z is again quadratic in N. HT H Ψi,j TΨi,j Υj,i TΥj,i k|k k k|k−1 ki ki = ( k ) k + ( k ) k , (41) j=1, j=i APPENDIX B HT H Ψi,j TΥi,j Υj,i TΨj,i (42) ki kj = ( k ) k + ( k ) k . COMPUTATIONAL COMPLEXITY OF ALTERNATIVE QR Since the right-hand-side (RHS) of (41)-(42) involves multi- DECOMPOSITION ALGORITHMS i,j i,j j,i j,i plications and additions of matrices (Ψk , Υk , Ψk , Υk ) of In this appendix, we analyze the computational complexity constant dimensions 2 × 3 [see (4)-(5)], the time complexity of QR decomposition of Λ [see (24)] by employing (1) Trajectory of Robot 1 Trajectory of Robot 2 3 True State 3 True State Posterior Estimate Posterior Estimate

2.5 2.5

2 2

1.5 1.5 y (m) y (m)

1 1

0.5 0.5

0 0

0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 x (m) x (m)

(a) (b)

Trajectory of Robot 3 Trajectory of Robot 4 3 3 True State True State Posterior Estimate Posterior Estimate

2.5 2.5

2 2

1.5 1.5 y (m) y (m)

1 1

0.5 0.5

0 0

0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 x (m) x (m)

(c) (d)

Fig. 5. [Experimental results] Real and estimated trajectories of (a) Robot 1, (b) Robot 2, (c) Robot 3, and (d) Robot 4, when employing the EKF based on the Modified Householder QR algorithm. the Cholesky decomposition, (2) the modified Gram-Schmidt Although QR factorization of Λ using Cholesky decomposi- process, and (3) the Givens rotations [24, Section 5.2]. Similar tion can achieve desired computational complexity, it has two T to the discussion in Section IV-C, we assume Πℓ = InΛ ,ℓ = drawbacks. Firstly, the matrix product Λ Λ squares the condi- T 2 1,...,nΛ, for simplicity. Furthermore, we suppose the QR tion number of Λ, i.e., κ(Λ Λ)= κ (Λ), which can introduce decomposition of Λ takes the form Λ = QΛRΛ, where QΛ numerical instability to Cholesky decomposition algorithm. is orthonormal and RΛ is upper triangular. Secondly, to use Cholesky decomposition, the matrix product ΛTΛ is required to be strictly positive definite. Unfortunately, T A. Cholesky Decomposition [24, Section 4.2.3] in our case H is rank-deficient, rendering H H positive semi- T definite. Hence, Cholesky decomposition is not applicable for Since the columns of QΛ are orthonormal, we have Λ Λ = T QR factorization of H addressed in this paper. RΛRΛ. Therefore, we can apply a Cholesky decomposition T of Λ Λ to compute the upper triangular matrix RΛ. Analogy to the analysis shown in Section III-C, the process B. Modified Gram-Schmidt [24, Section 5.2.8] T of evaluating Λ Λ is only quadratic in N due to the sparse For clarity, we use λi and qi to denote the ith columns of structure of Λ. In addition, since Cholesky decomposition of a Λ and QΛ, respectively. Furthermore, we use ri,j to denote 3 square matrix of dimension n requires O(n ) operations [24, the (i, j)th element of RΛ. For the purpose of complexity Section 4.2.3], we conclude that the time complexity of QR analysis, we outline the algorithmic flow chart of the Modified factorization of Λ using Cholesky decomposition is O(N 3) Gram-Schmidt process in Algorithm 3. (note that QΛ is not computed when using Cholesky decom- From Algorithm 3, it is clear that the vector qℓ is a 2 position). On the other hand, since nnz(Λ) ∼ O(N ) and linear combination of λℓ and {qj , j = 1,...,ℓ − 1}. Since T both matrices Λ Λ and RΛ are of dimensions N × N, the every qj , j = 2,...,ℓ − 1, can be expressed as a linear 2 space complexity is O(N ). combination of λj and {qi,i = 1,...,j − 1}, and q1 is a Algorithm 3 Modified Gram-Schmidt [24, Algorithm 5.2.5] x2 = 0, by selecting φ = arctan(−x2/x1), it is straightfor- φ T Require: Λ of dimensions mΛ × nΛ (mΛ ≥ nΛ). ward to verify G x = [x2 0] [24, Section 5.1.8]. In other φ Ensure: RΛ of dimensions nΛ × nΛ. words, left-multiplying x by G eliminates x2. The essence 1: for ℓ =1 to n, do of the Givens QR is to successively invoke Givens rotations to 2: Set q¯ = λℓ. transform a matrix into upper triangular form by eliminating 3: for j =1 to ℓ − 1, do non-zeros below its diagonal [24, Algorithm 5.2.2]. T 4: Compute rj,ℓ = qj q¯. The time complexity of the Givens QR process for decom- 2 5: Update q¯ ⇐ q¯ − rj,ℓqj . posing an arbitrary matrix Λ is in the order of mΛnΛ, where 6: end for mΛ and nΛ are the number of rows and columns of Λ, re- 7: Calculate rℓ,ℓ = q¯2. spectively [24, Section 5.2.3]. For Λ addressed here [see (24)], 8: if rℓ,ℓ =0, then substituting mΛ = (N − 1)N/2 and nΛ = N results in 9: STOP. the time complexity O(N 4). However, this does not take into 10: else account of the structure of Λ, and it is natural to conjecture 11: qℓ = q¯/rℓ,ℓ. that the overall time complexity using the Givens QR can be 12: end if reduced by exploiting the sparsity of Λ. Unfortunately, this 13: end for conjecture is false. In what follows, we examine both time 14: return RΛ. complexity and space complexity of applying the Givens QR algorithm to decompose Λ in (24). For clarity, we use λi,j to denote the (i, j)th element of Λ. Furthermore, we partition Λ T T T scalar multiplication of λ1, we conclude by recursion that by rows into Λ = [Λ1 ... ΛN−1] , where the rows of the the vector qℓ is a linear combination of {λj , j = 1,...,ℓ}. sub-matrix Λi correspond to the (Ji−1+1)th through Jith rows i Exploiting the structure of λj (j = 1,...,ℓ), we have of Λ, with J0 =0 and Ji = Ji−1 + (N − i)= j=1(N − j) nnz(q )= ℓ (N −j) for ℓ =1,...,N. Hence, updating q¯ for i =1,...,N − 1. To facilitate the discussion, we provide ℓ j=1 at the jth iteration inside the inner “for-loop” (see Algorithm 3, the pseudo-code of the Givens QR in Algorithm 4. It is worth Line 5), which is the most dominant computational process, noting that each Givens rotation effects only two rows of Λ j (see Algorithm 4, Lines 5-9), whereas the processing cost of requires approximately i=1(N − i) steps. Therefore, the overall time complexity of QR decomposition of Λ using the Line 7 is constant (4 multiplications and 2 additions). Modified Gram-Schmidt QR algorithm is in the order of Algorithm 4 Givens QR [24, Algorithm 5.2.2] N ℓ−1 j Require: Λ of dimensions m × n (m ≥ n ). (N − i) ∼ O(N 4) . Λ Λ Λ Λ Ensure: RΛ of dimensions nΛ × nΛ. ℓ=1 j=1 i=1 1: for ℓ =1 to nΛ, do Furthermore, for space complexity, we focus on the storage 2: Set i0 = ℓ, and seek all indices {i1,...,i̥ℓ } such that of {qℓ,ℓ =1,...,nΛ}, which is the most demanding source i0 1 in Algorithm 1), and needs not to be robot teams using maximum likelihood estimation,” in Proceedings recomputed. From (19), (21) and Remark 2, it is clear that of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Lausanne, Switzerland, Sep. 30 - Oct. 5 2002, pp. 434–459. calculation of v,βℓ, rℓ in Line 7 requires a constant number [14] K. Y. K. Leung, T. D. Barfoot, and H. H. T. Liu, “Decentralized of basic steps. Furthermore, u and v share exactly the same localization of sparsely-communicating robot networks: A centralized- sparsity pattern, thus nnz(v)= τ . equivalent approach,” IEEE Transactions on Robotics, vol. 26, no. 1, ℓ−1 pp. 62–77, Feb. 2010. Secondly, we examine the number of basic steps for com- [15] A. Howard, M. J. Matari´c, and G. S. Sukhatme, “Localization for mobile (ℓ−1) T puting the matrix-vector product δ = β Λ v, whose robot teams: A distributed mle approach,” in Experimental Robotics VIII, ℓ 2,2 ser. Springer Tracts in Advanced Robotics, B. Siciliano and P. Dario, element δj is the inner product of βℓv and the jth col- Eds. Berlin, German: Springer-Verlag, 2003, vol. 5, pp. 146–155. (ℓ−1) [16] E. D. Nerurkar, S. I. Roumeliotis, and A. Martinelli, “Distributed max- umn of Λ , j = 1,...,nΛ − ℓ. Since nnz(v) = τℓ−1, 2,2 imum a posteriori estimation for multi-robot cooperative localization,” computing δj only requires τℓ−1 scalar multiplications and in Proceedings of the IEEE International Conference on Robotics and τℓ−1 − 1 scalar additions. Therefore, computing δ involves Automation, Kobe, Japan, May 12-17 2009, pp. 1402–1409. [17] F. Dellaert, F. Alegre, and E. B. Martinson, “Intrinsic localization O((nΛ − ℓ)τℓ−1) operations. Once δ is computed, updating T and mapping with 2 applications: Diffusion mapping and Marco Polo s = [s1 ... snΛ−ℓ] is trivial with nΛ − ℓ basic steps localization,” in Proceedings of the IEEE International Conference on [see (22)]. Thus, the overall time complexity of Line 8 in Robotics and Automation, Taipei, Taiwan, Sep. 14-19 2003, pp. 2344– 2349. Algorithm 1 is O((nΛ − ℓ)τℓ−1). [18] S. Panzieri, F. Pascucci, and R. Setola, “Multirobot localisation using Thirdly, we note that (23) is a rank-one modification, where Interlaced Extended Kalman Filter,” in Proceedings of the IEEE/RSJ (ℓ) T International Conference on Intelligent Robots and Systems, Beijing, Λ2,2 is obtained by subtracting an outer product u−1δ from (ℓ−1) China, Oct. 9-15 2006, pp. 2816 – 2821. Λ2,2 . This rank-one update process has a cost of (nΛ − [19] L. Glielmo, R. Setola, and F. Vasca, “An interlaced extended kalman ℓ)(τℓ−1 −1). In particular, we only need to modify those rows filter,” IEEE Transactions on Automatic Control, vol. 44, no. 8, pp. of Λ(ℓ−1) whose indices correspond to the linear indices of the 1546–1549, Aug. 1999. 2,2 [20] N. Karam, F. Chausse, R. Aufrere, and R. Chapuis, “Localization of nonzero elements of u−1. a group of communicating vehicles by state exchange,” in Proceedings In summary, the time complexity of Householder update at of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, Oct. 9-15 2006, pp. 519–524. the ℓth iteration, dominated by the computation of δ and the [21] A. Martinelli, “Improving the precision on multi robot localization by (ℓ) update of Λ2,2, is O((nΛ − ℓ)τℓ−1). using a series of filters hierarchically distributed,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, CA, Oct. 29 - Nov. 2 2007, pp. 1053–1058. [22] P. S. Maybeck, Stochastic models, estimation and control, Volume 1. New York, NY: Academic Press, 1979. [23] D. S. Bayard and P. B. Brugarolas, “On-board vision-based spacecraft estimation algorithm for small body exploration,” IEEE Transactions on Aerospace and Electronic Systems, vol. 44, no. 1, pp. 243–260, Jan. 2008. [24] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed. Baltimore, MD: The Johns Hopkins University Press, 1996. [25] P. A. Businger and G. H. Golub, “Linear least squares solutions by Householder transformations,” Numerische Mathematik, vol. 7, pp. 269– 276, 1965. [26] L. Kaufman, “Application of dense Householder transformations to a ,” ACM Transactions on Mathematical Software, vol. 5, no. 4, pp. 442–450, Dec. 1979. [27] T. A. Davis, “Algorithm 915: Multifrontal multithreaded rank-revealing sparse QR factorization,” ACM Transactions on Mathematical Software, vol. 38, no. 1, Nov. 2011. [28] ——, Direct Methods for Sparse Linear Systems. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2006. [29] “Matlab version 7.12,” Natick, MA, 2011, http://www.mathworks.com/. [30] E. D. Nerurkar and S. I. Roumeliotis, “Asynchronous multi-centralized cooperative localization,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, Oct. 18- 22 2010, pp. 4352–4359.