arXiv:1912.06278v1 [cs.IT] 13 Dec 2019 meilCleeLno,Lno W A,Uie igo (e- Kingdom United 2AZ, SW7 London London, [email protected]). College Beijing, Imperial 5159, Box P.O. al Cryptology, is of shanxiangly Laboratory Lyu Key (e-mail: State S. China [email protected]). [email protected], 510632, Guangzhou University, rcsig(CP,Ynzo,Cia c.2016. Communicat Oct. Wireless China, Yangzhou, on (WCSP), und Conference present China Processing International was of 8th Plan paper the R&D This at Universitie Key 2018YFB1003701. National Central the and the by 2017YFB0802203 part for in and Funds 2019B010137005,21618329, Research o and Fundamental Foundation 2019B010136003 un the Science Grants Research Natural under Applied the and Province by Basic part Guangdong in 1187124 of and 2019B030302008, 61932010 Program 61902149, Major Grants the under China of dation euto nS r ie hnteltiedmnini olar no is dimension the stron when the lengths given than of are the bounds SR reduce upper in performance reduction to cal The aims vectors. framework which new all a (SR), la investigates reduction for paper sequential algorithm this reduction q problems, low-complexity a scale a In detection. developing (MIMO) for multiple-output multiple-input in opeiywiemitiigcmaal ro performanc error l comparable the exhibits maintaining algorithm while SR complexity hash-based other the to MI algorithms, compared tion large-scale that, show to results applied Simulation detection. when tempting especially algorith reduction becomes lattice low-complexity hash-based a of ytm,teei tl naeu opru oepractical more a pursue to MIMO avenue an small-scale still for is suffice there conventional systems, algorithms Although diversity [6]–[8]. reduction same systems the lattice MIMO collect for constellation but MLD on realizations, the whose depend noise the [3]–[5], and not In investigated size does well [2]. complexity det been antennas instantaneous have suboptimal transmit techniques lattice-reduction-aided of tion decades, number two th complexity the past er- exponential with from optimal suffers grows provides and it reliable (MLD) performance, design well-know ror detector to the likelihood is Though maximum arrays detectors. large efficient challeng next very computationally critical the with A by comes [1]. needed that systems requirements communication performance generation the system (MIMO) fulfill multiple-output to multiple-input in hundreds based. .Ln swt h eateto lcrcladEetoi E Electronic and Electrical of Department the with is Ling C. Securit Cyber of College the with are Weng J. and Wen J. Lyu, S. nLwcmlxt atc euto Algorithms Reduction Lattice Low-complexity On Abstract h ubro nenshsbe cldu otn or tens to up scaled been has antennas of number The ne Terms Index hswr a upre npr yteNtoa aua Scien Natural National the by part in supported was work This o ag-cl IODtcin h lsigof Blessing the Detection: MIMO Large-scale for 4 h rpsdnwfaeokealsteimplementation the enables framework new proposed The . Ltierdcini oua rpoesn strategy preprocessing popular a is reduction —Lattice ltierdcin IO ag-cl,hash- large-scale, MIMO, reduction, —lattice .I I. NTRODUCTION hnin y,JnigWn inWn n ogLing Cong and Weng Jian Wen, Jinming Lyu, Shanxiang eunilReduction Sequential osadSignal and ions 088 China. 100878, [email protected], ne Grant under s Guangdong f ,i atby part in 8, owt the with so ngineering, ,which m, npr by part in di part in ed e Grant der rGrants er eFoun- ce ,Jinan y, reduc- owest mail: rge- uest MO e. gest ger ec- led as of at n e s ag ai ihee oe opeiyta L.Later LLL. than to complexity generalized lower been has even preprocesses [27] with which ref. basis MIMO, large large in a attracting more th ( become dimensions, of reduction hundreds lattice in element-based basis a reduce to cryptography unsatisfactor has reduction dimensions. Seysen high that part in [26], fact performance [25], the direction studies of the is follow-up contrast, because few swaps In has [24]. of of reduction [22], versions Seysen [19], complexity order in fixed given implementation the are LLL and the [21]–[23], optim [20], in is simplified [19], step reduction [16], size wit the in the LLL instance, of For and variants ha effort. the Researchers great implementations analyzing [18]. and constructing faster in been given also its are properties KZ, basis bounds. expected blockwise performance for the improve As and pr some implementations give tical [15]–[17] refs. KZ, Regarding algorithms. tion rnlto.Teeoeain ayfrdsic algorithm and distinct swapping, for into reflection, vary operations to basis as These noted input aims sertranslation. a operations an involves algorithm elementary process transform The reduction of property. to better a with required matrix one essence, another effort unimodular quality computational In a the the it. find and between finding basis trade-off for reduced a the provide orthogona of more and or vectors, shorter with basis bases [14 reduced yield reduction They Seysen etc. [13], reduction Lens [12], (LLL) Korkin reduction Lenstra–Lovász Gauss reduction, [11], the Minkowski (KZ) latt reduction of as of Zolotareff types number the such popular condition strategies, all several the are reduction make There make small. to to basis properties: or reduced short, basis vectors desired basis the on depending [9] cryptanalysis to large-scale [10]. applications processing for its image find and algorithm also reduction may problems efficient syste large-scale an for Moreover, algorithm reduction low-complexity iesos ti oeotythat noteworthy is It dimensions. rbes u h hoeia hrceiainof characterization theoretical ELR the but problems, ifrn tutrswt L ains n n ih elur that be belief might one the and into variants, LLL with structures different klscnb neie rmLLK ieaue[2,[13], algorithms [12], new literature the of LLL/KZ analysis performance from analytica the inherited makes no which be Nevertheless, can methods. skills sophisticated more at hl L n lcws Zaesiltedfutcocsin choices default the still are KZ blockwise and LLL While reduc- conventional advance to done been has work Much h rnil fdsgigardcinagrtmvaries algorithm reduction a designing of principle The + a o engvnargru ramn,ee o small for even treatment, rigorous a given been not has ELR and ELR ELR ELR ELR + rpsdi 2]has [27] in proposed ) and a etndt arrive to tuned be can + 2]frsmall-scale for [28] ELR + aetotally have ELR s. ized tra– and ms. ac- ice ies on ed ve ly e- ], h y e 1 l l 2

complicated. vector space spanned by vectors in B . B x and x Γ π Γ ( ) πB⊥Γ ( ) In this work, we investigate a general form of ELR and denote the projection of x onto span(BΓ) and the orthogonal + ELR which we refer to as sequential reduction (SR). We complement of span(BΓ), respectively. x denotes rounding derive the objective function from a MIMO detection task, x to the nearest integer, x denotes getting⌊ ⌉ the absolute value and present the general form of an SR algorithm which can of x, and x denote the| Euclidean| norm of vector x. N and Z solve/approximate the smallest basis problem. Unlike KZ or respectivelyk k denotes the set of natural numbers and integers. Minkowski reduction, SR reduces the basis vectors by using The set of n n integer matrices with 1 is × ± sub-lattices so as to avoid a basis expansion process. The denoted by GLn(Z). strongest algorithm in SR tries to minimize the length of basis vectors with the aid of a closest vector problem (CVP) oracle. II. PRELIMINARIES We show bounds on the basis lengths and orthogonal defects for small dimensions. After that, the feasibility of applying A. Lattices SR to reduce a large dimensional basis is analyzed, and we An n-dimensional lattice Λ is a discrete additive subgroup actually construct a hash-based algorithm for this task. Our in the real field Rn. Similarly to the fact that any finite- simulation results then show the plausibility of using SR in dimensional vector space has a basis, a lattice has a basis. large-scale MIMO systems. To consider a square matrix for simplicity, a lattice generated n n Preliminary results of this work have been partly presented by basis B = [b1, ..., bn] R × is defined as in a conference paper [29]. Compared with [29], this work ∈ contains the following new contributions: Λ(B)= v v = c b ; c Z . The performance bounds on small dimensional bases  | i i i ∈  • i [n] are rigorously analyzed (Theorems 2 and 3). Unlike the  X∈  results in [29] that rely on an assumption about covering The dual lattice of Λ is defined as Λ† = n radius, these bounds hold for all input bases. u R u, v Z, v Λ . One basis of Λ† is given by Comparisons with other types of strong&weak reduction { ∈ | h i ∈ ∀ ∈ } • B−⊤. are made (Section III-B, Section IV-D), including η- The Gram-Schmidt (GS) basis of B, referred to as B∗, is Greedy reduction, KZ and its variants, Minkowski reduc- i 1 found by using bi∗ = π b∗,...,b∗ (bi)= bi j−=1 µi,j bj∗, 1 i−1 − tion, LLL and its variants, and Seysen reduction. { 2 } where µi,j = bi, b∗ / b∗ . A Hash-based SR algorithm is constructed (Section IV). h j i || j || P • The ith successive minimum of an n dimensional lattice More specifically, the nearest neighbor search problem is Λ(B) is the smallest real positive number r such that Λ approximately solved with the aid of hashing, and not contains i linearly independent vectors of length at most r: through a brute-force search. The theoretical studies are supported with more simula- λ (B) = inf r dim(span((Λ (0, r))) i , • i tion results (Section V), these include the comparisons { | ∩B ≥ } in which (t, r) denotes a ball centered at t with radius r. with major lattice-reduction-aided MIMO detection al- B gorithms, and the BER performance tested for various The orthogonality defect (OD) can alternatively quantify the channels . goodness of a basis The types of bases feasible for using SR-Hash is dis- n • i=1 bi cussed (Appendix A). We numerically show that the dual η(B)= || || . (1) det(BT B) of large-scale Gaussian random bases have dense pairwise Q| | angles. From Hadamard’s inequality,p we know that η(B) 1. As ≥ It is worth mentioning that SR is emerging as a new building the of a given basis is fixed, the parameter is block in lattice-reduction-aided MIMO detection. Thus, the proportional to the product of the lengths of the basis vectors. proposed SR variants may also benefit list sphere decoding A necessary condition for reaching the smallest orthogonality [30] and Klein’s sampling algorithm [31]. defect is to have a short basis length defined as l(B) = The rest of this paper is organized as follows. Backgrounds maxi bi . k k about lattices and lattice reduction in MIMO are reviewed in The εCVP problem is, given a vector y Rn and a lattice ∈ Section II. The SR framework is subsequently introduced in Λ(B), find a vector v Λ(B) such that: ∈ Section III. The low-complexity version of SR based on hash- y v 2 ε y w 2 , w Λ(B). ing is given in Section IV. After that, Section V presents the k − k ≤ k − k ∀ ∈ simulation results. Conclusions and possible future research An algorithm that solves an εCVP problem is referred to as an are presented in the last section. εCVP oracle. We write v = εCVP(y, B) or v = CVP(y, B) Notation: Matrices and column vectors are denoted by if ε =1. uppercase and lowercase boldface letters. The ith column and (j, i)th entry of B are respectively denoted as bi and b . I and 0 respectively denote the n n identity matrix B. Lattice-reduction-aided MIMO detection i,j n n × and n 1 zero vector, and the operation ( )⊤denotes matrix We considered an uplink multiuser large MIMO system, transposition.× [n] denotes the set 1,... ,n ·. For a set Γ, B in which n single-antenna users send data to a base station { } Γ T denotes the columns of B indexed by Γ. span(BΓ) denotes the with nR antennas, and both nT ,nR are in the order of tens 3 or hundreds. A received complex-valued signal vector at the Choosing (y, B˜ ) as the SIC [32] detector, then the pairwise E base station is written as: error probability Pe based on (5) becomes

P = 1 Pr(w (B¯ ∗)) e − ∈P yc = Bcxc + wc, (2) n ¯ ¯ 2 = 1 Pr( w⊤bi∗ < bi∗ /2) nR nT − | | where B C × denotes a channel matrix perfectly i=1 c ∈ Y known at the base station, x CnT refers to a signal n ¯ c bi∗ vector with entries drawn from∈ a QAM constellation, and = 1 erf − 2√2σ w CnR denotes a zero-mean additive noise vector with i=1 c Yn   entries∈ independently and identically following the complex 1 1 erf (6) normal distribution 0, σ2 . ≤ − 2√2σ d CN w i=1 i To simplify the analysis we will focus on representations in Y  k k  ¯ the real field, so (2) is transformed to an equivalent real value where the last inequality comes from bi∗ = πb¯ b¯ (b¯ ) πb¯ b¯ b¯ b¯ (b¯ ) = 1/ d , system with 1,..., i−1 i 1,..., i−1, i,..., n i i ≥ k k y Bx¯ w and di is the ith vector in the dual basis of Λ† . = + , (3) From (6), it becomes clear that the upper bound on Pe where is mainly controlled by the lengths of vectors in the dual (B ) (B ) basis, i.e., d1 ,..., dn . Based on this observation, we can B¯ = ℜ c −ℑ c , (4) k k k k (Bc) (Bc) solve/approximate the following problem in the dual lattice  ℑ ℜ  to attain better error rate performance for the above lattice- and y = (y)⊤, (y)⊤ ⊤, x = (x)⊤, (x)⊤ ⊤, w = reduction-aided SIC detector. ℜ ℑ ℜ ℑ (w)⊤, (w)⊤ ⊤ are all real and imaginary compositions. Definition 1 (SBP). The smallest basis problem (SBP) is, ℜ ℑ    Here the noise variance of w becomes σ2 = σ2 /2. given a lattice Λ, find the basis with the smallest orthogonality   w Lattice reduction is essentially multiplying a given basis defect. with a unimodular matrix U GL (Z) to get a reduced n To address SBP, a designed reduction algorithm should basis B˜ , BU¯ . For a lattice-reduction-aided∈ detector, we first make all basis vectors as short as possible. Moreover, since rewrite Eq. (3) as: the basis dimension is in the order of tens or hundreds in large 1 MIMO, we need a low-complexity lattice reduction algorithm y = B˜ (U− x)+ w. that reduces the basis with satisfactory performance. To make the unimodular transform compact for the QAM constellation, we need to scale and shift signal vector x to III. SEQUENTIALREDUCTIONFRAMEWORK get x (x + 1n 1)/2, so that the constraint on x become The fundamental principle of sequential reduction is to ← × n 1 a consecutive integer set Ξ . Let y (y + BU˜ − 1n 1)/2, reduce a basis vector by using all other vectors that span then the inferred signal vector is given← by: × a sublattice. In the new method, given an input basis B1, we sequentially solve si = εCVP(bi, B[n] i) with [n] i = \ \ xˆ =2 Ξn (U Zn ( (y, B˜ ))) 1n 1, (5) 1, ..., n i. For each si, we test whether the residue distance Q Q E − × { }\ 2 2 2 is shorter: bi si < τ bi , where τ (0, 1] is a where (y, B˜ ) denotes a low-complexity detector that could parameter to|| control− || the complexity.|| || If this holds,∈ we update E be zero-forcing (ZF) or successive-interference-cancellation bi by bi bi si. Here both si = 0 and the si that (SIC), and Q ( ) denotes a quantization function with respect to makes b ←s =− b are declared as ineffective attempts. · i i i its subscript. Given certain information about the signal vector, A thresholdk − parameterk k mk is set to count these useless trials. the detectors can be implemented under an minimum-mean- The algorithm terminates if m>n, which means no more square-error (MMSE) principle. The MMSE-based ZF/SIC vectors can be further reduced. The general form of sequential detectors are similarly given by extending the size of the reduction is summarized in Algorithm 1. ¯ ¯ 2 system: y [y⊤, 01 n]⊤, B [B⊤,σ/σsIn]⊤, with σs An SR algorithm maintains a lattice basis due to the referring to← the variance× of a signal← symbol. following reason. In round m, suppose k [n] i ckbk is a ∈ \ valid reduction on bi, then the lattice basis updating process m m P m becomes B BT , with Tk,k = 1 k [n], Tk,i = ← ∀ ∈ m C. The objective in lattice reduction ck k [n] i and all other entries are zeros. Since T − ∀ ∈ \ m Hereby we explain the design criteria of lattice reduction is an integer matrix with determinant 1, T is unimodular, used in MIMO detection. For a set of linearly independent and the composition of the transform matrices from different rounds maintains a unimodular matrix. vectors B¯ = [b¯1,..., b¯n], we define its fundamental paral- lelepiped as If an exact CVP oracle is chosen in line 4 of Algorithm 1, we call the algorithm SR-CVP. By choosing other approximate n 1Unless otherwise specified, B is chosen from the dual of a channel matrix. (B¯ )= cib¯i 1/2 ci 1/2 . P | − ≤ ≤ 2 Choosing τ > 1 may make the algorithm diverge. (i=1 ) X 4

Algorithm 1: The general form of an SR algorithm. theorem that consists of upper bounds for the basis length and the orthogonality defect. Input: lattice basis B = [b1,..., bn], complexity threshold τ; Theorem 2. For any dimension n 4, an SR-CVP reduced Output: reduced lattice basis B. basis satisfies: ≤ 1 i =0, m =1; 2 while m n do 4n ≤ l(B) λn(B), (8) 3 i (i mod n)+1; ⊲ The column index; ≤ 5 n ← r − 4 si = εCVP(bi, B[n] i); ⊲ Exact/approximated CVP n/2 n \ 4 λn(B) solvers; η(B) n . (9) ≤ 5 n λ1 (B) 5 if b s 2 < τ b 2 then  −  || i − i|| || i|| 6 b b s ; Proof. We first show upper bounds for terms 2 and 3 in (7), i ← i − i 7 m =1; respectively. By constructing a sublattice Λ(B′) from vectors with lengths λ (B), ..., λ (B) in Λ(B), the covering radius 8 else 1 n satisfies 9 m m +1; ← B x B ρ( ) = maxx dist( , Λ( ))

max dist(x, Λ(B′)) ≤ x CVP solvers, we can obtain other variants that have lower n complexity. As shown in Fig. 1, SR encompasses SR-CVP, SR- 1/2 λ2(B), Pair and SR-Hash. The red box in the figure denotes SR, whose ≤ v i ui=1 SR-CVP, SR-Pair and SR-Hash algorithms feature decreasing uX t complexity. Their analogies in the conventional KZ framework where the last inequality is obtained after applying Babai’s are shown in the black box. nearest plane algorithm [33]. Since the residue distance of CVP is upper bounded by the covering radius of the sublattice Λ(B[n] i), we have for term 3 that \ 2 2 πB n i (bi) si ρ (B[n] i) SR: SR-CVPSR-Pair SR-Hash [ ]\ − ≤ \ 1 2 λj (B[n] i) ≤ 4 \ LLL LLL-variants j=i KZ: KZ X6 1 b 2 . (10) ≤ 4 k j k j=i X6 Complexity decreasing Regarding term 2, we use QR decomposition to get B[n] i, bi = QR, from which we obtain Fig. 1: SR contains algorithms with different performance- \ πB⊥ (bi) = rn,n . Now w.l.o.g. assume that the complexity trade-offs. [n]\i  | | successive minima λ1(B), ..., λn(B) come from vectors , , v 1 B[n] i, bi c1, ..., vn B[n] i, bi cn. To produce n linearly independent\ vectors, there\ exists at least one     A. Basis properties of SR-CVP vector denoted as ck whose nth entry ck,n is nonzero. 2 2 2 Then we have Rck = λk(B) λn(B). Together with We need to understand the performance limits of SR by first k k ≤ 2 2 2 2 n 1 2 analyzing SR-CVP. Hereby we set τ = 1 in the analysis for Rck = c r + − v πB⊥ (bi) , it arrives k k k,n n,n j=1 n,j ≥ [n]\i brevity. When no more attempts using si = CVP(bi, B[n] i) at can further reduce the basis in the algorithm, for all s \ P i ∈ Λ B[n] i we have 2 \ b 2 B (11) πB⊥[n]\i ( i) λn( ). b 2 b s 2 ≤ i i i k k ≤ || − || By substituting (10) and (11) to (7) for all basis vectors, we term 1 2 have = πB⊥ (bi)+ πB (bi) si 2 2 | {z } [n]\i [n]\i 2 1 || − || b1 λn(B)+ 4 j=1 bj , 2 2 k k ≤ 6 k k B . = πB⊥ (bi) + π [n]\i (bi) si , (7) . [n]\i − . P  2 2 term 3  2 1 term 2 bn λn(B)+ 4 j=n bj . k k ≤ 6 k k where the second equality is due to| Pythagoras’{z theorem.} Note  | {z } The sum of these n inequalities yieldsP that SR-CVP provides the tightest constraint on term 3, and n n approximations of CVP oracles also distinguish themselves 2 n 1 2 b nλ2 (B)+ − b . on the same term. Based on (7), we can prove the following k ik ≤ n 4 k ik i=1 i=1 X X 5

If n 4, we have Clearly the above theorem is non-trivial when n 4, and ≤ ≤ n this will come in handy when attacking a counter example in 4n b 2 λ2 (B). (12) subsection IV-D. k ik ≤ 5 n n i=1 X − Based on Eq. (12), the longest vector in the basis can be trivially bounded as B. Discussions 4n 1) Comparison with η-Greedy reduction [34, Fig.5] (also l(B) λ (B). ≤ 5 n n noted as ELR+-SLV in [28]). Rather than applying CVP r − To analyze the orthogonal defect, we apply the arithmetic for all vectors, η-Greedy reduction only performs CVP mean-geometric mean inequality on (12) to get for the longest basis vector . According to its definition [34], it is only a special case of SR-CVP and all SR-CVP n n n/2 1 4 n/2 reduced basis must be greedy-reduced. For example, b b 2 λn(B). || i|| ≤ n k ik ≤ 5 n n consider the following basis i=1 i=1 !  −  Y X (13) 20001 Clearly the volume of the lattice is lower bounded by λn (B) 1 02001 for n 4, so along with (13) we obtain (9).   ≤ 00201  00021    If we alternatively set τ < 1, then upon termination of SR  0 00 0 ε  2 2   we have bi 1/τ bi si . Along with the techniques   || || ≤ || − || used in Theorem 2, we obtain with parameter ε (0, 1). The shortest vector ∈ 00 0 0 2ε ⊤ cannot be reached by greedy 4n ± l(B) λn(B). (14) reduction. Specifically, by using 1 b as the query ≤ 4τ n +1   × 5 r − point, η-Greedy cannot find 2b5 b1 b2 b3 b4 Since we have to ensure that the denominator 4τ n +1 is − − − − and 2b5 + b1 + b2 + b3 + b4. In contrast, SR-CVP larger than 0, we claim that inequality (14) holds if n<− 4τ +1. − additionally considers the cases of using b1, b2, b3, and If the CVP oracle is replaced by another suboptimal solver n b4 as query points. A shortest vector v = i=1 cibi referred to as εCVP, then when bounding term 3 we have with at least one coefficient c = 1 must be contained k ± 2 2 in the SR-CVP reduced basis. P πB n i (bi) si ερ (B[n] i). [ ]\ − ≤ \ 2) Comparison with KZ and its variants [16], [35]. Recall Similarly to the above, it yields that a basis B is called KZ reduced if it satisfies the size reduction conditions, and πB⊥ (bi) is the shortest 4n [i−1] l(B) λ (B), (15) n vector of the projected lattice πB⊥ ([bi,..., bn]) for ≤ 4 εn + ε [i−1] r − 1 i n [35]. For a KZ reduced basis, it satisfies in which n< 4/ε +1. ≤ ≤ √i+3 [35] bi 2 λi(B), 1 i n. Though boosted Let θi be the angle between bi and the subspace KZ [16]k k can ≤ solve the length≤ increasing≤ issue caused B , and define , . Such a maxi- span( [n] i) θmax maxi θi by size reduction, tuning π (b ) to be the shortest \ B⊥[i−1] i mum angle between basis vectors and subspaces can also be vector in the projected lattice can still make the basis bounded, as shown in the following theorem. longer. On the contrary, this issue is totally avoided in 2 Theorem 3. An SR-CVP reduced basis satisfies cos θmax SR-CVP. n 1 ≤ 3) Comparison with Minkowski reduction. Recall that a −4 . lattice basis B is called Minkowski reduced if for any Proof: Based on (7) and (10) we have integers c1, ..., cn such that ci, ..., cn are altogether 2 2 2 1 2 coprime, it has b1c1 + + bncn bi for 1 B bi πB⊥ (bi) π [n]\i (bi) si bj . k ··· k≥k k ≤ k k − [n]\i ≤ − ≤ 4 k k i n [15]. For a Minkowski reduced basis, it satisfies j=i ≤ (i 4)/2 X6 (16) [15] bi max 1, (5/4) − λi(B), 1 i n. 2 k k ≤ ≤ ≤ 2 2 2 Whereas Minkowski reduction is optimal as it reaches It then follows from bi cos θi = bi πB⊥ (bi)  k k k k − [n]\i all the successive minima when n 4, our results in that ≤ Theorem 2 only show the SR-CVP reduced basis is 2 2 1 2 b cos2 θ b cos2 θ b . not far from the optimal one. Here we argue that SR- k ik max ≤k ik i ≤ 4 k j k j=i CVP has simpler structure. While Minkowski reduction X6 requires solving integer least squares problems with Similarly to the techniques used in proving Theorem 2, we GCD constraints and delicate basis expansion, SR-CVP sum (16) for to get i =1,...,n only involves unconditional CVP solvers and its basis n 1 expansion process is trivial. Moreover, the SR-CVP cos2 θ − . max ≤ 4 algorithm can be approximately implemented by its many low-complexity siblings in the SR family. 6

C. Complexity of SR and SR-CVP B. Angular LSH We argue that even when the threshold parameter τ = LSH roughly works as follows: first all N candidates 1, the decrease from bi to bi si can be finitely are dispatched to different buckets with labels, then when counted because a lattice|| || is discrete.|| − Therefore|| we define searching the nearest neighbor of a query point q, we can bi si n 2 ǫ = supb s || − || which satisfies ǫ< 1. As b is alternatively do this only for N ′ candidates that have the same i, i bi i=1 i || ||n k k no smaller than λ2(B) while this metric keeps decreas- label with q, where N ′ N. There are label functions f i=1 i P ≪ ing for every n iterations, the number of calls to the CVP which map an n-dimensional vector v to a low-dimensional P B 2 v k kF sketch of . For certain distance function D, vectors which oracle is not larger than n log( n λ2(B) ) with τ 1, where Pi=1 i ≤ are nearby in the sense of D have a high probability of having B 2 F denotes the Frobenius norm of the input basis, and the the same sketch, while vectors which are far away have a low klogkfunction is over min 1/τ, 1/ǫ . Therefore we conclude { } probability of having the same image under f. that the number of iterations in SR is polynomial. To reach this property, we introduce the definition of an Regarding SR-CVP, since the reduction in each round is LSH family . quite strong, we can use the following heuristic implemen- F tation to minimize the number of iterations: first reduce the Definition 5. A family = f : Rn N of hash functions F { → } longest vector (similarly to η-Greedy), then reduce other basis is said to be (r1, r2,p1,p2)-sensitive for a similarity measure Rn vectors in descending order with n 1 rounds of CVP. D if for any u, v , we have i) If D(u, v) r1, − ∈ ≤ Our simulation results show that this version of SR-CVP then Prf (f(u) = f(v)) p1; ii)If D(u, v) r2, then ∈F ≥ ≥ is competitive with Minkowski reduction and boosted KZ Prf (f(u)= f(v)) p2. ∈F ≤ reduction. For the sake of constructing a hash family with p 1 1 ≈ While we can employ a state-of-the-art implementation and p2 0, normally one first constructs p1 p2 and ≈ ≈ for CVP, its complexity for a random basis is exponential then uses the so called AND- and OR-compositions to turn it [36]–[38]. In the next section, we will focus on approximate into an (r1, r2,p1′ ,p2′ )-sensitive hash family ′ with p1′ >p1 F versions of CVP. and p2′ < p2, thereby amplifying the gap between p1 and p2. Specifically, by combining k AND-compositions and t OR-compositions, we can turn an (r1, r2,p1,p2)-sensitive t t IV. HASH-BASED APPROXIMATION: SR-HASH hash family into an (r , r , 1 1 pk , 1 1 pk )- F 1 2 − − 1 − − 2 sensitive hash family ′. As long as p1 >p2, we can always A. The nearest neighbor problem in SR-Pair F  k t  find values of k and t such that 1 1 p1 1 and t − − → When the εCVP subroutine is not implemented with an 1 1 pk 0. − − 2 →  exact CVP algorithm but rather a pairwise cancellation with Note that if given a hash family which is (r , r ,p ,p )-  1 2 1 2 the following form: sensitive with p p , then weH can use to distinguish 1 ≫ 2 F between vectors which are at most r1 away from v, and b = arg min b(j) , j = 1,...,N i, (17) vectors which are at least r away from v with non-negligible i || i || { }\ 2 (j) probability, by only looking at their hash values. Although bi = bi bi, bj / bj , bj bj , − ⌊h i h i⌉ large values of k and t can amplify the gap between p1 and p2, we refer to the whole algorithm as SR-Pair. This algorithm large parameters come at the cost of having to compute many coincides with the element-based reduction in [27]. Although hashes and having to store many hash tables in memory. To this sub-routine only has a complexity in the order of O(nN), minimize the overall , we need the following reaching another variant with lower complexity is possible. lemma that shows how to balance k and t. In practice, we can Recall the nearest neighbor problem in the field of large further tune k and t to have the best performance. dimensional data processing is: given a list of n-dimensional Lemma 6 ([41], [42]). Suppose there exists an (r1, r2,p1,p2)- Rn vectors L = v1, v2,..., vN , preprocess L in such sensitive family . For a list L of size N, let a way that, when{ later given} a target∈ vector q / L, one can F ∈ 1 efficiently find an element v L which is almost the closest log p1− log N ρ ∈ ρ = 1 , k = 1 , t = O(N ). to q. Since Eq. (17) exactly defines a search for the nearest log p2− log p2− neighbor of bi among the vectors in B, then it motivates us to reduce this complexity to O(n log N) based on locality- Then given a query point q, with high probability we can either sensitive-hashing (LSH) [39], [40]. find an element v L such that D(q, v) r2, or conclude that with high probability,∈ no element v ≤L with D(q, v) > Remark 4. If we choose SIC as the εCVP subroutine, then ∈ r1 exist, with the following costs: i) Time for preprocessing along with LLL preprocessing we have [33] the list: O(kN 1+ρ); ii) Space complexity of the preprocessed data: O(N 1+ρ); iii) Time for answering a query: O(N ρ). √ n 1 bi πB[n]\i (bi) si 2(2/ 3) − ρ(B) (18) − − ≤ In the sequel, we examine the implementation of LSH based for such an SR-SIC algorithm. However, the computation on angular hashing. Angular hashing means generating random complexity of this algorithm is still too high as it requires hyperplanes h1,..., hk, such that the whole space is sliced the pre-processing by LLL. into 2k regions. After that, to find the nearest neighbor of q, 7 one only compares q to points in the same region . Here we Algorithm 2: The SR-Hash algorithm. introduce the angular distance similarity function R Input: original lattice basis B = [b1,..., bn], complexity threshold τ, LSH parameters t, k. u⊤v D(u, v) = arccos . Output: Reduced basis [b1,..., bn] of Λ. u v t k kk k 1 Initialize t empty hash tables (Tl)l=1, each has k random With this measure two vectors are nearby if their common hash functions f ,...f ; l,1 l,k ∈F angle is small. Its corresponding hash family is defined by 2 for i =1 n do ··· t 3 Add bi to all hash tables (Tl) , with hash values n 1 if a⊤v 0; l=1 = fa : a R , a =1 ,fa (v)= ≥ t (fl (bi)) and vectors in the same bucket noted as F { ∈ k k } (0 if a⊤v < 0. l=1 Tl (fl (bi)); Intuitively, the space that is orthogonal to a defines a hy- 4 i =0, m =1; perplane, and fa maps the two regions separated by this 5 while m n do hyperplane to different bits. In particular, for any two an- ≤ 6 θ1 θ2 i (i mod n)+1; ⊲ The column index; gles θ1 < θ2, the family is (θ1,θ2, 1 π , 1 π )- ← t F − − 7 Obtain the set of candidates C = l=1Tl (fl ( bi)); sensitive. Further with k AND- and t OR- compositions, we ∪ 2± t t k k 8 cl = arg mincl C bi bi, cl / cl, cl cl ; θ1 θ2 have (θ1,θ2, 1 1 1 , 1 1 1 )- ∈ k − ⌊h i h i⌉ k − − − π − − − π 9 si = bi, cl / cl, cl cl; sensitive hash family. ⌊h 2i h i⌉ 2       10 if bi si < τ bi then To illustrate LSH and in particular the angular LSH method || − || || || 11 Remove bi from all hash tables; described above, Fig. 2 shows how hyperplane hashing might 12 bi bi si; work in a 2-D setting. In the figure, we have a list of ← − 8 13 Add bi to all hash tables; candidates: L = b1,..., b8 , and we use k =2 hyperplanes 14 m =1; for t =2 hash tables.{ Each table} stores the hash keys (labels) 15 else along with elements being placed in buckets, where elements 16 m m +1; having the same keys will be placed in the same buckets. ← In the two tables, the AND-compostions of 11 respectively correspond to b1, b2, b3 and b1. Based on OR-composition, the nearest neighbor of b1 is found inside b2, b3 . { } D. Discussions

b b 1 0 b8 Labels Buckets b1 1) Comparison with SR-CVP. Here we emphasize that SR- f1 b7 b 1 1 b1 b2 b3 b Pair/SR-Hash is only a weak approximation for SR-CVP, b2 f2 1 0 b8 b and these low complexity algorithms may have quite in- b3 b 0 1 b4 1 b b ferior performance. Consider the counter example given 0 b5 b6 0 0 b5 b6 b7 b4 for ELR [27] (the same as SR-Pair). Clearly SR-Pair/SR- Hash is unable to reduce a basis whose Gram matrix is b b8 Labels Buckets b1 b f1 b7 1 1 1 b1 1 0.5 ν 0.5 ν b2 b b − − 0 1 0 b2 b3 b4 G = 0.5 ν 1 0.5+ ν b b3 1 f2 0 1  − −  b b8 0.5 ν 0.5+ ν 1 b 0 b b6 0 0 − − b4 b5 b5 b6 b7   with ν 0. Under spherical coordinate system of Fig. 2: Demonstration of LSH. (r, ̺, ϕ) →with r = 1, ̺ = π/3, and ϕ = π/2 ν, the lattice basis A corresponded to G (up to a unitary− C. LSH-Based Reduction transform) is given by Now we show how to incorporate LSH into the sequen- sin ϕ cos(π/3) sin ϕ cos(π/3) 1 tial reduction algorithm. The pseudo-codes of SR-Hash are A = sin ϕ sin(π/3)− sin ϕ sin(π/3) 0 . presented in Algorithm 2. It first hashes all vectors in lines  cos ϕ cos ϕ 0  2-3. Then inside the loop of the element-based reduction, −  (19) the search for finding the nearest neighbor of bi is within t This basis has an angle θi <ν 0 between any ai and C = l=1Tl (fl ( bi)). Every time when a shorter bi is → A ∪ ± span(A[3] i), and η(A)= if ν 0. If is reduced found, its hash labels and positions in buckets are updated \ ∞ → by using SR-CVP, we have θmax π/4 according to in lines 11 and 13. Theorem 3, so A is not a stable≥ basis for SR-CVP. In summary, SR-Hash can be seen as a generalization of Moreover, the actual reduced basis has the following the naive brute-force search inside SR-Pair for finding nearest form: neighbors, as k =0, t =1 corresponds to checking all other basis vectors for nearby vectors, while increasing both k and 2 sin ϕ cos(π/3) 1 sin ϕ cos(π/3) 1 − − t leads to fewer comparisons but a higher cost of computing A˜ = 0 sin ϕ sin(π/3) 0 . hash keys and checking buckets.  2cos ϕ cos ϕ 0  −   8

Its OD is 4cos2 ϕ + sin2 ϕ 2 sin ϕ +1 η(A˜ )= − ; 6 -Greedy 4.5 -Greedy p √3 sin ϕ cos ϕ 5 SR-CVP SR-CVP 4 4 when given ϕ = π/2 10− , we have bKZ bKZ ˜ − 3.5 η(A) ϕ=π/2 10−4 = 1.1547. Therefore, the proposed 4 Minkowski Minkowski low-complexity| − SR algorithms are only feasible for 3 bases whose input vectors are dense in some directions. 3 Our simulation results and Appendix A will show that 2.5 the dual lattice basis in MIMO detection is one example 2 of this. 2 2) Comparison with LLL and its variants [19], [22], [24]. Orthogonality defect Orthogonality defect Note that the worst case complexity of LLL for bases 1.5 in the real field is unbounded [43], and the variants that control the order of swaps or a selective imple- mentation of size reduction cannot remove this curse. On the contrary, SR variants with a polynomial time 2 4 6 8 10 2 4 6 8 10 εCVP routine can enjoy the overall polynomial-time Dimension Dimension complexity. Regarding performance bounds, LLL and (a) Primal bases. (b) Dual bases. its variants (the maintains the Siegel condition and size reduction condition) often have bounds of the form Fig. 3: The orthogonal defects of different types of strong n 1 reduction. l (B) 2 − λn (B), while SR-Pair and SR-Hash are heuristic.≤ 3) Comparison with Seysen reduction [14]. Rather than entries from (0, 1), respectively. The figure shows that minimizing the orthogonality defect of a basis, a metric N called Seysen’s measure can reflect whether both the pri- ODs of SR-CVP, bKZ and Minkowski reduced bases are n 2 2 almost indistinguishable. η-Greedy has the worst performance mal and dual bases are short: i=1 bi di . Seek- ing for the global minimum of this metrick k k is extremelyk as expected, because it is not designed to minimized all hard; when referring to Seysen’sP algorithm [25], it is the basis vectors. Since Minkowski reduction is the state-of-the- n 2 2 art algorithm for generating the shortest basis in practice, our one that finds a local minimum of i=1 bi di without any theoretical performance guarantee.k k k Sim-k results show that SR-CVP practically reaches optimality as ilarly to SR-Pair, Seysen’s algorithmP performs basis well. updates in a pair-wise manner: Fig. 4 plots the averaged number of CVP runs in Fig. 3 when using η-Greedy and SR-CVP. It is known that both bj = bj + ci,j bi,i = j, Minkowski and bKZ cost around n oracles for the shortest 6 vector problem (SVP) or CVP. Fig. 4 reflects that SR-CVP 1 di,dj bi,bj with ci,j = h 2i h 2i . Due to the ad- 2 di bi actually needs fewer than n rounds of CVP, and is only slightly ⌊ k k − k k ⌉ ditional inner product calculation in the dual basis, more complicated than η-Greedy. Seysen’s algorithm is more complicated than SR-Pair, and it does not support the hash-based implementa- tion. Moreover, in large ( 35) dimensions Seysen’s B. SR-Hash vs. SR-Pair and LLL variants algorithm often halts at a≥ local minimum [14, P.375]. In this subsection, we study the complexity/performance Since the error rate performance is only controlled by tradeoffs of different types of weak lattice reduction. The the length of the dual basis, our empirical results also modulation is set as 16 QAM, and the results are obtained show that Seysen’s algorithm is not competitive for large from 1 104 Monte Carlo runs. We denote the zero-forcing dimensions. detector× by “ZF”, the successive interference cancellation detector by “SIC”, and lattice-reduction-aided detectors with V. SIMULATION RESULTS prefixes: “LLL-SIC/ZF” [32], “bLLL-SIC/ZF” [16], “SR-Pair- A. Performance of SR-CVP SIC/ZF” (this paper), “SR-Hash-SIC/ZF” (this paper), and Hereby we employ the OD’s to compare SR-CVP with “Seysen-SIC/ZF” [26]. Here comparisons are made for major other strong lattice reduction algorithms, including the boosted lattice-reduction-aided methods in large-scale MIMO systems, Korkin-Zolotarev reduction noted as “bKZ”, the Minkowski because they represent pre-processing based methods that may reduction noted as “Minkowski”, and the η-Greedy reduction attain the diversity order of ML detection [6], [44]. [34, Fig.5] noted as “η-Greedy”. Results are averaged over 1) i.i.d. channels: Assume that each entry of the chan- 1 104 Monte-Carlo runs, and SR-CVP is implemented by nel matrix is chosen from a standard normal distribution the× heuristic version in subsection III-C. (0, 1). Fig. 5 plots the bit error rate (BER) performance CN Fig. 3 plots dimension versus OD for distinct algorithms of different uncoded MIMO detectors in a real domain 2nT for the primal and dual of a Gaussian random matrix with 2n = 60 60 MIMO system. Here the linear detectors are× R × 9

10-1 10 10 -Greedy -Greedy 9 SR-CVP 9 SR-CVP 10-2 8 8

7 7 BER

6 6 10-3 5 5

4 4 10-4 14 16 18 20 22 24 Number of effective CVP 3 Number of effective CVP 3 SNR/dB 2 2 (a) Lattice-reduction-aided SIC detectors.

1 1 2 4 6 8 10 2 4 6 8 10 10-1 Dimension Dimension (a) Primal bases. (b) Dual bases.

Fig. 4: The number of effective CVP runs in -Greedy and -2 η 10 SR-CVP. BER implemented with the MMSE criterion. Parameters in LSH are 10-3 chosen as t = n0.585 = 11, k = log n =6. ⌊ ⌉ ⌊ ⌉ In the high SNR region of Fig. 5-(a), we observe that, in addition to the well-known fact that ZF, SIC and Seysen-SIC 10-4 fail to achieve the full diversity order, b-LLL-SIC, SR-Pair- 16 18 20 22 24 26 SIC and SR-Hash-SIC all attain approximately 1dB gain over SNR/dB LLL-SIC. As for Fig. 5-(b), the variants of SR both outperform (b) Lattice-reduction-aided ZF detectors. conventional and boosted LLL algorithms. Both sub-figures Fig. 5: The BER performance of different detectors in large indicate that SR-Hash gets very close to SR-Pair. MIMO. The complexity of implementing the lattice reduction al- gorithms is plotted in Fig. 6, where sub-figure (a) is for the effective channel matrix under the MMSE criterion, and sub- outperform Seysen and LLL in terms of OD. figure (b) under the ZF criterion. Considering the difficulty 2) Correlated Channels: Results in the last example were in analyzing the number of floating-point operations for hash obtained for i.i.d. frequency-flat Rayleigh fading channels. The operations, here we measure the complexity by the number of performance of MIMO systems in realistic radio environments vector comparisons. This equals to the number of iterations however sometimes depends on spatial correlation. Therefore, times: the size of the basis for SR-Pair, the number of vectors we investigated the effect of channel correlation on the per- in the same buckets for SR-Hash, and to the size of vectors formance of the new reduction algorithms. Based on [45], the for doing size-reductions for both LLL and bLLL. From Fig. spatially correlated channel is modeled as 6-(a), we observe that the LLL variants are not affected by SNR in the MMSE matrix, and Seysen, SR-Pair and SR- Bc =ΨBc,

Hash gradually increase with the rise of SNR. This shows the nR nR where Ψ R × is the correlation matrix defined by complexity of Seysen, SR-Pair and SR-Hash are dependent on ∈ f nR 1 the quality of the input bases. Regarding the stationary lines 1 ρ ... ρ − nR 2 in Fig. 6-(b), the numbers of comparisons of Seysen, SR-Pair ρ 1 ... ρ −   and SR-Hash reflect the asymptotic values of their counter- Ψ= . . . . , . . .. . parts in Fig. 6-(a). Both subfigures reveal that the hash method    ρnR 1 ρnR 2 ... 1  helps to reduce the complexity of SR-Pair significantly. A  − −    natural question that arises here is whether the complexity and ρ refers to the spatial correlation coefficient. dependency of SR-Pair&SR-Hash on input bases may lead to With the same chosen parameters in the algorithm as those inferior performance at low SNR. To address this question, for i.i.d. channels, Fig. 8 demonstrates the BER performances we plot the SNR versus OD relations of different reduction against SNR in correlated channels respectively with ρ =0.1 algorithms in Fig. 7. We observe from the figure that even and ρ = 0.3. It reveals that, as ρ increases, the SR aided at low SNR, SR-Pair&SR-Hash featuring low complexity still detectors suffer from more severe performance degradation 10

100 106 106

10-1

5 5 10-2

10 10 BER

10-3

104 104 10-4

Number of comparisons Number of comparisons 12 14 16 18 20 22 24 SNR/dB (a) ρ = 0.1.

103 103 14 16 18 20 22 24 10 15 20 100 SNR/dB SNR/dB (a) MMSE criterion. (b) ZF criterion. 10-1 Fig. 6: The complexity of different lattice reduction algorithms in i.i.d. channels. 10-2 BER

1020 10-3 1018

-4 16 10 10 14 16 18 20 22 24 26 SNR/dB OD = 0 3 1014 (b) ρ . . Fig. 8: The BER performance of lattice-reduction-aided SIC 1012 detectors in correlated channels with ρ =0.1 and ρ =0.3.

1010 14 16 18 20 22 24 algorithms. Within the SR framework, the SR-Hash method SNR/dB can serve as an effective subprogram, and simulation results Fig. 7: The ODs of MMSE matrices reduced by different show that the complexity-performance trade-off outperforms algorithms. those of SR-Pair and LLL variants in large MIMO detection. We believe that the studies we initiated here, only scratch the tip of the iceberg about the new lattice reduction family. than the LLL aided methods, although the BER gaps between Many important questions remain to be answered. Research on SR variants and LLL variants are very small. This is not un- the interactions and combinations of SR with other techniques expected because we do have examples showing SR-Pair/SR- such as floating-point arithmetic [46], [47], randomized detec- Hash cannot reduce certain matrices (e.g., the matrix in (19)). tion algorithms [31], [48], success probability analysis [49], Lastly, as plotted in Fig. 9, the complexity of SR-Pair/SR-Hash [50], and numerous other topics is now being pursued. is still much lower than those of LLL variants and Seysen, and the proposed SR-Hash has much lower complexity than SR- Pair. APPENDIX A ONTHETYPEOFBASESFEASIBLEFOR VI. CONCLUSIONS AND FUTURE WORK SR-PAIR&SR-HASH To summarize, we have unveiled a new lattice reduction family called sequential reduction, which enjoys a polynomial We first argue that a large dimensional Gaussian random number of iterations. Theoretical bounds on basis lengths and basis is always SR-CVP reduced, and thus being SR-Pair orthogonality defects are derived under the premise that an ex- and SR-Hash reduced. The inability to change such bases is act CVP subroutine has been invoked. Though we only manage however not a problem because these bases are close to being to prove these results for small dimensions, they still provide orthogonal. insights on understanding the performance of such a class of 11

2 6 6 n 10 10 2 lim Pr b1 + biai b1 n  ≤k k  →∞ i=2 X  1 n/2 lim n ≤ n 1+2β (1 2β) a2 105 105 →∞  − i=2 i  < ε. P

Next, we investigate the reduction on the dual of a Gaussian 104 104 random basis, which arises in our detection problem. For an input basis B we define Number of comparisons Number of comparisons b , b θ = arccos |h i j i| , 1 i = j n. i,j b b ≤ 6 ≤  i j  3 3 k kk k 10 10 The following lemma says that the SR-Pair method can 12 14 16 18 20 22 24 15 20 25 provide a Gauss-reduced basis for all pairs of vectors with SNR/dB SNR/dB pairwise angles θ > π/3. (a) ρ = 0.1. (b) ρ = 0.3. i,j Fig. 9: The complexity of lattice reduction algorithms in Lemma 8. For an SR-Pair reduced basis, we have θi,j > π/3 for all i = j. correlated channels with ρ =0.1 and ρ =0.3. 6 Proof. If a lattice basis B is non-reducible by SR-Pair, we have bi, bj / bj , bj = 0 i = j. Therefore the lemma Proposition 7. For a Gaussian random basis whose entries follows⌊h from i h i⌉ ∀ 6 follow the distribution (0, 1), the probability that it is not N 1 bi 1 min( bi , bj ) SR-CVP reduced goes to zero as n . cos θi,j < k k < k k k k 1/2. → ∞ 2 bj 2 max( bi , bj ) ≤ Proof: We need to show that for all choices of coefficients k k k k k k Z ai′ s in with at least one nonzero ai, the probability Here we argue that the pairwise angles are dense in the dual n 2 of a Gaussian random basis. Fig. 10-(a) plots the histogram Pr b + b a b 2  1 i i ≤k 1k  of such random matrices. It shows that so a large number of i=2 X vectors satisfy θ < π/3, and these vectors will trigger the   i,j n reduction in SR-Pair/SR-Hash. On the contrary, as predicted vanishes as the problem size n increases. Since i=2 biai is an isotropic Gaussian random vector with covariance by Proposition 7, Fig. 10-(b) shows that the primal basis will n n n P E ⊤ 2 not be reduced by SR-Pair/SR-Hash. Bases with dense angles ( i=2 biai) ( i=2 biai) = i=2 ai In, then for also feature large orthogonality defects. In Fig. 11, we plot the any β > 0,  P P P  OD versus dimension n relations respectively for the dual and n 2 primal Gaussian random matrices. The figure shows the dual 2 n1.5 Pr b1 + biai b1 bases approximately have a growth rate of O(20 ), while  ≤k k  i=2 X that of the primal basis is extremely small. The above confirms n 2 2  β b1+ biai b1 that the objective lattice bases in MIMO detection are easily E e−  Pi=2 −k k  , ≤ k k reducible by tuning the pairwise angles.   dxdv = (2π)n REFERENCES Z n 2 In 2 a βIn v [1] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta, 1 [v⊤,x⊤] i=2 i   − 2 2 n a2βI 1+2β n a2 I x O. Edfors, and F. Tufvesson, “Scaling up MIMO: opportunities and e  i=2 i n pP i=2 i n   challenges with very large arrays,” IEEE Signal Process. Mag., vol. 30, no. 1, pp. 40–60, 2013. 1/2 pP n P2  − [2] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, “Closest point search in In 2 ai βIn = det i=2 lattices,” IEEE Trans. Inf. Theory, vol. 48, no. 8, pp. 2201–2214, 2002. 2 n a2βI 1+2β n a2 I  i=2 i n pP i=2 i n  [3] H. Yao and G. W. Wornell, “Lattice-reduction-aided detectors for n/2 MIMO communication systems,” in Proc. Global Telecommunications pP1 P  Conference (GLOBECOM), Taipei, Taiwan, pp. 424–428, 2002. = n 2 . (20) 1+2β (1 2β) i=2 ai [4] C. Windpassinger, R. F. H. Fischer, and J. B. Huber, “Lattice-reduction-  −  aided broadcast precoding,” IEEE Trans. Commun., vol. 52, no. 12, pp. By optimizing over Pβ in the denominator, we have 1 + 2057–2060, 2004. 2β (1 2β) n a2 1+ 1 n a2. This means we can [5] Z. Wang, Y. Huang, and S. Lyu, “Lattice-reduction-aided gibbs i=2 i 4 i=2 i algorithm for lattice gaussian sampling: Convergence enhancement and use β−= 1 to reach≤ the tightest bound for inequality (20). 4 P P decoding optimization,” IEEE Trans. Signal Processing, vol. 67, no. 16, Therefore, for any ε> 0, we have pp. 4342–4356, 2019. 12

basis,” Combinatorica, vol. 13, no. 3, pp. 363–376, sep 1993. [15] W. Zhang, S. Qiao, and Y. Wei, “HKZ and minkowski reduction 104 14000 4.5 algorithms for lattice-reduction-aided MIMO detection,” IEEE Trans. Signal Process., vol. 60, no. 11, pp. 5963–5976, 2012. 4 [16] S. Lyu and C. Ling, “Boosted KZ and LLL algorithms,” IEEE Trans. 12000 Signal Process., vol. 65, no. 18, pp. 4784–4796, Sep. 2017. 3.5 [17] J. Wen and X. Chang, “On the KZ reduction,” IEEE Trans. Information 10000 Theory, pp. 1–1, 2018. 3 [18] Y. Chen and P. Q. Nguyen, “BKZ 2.0: Better lattice security estimates,” in Proc. ASIACRYPT, Seoul, South Korea, 2011, pp. 1–20, 2011. 8000 2.5 [19] C. Ling, W. H. Mow, and N. Howgrave-Graham, “Reduced and fixed- complexity variants of the LLL algorithm for communications,” IEEE Trans. Commun., vol. 61, no. 3, pp. 1040–1050, 2013. 6000 2 [20] W. Zhang, S. Qiao, and Y. Wei, “A diagonal lattice reduction algorithm 1.5 for MIMO detection,” IEEE Signal Process. Lett., vol. 19, no. 5, pp. 4000 311–314, 2012. 1 [21] H. Vetter, V. Ponnampalam, M. Sandell, and P. A. Hoeher, “Fixed complexity LLL algorithm,” IEEE Trans. Signal Process., vol. 57, no. 4, 2000 0.5 pp. 1634–1637, 2009. [22] Q. Wen, Q. Zhou, and X. Ma, “An enhanced fixed-complexity LLL algorithm for MIMO detection,” in Proc. Global Telecommunications 0 0 /3 0 1 /3 1 1.2 1.4 1.6 Conference (GLOBECOM), Austin, TX, USA, pp. 3231–3236, 2014. [23] Q. Wen and X. Ma, “Efficient greedy LLL algorithms for lattice decoding,” IEEE Trans. Wirel. Commun., vol. 15, no. 5, pp. 3560–3572, (a) Dual bases. (b) Primal bases. 2016. [24] J. Wen and X. Chang, “Gfclll: A greedy selection-based approach for Fig. 10: The histogram of the pairwise angles for the dual and fixed-complexity LLL reduction,” IEEE Commun. Lett., vol. 21, no. 9, primal of 60 60 Gaussian random bases. pp. 1965–1968, 2017. × [25] D. Seethaler, G. Matz, and F. Hlawatsch, “Low-complexity MIMO data detection using seysen’s lattice reduction algorithm,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Honolulu, Hawaii, 10250 USA, pp. 53–56, 2007. [26] W. Zhang, X. Ma, and A. Swami, “Designing low-complexity detectors based on seysen’s algorithm,” IEEE Trans. Wireless Communications, 10200 vol. 9, no. 10, pp. 3301–3311, 2010. [27] Q. Zhou and X. Ma, “Element-based lattice reduction algorithms for large MIMO detection,” IEEE J. Sel. Areas Commun., vol. 31, no. 2, 150 10 pp. 274–286, 2013. [28] ——, “Improved element-based lattice reduction algorithms for wireless 100 communications,” IEEE Trans. Wirel. Commun., vol. 12, no. 9, pp. 10 4414–4421, 2013. [29] S. Lyu and C. Ling, “Sequential lattice reduction,” in Proc. 8th Int. Conf. Wirel. Comm. & Signal Process. (WCSP), Yangzhou, China, pp. 1050 1–5, 2016. [30] B. M. Hochwald and S. Ten Brink, “Achieving near-capacity on a 100 multiple-antenna channel,” IEEE Trans. Commun., vol. 51, no. 3, pp. 0 10 20 30 40 50 60 389–399, 2003. [31] S. Liu, C. Ling, and D. Stehlé, “Decoding by sampling: A randomized lattice algorithm for bounded distance decoding,” IEEE Trans. Inf. Fig. 11: The ODs of square Gaussian random bases. Theory, vol. 57, no. 9, pp. 5933–5945, 2011. [32] C. Ling, “On the proximity factors of lattice reduction-aided decoding,” IEEE Trans. Signal Process., vol. 59, no. 6, pp. 2795–2808, 2011. [33] L. Babai, “On lovász’ lattice reduction and the nearest lattice point [6] M. Taherzadeh, A. Mobasher, and A. K. Khandani, “LLL reduction problem,” Combinatorica, vol. 6, no. 1, pp. 1–13, 1986. achieves the receive diversity in MIMO decoding,” IEEE Trans. [34] P. Q. Nguyen and D. Stehlé, “Low-dimensional lattice basis reduction Information Theory, vol. 53, no. 12, pp. 4801–4805, 2007. revisited,” ACM Trans. Algorithms, vol. 5, no. 4, pp. 46:1–46:48, 2009. [7] ——, “Communication over MIMO broadcast channels using lattice- [35] J. C. Lagarias, H. W. Lenstra, and C.-P. Schnorr, “Korkin-Zolotarev basis reduction,” IEEE Trans. Inf. Theory, vol. 53, no. 12, pp. 4567– bases and successive minima of a lattice and its reciprocal lattice,” 4582, 2007. Combinatorica, vol. 10, no. 4, pp. 333–348, 1990. [8] X. Ma and W. Zhang, “Performance analysis for MIMO systems [36] Y. Aono and P. Q. Nguyen, “Random sampling revisited: Lattice with lattice-reduction aided linear equalization,” IEEE Trans. Commun., enumeration with discrete pruning,” in Proc. Advances in Cryptology - vol. 56, no. 2, pp. 309–318, 2008. EUROCRYPT, Paris, France, 2017, pp. 65–102, 2017. [9] P. Q. Nguyen and B. Vallée, Eds., The LLL Algorithm. Springer Berlin [37] Z. Wang and C. Ling, “On the geometric ergodicity of metropolis- Heidelberg, 2010. hastings algorithms for lattice gaussian sampling,” IEEE Trans. [10] R. Neelamani, R. L. de Queiroz, and R. G. Baraniuk, “Compression Information Theory, vol. 64, no. 2, pp. 738–751, 2018. color space estimation of JPEG images using lattice basis reduction,” in [38] J. Jaldén and B. Ottersten, “On the complexity of sphere decoding in Proc. IEEE Int. Conf. Image Process. (ICIP), Thessaloniki, Greece, pp. digital communications,” IEEE Trans. Signal Process., vol. 53, no. 4, 890–893, 2001. pp. 1474–1484, 2005. [11] A. Korkinge and G. Zolotareff, “Sur les formes quadratiques positives,” [39] A. Gionis, P. Indyk, and R. Motwani, “Similarity search in high Math. Ann., vol. 11, no. 2, pp. 242–292, 1877. dimensions via hashing,” in Proc. 25th International Conference on [12] D. Micciancio and S. Goldwasser, Complexity of Lattice Problems. Very Large Data Bases, VLDB’99, Edinburgh, Scotland, UK, pp. Boston, MA: Springer, 2002. 518–529, 1999. [13] A. K. Lenstra, H. W. Lenstra, and L. Lovász, “Factoring polynomials [40] A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approxi- with rational coefficients,” Mathematische Annalen, vol. 261, no. 4, pp. mate nearest neighbor in high dimensions,” in Proc. 47th Annual IEEE 515–534, 1982. Symposium on Foundations of Computer Science (FOCS), Berkeley, [14] M. Seysen, “Simultaneous reduction of a lattice basis and its reciprocal California, USA, pp. 459–468, 2006. 13

[41] P. Indyk and R. Motwani, “Approximate nearest neighbors: Towards Jian Weng received the B.S. and M.S. degrees in computer science from removing the curse of dimensionality,” in Proc. 30th ACM Symposium South China University of Technology, Guangzhou, China, in 2001 and 2004, on the Theory of Computing (STOC), Dallas, Texas, USA, pp. 604–613, respectively, and the Ph.D. degree in computer science from Shanghai Jiao 1998. Tong University, Shanghai, China, in 2008. He is currently a Professor and [42] T. Laarhoven, “Sieving for shortest vectors in lattices using angular the Executive Dean with the College of Information Science and Technology, locality-sensitive hashing,” in Proc. Advances in Cryptology - CRYPTO, Jinan University, Guangzhou, China. He has authored/coauthored 80 papers Santa Barbara, CA, USA, 2015, pp. 3–22, 2015. in international conferences and journals, such as CRYPTO, EUROCRYPT, [43] J. Jaldén, D. Seethaler, and G. Matz, “Worst- and average-case complex- ASIACRYPT, TCC, PKC, CT-RSA, IEEE TPAMI, IEEE TDSC, etc. His ity of LLL lattice reduction in MIMO wireless systems,” in Proc. IEEE research areas include public key cryptography, cloud security, blockchain, etc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Caesars Palace, Prof. Weng was a recipient of the Young Scientists Fund of the National Nat- Las Vegas, Nevada, USA, pp. 2685–2688, 2008. ural Science Foundation of China in 2018, and the Cryptography Innovation [44] D. Wübben, D. Seethaler, J. Jaldén, and G. Matz, “Lattice reduction,” Award from Chinese Association for Cryptologic Research (CACR) in 2015. IEEE Signal Process. Mag., vol. 28, no. 3, pp. 70–91, 2011. He served as the General Co-Chair for SecureComm 2016, TPC Co-Chairs [45] J. Choi, S. R. Kim, and I. Choi, “Statistical eigen-beamforming with for RFIDsec13 Asia, and ISPEC 2011, and program committee members for selection diversity for spatially correlated OFDM downlink,” IEEE more than 40 international cryptography and information security conferences. Trans. Vehicular Technology, vol. 56, no. 5, pp. 2931–2940, 2007. He also serves as an Associate Editor for IEEE TRANSACTIONS ON [46] P. Q. Nguyen and D. Stehlé, “Floating-point LLL revisited,” in Proc. VEHICULAR TECHNOLOGY. Advances in Cryptology - EUROCRYPT, Aarhus, Denmark, 2005, pp. 215–233, 2005. [47] X. Chang, D. Stehlé, and G. Villard, “Perturbation analysis of the QR factor R in the context of LLL lattice basis reduction,” Math. Comput., vol. 81, no. 279, pp. 1487–1511, 2012. [48] P. Xie, H. Xiang, and Y. Wei, “Randomized algorithms for total least squares problems,” Numerical Lin. Alg. with Applic., vol. 26, no. 1, 2019. [49] J. Wen and X. Chang, “Success probability of the babai estimators for box-constrained integer linear models,” IEEE Trans. Information Theory, vol. 63, no. 1, pp. 631–648, 2017. [50] X. Chang, J. Wen, and X. Xie, “Effects of the LLL reduction on the success probability of the babai point and on the complexity of sphere decoding,” IEEE Trans. Information Theory, vol. 59, no. 8, pp. 4915–4926, 2013.

Cong Ling (S’99-A’01-M’04) received the B.S. and M.S. degrees in electrical Shanxiang Lyu received the B.S. and M.S. degrees in electronic and infor- engineering from the Nanjing Institute of Communications Engineering, Nan- mation engineering from South China University of Technology, Guangzhou, jing, China, in 1995 and 1997, respectively, and the Ph.D. degree in electrical China, in 2011 and 2014, respectively, and the Ph.D. degree from the Electrical engineering from the Nanyang Technological University, Singapore, in 2005. and Electronic Engineering Department, Imperial College London, in 2018. He had been on the faculties of the Nanjing Institute of Communications He is currently a lecturer with the College of Cyber Security, Jinan University. Engineering and King’s College. He is currently a Reader (Associate Pro- His research interests are in lattice theory, algebraic number theory, and their fessor) with the Electrical and Electronic Engineering Department, Imperial applications. College London. His research interests are coding, information theory, and security, with a focus on lattices. Dr. Ling has served as an Associate Editor for the IEEE TRANSACTIONS ON COMMUNICATIONS and the IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY.

Jinming Wen received his Bachelor degree in Information and Computing Science from Jilin Institute of Chemical Technology, Jilin, China, in 2008, his M.Sc. degree in Pure Mathematics from the Mathematics Institute of Jilin University, Jilin, China, in 2010, and his Ph.D degree in Applied Mathematics from McGill University, Montreal, Canada, in 2015. He was a postdoctoral research fellow at Laboratoire LIP (from March 2015 to August 2016), University of Alberta (from September 2016 to August 2017) and University of Toronto (from September 2017 to August 2018). He has been a full professor in Jinan University, Guangzhou since September 2018. His research interests are in the areas of lattice reduction and sparse recovery. He has published around 50 papers in top journals (including Applied and Computational Harmonic Analysis, IEEE Transactions on Information Theory/Signal Processing/Wireless Communications) and conferences. He is an Associate Editor of IEEE Access.