Localizing Unsynchronized Sensors with Unknown Sources Dalia El Badawy, Viktor Larsson, Marc Pollefeys, and Ivan Dokmanic´

Home , Invertible matrix

Abstract—We propose a method for sensor array self- on whether or not the devices are synchronized. However, localization using a set of sources at unknown locations. The the most general and practically appealing setup is when sources produce signals whose times of arrival are registered the sources are at arbitrary, unknown locations. This enables at the sensors. We look at the general case where neither the emission times of the sources nor the reference time frames one to use signals of opportunity such as speech, transient of the receivers are known. Unlike previous work, our method sounds, or radio signals [10]. But it also means that receivers directly recovers the array geometry, instead of first estimating are no longer synchronized with the sources. To complicate the timing information. The key component is a new loss things further, receivers themselves do not have to be mutually function which is insensitive to the unknown timings. We cast synchronized. While in many audio applications, microphones the problem as a minimization of a non-convex functional of the Euclidean distance matrix of microphones and sources subject to are connected to a common interface, we are also surrounded certain non-convex constraints. After convexification, we obtain by ad hoc networks of smartphones and voice-based assistants. a semidefinite relaxation which gives an approximate solution; Thus, not only do we measure times of arrival as opposed to subsequent refinement on the proposed loss via the Levenberg- absolute times of flight, but those registered times of arrival Marquardt scheme gives the final locations. Our method achieves further depend on the unknown time reference frame of each state-of-the-art performance in terms of reconstruction accuracy, speed, and the ability to work with a small number of sources and receiver. In this paper, we introduce an algorithm for jointly receivers. It can also handle missing measurements and exploit localizing a set of receivers and a set of sources from measured prior geometric and temporal knowledge, for example if either TOAs in the general case when all nodes are not synchronized: the receiver offsets or the emission times are known, or if the sources go off at unknown times and the reference time frame array contains compact subarrays with known geometry. of each receiver can be different and unknown.

I.INTRODUCTION A. Related Work In MIMO radar [1], ultrasound imaging [2], acoustic imag- Among the earliest work on self-localization are the papers ing [3], underwater acoustics [4], and room acoustics [5], [6], of Rockah and Schultheiss [11], [12] and Weiss and Fried- a collection of sources emit signals that are then captured lander [13] from the 1980s. Plinge et al. [14] provide an in- by the receivers. In these applications, we often need an depth overview of microphone array localization techniques. accurate knowledge of the geometry of the source and receiver A systematization of existing approaches to self-localization positions to proceed with common array processing algorithms is also given by Wang et al. [15]. In the following, we review such as beamforming and source localization. Knowing the some of the major points, related to localization in different sensor array geometry similarly enables the reconstruction setups. of physical fields in environmental monitoring using ad hoc In some cases, the pairwise distance between all the nodes sensor networks [7], [8]. Further, knowing device locations can be estimated. This happens for example when the nodes enables a host of location-based services in the context of the can both send and receive [16] or by measuring the diffuse Internet of Things [9]. The recurring quintessential problem is noise coherence [17]. Localization then amounts to multidi- arXiv:2102.03565v1 [eess.SP] 6 Feb 2021 thus to efficiently estimate the geometry of a set of nodes. mensional scaling (MDS) [18], [19], [20], [21]. We study estimating the geometry of the nodes from the A more common situation in audio applications is that the times of arrival (TOAs) of the source signals. The setup is nodes can only receive or only send. The “sending” nodes need illustrated in Figure 1 where receivers register TOAs of source not be real devices; they can be any acoustic events or signals events. Some events can be (near-)collocated with the nodes, of opportunity. We can distinguish the case when the sources such is the case with transceivers. If the sources are anchors and the receivers are synchronized so we can estimate the with known positions, locating the nodes becomes an exercise source–receiver distances, or the various cases when sources, in geometry: intersecting spheres or hyperboloids depending receivers, or both sources and receivers are not synchronized. a) Known Source Emission Times and Receiver Offsets: In line with the philosophy of reproducible research, all code and data If the emission times happen to be known and the nodes and to reproduce the results of this paper are available at http://github.com/ swing-research/xtdoa. events all have a common time reference (that is, they are syn- Work done while D. El Badawy was a student at EPFL, Switzerland. chronized)1, then the TOAs correspond to times of flight and V. Larsson is with the Department of Computer Science, ETH Zurich. directly give the distances between the nodes and the events. M. Pollefeys is with the Department of Computer Science, ETH Zurich and Microsoft Mixed Reality and AI Lab Zurich. Given these distances, the joint localization problem reduces I. Dokmanic´ is with the Department of Mathematics and Computer Science, University of Basel. 1Additionally, the internal delays of the receivers are known. 2

3 1

2 3

Fig. 1. Setup. Source (people or loudspeakers) events are captured by microphones. The times of arrival tmk at each microphone depend on the source emission times τk as well as the microphone’s own time of reference. Some entries might be missing. to multidimensional unfolding [22], [23]. Some methods are squares [32] to reconstruct the times.2 based on direct optimization of the maximum likelihood (ML) c) Unknown Emission Times and Receiver Offsets: The criterion [13], but these often fail due to the non-convexity closest to our work is the method of Wang et al. [15] who treat of the objective and the existence of bad local minima [15]. the most general case with unknown source emission times and Other methods are based on Euclidean distance matrices and receiver offsets. Same as the methods that address the case semidefinite programming (SDP) [24]. of one unknown set of offsets [32], Wang et al. proceed in Crocco et al. [23] proceed by constructing a certain low- two stages. The timing information is estimated via structured rank matrix of differences of squared TOAs from which the least-squares (Gauss–Newton), noting that with the correct positions can be recovered by solving a low-dimensional non- timing estimates a certain matrix computed from the measured convex optimization problem. The low-rank matrix of differ- T(D)OA measurements must become low rank. This timing ences of squared TOAs [23] is related to the (translated) source estimation procedure may require numerous random restarts. T positions S and receiver positions R as H ≈ R¯ S¯ , where Once the timings are estimated, they proceed as Crocco et al. the latter is the matrix of inner products between centered [23] to recover the geometry. receiver points and centered source points. Since we have that d) Convex Relaxations: In the context of sensor network T H = R¯ QQ−1S¯ for any invertible matrix Q, finding the localization (with fixed anchor nodes), Biswas et al. [33], correct relative geometry amounts to identifying the right Q [34] proposed to use SDP relaxations to handle the non- and the difference vector between the center of R and the convexity introduced by the square root in the distance center of S. In our approach, we avoid this step by working measurements. Similar relaxations have since also been used with the full point set X = [R; S] and the corresponding for source node localization from TDOA measurements (see Gram matrix, instead of splitting X into R and S. The Gram e.g. Yang et al. [35], Vaghefi and Beuhrer [36]) and mixed matrix and its constraints then automatically glue together the TDOA/FDOA measurements (Wang et al. [37]). Yang et al. sources and the receivers (see Section III-A). [35] also proposed a SOCP relaxation; however, it requires that the target node lies within the convex hull of the anchors. b) When One Set of Times is Known: A more realistic Jiang et al. [38] propose to use Truncated Nuclear Norm scenario is when the source emission times are unknown and Regularization (TNNR) to solve the TDOA self-calibration different but the microphones are synchronized. The reverse problem. Their optimization scheme consists of solving a case is also relevant: when the receivers are not synchronized sequence of convex surrogate problems based on the (convex) but the source emission times are known. This is the case if nuclear norm. We propose SDP relaxations for the full self- we use a distributed sensor array. calibration problem, where both senders are receivers are The two unknowns (geometry and one set of times) can unknown and not synchronized. be estimated by directly optimizing an objective, for example e) Minimal Estimation Problems: A series of work using majorization [25]. Again, this often fails due to the non- considers minimal problems in TOA, where the goal is to convexity of the problem and cannot work with near-minimal estimate the unknowns from as few measurements as possible. configurations. Following up on their work on minimal problems for TOA measurements [39], Kuang et al. [40] propose a stratified An alternative approach is to estimate the unknowns sequen- approach for estimating the unknown time offsets from TDOA tially, first recovering the unknown times, and then using them measurements. Once the offsets are recovered, the solvers to recover the geometry [26], [27]. Early work of Pollefeys and from their previous work [39] can be applied to recover the Nister [28] exploits the low rank of a certain matrix of squared sender and receiver positions. Zhayida et al. [41] (and later TOA differences. Their work is a near-field generalization of the work of Thrun [29], which was also adapted to work 2Heusdens and Gaubitch in fact address the case when the microphone with missing measurements [30], [31]. Heusdens and Gaubitch delays are unknown and the source is periodic; this is equivalent to only propose a more robust scheme based on structured total-least- having unknown microphone delays; see Section IV-C. 3

Farmani et al. [42]) considered the minimal solutions to the emit pulses whose times of arrival (TOA) can be measured special case of dual microphone setups. Burgess et al. [43] at the receivers.3 Since we do not assume synchronization proposed solutions for settings where either the sender or the between the sources and the receivers, the absolute emission receivers lie in a lower dimensional space. Batstone et al. [44] time τk of the kth source is unknown. Since the sources are not consider the case of constant offset TDOA self-calibration necessarily synchronized among themselves, the differences (i.e. transmissions have a known period but unknown offset); τk − τk0 are also unknown. Similarly, since we do not assume this is also a stratified approach which first solves for the synchronization among receivers (nor knowing their internal unknown time offset [40]. We show that our approach can delays), the temporal frames of reference of each receiver are also succeed in near-minimal configurations. shifted by an unknown σm with respect to some reference clock. The times of arrival of the kth source signal at the mth B. Contributions receiver thus become −1 Our work is motivated by the fact that the two-stage tmk = v k~rm − ~skk + σm + τk, (1) procedure is suboptimal. The (valid) reason to adopt sequential estimation are poor local minima in the loss function which where for simplicity we let both σm and τk have arbitrary involves both the times and the positions. From a statistical sign. We assume the transmission speed v to be known and point of view, however, joint estimation is preferred. Further, w.l.o.g. let v = 1 for the remainder of the paper. timing estimation can fail or require many random restarts; Note that if we instead use the time-difference-of-arrival the same holds for the non-convex optimization to recover the (TDOA), then with the first sensor as the reference, the TDOA T relative configurations from R¯ S¯ . The two-stage procedure is makes it challenging to exploit prior geometric information t˜mk = tmk − t1k such us known distances or distance bounds. = k~r − ~s k + (σ − σ ) − k~r − ~s k In light of related work and the above discussions, we m k m 1 1 k summarize our contributions as follows: = k~rm − ~skk +σ ˜m +τ ˜k. • What we are usually interested in are the sensor po- And so we can still think of the TDOA t˜mk as a TOA but sitions. We thus formulate a new self-localization loss with modified emission times and offsets. which is invariant to offsets and delays. The proposed loss uses non-squared distances as opposed to the usual A. Minimal Number of Sources and Receivers squared; we prove that minimizing this objective yields the maximum likelihood estimate of the positions under Common sensor localization scenarios are either in the Gaussian noise. It enables us to recover the geometry horizontal plane (d = 2) or in 3D space (d = 3). The number without worrying about the times and without random of the degrees of freedom in the positions of sources and restarts. receivers is then d(M + K). However, since rigid motions • We formulate a non-convex semidefinite optimization of the entire setup cannot be distinguished from TOA data, problem in terms of this new loss. We work with the we can choose the associated d(d + 1)/2 degrees of freedom d full point set X = [R; S] and the corresponding Gram freely (d translational and 2 rotational). Each source and matrix. The Gram matrix and its constraints intrinsically receiver come with an additional unknown time, which gives glue together the sources and the receivers and thus another M +K degrees of freedom. However, we can choose a obviate the need for the non-convex optimization step global time offset arbitrarily, which is another gauge invariance of [23] (see Section III-A); and subtracts one degree of freedom. The total number of the • Although our method does not require synchronization, it degrees of freedom is thus can leverage synchronization among the receivers or the #(DOF) = (d + 1)(M + K − d ) − 1 sources if it is fully or partially present [32], [45], [46]. It 2 can further leverage geometric side information such as As measurements we get MK TOAs. In order to get a when the receivers contain subarrays with known relative dimension-zero solution set (a solution set that is a finite geometry or some sources and receivers are collocated. or countable set of points, but not necessarily unique), this Finally, it can handle missing measurements. number should at least match the number of the degrees of The proposed method achieves state-of-the art results, not freedom. Solving for the minimal number of sources as a only in terms of localization accuracy, runtime, or exploiting function of the number of receivers, we obtain the following side information, but also in the ability to solve near-minimal relation (d + 1)M − d(d + 1)/2 − 1 problems with few nodes. Finally, we note that as suggested K ≥ . by Ono [25], our method can also be interpreted as a virtual M − (d + 1) synchronization method. For d = 3 this gives K ≥ (4M − 7)/(M − 4) implying that the smallest number of receivers that can be localized is II.PROBLEM STATEMENT M = 5 for which at least K = 13 sources are required to

We wish to localize a set of M receivers with unknown 3 d Note that these measurements can be obtained without physically emitting positions ~r1, . . . , ~rM ∈ R using a set of K sources with un- pulses. In audio, for example, one often emits long chirps which is followed d known positions ~s1, . . . ,~sK ∈ R . We assume that the sources by pulse compression. 4 get a dimension-zero solution. Some other minimal cases are: Every matrix in the argument of span correspond to adding (M = 7,K = 7), (M = 13,K = 5). a constant offset to either a source or a receiver. As a As we demonstrate with real and numerical experiments in consequence, any nullspace vector can be written as a linear Section VI, our algorithms can operate in the vicinity of this combination of source / receiver offsets which proves the minimal regime, where previous methods either fail or perform proposition. poorly. What Proposition 1 says is that J M ∆J K is sufficient to determine the positions in the sense that the only locations III.ALOSS INSENSITIVETO OFFSETS AND DELAYS such that this matches measurements are the true ones (if the Instead of proceeding sequentially by first estimating the original problem is solvable). unknown times as in the case of the state-of-the-art methods We can then propose the following timing-invariant objec- [32], [15], we note that the primary task is usually position tive for localization estimation rather than timing estimation. Besides, when the M 1 2 positions are known, estimating the timings {σm}m=1 and min kJ M (∆(R, S) − T)J K k , (5) K R,S 2 F {τk}k=1 boils down to a simple algebraic exercise (see Section III-D). We thus devise a data fidelity metric which is insensi- where k · kF denotes the Frobenius norm. This objective is tive to the unknown timings, yet is only minimized with the justified by the following direct corollary of Proposition 1. correct positions. Corollary 1. If the unsynchronized localization problem has a We begin by writing (1) in matrix form as unique solution (up to a rigid transformation), then vanishing T T T = ∆ + ~σ1K + 1M ~τ , (2) loss in (5) implies correct localization. T where ∆ = (δmk) and δmk = k~rm − ~skk, ~σ = [σ1, . . . , σM ] Further, the loss (5) is in fact equivalent to the ML loss T and ~τ = [τ1, . . . , τK ] . Now we make the key observation: given the offsets ~σ and ~τ. Namely, assuming i.i.d. Gaussian multiplying (2) on both sides by a matrix which has an all- noise on the TOA measurements, we have ones vector in the nullspace will annihilate the two terms that f (∆, ~σ, ~τ) = k∆ + ~σ1T + 1 ~τ T − Tk2 , depend on the unknown times. ML K M F Given an integer L, let J L be a geometric centering matrix and the loss in (5) is simply of size L, 4 1 T f(∆) = min fML(∆, ~σ, ~τ). J L := I L − L 1L1L. (3) ~σ,~τ ~ T ~T To see this, note that It is easy to verify that J M 1M = 0M and 1K J K = 0K so that the matrix T T ∇~σfML = 2(∆ + ~σ1K + 1M ~τ − T)1K = 0 (6) 1 P = J M TJ K = J M ∆J K (4) ? T =⇒ ~σ = − T (∆ + 1M ~τ − T)1K . (7) 1 1K does not depend on the unknown times. In fact, we can say K more. Inserting into the original cost, we get ? T 1 T 2 Proposition 1. We have J M T 1J K = J M T 2J K if and only fML(∆, ~σ , ~τ) = k(∆ + 1M ~τ − T )(IK − 1K 1K ) kF . 0 0 0 T 0T K if there exist ~σ and ~τ such that T 1 = T 2 + ~σ 1K + 1M ~τ . | {z } JK In particular if T 1 = ∆. Solving for ~τ ? similarly yields f (∆, ~σ?, ~τ ?) = f(∆). Remark: If ∆ happened to be a Euclidean (squared) distance ML 2 matrix (EDM) of a point set X , for example D = (dij), dij = 1 A. Characterization in Terms of the Full Gram Matrix k~xi − ~xjk, then − 2 J N DJ N would equal the Gram matrix of the (centered) X . No similar statement can be made here As described in Section I-A, a number of state-of-the-art simply because ∆ holds non-squared distances, and because methods based on low-rank matrix decompositions recover the T it only corresponds to one off-diagonal block of D. matrix H = R¯ S¯ , with R¯ and S¯ being some translated (centered) versions of R and S [23], [32], [15]. Matrix Proof. The nullspace of the linear operator factorizations such as singular value decomposition can only M×K M×K ¯ ¯ J : R → R determine R and S up to a multiplication by an arbitrary J invertible matrix Q, since for any such Q we have that A 7→ J M AJ K T H = Rb Sb, with Rb = QT R, Sb = Q−1S. Finding the right is spanned by Q (which can be assumed to be upper-triangular which fixes the rotational gauge freedom) and the (one) translation vector N (J ) = span{1 ~e>,..., 1 ~eT ,~e 1T , . . .~e 1T }. M 1 M K 1 K M K is then achieved via direct non-convex optimization, though Note that not all of those MK matrices are linearly inde- over a lower dimensional space than the original problem. pendent, but the argument does not hinge on independence. We sidestep this problem by writing the measurements in terms of the full point set 4We often indicate matrix and vector sizes in subscripts. While making the d×N notation a bit clunkier, it helps keep track of dimensions. X := [R, S] ∈ R , 5

2 where N = M+K, and always working with the entire X . We We can equivalently write the constraint bmk = L(G)mn denote the corresponding full Gram matrix by G = X T X ∈ as N×N . The off-diagonal blocks of the Gram matrix G encode R L(G) b the relative geometry between R and S—by recovering G, L := mk mk ~0 (11) mk b 1 we know that relative geometry. mk The matrix of squared distances (the EDM) between the bmk ≥ 0 (12) column vectors in X can be written as a linear function of rank(Lmk) = 1. (13) 2 N the Gram matrix. Letting D = (k~xn − ~xn0 k )n,n0=1, one has All the non-convexity is now lumped into the rank con- T D = D(G) := diag(G)1N − 2G + 1N diag(G), (8) straints, rank(G) = d and rank(Lmk) = 1 for 1 ≤ m, k ≤ M,K. The final step is to relax all of the rank constraints to T with diag(G) = [g11, . . . , gNN ] . This means that the matrix get of distances ∆ between sources and receivers can be written 2 as an entrywise square root of a linear function of G, since min kJ M (B − T)J K k (14a) G,B F •2 def ∆ = L(G) = S row D(G) S col, (9) subject to G 0, (14b) G1 = ~0, (14c) where S := [I , 0 ] and S := [0 , I ]T are N row M M×K col K×M K ~ the appropriate row and column selection matrices, and ( · )•2 Lmk 0, for all 1 ≤ m, n ≤ M,K, (14d) denotes an entrywise square. In view of Proposition 1, this B ≥ 0. (14e) leads us to the first reformulation of the original problem: 2 p• We can finally reconstruct the point set from the re- min J M ( L(G) − T)J K (10a) G F covered G using singular value decomposition. We have G 0, > subject to (10b) G = U ΛV , where Λ = diag(σ1, . . . , σN ) with G1N = ~0, (10c) the singular values σi assumed to be sorted in decreas- ing order. We reconstruct the point set using Xˆ = rank(G) = d. (10d) > [diag(σ1, . . . , σd),~0d×(N−d)]V . If the rank of the estimated The constraint (10c) resolves the translation ambiguity; it is G is truly d, as it ideally should be since it describes a d- equivalent to the point set being centered around the origin dimensional point set, then the trailing singular values would 1 P be zero anyway. N n ~xn = 0. Note that now the receivers and sources are being translated together, unlike in earlier work where they were split. C. Refinement by Levenberg-Marquardt Further, the estimated locations are directly read out from a factorization of G, without the need for stitching. In general, solving the above semidefinite program gives point configurations which are good coarse approximations to the true geometry, although they are not good enough for B. Semidefinite Relaxation most applications. The reason can be tracked down to the The formulation (10a) remains nonconvex. Thus, direct fact that after relaxing the rank constraints, the estimated approaches such as randomly-initialized first-order methods matrices have higher rank than desired. Still, the positive (e.g., projected gradient descent) are likely to get stuck in a semidefinite constraints constrain geometry sufficiently to get local minimum. We thus propose to instead solve a convex decent estimates. One strategy would be to employ various relaxation of the problem. rank minimization strategies. Informally, we tried this and it There are two sources of nonconvexity: the entrywise square works decently but it is very slow. root in the objective, and the rank constraint on the Gram ma- Instead, we propose to refine the output of the SDR using p• trix. We begin by replacing the entrywise square root L(G) the Levenberg-Marquardt (LM) algorithm. It is fast, accurate, by a new matrix variable B, and adding the constraint that the and it works better. We also note that simple gradient descent entrywise square of B must equal L(G). is much slower. Concretely, we write the objective as To derive the LM updates, we need to compute the Jacobian

2 of the loss in order to linearize it. The loss min kJ M (B − T)J K kF , G,B 2 J (R, S) = kJ M (∆(R, S) − T)J K kF and ask that B •2 = L(G) to obtain an equivalent formulation. We now proceed by reformulating the added constraint as can be written as a rank constraint on some semideﬁnite matrix (or matrices). J (R, S) = kf(R, S)k2 , Since the entrywise square does not mix the entries of B, 2 we can add such constraints on a per-entry basis. This is where f(R, S) = A~δ(R, S) − ~t, computationally more efﬁcient (and empirically works better) T ~ than working with the entire vectorization of B at once. A = J K ⊗ J M , t = A vec(T) 6

~ d×(M+K) M×K and δ = vec(∆). Since ∆ : R → R , the with ⊗ denoting the Kronecker product, V := 1K ⊗ I M , M×K×d×(M+K) Jacobian is a tensor in R . We can compute W := I K ⊗ 1M . From here, estimating the times vector  amounts to solving an overdetermined system of linear equa- ~rm − ~sk  ` = m, tions. Note that the system matrix has a nullspace of dimension k~r − ~s k  m k one (spanned by α1) due to the global offset ambiguity. To [D∆] = ~rm − ~sk m,k,:,` − ` = M + k, eliminate the offset ambiguity we arbitrarily set σ = 0, and  k~r − ~s k 1  m k let V 0 be V with the first column removed. The least-squares 0 otherwise. estimate of the unknown timings is then given as with the understanding that the first M slices in the last index " 0# ~σ † of D correspond to the receivers and the last K to the sources. b = V 0 W ~e, (15) To make things easier to compute with, we rewrite T as a ~τb function of a single vector 0 T † with ~σb = [σ2,..., σM ] and ( · ) denoting the Moore- vec(R) b b ~x = ~x(R, S) = , Penrose pseudoinverse. vec(S) and rearrange the output into a vector so that ~δ(~x) = IV. VARIANTS vec(∆(R, S)). The Jacobian D~δ is obtained from DT by In many cases, we have some prior information about the collapsing the first two dimensions and the last two dimensions distances or offset times. The proposed method makes it so that D~δ ∈ RMK×d(M+K). Finally the Jacobian of f is straightforward to leverage this prior knowledge. In the follow- computed as ing, we explain how by describing several typical scenarios. Df = A D~δ, and we have an affine approximation of f as (with some abuse A. Localization of Subarrays of notation) Suppose that some distances are known. It is often the case that the set of receivers consists of several compact subarrays f(~x; ~x0) = f(~x) + Df(~x)(~x − ~x0). with known geometry (voice-based assistants, smart phones). This leads to the LM update, These compact subarrays are then distributed in the space of 2 2 interest at unknown positions and with random orientations, (k+1) (k) (k) ~x = arg min f(~z; ~x ) + λ ~z − ~x together with discrete microphones. ~z Since the EDM is a linear function of the Gram matrix, which is solved by this prior knowledge corresponds to simple linear constraints ~x(k+1) = ~x(k) − (Df(~xk)T Df(~xk) + λI )−1Df(~xk)T f(~x(k)). on G. We thus augment the original program with a set of known inter-sensor distances for microphone pairs The full localization algorithm is summarized in Algorithm 1. M = (m1, m2): m1 < m2,

distance dm1m2 between ~rm1 and ~rm2 known , Algorithm 1 Unsynchronized Sensor Localization. Then we add the following set of constraints to our optimiza- Require: TOA T ∈ M×K R tion: Ensure: Positions of receivers R ∈ Rd×M and sources S ∈ d×K D(G) = d2 , (m , m ) ∈ M, m 6= m . R m1m2 m1m2 for all 1 2 1 2 Initialize R and S using SDR (14). (16) Update R and S using LM. If the set of receivers is partitioned as

R = [R1, R2,..., RJ ],

D. Estimating Offsets and Delays with the distances within the jth subarray Rj known, these constraints correspond to knowing the diagonal subblocks of Once an estimate ∆b of ∆ is computed, we can also compute the upper-left block of D. the unknown offsets ~τ and delays ~σ. We start by noting from We can again proceed with an LM reﬁnement step. In this (2) that T T case, we consider the following objective T − ∆ = ~σ1K + 1M ~τ , min kf(θ)k2 (17a) where the right hand side is linear in (~σ, ~τ). We exploit it by θ vectorizing both sides: subject to g(θ) = 0, (17b) (17c) ~e := vec(T − ∆) = 1K ⊗ ~σ + τ ⊗ 1M = V ~σ + W ~τ where g : Rd(M+K) 7→ R|M| is the known-distances residual ~σ deﬁned as = VW , ~τ g (X ) = (X , X ) − d2 m1m2 edm m1m2 m1m2 7

2 with edm(X , X )m1m2 = k~xm1 − ~xm2 k being the entries C. Sources with Known (Constant) Offset of the EDM of the point set X . We solve (17) using the As a last example of prior information that is easily handled, Augmented Lagrangian algorithm [47]. The corresponding we mention the practically relevant case where the source is update is given in Appendix A. for example a smartphone emitting periodic pulses. More gen- In some cases the distances may not be known exactly, but erally, the source is emitting signals at known (not necessarily we might have access to good bounds. We once more leverage regular) intervals. The phone is not synchronized with the the linearity of D in the Gramian by adding the following receivers, so the emitting times are strictly speaking unknown, linear constraints to the program, but only up to one unknown offset—the start time of the 2 ¯2 emissions. This case has been studied in [32], [44]. d ≤ D(G)m m ≤ d . (18) m1m2 1 2 m1m2 We show how to incorporate this using our proposed d¯ d method. We have that where m1m2 and m1,m2 are upper and lower bounds on the distance dm ,m . 1 2 τk = τo + δk, Finally, we note that priors might be available not only for inter-receiver distances but also for inter-source distances with δk is the known offset of the kth emission with respect and the distances between the sources and the receivers (we to the unknown time τo. When the emissions are periodic, we often have a coarse idea about how far the source events are have δk = k · δ, but for simplicity we keep the notation more from the microphones). These can be added to the constraints general. completely analogously. Note that in general, the vectors ~σ and ~τ can never be recovered uniquely, unless one of the times is known. The reason for this is that the global time reference can be decided B. When One Set of Times is Known arbitrarily (we used this fact when counting the degrees of When one set of timings is known (either the source emis- freedom in Section II-A). sion times or the receiver offsets), it would still be possible It then follows that the case of sources with known offsets is to use the general formulation with the derived loss (10a) mathematically equivalent to the case when only the receivers designed for the case when both sets of times are unknown (2). offsets are unknown. Concretely, first write Though that would require more measurements than necessary ~τ = τ 1 + ~δ, given such prior knowledge as we show next. o Both cases (unknown emission times or unknown offsets) where ~δ is a known vectors. Then note that are captured by the simplified measurement model T T T ~> ~σ1K + 1M ~τ = (~σ − τo1)1K + 1δ . T T = ∆ + 1M ~τ , (19) Since 1~δ> is known it can be absorbed in the measurements; up to a transposition as necessary (exchanging the role of the one unknown offset τo can then be absorbed in receiver sources and receivers). From here, it follows that the influence offsets. of the timings can be eliminated simply by one left multiplication by J M so that D. Missing Measurements 0 P = J M T = J M ∆ One upshot of our formulation is that it allows localization even when some sensor–receiver measurements are unavail- does not depend on ~τ. We can thus replace the loss (10a) by able. We denote the set of missing entries by 2 th kJ M (B − T)kF . (20) M = (m, k): distance between m receiver and kth source is missing . To see how this reduces the minimal required number of measurements (sources or sensors), recall the degrees-of- We then replace the objective (14a) with freedom count from Section II-A. Concretely, 2 X > d+1 min J M (B − T + αmk~em~ek )J K , (21) #(DOF) = d(M + K − ) + M G,B,{αmk} 2 (m,k)∈M F gives where ~en denotes the canonical basis, and the coefficients αmk (d + 1)(2M − d) account for the missing entries. K ≥ .   2(M − d) vec(R) The LM refinement then proceeds with θ = vec(S) ∈ 4M−6 For d = 3, this becomes K ≥ M−3 , so that the minimal vec(α) number of receivers is now M = 4 and the minimal cases Rd(M+K)+|M|. The Jacobian now includes the derivative with for the dimension-zero solution are M = 4,K = 10, M = respect to α as well 5,K = 7, M = 6,K = 6, and M = 9,K = 5. Another way ( to see this is by noting that the operator A 7→ J M A has a 1 (m, k) ∈ M, i = m, j = k, [Dαf]mk,ij = (22) smaller nullspace than A 7→ J M AJ K . 0 otherwise. 8

V. NUMERICAL SIMULATIONS to the required number of iterations and did not improve the In this section, we evaluate our approach on synthetic data. results in the case M = K = 7. We implemented our algorithms in Matlab and used the CVX The localization results comparing our approach (SDR+LM) package for specifying convex programs [48], [49] with the to the two-stage baseline are shown in Figure 2. As can be SeDuMi solver. For reverberation simulations, we use the seen, our timing-invariant approach outperforms the two-stage Pyroomacoustics [50] package in Python. In the following, method, especially in the near-minimal configurations. As for we describe the setup and present a series of results for the runtime, we show in Figure 3 a comparison of the average different scenarios showcasing the accuracy and flexibility of runtime over 20 runs as we increase the number of sensors and −5 our method. receivers for a fixed noise standard deviation σ = 10 . The runtime of our approach is consistently less than 5 seconds, whereas the runtime for the two-stage approach increases A. Setup significantly with the number of sensors. Thus, our timing- We consider an equal number of sensors and sources invariant approach avoids both having timing-estimation errors M = K in the range of 7 to 12. For each pair, we generated that propagate to the position estimation as well as the need 200 configurations of points randomly distributed in a volume for multiple random initializations that increase the runtime. of dimensions 10 m × 10 m × 3 m. The time offsets are We now evaluate the case when the microphones are syn- uniformly generated in the range [−1, 1] s. The TOAs were chronized but the source emission times are still unknown. finally corrupted by zero-mean random Gaussian noise with The results are shown in Figure 4 along with the results of standard deviation σ ∈ {10−3, 10−4, 10−5, 10−6,, 0} s or the fully unsynchronized case. As can be seen, the localization equivalently σ ∈ {34.3, 3.4, 0.3, 0.03, 0} cm. The parameters performance with partial synchronization is better, especially are summarized in Table I. for (M = 7,K = 7) and (M = 8,K = 8). Next, we test the case when the inter-microphone distances TABLE I are known, but the whole setup is still fully unsynchronized. PARAMETERSFORTHESETUPANDALGORITHMS. The localization errors are shown in Figure 4. As would be Dimension d = 3 expected, having side information significantly improves the Volume 10 m × 10 m × 3 m localization performance. M sensors 7 to 12 For the last experiment, we attempt localization when some K sources 7 to 12 Time offset [-1,1] s TOA data is missing. We test a range of missing entries Speed of sound 343 m/s from 4% to 20% of the total. Figure 5 shows the results for Solver SeDuMi M = K = 12 where we also compare the performance to LM iterations At most 1000 the complete data case. We can see that we are still able to localize when few entries are missing. However, as we have found some intrinsically bad configurations that cannot B. Evaluation be accurately localized with complete data, it is expected that The reconstructed points are first optimally aligned using the problem would be exacerbated when some TOA entries Procrustes analysis [51]. The localization error between the are further missing. true [R, S] and reconstructed [Rˆ , Sˆ ] points is then In summary, our approach (SDR+LM) outperforms the two- stage baseline (Wang et al [15]) in terms of localization M K P ˆ P ˆ error and runtime. Additional side information such as partial ~rm − ~rm + ~sk − ~sk m=1 synchronization or knowledge of some distances can be easily E = k=1 . (23) rs M + K accommodated in our formulation, and as expected improve We present the results using box plots depicting the median, the localization results compared to the fully unsynchronized first quartile, and third quartile. The whiskers correspond to case. 1.5 times the inter-quartile range. Values over or below this are shown as outliers. D. Reverberation Results To facilitate comparison, we will also present the median We now evaluate the localization performance of our ap- and the 95% confidence interval. In all figures, we clip the proach in the presence of reverberation. errors to a minimum of 10−3 which we assume is sufficient 10 × 10 × 3 M = for most applications. We consider a room of dimensions . The K = 12 sensors and sources are randomly placed in the room. The sources are random Gaussian noise and do not overlap C. Results such that the segmentation at each microphone is known. The As a baseline algorithm, we choose to compare to the two- SNR is set to 15 dB. stage approach of Wang et al. [15]. We consider two aspects in We vary the reverberation time by adjusting the wall our comparison: localization error and runtime. Note that we absorption coefficients and maximum order for the image- only report the results of Wang et al.’s approach without the source method. For each reverberation time, we simulate 20 further third-stage refinement using Ono et al’s [25] algorithm. realizations. While the desired reverberation times are 0 s, We had observed that the refinement takes a long time due 0.1 s, 0.2 s, 0.3 s, 0.4 s, and 0.5 s, the actual reverberation 9

Fig. 2. Localization results for an unsynchronized setup comparing our approach (SDR+LM) to the two-stage baseline (Wang et al [15]). Results are shown at different noise levels and for M = K in the range of 7 to 12.

TABLE II PARAMETERS FOR THE REVERBERATION SIMULATION.

Room dimensions 10 m × 10 m × 3 m M=K sensors and sources 12 Time offset [0,2] s Source duration 0.5 s Speed of sound 343 m/s Sampling rate 48000 Hz Reverberation time 0 to 0.5 s SNR 15 dB

this affects the localization errors as shown in Figure 6b. Our timing-invariant approach still outperforms the two-stage Fig. 3. Average runtime of our approach (SDR+LM) compared to the two- baseline. stage baseline (Wang et al [15]). Results are for M = K in the range of 7 to 12. The noise standard deviation is fixed σ = 10−5. The runtime of our XPERIMENTS approach is consistently less than 5 seconds. VI.E In this section, we evaluate our approach on real data [44] recorded in an office with most of the furniture removed. times obtained in simulation are slightly different but close. A loudspeaker played a chirp from 65 positions. We have The parameters are summarized in Table II. access to the ground truth positions for 12 microphones The TDOA are estimated using the method of Yamaoka et measured using a laser, the 12 × 65 TOA matrix, and a mask ˆ al [52]. The error between the true tmk and estimated tmk indicating the positions of outlier TOA entries. While the TOA TDOA is calculated as was extracted from the recordings knowing the chirp, our M K algorithms also work for TDOA matrices which could have P P ˆ |tmk − tmk| been extracted without any assumption on the source as shown E = m=1 k=1 . (24) t MK in Section V-D. We then use the estimated TDOA as input to our localization algorithm as well as the two-stage baseline of Wang et al [15]. A. Setup The results are shown in Figure 6. In Figure 6a, we show Figure 7 shows the TOA matrix, outlier positions, and the the median TDOA estimation error. As expected, the errors true microphone positions. A total of 23 loudspeakers provide are larger as the reverberation time increases. Subsequently, clean data without outliers. Thus, for the first experiments, 10

Fig. 4. Median localization errors for an unsynchronized setup in different scenarios. Results are shown at different noise levels and for M = K in the range of 7 to 12. In the case without any synchronization, our approach (SDR+LM) outperforms the two-stage baseline (Wang et al [15]). Side information such as partial synchronization or knowledge of some distances improves the results.

Fig. 5. (Missing Data.) Median localization errors for M = K = 12 in the case of missing TOA entries at different levels of noise. Between 4% and 20% of entries are missing. The whole setup is unsynchronized. The results are plotted against those without any missing data. 11

(a)

(b)

Fig. 6. (Reverberation.) Results as the reverberation time increases. (a) Median TDOA estimation error. (b) Median localization error. Our timing- invariant approach outperforms the two-stage baseline. (b) we will use the 12 × 23 subset of TOAs. Then, for the full Fig. 7. (Real Data.) Data from a real experiment carried out in an ofﬁce 12 × 65 TOAs, we will use our missing data approach to environment. (a) True positions of 12 microphones. (b) Mask showing outliers in the TOA data. handle the outliers. We report results over 200 runs where we randomly choose a subset of the loudspeakers to localize the 12 microphones. Since we only have the ground truth for the microphone positions, we calculate the localization error between the true R and reconstructed Rˆ microphone positions as M P ˆ ~rm − ~rm E = m=1 . (25) r M

B. Results We first test our approach on subsets of the 12 × 23 TOA matrix that is outlier-free. We also compare to the Wang et al. [15] two-stage baseline. In Figure 8(a), we show the errors for localizing all 12 microphones using a varying number of loudspeakers from 6 (the minimal case for M = 12) to 11. We also show the average errors in Figure 8(b). Once again, our approach significantly outperforms the two-stage baseline. Also similar to what we observed in the numerical simulations, Fig. 8. (Real Data.) Average localization errors for localizing 12 microphones while using more loudspeakers improves the average error, for using 6 to 11 sources. We compare our approach (SDR+LM) to a two-stage approach (Wang). The input TOA matrix is a subset of the 12×23 outlier-free each K, there are a number of large errors corresponding to TOA matrix. intrinsically difficult configurations that have attractive non- global minima. However, the minimum error is consistent for all K and is less than 5 cm. 12

APPENDIX A LOCALIZATION OF SUBARRAYS The LM refinement proceeds by iteratively minimizing the augmented Lagrangian over θ kf(θ)k2 + g(θ)>~z + µ kg(θ)k2 , (26) where ~z is the Lagrange multiplier and µ > 0. Minimizing (26) is equivalent to minimizing [47] kf(θ)k2 + µ kg(θ) + ~z/(2µ)k2 . (27) Since (27) is nonlinear in θ, the LM algorithm is once again Fig. 9. (Real Data.) Average localization errors for localizing 12 microphones used in which the update is using 6 to 21 sources. We compare using subsets of the available 12×65 TOA matrix (Full Data) to using subsets of the 12 × 23 outlier-free TOA matrix θ(i+1) = θ(i)− (Outlier-Free). For the former, we use the missing data approach where we −1 treat outliers as missing entries. Df(θ(i))>Df(θ(i)) + µDg(θ(i))>Dg(θ(i)) + λ(i)I Df(θi)>f(θ(i)) + µDg(θ(i))>(g + ~z/(2µ) Next, we use subsets of the entire 12×65 TOA that contains |M|×d(M+K) outliers but assume knowing where the outliers are. We can where Dg ∈ R is the Jacobian defined as thus use the missing data approach described in Section IV-D.  ~xi − 2~xj p = i, Figure 9 shows the errors for localizing the 12 microphones  Dgij,p = −2~x + ~x p = j, (28) using 6 to 21 sources. On one hand, the average error is larger i j  compared to the outlier-free case. On the other hand, the best 0 otherwise. case scenarios that result in minimum errors are comparable to the outlier-free case. The smallest localization error is 4 cm REFERENCES and corresponds to K = 16 loudspeakers. [1] G. San Antonio, D. R. Fuhrmann, and F. C. Robey, “MIMO radar ambiguity functions,” IEEE Journal of Selected Topics in Signal Processing, vol. 1, no. 1, pp. 167–177, 2007. [2] R. Parhizkar, A. Karbasi, S. Oh, and M. Vetterli, “Calibration using matrix completion with application to ultrasound tomography,” IEEE transactions on signal processing, vol. 61, no. 20, pp. 4923–4933, 2013. VII.CONCLUSION [3] D. El Badawy, “Good Vibrations and Unknown Excitations,” Ph.D. dissertation, Ecole´ polytechnique fed´ erale´ de Lausanne, 2020. [4] B. G. Ferguson, “Improved time-delay estimates of underwater acoustic We formulated a timing-invariant objective that allows us signals using beamforming and prefiltering techniques,” IEEE Journal of Oceanic Engineering, vol. 14, no. 3, pp. 238–244, 1989. to localize sensors and sources in an unsynchronized setting. [5] R. Scheibler, D. Di Carlos, A. Deleforge, and I. Dokmanic,´ “Separake: Based on this objective, we proposed an SDP relaxation using Source separation with a little help from echoes,” in 2018 IEEE the full Gram matrix of the point set and then a subsequent International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 6897–6901. refinement step using the LM algorithm. We have thus elimi- [6] H. Pan, R. Scheibler, E. Bezzam, I. Dokmanic,´ and M. Vetterli, “FRIDA: nated the need for having to first estimate the unknown timings FRI-based DOA estimation for arbitrary array layouts,” in 2017 IEEE as well as avoided the multiple initializations required when International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017, pp. 3186–3190. solving non-convex problems. We also proposed an approach [7] M. Martinez-Camara, I. Dokmanic,´ J. Ranieri, R. Scheibler, M. Vetterli, to handle missing data and showed how to seamlessly in- and A. Stohl, “The fukushima inverse problem,” in 2013 IEEE Interna- corporate additional information such as knowledge of some tional Conference on Acoustics, Speech and Signal Processing. IEEE, 2013, pp. 4330–4334. distances or partial synchronization. [8] S. Simoni, S. Padoan, D. F. Nadeau, M. Diebold, A. Porporato, G. Bar- Using numerical simulations, we demonstrated the feasibil- renetxea, F. Ingelrest, M. Vetterli, and M. Parlange, “Hydrologic re- sponse of an alpine watershed: Application of a meteorological wireless ity of the approach in different scenarios and in near-minimal sensor network to understand streamflow generation,” Water Resources configurations. We compared our approach to a two-stage Research, vol. 47, no. 10, 2011. state-of-the-art method [15]. Not only did our approach out- [9] M. Z. Win, F. Meyer, Z. Liu, W. Dai, S. Bartoletti, and A. Conti, “Efficient multisensor localization for the internet of things: Exploring a perform the two-stage algorithm in terms of localization error, new class of scalable localization algorithms,” IEEE Signal Processing but also in terms of runtime. We also tested our algorithms on Magazine, vol. 35, no. 5, pp. 153–167, 2018. real times of arrival measured in an office environment. We [10] H. Simkovits, A. J. Weiss, and A. Amar, “Navigation by inertial device and signals of opportunity,” Signal Processing, vol. 131, pp. 280–287, were able to localize the 12 microphones to within 0.04 m 2017. average error. [11] Y. Rockah and P. Schultheiss, “Array shape calibration using sources in unknown locations–Part I: Far-field sources,” IEEE Transactions on Future work could focus on dealing with outliers which Acoustics, Speech, and Signal Processing, vol. 35, no. 3, pp. 286–299, arise in the presence of multipath, for example. The method 1987. could be similar to our missing data approach except that [12] ——, “Array shape calibration using sources in unknown locations–Part II: Near-field sources and estimator implementation,” IEEE Transactions the positions of the outliers are not known and need to be on Acoustics, Speech, and Signal Processing, vol. 35, no. 6, pp. 724– determined. 735, 1987. 13

[13] A. J. Weiss and B. Friedlander, “Array shape calibration using sources in [36] R. M. Vaghefi and R. M. Buehrer, “Asynchronous time-of-arrival-based unknown locations-a maximum likelihood approach,” IEEE Transactions source localization,” in International Conference on Acoustics, Speech on acoustics, speech, and signal processing, vol. 37, no. 12, pp. 1958– and Signal Processing (ICASSP). IEEE, 2013, pp. 4086–4090. 1966, 1989. [37] G. Wang, Y. Li, and N. Ansari, “A semidefinite relaxation method [14] A. Plinge, F. Jacob, R. Haeb-Umbach, and G. A. Fink, “Acoustic micro- for source localization using TDOA and FDOA measurements,” IEEE phone geometry calibration: An overview and experimental evaluation of Transactions on Vehicular Technology, vol. 62, no. 2, pp. 853–862, 2012. state-of-the-art algorithms,” IEEE Signal Processing Magazine, vol. 33, [38] F. Jiang, Y. Kuang et al., “Time delay estimation for TDOA self- no. 4, pp. 14–29, 2016. calibration using truncated nuclear norm regularization,” in International [15] L. Wang, T.-K. Hon, J. D. Reiss, and A. Cavallaro, “Self-localization of Conference on Acoustics, Speech and Signal Processing (ICASSP). ad-hoc arrays using time difference of arrivals,” IEEE Transactions on IEEE, 2013, pp. 3885–3889. Signal Processing, vol. 64, no. 4, pp. 1018–1033, 2015. [39] Y. Kuang, S. Burgess, A. Torstensson, and K. Astr˚ om,¨ “A complete [16] C. Peng, G. Shen, Y. Zhang, Y. Li, and K. Tan, “Beepbeep: a high characterization and solution to the microphone position self-calibration accuracy acoustic ranging system using cots mobile devices,” in Pro- problem,” in International Conference on Acoustics, Speech and Signal ceedings of the 5th international conference on Embedded networked Processing (ICASSP). IEEE, 2013, pp. 3875–3879. sensor systems. ACM, 2007, pp. 1–14. [40] Y. Kuang and K. Astr˚ om,¨ “Stratified sensor network self-calibration [17] C. Vanwynsberghe, “Reseaux´ a` grand nombre de microphones: applica- from TDOA measurements,” in European Signal Processing Conference bilite´ et mise en œuvre,” Ph.D. dissertation, Paris 6, 2016. (EUSIPCO). IEEE, 2013, pp. 1–5. [18] W. S. Torgerson, “Multidimensional scaling: I. Theory and method,” [41] S. Zhayida, S. Burgess, Y. Kuang, and K. Astr˚ om,¨ “TOA-based self- Psychometrika, vol. 17, no. 4, pp. 401–419, 1952. calibration of dual-microphone array,” Journal of Selected Topics in [19] J. B. Kruskal and M. Wish, Multidimensional scaling. Sage, 1978, Signal Processing, vol. 9, no. 5, pp. 791–801, 2015. no. 11. [42] M. Farmani, R. Heusdens, M. S. Pedersen, and J. Jensen, “Tdoa- [20] I. Dokmanic,´ R. Parhizkar, J. Ranieri, and M. Vetterli, “Euclidean based self-calibration of dual-microphone arrays,” in European Signal distance matrices: essential theory, algorithms, and applications,” IEEE Processing Conference (EUSIPCO). IEEE, 2016, pp. 617–621. ˚ Signal Processing Magazine, vol. 32, no. 6, pp. 12–30, 2015. [43] S. Burgess, Y. Kuang, and K. Astrom,¨ “TOA sensor network self- [21] C. Vanwynsberghe, P. Challande, J. Marchal, R. Marchiano, and calibration for receiver and transmitter spaces with difference in dimen- F. Ollivier, “A robust and passive method for geometric calibration sion,” Signal Processing, vol. 107, pp. 33–42, 2015. of large arrays,” The Journal of the Acoustical Society of America, [44] K. Batstone, G. Flood, T. Beleyur, V. Larsson, H. R. Goerlitz, M. Os- ˚ vol. 139, no. 3, pp. 1252–1263, 2016. [Online]. Available: https: karsson, and K. Astrom,¨ “Robust self-calibration of constant offset time- //doi.org/10.1121/1.4944566 difference-of-arrival,” in International Conference on Acoustics, Speech [22] P. H. Schonemann,¨ “On metric multidimensional unfolding,” Psychome- and Signal Processing (ICASSP). IEEE, 2019, pp. 4410–4414. trika, vol. 35, no. 3, pp. 349–366, 1970. [45] I. Dokmanic,´ L. Daudet, and M. Vetterli, “How to localize ten microphones in one fingersnap,” in EUSIPCO, 2014. [23] M. Crocco, A. Bue, and V. Murino, “A bilinear approach to the posi- [46] ——, “From acoustic room reconstruction to SLAM,” in IEEE ICASSP. tion self-calibration of multiple sensors,” IEEE Trans. Signal Process., IEEE, 2016, pp. 6345–6349. vol. 60, no. 2, pp. 660–673, 2012. [47] S. Boyd and L. Vandenberghe, Introduction to Applied Linear Algebra: [24] I. Dokmanic,´ J. Ranieri, and M. Vetterli, “Relax and unfold: Microphone Vectors, Matrices, and Least Squares. Cambridge University Press, localization with euclidean distance matrices,” in 2015 23rd European 2018. Signal Processing Conference (EUSIPCO), 2015, pp. 265–269. [48] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex [25] N. Ono, H. Kohno, N. Ito, and S. Sagayama, “Blind alignment of programming, version 2.1,” http://cvxr.com/cvx, Mar. 2014. asynchronously recorded signals for distributed microphone array,” in [49] ——, “Graph implementations for nonsmooth convex programs,” in 2009 IEEE Workshop on Applications of Signal Processing to Audio Recent Advances in Learning and Control, ser. Lecture Notes in Control and Acoustics. IEEE, 2009, pp. 161–164. and Information Sciences, V. Blondel, S. Boyd, and H. Kimura, Eds. [26] J. Zhang, R. C. Hendriks, and R. Heusdens, “Structured total least Springer-Verlag Limited, 2008, pp. 95–110, http://stanford.edu/∼boyd/ squares based internal delay estimation for distributed microphone auto- graph dcp.html. localization,” in 2016 IEEE International Workshop on Acoustic Signal [50] R. Scheibler, E. Bezzam, and I. Dokmanic,´ “Pyroomacoustics: A python Enhancement (IWAENC), 2016, pp. 1–5. package for audio room simulations and array processing algorithms,” [27] N. D. Gaubitch, W. B. Kleijn, and R. Heusdens, “Auto-localization in in IEEE International Conference on Acoustics, Speech and Signal ad-hoc microphone arrays,” in IEEE ICASSP. Vancouver: IEEE, 2013, Processing (ICASSP), 2018. pp. 106–110. [51] P. H. Schonemann,¨ “A generalized solution of the orthogonal procrustes [28] M. Pollefeys and D. Nister, “Direct computation of sound and mi- problem,” Psychometrika, vol. 31, no. 1, pp. 1–10, 1966. [Online]. crophone locations from time-difference-of-arrival data,” in 2008 IEEE Available: https://doi.org/10.1007/BF02289451 International Conference on Acoustics, Speech and Signal Processing. [52] K. Yamaoka, R. Scheibler, N. Ono, and Y. Wakabayashi, “Sub-sample IEEE, 2008, pp. 2445–2448. time delay estimation via auxiliary-function-based iterative updates,” in [29] S. Thrun, “Affine structure from sound,” in Advances in Neural Infor- 2019 IEEE Workshop on Applications of Signal Processing to Audio and mation Processing Systems, 2006, pp. 1353–1360. Acoustics (WASPAA), 2019, pp. 130–134. [30] M. Krekovic,´ G. Baechler, I. Dokmanic,´ and M. Vetterli, “Structure from sound with incomplete data,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 3539–3543. [31] M. Krekovic,´ “Point, Plane, Set, Match! Winning the Echo Grand SLAM,” Ph.D. dissertation, Ecole´ polytechnique fed´ erale´ de Lausanne, 2020. [32] R. Heusdens and N. Gaubitch, “Time-delay estimation for TOA-based localization of multiple sensors,” in IEEE ICASSP. IEEE, 2014, pp. 609–613. [33] P. Biswas, T.-C. Lian, T.-C. Wang, and Y. Ye, “Semidefinite programming based algorithms for sensor network localization,” ACM Transactions on Sensor Networks (TOSN), vol. 2, no. 2, pp. 188–220, 2006. [34] P. Biswas, T.-C. Liang, K.-C. Toh, Y. Ye, and T.-C. Wang, “Semidefinite programming approaches for sensor network localization with noisy distance measurements,” IEEE transactions on automation science and engineering, vol. 3, no. 4, pp. 360–371, 2006. [35] K. Yang, G. Wang, and Z.-Q. Luo, “Efficient convex relaxation methods for robust target localization by a sensor network using time differences of arrivals,” IEEE transactions on signal processing, vol. 57, no. 7, pp. 2775–2784, 2009.