Density Adaptive

Felix Jaremo¨ Lawin, Martin Danelljan, Fahad Shahbaz Khan, Per-Erik Forssen,´ Michael Felsberg Laboratory, Department of Electrical Engineering, Linkoping¨ University, Sweden {felix.jaremo-lawin, martin.danelljan, fahad.khan, per-erik.forssen, michael.felsberg}@liu.se

Abstract

Probabilistic methods for point set registration have demonstrated competitive results in recent years. These techniques estimate a probability distribution model of the point clouds. While such a representation has shown promise, it is highly sensitive to variations in the den- sity of 3D points. This fundamental problem is primar- ily caused by changes in the sensor location across point sets. We revisit the foundations of the probabilistic regis- tration paradigm. Contrary to previous works, we model the underlying structure of the scene as a latent probabil- ity distribution, and thereby induce invariance to point set density changes. Both the probabilistic model of the scene and the registration parameters are inferred by minimiz- ing the Kullback-Leibler divergence in an Expectation Max- imization based framework. Our density-adaptive regis- tration successfully handles severe density variations com- monly encountered in terrestrial applications. We perform extensive experiments on several challenging real- world Lidar datasets. The results demonstrate that our ap- proach outperforms state-of-the-art probabilistic methods for multi-view registration, without the need of re-sampling. Figure 1. Two example Lidar scans (top row), with signifi- cantly varying density of 3D-points. State-of-the-art probabilistic method [6] (middle left) only aligns the regions with high density. This is caused by the emphasis on dense regions, as visualized by 1. Introduction the Gaussian components in the model (black circles in bottom 3D-point set registration is a fundamental problem in left). Our method (right) successfully exploits essential informa- tion available in sparse regions, resulting in accurate registration. computer vision, with applications in 3D mapping and scene understanding. Generally, the point sets are acquired using a 3D sensor, e.g. a Lidar or an RGBD camera. The parameters, and a Gaussian (GMM) of the task is then to align point sets acquired at different po- point distribution. Our formulation instead minimizes the sitions, by estimating their relative transformations. Re- Kullback-Leibler divergence between the mixture model cently, probabilistic registration methods have shown com- and a latent scene distribution. petitive performance in different scenarios, including pair- Common acquisition sensors, including Lidar and wise [19, 14, 15] and multi-view registration [10, 6]. RGBD cameras, do not sample all surfaces in the scene In this work, we revisit the foundations of the probabilis- with a uniform density (figure 1, top row). The density of tic registration paradigm, leading to a reformulation of the 3D-point observations is highly dependent on (1) the dis- Expectation Maximization (EM) based approaches [10, 6]. tance to the sensor, (2) the direction of the surface relative In these approaches, a Maximum Likelihood (ML) formu- to the sensor, and (3) inherent surface properties, such as lation is used to simultaneously infer the transformation specularity. Despite recent advances, state-of-the art prob-

3829 abilistic methods [19, 10, 6, 15, 14] struggle under varying tration. Another line of research is to use feature descriptors sampling densities, in particular when the translational part to find point correspondences in a robust estimation frame- of the transformation is significant. The density variation is work, such as RANSAC [20]. Zhou et al.[27] also use problematic for standard ML-based approaches since each feature correspondences, but minimize a Geman-McClure 3D-point corresponds to an observation with equal weight. robust loss. A drawback of such global methods is the re- Thus, the registration focuses on regions with high point liance on accurate geometric feature extraction. densities, while neglecting sparse regions. Probabilistic registration methods model the distribution This negligence is clearly visible in figure 1 (bottom of points as a density function. These methods perform left), where registration has been done using CPPSR [6]. alignment either by employing a correlation based approach Here the vast majority of Gaussian components (black cir- or using an EM based optimization framework. In cor- cles) are located in regions with high point densities. A relation based approaches [25, 15], the point sets are first common consequence of this is inaccurate or failed regis- modeled separately as density functions. The relative trans- trations. Figure 1 (middle right) shows an example regis- formation between the points set is then obtained by mini- tration using our approach. Unlike the existing method [6], mizing a metric or divergence between the densities. These our model exploits information available in both dense and methods lead to nonlinear optimization problems with non- sparse regions of the scene, as shown by the distribution of convex constraints. Unlike correlation based methods, the Gaussian components (figure 1, bottom right). EM based approaches [19, 10] find an ML-estimate of the 1.1. Contributions density model and transformation parameters. Most methods implicitly assume a uniform density of We propose a probabilistic point set registration ap- the point clouds, which is hardly the case in most applica- proach that counters the issues induced by sampling den- tions. The standard approach [22] to alleviate the problems sity variations. Our approach directly models the underly- of varying point density is to re-sample the point clouds in ing structure of the 3D scene using a novel density-adaptive a separate preprocessing step. The aim of this strategy is to formulation. The probabilistic scene model and the trans- achieve an approximately uniform distribution of 3D points formation parameters are jointly inferred by minimizing the in the scene. A common method is to construct a voxel Kullback-Leibler (KL) divergence with respect to the latent grid and taking the mean point in each voxel. Compara- scene distribution. This is enabled by modeling the acqui- ble uniformity is achieved using the Farthest Point Strategy sition process itself, explicitly taking the density variations [8], were points are selected iteratively to maximize the dis- into account. To this end, we investigate two alternative tance to neighbors. Geometrically Stable Sampling (GSS) strategies for estimating the acquisition density: a model- [11] also incorporates surface normals in the sample selec- based and a direct empirical method. Experiments are per- tion process. However, such re-sampling methods have sev- formed on several challenging Lidar datasets, demonstrat- eral shortcomings. First, 3D scene information is discarded ing the effectiveness of our approach in difficult scenarios as observations are grouped together or removed, leading to with drastic variations in the sampling density. sparsification of the point cloud. Second, the sampling rate, e.g. voxel size, needs to be hand picked for each scenario 2. Related work as it depends on the geometry and scale of the point cloud. The problem of 3D-point set registration is extensively Third, a suitable trade-off between uniformity and sparsity pursued in computer vision. Registration methods can be must be found. Thus, such preprocessing steps are compli- coarsely categorized into local and global methods. Local cated and their efficacy is questionable. In this paper, we methods rely on an initial estimate of the relative transfor- instead explicitly model the density variations induced by mation, which is then iteratively refined. The typical ex- the sensor. ample of a local method is the Iterative Closest Point (ICP) There exist probabilistic registration methods that tackle algorithm. In ICP, registration is performed by iteratively the problem of non-uniform sampling density [2, 13]. In alternating between establishing point correspondences and [2], a one class support vector machine is trained for pre- refining the relative transformation. While the standard ICP dicting the underlying density of partly occluded point sets. [1] benefits from a low computational cost, it is limited by The point sets are then registered by minimizing the L2 dis- a narrow region of convergence. Several works [23, 21, 4] tance between the density models. In [16], an extended EM investigate how to improve the robustness of ICP. framework for modeling noisy data points is derived, based Global methods instead aim at finding the global solu- on minimizing the KL divergence. This framework was tion to the registration problem. Many global methods rely later exploited for handling in point set registration on local ICP-based or probabilistic methods and use, e.g., [13]. Unlike these methods, we introduce a latent distribu- multiple restarts [17], graph optimization [24], branch-and- tion of the scene and explicitly model the point sampling bound [3] techniques to search for a globally optimal regis- density using either a sensor model or an empirical method.

3830 3. Method ,

In this work, we revisit probabilistic point cloud registra- M Ni (Θ; ,..., M )= log(pV (φ(xij; ωi) θ)) . (2) tion, with the aim of alleviating the problem of non-uniform L X1 X | i j point density. To show the impact of our model, we employ X X the Joint Registration of Multiple Point Clouds (JRMPC) Here, we denote the set of all parameters in the model [10]. Compared to previous probabilistic methods, JRMPC as Θ = θ,ω1,...,ωM . Inference is performed with has the advantage of enabling joint registration of multiple the Expectation{ Maximization} (EM) algorithm, by first input point clouds. Furthermore, this framework was re- introducing a latent variable Z 1,...,K that as- cently extended to use color [6], geometric feature descrip- signs a 3D-point V to a particular∈ { mixture component} tors [5] and incremental joint registration [9]. However, our Z = k. The complete data likelihood is then given by approach can be applied to a variety of other probabilistic pV,Z (v,k θ) = pZ (k θ)pV |Z (v k,θ), where pZ (k θ) = πk registration approaches. Next, we present an overview of | | | | and pV |Z (v k,θ) = (v; µk, Σk). The original mixture the baseline JRMPC method. model (1) is| recoveredN by marginalizing the complete data likelihood over the latent variable Z. 3.1. Probabilistic Point Set Registration The E-step in the EM algorithm involves computing the Point set registration is the problem of finding the rela- expected complete-data log likelihood, tive geometric transformations between M different sets of M Ni n points. We directly consider the general case where M 2. n Q(Θ;Θ )= EZ|xij ,Θ [log(pV,Z (φ(xij; ωi),Z θ))] . Ni ≥ | Each set i = xij j=1,i = 1,...,M, consists of 3D- i j X { } 3 X X point observations xij R obtained from, e.g., a Lidar (3) ∈ scanner or an RGBD camera. We let capital letters Xij de- Here, the conditional expectation is taken over the latent note the associated random variables for each observation. variable given the observed point xij and the current esti- In general, probabilistic methods aim to model the probabil- mate of the model parameters Θn. In the M-step, the model n+1 n ity densities pXi (x), for each point set i, using for instance parameters are updated as Θ = arg maxΘ Q(Θ;Θ ). Gaussian Mixture Models (GMMs). This process is then repeated until convergence. Different from previous approaches, JRMPC derives the 3.2. Sampling Density Adaptive Model densities pXi (x) from a global probability density model pV (v θ), which is defined in a reference coordinate frame To tackle the issues caused by non-uniform point den- given| parameters θ. The registration problem can then sities, we revise the underlying formulation and model as- be formulated as finding the relative transformations from sumptions. Instead of modeling the density of 3D-points, 3 point set i to the reference frame. We let φ( ; ω) : R we aim to infer a model of the actual 3D-structure of the R3 be a 3DX transformation parametrized by ω· RD. The→ scene. To this end, we introduce the latent probability dis- ∈ goal is then to find the parameters ωi of the transformation tribution of the scene qV (v). Loosely defined, it is seen from i to the reference frame, such that φ(Xij; ωi) pV . as a uniform distribution on the observed surfaces in the X ∼ Similarly to previous works [10, 6], we focus on the most scene. Intuitively, qV (v) encodes all 3D-structure, i.e. common case of rigid transformation φ(x; ω)= Rωx + tω. walls, ground, objects etc., that is measured by the sen- In this case, the density model of each point set is obtained sor. Different models of qV (v) are discussed is section 3.4. as pX (x ωi,θ)= pV (φ(x; ωi) θ). Technically, q might not be absolutely continuous and is i | | V The density pV (v θ) is composed by a mixture of Gaus- thus regarded a probability measure. However, we will de- sian distributions, | note it as a density function to simplify the presentation. Our goal is to model qV (v) as a parametrized density K function pV (v θ). We employ a GMM (1) and minimize pV (v θ)= πk (v; µk, Σk) . (1) | | N the Kullback-Leibler (KL) divergence from pV to qV , k X=1 qV (v) Here, (v; µ, Σ) is a Gaussian density with mean µ and KL(qV pV )= log qV (v)dv. (4) || pV (v θ) covarianceN Σ. The number of components is denoted by K Z  |  and πk is the prior weight of component k. The set of all Utilizing the decomposition of the KL-divergence K mixture parameters is thus θ = πk,µk, Σk . KL(qV pV )= H(qV ,pV ) H(qV ) into the cross entropy { }k=1 || − Different from previous works, the mixture model pa- H(qV ,pV ) and entropy H(qV ) of qV , we can equivalently rameters θ and transformation parameters ω are inferred maximize, jointly in the JRMPC framework, assuming independent (Θ) = H(qV ,pV )= log (pV (v θ)) qV (v)dv (5) observations. This is achieved by maximizing the log- E − | Z

3831 In (5), the integration is performed in the reference frame pected complete-data cross entropy as, of the scene. On the other hand, the 3D points xij are ob- n served in the coordinate frames of the individual sensors. (Θ, Θ )= (8) Q As in section 3.1, we relate these coordinate frames with the M

n transformations φ( ; ωi). By applying the change of vari- EZ|x,Θ [log (pV,Z (φ(x; ωi),Z θ))] fi(x)qXi (x)dx. · R3 | ables v = φ(x; ω ), we obtain i=1 i XZ n M Here, Θ is the current estimate of the parameters. The E- 1 (Θ) = log (pV (φ(x; ωi) θ)) (6) step involves evaluating the expectation in (8), taken over E M R3 | · i=1 the probability distribution of the latent variable, X Z qV (φ(x; ωi)) det(Dφ(x; ωi)) dx. | | pXi,Z (x,k Θ) pZ|Xi (k x, Θ) = K | Here, det(Dφ(x; ωi)) is the determinant of the Jacobian | pX ,Z (x,k Θ) | | k=1 i | of the transformation. From now on, we assume rigid trans- π (φ(x; ω ); µ , Σ ) = P k k k k . (9) formations, which implies det(Dφ(x; ωi)) = 1. K N Ni| | l=1 πl (φ(x; ωl); µl, Σl) We note that if xij are independent samples from N { }i=1 qV (φ(x; ωi)), the original maximum likelihood formulation To maximize (8)P in the M-step, we first perform a (2) is recovered as a Monte Carlo sampling of the objective Monte Carlo sampling of (8). Here we use the assump- (6). Therefore, the conventional ML formulation (2) relies tion that the observations are independent samples drawn on the assumption that the observed points xij follow the n from xij qXi . To simplify notation, we define αijk = underlying uniform distribution of the scene qV . However, ∼ n pZ|Xi (k xij, Θ ). Then (8) is approximated as, this assumption completely neglects the effects of the acqui- | sition sensor. Next, we address this problem by explicitly (Θ, Θn) Q(Θ, Θn)= (10) Q ≈ modeling the sampling process. M Ni K In our formulation, we consider the points in set i to be 1 n αijkfi(xij) log (pV,Z (φ(xij; ωi),k θ)) . Ni | independent samples xij qXi of a distribution qXi (x). In i=1 j=1 k ∼ X X X=1 addition to the 3D structure qV of the scene, qXi can also depend on the position and properties of the sensor, and the Please refer to the supplementary material for a detailed inherent properties of the observed surfaces. This enables derivation of the EM procedure. more realistic models of the sampling process to be em- The key difference of (10) compared to the ML case ployed. By assuming that the distribution qV is absolutely (3), is the weight factor fi(xij). This factor effectively continuous [7] w.r.t. qXi , eq. (6) can be written, weights each observation xij based on the local density of 3D points. Since the M-step has a form similar to (3), M qV (φ(x; ωi)) we can apply the optimization procedure proposed in [10]. (Θ)= log (pV (φ(x; ωi) θ)) qXi (x)dx. Specifically, we employ two conditional maximization steps E R3 | qX (x) i=1Z i [18], to optimize over the mixture parameters θ and trans- X (7) formation parameters ω respectively. Furthermore, our ap- Here, we have also ignored the factor 1/M. The frac- i qV (φ(x;ωi)) proach can be extended to incorporate color information us- tion fi(x) = is known as the Radon-Nikodym qXi (x) ing the approach proposed in [6]. derivative [7] of the probability distribution qV (φ(x; ωi)) 3.4. Observation Weights with respect to qXi (x). Intuitively, fi(x) is the ratio be- tween the density in the latent scene distribution and the We present two approaches of modeling the observation density of points in point cloud i. Since it weights the ob- X weight function fi(x). The first is based on a sensor model, served 3D-points based on the local density, we term it the while the second is an empirical estimation of the density. observation weighting function. In section 3.4, we later in- troduce two different approximations of fi(x) to model the sampling process itself. 3.4.1 Sensor Model Based Here, we estimate the sampling distribution q by model- 3.3. Inference Xi ing the acquisition sensor itself. For this method we there- In this section, we describe the inference algorithm used fore assume that the type of sensor (e.g. Lidar) is known to minimize (7). We show that the EM-based frame- and that each point set i consists of a single scan. The X work used in [10, 6] also generalizes to our model. As latent scene distribution qV is modeled as a uniform dis- in section 3.1, we apply the latent variable Z and the tribution on the observed surfaces S. That is, S is a 2- complete-data likelihood pV,Z (v,k θ). We define the ex- dimensional manifold consisting of all observable surfaces. |

3832 1 Thus, we define qV (A) = |S| S∩A dS for any measur- able set A R3. For simplicity, we use the same notation ⊂ R qV (A) = P(V A) for the probability measure qV of V . We use S = ∈dS to denote the total area of S. | | S We model the sampling distribution qXi based on the properties of aR terrestrial Lidar. It can however be extended to other sensor geometries, such as time-of-flight cameras. We can without loss of generality assume that the Lidar is positioned in the origin x = 0 of the sensor-based refer- −1 ence frame in i. Further, let Si = φi (S) be the scene transformed toX the reference frame of the sensor. Here, we 1 2 3 4 5 6 use φi(x)= φ(x,ωi) to simplify notation. We note that the Figure 2. Visualization of the observation weight computed using density of Lidar rays is decreasing quadratically with dis- our sensor based model (left) and empirical method (right). The tance. For this purpose, we model the Lidar as light source 3D-points in the densely sampled regions in the vicinity of the Li- emitting uniformly in all directions of its field of view. The dar are assigned low weights, while the impact of points in the sampling probability density at a visible point x Si is then sparser regions are increased. The two approaches produce visu- ∈ T nˆxxˆ ally similar results. The main differences are seen in the transitions proportional to the absorbed intensity, calculated as 2 . kxk from dense to sparser regions. Here, nˆx is the unit normal vector of Si at x, is the Euclidean norm and xˆ = x/ x . k · k variations orthogonal to the underlying surface. In the local The sampling distributionk isk defined as the probability of neighborhood of a point v¯ S, we can then approximate observing a point in a subset A R3. It is obtained by ∈ ⊂ the latent scene distribution as a 1-dimensional Gaussian in integrating the point density over the part of the surface S 1 T 2 the normal direction qV (v) (ˆn (v v¯); 0,σ (¯v)). intersecting A, ≈ |S| N v¯ − nˆ It is motivated by a locally planar approximation of the sur- T face at , where is constant in the tangent directions nˆxxˆ S v¯ qV (v) g a 2 ,x S F i kxk i i of S. Here, σ2 (¯v) is the variance in the normal direction. qXi (A)= dSi, gi(x)= ∈ ∩ nˆ Si∩A S ε , otherwise Z | | ( To estimate the observation weight function f(x) = (11) qV (φ(x)) , we also find a local approximation of the sam- Here, R3 is the observed subset of the scene, is qX (x) Fi ε pling density q (x). For simplicity, we drop the point set the outlier⊂ density and is a constant such that the prob- X a index i in this section and assume a rigid transformation ability integrates to 1. Using the properties of , we can qV φ(x) = Rx + t. First, we extract the L nearest neigh- rewrite (11) as qXi (A) = gi d(qV φi). Here, qV φi A ◦ ◦ bors x1,...,xL of the 3D point x in the point cloud. We is the composed measure qV (φi(A)). From the proper- 1 R then find the local mean x¯ = L l xl and covariance ties of the Radon-Nikodym derivative [7], we obtain that 1 T C = (xl x¯) (xl x¯). This yields the local f = d(qV ◦φi) = 1 . In practice, surface normal esti- L−1 l − − P i dqX gi L i sampling density estimate qX (x) N (x;x,C ¯ ). Let mates can be noisy, thus promoting the use of a regular- PT ≈ N 2 C = BDB be the eigenvalue decomposition of C with kxk 2 2 2 ized quotient fi(x) = a T , for some fix parameter ˆ ˆ ˆ and diag , and eigenval- γnˆxxˆ+1−γ B = (b1, b2, b3) D = (σ1,σ2,σ3) γ [0, 1]. Note that the calculation of fi(x) only requires ues sorted in descending order. Since we assume the points information∈ about the distance x to the sensor and the to originate from a locally planar region, we deduce that k k 2 2 2 ˆ 2 normal nˆx of the point cloud at x. For details and deriva- σ1,σ2 σ3. Furthermore, b3 and σ3 approximate the nor- tions, see the supplementary material. mal direction≫ of the surface and the variance in this direc- tion. We utilize this information for estimating the local latent scene distribution, by setting v¯ = φ(¯x), nˆv¯ = Rˆb3 3.4.2 Empirical Sample Model 2 2 and σnˆ (¯v)= σ3. We then obtain, As an alternative approach, we propose an empirical model −2 σ1 0 0 1 T −2 T of the sampling density. Unlike the sensor-based model in qV (φ(x)) 2 (x − x¯) B 0 σ2 0 B (x − x¯) f(x)= σ1σ2e 0 00! . section 3.4.1, our empirical approach does not require any qX (x) ∝ information about the sensor. It can thus be applied to arbi- (12) trary point clouds, without any prior knowledge. We mod- Here, we have omitted proportionality constants indepen- ify the latent scene model qV from sec. 3.4.1 to include a 1- dent of the point location x in f(x), since they do not dimensional Gaussian distribution in the normal direction of influence the objective (7). A detailed derivation is pro- the surface S. This uncertainty in the normal direction mod- vided in the supplementary material. In practice, we found els the coarseness or evenness of the surface, which leads to f(x) σ σ to be a sufficiently good approximation since ∝ 1 2

3833 σ−2,σ−2 0 and x¯ x. 1 2 ≈ ≈ Note that the observation weights fi(xij) in (10) can be precomputed once for every registration. The added com- putational cost of the density adaptive registration method is therefore minimal and in our experiments we only ob- served an increase in computational time of 2% compared to JRMPC. In figure 2, the observation weights fi(xij) are visualized for both the sensor based model (left) and empir- ical method (right). Figure 3. The synthetic 3D scene. Left: Rendering of the scene. 4. Experiments Right: Top view of re-sampled point set with varying density. We integrate our sampling density adaptive model in the probabilistic framework JRMPC [10]. Furthermore, In our experiments, we set the outlier ratio 0.005 and fix we evaluate our approach, when using feature information, the spatial component weights πk to uniform. In case of by integrating the model in the color based probabilistic pairwise registration, we set the number of spatial compo- method CPPSR [6]. nents K = 200. In the joint registration scenario, we set First we perform a synthetic experiment to highlight the K = 300 for all methods to increase the capacity of the impact of sampling density variations on point set registra- model for larger scenes. We use 50 EM iterations for both tion. Second, we perform quantitative and qualitative evalu- the pairwise and joint registration scenarios. In case of color ations on two challenging Lidar scan datasets: Virtual Photo features, we use 64 components as proposed in [6]. Sets [26] and the ETH TLS [24]. Further detailed results are In addition to the above mentioned parameters, we use presented in the supplementary material. the L = 10 nearest neighbors to estimate σ1 and σ2 in sec- tion 3.4.2. To regularize the observation weights fi(xij) 4.1. Experimental Details (section 3.4) and remove outlier values, we first perform Throughout the experiments we randomly generate a median filtering using the same neighborhood size of ground-truth rotations and translations for all point sets. L = 10 points. We then clip all the observation weights that The point sets are initially transformed using this ground- exceed a certain threshold. We fix this threshold to 8 times truth. The resulting point sets are then used as input for the mean value of all observation weights within a point all compared registration methods. For efficiency reasons set. In the supplementary material we provide an analysis we construct a random subset of 10k points for each scan of these parameters and found our method not to be sensi- in all the datasets. The experiments on the point sets from tive to the parameter values. For the sensor model approach VPS and ETH TLS are conducted in two settings. First, (section 3.4.1) we set γ = 0.9. We keep all parameters fix we perform direct registration on the constructed point sets. in all experiments and datasets. Second, we evaluate all compared registration methods, ex- Evaluation Criteria: The evaluation is performed by com- cept for our density adaptive model, on re-sampled point puting the angular error (i.e. the geodesic distance) be- sets. The registration methods without density adaptation, tween the found rotation, R, and the ground-truth rota- however, are sensitive to the choice of re-sampling tech- tion, Rgt. This distance is computed via the Frobenius nique and sampling rate. In the supplementary material we distance dF (R,Rgt), using the relation dG(R1,R2) = −1 provide an exhaustive evaluation of FPS [8], GSS [11] and 2sin (dF (R1,R2)/√8), which is derived in [12]. To voxel grid re-sampling at different sampling rates. We then evaluate the performance in terms of robustness, we report extract the best performing re-sampling settings for each the failure rate as the percentage of registrations with an an- registration method and use it in the comparison as an em- gular error greater than 4 degrees. Further, we present the pirical upper bound in performance. accuracy in terms of the mean angular error among inlier Method naming: We evaluate two main variants of the registrations. In the supplementary material we also pro- density adaptive model. In the subsequent performance vide the translation error. plots and tables, we denote our approach using the sensor 4.2. Synthetic Data model based observation weights in section 3.4.1 by DARS, and the empirical observation weights in section 3.4.2 by We first validate our approach on a synthetic dataset to DARE. isolate the impact of sampling density variations on pair- Parameter settings: We use the same values for all the wise registration. We construct synthetic point clouds by parameters that are shared between our methods and the performing point sampling on a polygon mesh that simu- two baselines: the JRMPC and CPPSR. As in [10], we lates an indoor 3D scene (see figure 3 left). We first sample use a uniform mixture component to model the . uniformly, and densely. We then randomly select a virtual

3834 Total Recall Plot Total Recall Plot 1 0.6 4.3. Pairwise Registration 0.9 DARE DARE DAR-ideal 0.5 DARS 0.8 JRMPC DARS-g0 We perform pairwise registration experiments on the 0.7 JRMPC 0.4 0.6 joint Virtual Photo Set (VPS) [26] and the TLS ETH [24] 0.5 0.3 datasets. The VPS dataset consists of Lidar scans from two Recall Recall 0.4 0.2 0.3 separate scenes, each containing four scans. The TLS ETH

0.2 0.1 dataset consists of two separate scenes, with seven and five 0.1 scans respectively. We randomly select pairs of different 0 0 0 1 2 3 4 0 1 2 3 4 Threshold (°) Threshold (°) scans within each scene, resulting in total 3720 point set (a) Synthetic (b) Combined VPS and TLS pairs. The ground-truth for each pair is generated by first Figure 4. Recall curves with respect to the angular error. (a) Re- randomly selecting a rotation axis. We then rotate one of sults on the synthetic dataset. Our DARE approach closely fol- the point sets with a rotation angle (within 0-90 degrees) lows the upper bound, DAR-ideal. (b) Results on the combined around the rotation axis and apply a random translation, VPS and TLS ETH datasets. In all cases, our DARE approach drawn from a multivariate with standard significantly improves over the baseline JRMPC [10]. deviation 1.0 meters in all directions. Avg. inlier error (◦) Failure rate (%) Table 4b shows pairwise registration comparisons in terms of angular error on the joint dataset. We compare JRMPC 2.44 0.87 90.4 JRMPC-eub 1.67±0.92 46.0 the baseline JRMPC [10] with both of our sampling den- ICP 1.73±1.04 62.6 sity models: DARE and DARS. We also show the results ICP-eub 1.81±0.99 55.7 i.e ± for DARS without using normals, . setting γ = 0 in sec- CPD 1.88 1.25 90.0 tion 3.4.1, in the DARS-g0 curve. All the three variants CPD-eub 1.30±0.95 40.8 ± of our density adaptive approach significantly improve over DARE 1.45 0.89 43.3 ± the baseline JRMPC [10]. Further, our DARE model pro- Table 1. A comparison of our approach with existing methods in vides the best results. It significantly reduces the failure rate terms of average inlier angular error and failure rate for pairwise from 90.4% to 43.3%, compared to the JRMPC method. registration on the combined VPS and TLS ETH dataset. The We also compare our empirical density adaptive model methods with the additional -eub in the name are the empirical with several existing methods in the literature. Table 1 upper bounds using re-sampling. Our DARE method improves shows the comparison of our approach with the JRMPC over the baseline JRMPC, regardless of re-sampling settings, both [10], ICP1 [1], and CPD [19] methods. We present numeri- in terms of accuracy and robustness. cal values for the methods in terms of average inlier angular error and the failure rate. sensor location. Finally, we simulate Lidar sampling den- Additionally, we evaluate the existing methods using re- sity variations by randomly removing points according to sampling. In the supplementary material we provide an their distances to the sensor position (see figure 3 right). In evaluation of different re-sampling approaches at different total the synthetic dataset contains 500 point set pairs. sampling rates. For each of the methods JRMPC [10], Figure 4a shows the recall curves, plotting the ratio of ICP [1], and CPD [19], we select the best performing re- registrations with an angular error smaller than a threshold. sampling approach and sampling rate. In practical applica- We report results for the baseline JRMPC and our DARE tions however, such comprehensive exploration of the re- method. We also report the results when using the ideal sampling parameters is not feasible. In this experiment, sensor sample model to compute the observation weights the selected re-sampling settings serve as empirical upper fi(xij), called DAR-ideal. Note that the same sampling bounds, denoted by -eub in the method names in table 1. function was employed in the construction of the virtual From table 1 we conclude that regardless of re-sampling scans. This method therefore corresponds to an upper per- approach, our DARE still outperforms JRMPC, both in formance bound of our DARE approach. terms of robustness and accuracy. The best performing The baseline JRMPC model struggles in the presence of method overall was the empirical upper bound for CPD sampling density variations, providing inferior registration with re-sampling. However, CPD is specifically designed results with a failure rate of 85 %. Note that the JRMPC for pairwise registration, while JRMPC and our approach corresponds to setting the observation weights to uniform also generalize to multi-view registration. fi(xij) = 1 in our approach. The proposed DARE, signifi- cantly improves the registration results by reducing the fail- 4.4. Multi•view registration ure rate from 85 % to 2 %. Further, the registration perfor- We evaluate our approach in a multi-view setting, by mance of DARE closely follows the ideal sampling density jointly registering all four point sets in the VPS indoor model, demonstrating the ability of our approach to adapt to sampling density variations. 1We use the built-in Matlab implementation of ICP.

3835 (a) CPPSR (b) DARE-color Figure 5. Joint registration of the four point sets in the VPS indoor dataset. (a) CPPSR [6] only aligns the high density regions and neglects sparsely sampled 3D-structure. (b) Corresponding registration using our density adaptive model incorporating color information.

Total Recall Plot ◦ 0.9 Avg. inlier error ( ) Failure rate (%) DARE CPPSR 2.57 0.837 87.4 0.8 DARE-color ± CPPSR CPPSR-eub 1.63 0.807 20.9 0.7 CPPSR-eub JRMPC 2.38± 1.01 92.1 JRMPC JRMPC-eub 2.13±0.83 38.6 0.6 JRMPC-eub ± DARE-color 1.26 0.61 14.5 0.5 DARE 1.84±0.80 36.0 ± Recall 0.4 Table 2. A multi-view registration comparison of our density adap- 0.3 tive model with existing methods in terms of average inlier angular error and failure rate on the VPS indoor dataset. Methods with - 0.2 eub in the name are empirical upper bounds. Our model provides 0.1 improved results, both in terms of robustness and accuracy.

0 0 1 2 3 4 Threshold (°) Figure 6. A multi-view registration comparison of our density CPPSR locks on to the high density regions, while our den- adaptive model and existing methods, in terms of angular error sity adaptive approach successfully registers all scans, pro- on the VPS indoor dataset. Our model provides lower failure rate ducing an accurate reconstruction of the scene. Further, we compared to the baseline methods JRMPC and CPPSR, also in provide additional results on the VPS outdoor dataset in the comparison to the empirical upper bound. supplementary material. dataset. We follow a similar protocol as in the pairwise reg- 5. Conclusions istration case (see supplementary material). In addition to the JRMPC, we also compare our color extension with the We investigate the problem of sampling density varia- CPPSR approach of [6]. Table 2 and figure 6 shows the tions in probabilistic point set registration. Unlike previous multi-view registration results on the VPS indoor dataset. works, we model both the underlying structure of the 3D As in the pairwise scenario, the selected re-sampled ver- scene and the acquisition process to obtain robustness to sions are denoted by -eub in the method name. We use density variations. Further, we jointly infer the scene model the same re-sampling settings for JRMPC and CPPSR as and the transformation parameters by minimizing the KL for JRMPC in the pairwise case. Both JRMPC and CPPSR divergence in an EM based framework. Experiments are have a significantly lower accuracy and a higher failure rate performed on several challenging Lidar datasets. Our pro- compared to our sampling density adaptive models. We fur- posed approach successfully handles severe density vari- ther observe that re-sampling improves both JRMPC and ations commonly encountered in real-world applications. CPPSR, however, not to the same extent as our density Acknowledgements: This work was supported by the EU’s adaptive approach. Figure 5 shows a qualitative comparison Horizon 2020 Programme grant No 644839 (CENTAURO), between our color based approach and the CPPSR method CENIIT grant (18.14), and the VR grants: EMC2 (2014- [6]. In agreement with the pairwise scenario (see figure 1) 6227), starting grant (2016-05543), LCMM (2014-5928).

3836 References point algorithm. In Proceedings of the 2000 IEEE Interna- tional Conference on and Automation, ICRA 2000, [1] P. J. Besl and N. D. McKay. A method for registration of 3-d April 24-28, 2000, San Francisco, CA, USA, pages 3739– shapes. PAMI, 14(2):239–256, 1992. 2, 7 3744, 2000. 2 [2] D. Campbell and L. Petersson. An adaptive data represen- [18] X.-L. Meng and D. B. Rubin. Maximum likelihood es- Pro- tation for robust point-set registration and merging. In timation via the ECM algorithm: a general framework. ceedings of the IEEE International Conference on Computer Biometrika, 80(2):268–278, 1993. 4 Vision, pages 4292–4300, 2015. 2 [19] A. Myronenko and X. Song. Point set registration: Coher- [3] D. Campbell and L. Petersson. Gogma: Globally-optimal ent point drift. IEEE transactions on pattern analysis and gaussian mixture alignment. In The IEEE Conference on machine intelligence, 32(12):2262–2275, 2010. 1, 2, 7 Computer Vision and (CVPR), June [20] R. Raguram, J.-M. Frahm, and M. Pollefeys. A comparative 2016. 2 analysis of ransac techniques leading to adaptive real-time [4] D. Chetverikov, D. Stepanov, and P. Krsek. Robust euclidean . Computer Vision–ECCV 2008, alignment of 3d point sets: the trimmed iterative closest point pages 500–513, 2008. 2 algorithm. IMAVIS, 23(3):299–309, 2005. 2 [21] A. Rangarajan, H. Chui, and F. L. Bookstein. The softassign [5] M. Danelljan, G. Meneghetti, F. Shahbaz Khan, and M. Fels- procrustes matching algorithm. In IPMI, 1997. 2 berg. Aligning the dissimilar: A probabilistic method for feature-based point set registration. In ICPR, 2016. 3 [22] R. B. Rusu and S. Cousins. 3d is here: (pcl). In Robotics and automation (ICRA), 2011 IEEE Inter- [6] M. Danelljan, G. Meneghetti, F. Shahbaz Khan, and M. Fels- national Conference on, pages 1–4. IEEE, 2011. 2 berg. A probabilistic framework for color-based point set registration. In CVPR, 2016. 1, 2, 3, 4, 6, 8 [23] A. Segal, D. Hahnel,¨ and S. Thrun. Generalized-icp. In RSS, [7] R. Durrett. Probability: Theory and Examples. Cambridge 2009. 2 Series in Statistical and Probabilistic Mathematics. Cam- [24] P. W. Theiler, J. D. Wegner, and K. Schindler. Globally con- bridge University Press, 2010. 4, 5 sistent registration of terrestrial laser scans via graph opti- [8] Y. Eldar, M. Lindenbaum, M. Porat, and Y. Y. Zeevi. The mization. ISPRS Journal of Photogrammetry and Remote farthest point strategy for progressive image sampling. TIP, Sensing, 109:126–138, 2015. 2, 6, 7 6(9):1305–1315, 1997. 2, 6 [25] Y. Tsin and T. Kanade. A correlation-based approach to ro- [9] G. D. Evangelidis and R. Horaud. Joint alignment of mul- bust point set registration. In European conference on com- tiple point sets with batch and incremental expectation- puter vision, pages 558–569. Springer, 2004. 2 maximization. IEEE Transactions on Pattern Analysis and [26] J. Unger, A. Gardner, P. Larsson, and F. Banterle. Capturing Machine Intelligence, 2017. In press. Early Access. 3 reality for computer graphics applications. In Siggraph Asia [10] G. D. Evangelidis, D. Kounades-Bastian, R. Horaud, and Course, 2015. 6, 7 E. Z. Psarakis. A generative model for the joint registration [27] Q.-Y. Zhou, J. Park, and V. Koltun. Fast global registration. of multiple point sets. In European Conference on Computer In European Conference on Computer Vision, pages 766– Vision, pages 109–122. Springer, 2014. 1, 2, 3, 4, 6, 7 782. Springer, 2016. 2 [11] N. Gelfand, L. Ikemoto, S. Rusinkiewicz, and M. Levoy. Ge- ometrically stable sampling for the icp algorithm. In 3DIM 2003, 2003. 2, 6 [12] R. Hartley, J. Trumpf, Y. Dai, and H. Li. Rotation averaging. International Journal of Computer Vision, 103(3):267–305, July 2013. 6 [13] J. Hermans, D. Smeets, D. Vandermeulen, and P. Suetens. Robust point set registration using em-icp with information- theoretically optimal outlier handling. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 2465–2472. IEEE, 2011. 2 [14] R. Horaud, F. Forbes, M. Yguel, G. Dewaele, and J. Zhang. Rigid and articulated point registration with expectation con- ditional maximization. PAMI, 33(3):587–602, 2011. 1, 2 [15] B. Jian and B. C. Vemuri. Robust point set registration using gaussian mixture models. PAMI, 33(8):1633–1645, 2011. 1, 2 [16] L. J. Latecki, M. Sobel, and R. Lakaemper. New em derived from kullback-leibler divergence. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge dis- covery and data mining, pages 267–276. ACM, 2006. 2 [17] J. P. Luck, C. Q. Little, and W. Hoff. Registration of range data using a hybrid and iterative closest

3837