Virtual Rephotography: Novel View Prediction Error for 3D Reconstruction
Total Page:16
File Type:pdf, Size:1020Kb
Virtual Rephotography: Novel View Prediction Error for 3D Reconstruction MICHAEL WAECHTER, MATE BELJAN, SIMON FUHRMANN, and NILS MOEHRLE Technische Universitat¨ Darmstadt JOHANNES KOPF Facebook and MICHAEL GOESELE Technische Universitat¨ Darmstadt The ultimate goal of many image-based modeling systems is to render photo-realistic novel views of a scene without visible artifacts. Existing evaluation metrics and benchmarks focus mainly on the geometric accuracy of the reconstructed model, which is, however, a poor predictor of visual ac- curacy. Furthermore, using only geometric accuracy by itself does not allow evaluating systems that either lack a geometric scene representation or uti- lize coarse proxy geometry. Examples include a light field and most image- based rendering systems. We propose a unified evaluation approach based on novel view prediction error that is able to analyze the visual quality of (a) Mesh with vertex colors (b) Mesh with texture any method that can render novel views from input images. One key advan- rephoto error: 0.47 rephoto error: 0.29 tage of this approach is that it does not require ground truth geometry. This dramatically simplifies the creation of test datasets and benchmarks. It also allows us to evaluate the quality of an unknown scene during the acquisition and reconstruction process, which is useful for acquisition planning. We evaluate our approach on a range of methods including standard geometry- plus-texture pipelines as well as image-based rendering techniques, com- pare it to existing geometry-based benchmarks, demonstrate its utility for a range of use cases, and present a new virtual rephotography-based bench- mark for image-based modeling and rendering systems. (c) Point cloud splatting (d) Image-based rendering (IBR) CCS Concepts: •Computing methodologies ! Reconstruction; Image- rephoto error: 0.51 rephoto error: 0.62 based rendering; Fig. 1. Castle Ruin with different 3D reconstruction representations. Additional Key Words and Phrases: image-based rendering, image-based Geometry-based evaluation methods [Seitz et al. 2006; Strecha et al. 2008; modeling, 3D reconstruction, multi-view stereo, novel view prediction error Aanæs et al. 2016] cannot distinguish (a) from (b) as both have the same geometry. While the splatted point cloud (c) could in principle be evaluated ACM Reference Format: with these methods, the IBR solution (d) cannot be evaluated at all. Michael Waechter, Mate Beljan, Simon Fuhrmann, Nils Moehrle, Johannes Kopf, and Michael Goesele. 2017. Virtual rephotography: Novel view pre- diction error for 3D reconstruction. ACM Trans. Graph. 36, 1, Article 8 1. INTRODUCTION (January 2017), 11 pages. DOI: http://dx.doi.org/10.1145/2999533 Intense research in the computer vision and computer graphics communities has lead to a wealth of image-based modeling and rendering systems that take images as input, construct a model of the scene, and then create photo-realistic renderings for novel view- points. The computer vision community contributed tools such as structure from motion and multi-view stereo to acquire geomet- ric models that can subsequently be textured. The computer graph- Part of this work was done while Johannes Kopf was with Microsoft Re- ics community proposed various geometry- or image-based render- search. ing systems. Some of these, such as the Lumigraph [Gortler et al. Permission to make digital or hard copies of all or part of this work for personal or 1996], synthesize novel views directly from the input images (plus classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation a rough geometry approximation), producing photo-realistic results on the first page. Copyrights for components of this work owned by others than the without relying on a detailed geometric model of the scene. Even author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or though remarkable progress has been made in the area of model- republish, to post on servers or to redistribute to lists, requires prior specific permission ing and rendering of real scenes, a wide range of issues remain, and/or a fee. Request permissions from [email protected]. c 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM. especially when dealing with complex datasets under uncontrolled $15.00 conditions. In order to measure and track the progress of this ongo- DOI: http://dx.doi.org/10.1145/2999533 ing research, it is essential to perform objective evaluations. ACM Transactions on Graphics, Vol. 36, No. 1, Article 8, Publication date: January 2017. 2 • M. Waechter et al. in Figure 2(a) is strongly fragmented while the model in 2(b) is not. Thus, the two renderings exhibit very different visual quality. Their similar geometric error, however, does not reflect this. Con- trarily, Figures 2(c) and 2(d) have very different geometric errors despite similar visual quality. Close inspection shows that 2(d) has a higher geometric error because its geometry is too smooth. This is hidden by the texture, at least from the viewpoints of the ren- derings. In both cases similarity of geometric error is a poor pre- dictor for similarity of visual quality, clearly demonstrating that the (a) (b) (c) (d) purely distance-based Middlebury evaluation is by design unable to geom. error: 0.83 mm geom. error: 0.81 mm geom. error: 0.48 mm geom. error: 0.79 mm capture certain aspects of 3D reconstructions. Thus, a new method- completeness: 0.88 % completeness: 0.96 % completeness: 0.99 % completeness: 0.96 % ology that evaluates visual reconstruction quality is needed. Most 3D reconstruction pipelines are modular: Typically, struc- Fig. 2. Four submissions to the Middlebury benchmark to which we added ture from motion (SfM) is followed by dense reconstruction, tex- texture [Waechter et al. 2014]: (a) Merrell Confidence [Merrell et al. 2007] turing and rendering (in image-based rendering the latter two steps and (b) Generalized-SSD [Calakli et al. 2012] have similar geometric error are combined). While each of these steps can be evaluated individ- and different visual quality. (c) Campbell [2008] and (d) Hongxing [2010] ually, our proposed approach is holistic and evaluates the complete have different geometric error and similar visual quality. pipeline including the rendering step by scoring the visual quality of the final renderings. This is more consistent with the way humans Existing evaluation efforts focus on systems that acquire mesh would assess quality: They are not concerned with the quality of models. They compare the reconstructed meshes with ground truth intermediate representations (e.g., a 3D mesh model), but instead geometry and evaluate measures such as geometric complete- consider 2D projections, i.e., renderings of the final model from ness and accuracy [Seitz et al. 2006; Strecha et al. 2008; Aanæs different viewpoints. et al. 2016]. This approach falls short in several regards: First, Leclerc et al. [2000] pointed out that in the real world infer- only scenes with available ground-truth models can be analyzed. ences that people make from one viewpoint, are consistent with Ground-truth models are typically not available for large-scale re- the observations from other viewpoints. Consequently, good mod- construction projects outside the lab such as PhotoCity [Tuite et al. els of real world scenes must be self-consistent as well. We exploit 2011], but there is, nevertheless, a need to evaluate reconstruction this self-consistency property as follows: We divide the set of cap- quality and identify problematic scene parts. Second, evaluating tured images into a training set and an evaluation set. We then re- representations other than meshes is problematic: Point clouds can construct the training set with an image-based modeling pipeline, only be partially evaluated (only reconstruction accuracy can be render novel views with the camera parameters of the evaluation measured but not completeness), and image-based rendering rep- images, and compare those renderings with the evaluation images resentations or light fields [Levoy and Hanrahan 1996] cannot be using selected image difference metrics. evaluated at all. And third, it fails to measure properties that are This can be seen as a machine learning view on 3D recon- complementary to geometric accuracy. While there are a few appli- struction: Algorithms infer a model of the world based on train- cations where only geometric accuracy matters (e.g., reverse engi- ing images and make predictions for evaluation images. For sys- neering or 3D printing), most applications that produce renderings tems whose purpose is the production of realistic renderings our for human consumption are arguably more concerned with visual approach is the most natural evaluation scheme because it di- quality. This is for instance the main focus in image-based render- rectly evaluates the output instead of internal models. In fact, our ing where the geometric proxy does not necessarily have to be ac- approach encourages that future reconstruction techniques take curate and only visual accuracy of the resulting renderings matters. photo-realistic renderability into account. This line of thought is If we consider, e.g., multi-view stereo reconstruction pipelines, also advocated by Shan et al.’s Visual Turing Test [2013] or Van- for which geometric evaluations such as the common multi-view hoey et al.’s appearance-preserving mesh simplification [2015]. benchmarks [Seitz et al. 2006; Strecha et al. 2008; Aanæs et al. Our work draws inspiration from Szeliski [1999], who proposed 2016] were designed, we can easily see that visual accuracy is com- to use intermediate frames for optical flow evaluation. We extend plementary to geometric accuracy. The two measures are intrinsi- this into a complete evaluation paradigm which is able to handle a cally related since errors in the reconstructed geometry tend to be diverse set of image-based modeling and rendering methods.