Reconstruction and Recommendation of Realistic 3D Models Using Cgans

DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2018 Reconstruction and recommendation of realistic 3D models using cGANs MÓNICA VILLANUEVA AYLAGAS KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Reconstruction and recommendation of realistic 3D models using cGANs MÓNICA VILLANUEVA AYLAGAS Master in Machine Learning Date: June 15, 2018 Supervisor: Hedvig Kjellström and Mario Romero Vega Examiner: Danica Kragic Jensfelt Swedish title: Rekonstruktion och rekommendation av realistiska 3D-modeller som använder cGANs School of Electrical Engineering and Computer Science iii Abstract Three-dimensional modeling is the process of creating a representation of a surface or object in three dimensions via a specialized software where the modeler scans a real-world object into a point cloud, creates a completely new surface or edits the selected representation. This process can be challenging due to fac- tors like the complexity of the 3D creation software or the number of dimensions in play. This work proposes a framework that recommends three types of reconstructions of an incomplete or rough 3D model using Generative Adversarial Networks (GANs). These reconstructions follow the distribution of real data, re- semble the user model and stay close to the dataset while keeping features of the input, respectively. The main advantage of this approach is the acceptance of 3D models as input for the GAN instead of latent vectors, which prevents the need of training an extra network to project the model into the latent space. The systems are evaluated both quantitatively and qualitatively. The quantitative measure relies upon the Intersection over Union (IoU) metric while the quantitative evaluation is measured by a user study. Experiments show that it is hard to create a system that generates realistic models, following the distribution of the dataset, since users have different opinions on what is realistic. However, similarity be- tween the user input and the reconstruction is well accomplished and, in fact, the most valued feature for modelers. iv Sammanfattning Tredimensionell modellering är processen att skapa en representation av en yta eller ett objekt i tre dimensioner via en specialiserad programvara där modelle- raren skannar ett verkligt objekt i ett punktmoln, skapar en helt ny yta eller redi- gerar den valda representationen. Denna process kan vara utmanande på grund av faktorer som komplexiteten i den 3D-skapande programvaran eller antalet dimensioner i spel. I det här arbetet föreslås ett ramverk som rekommenderar tre typer av rekonstruktioner av en ofullständig eller grov 3D-modell med Generati- ve Adversarial Networks (GAN). Dessa rekonstruktioner följer distributionen av reella data, liknar användarmodellen och håller sig nära datasetet medan respek- tive egenskaper av ingången behålls. Den främsta fördelen med detta tillväga- gångssätt är acceptansen av 3D-modeller som input för GAN istället för latenta vektorer, vilket förhindrar behovet av att träna ett extra nätverk för att projicera modellen i latent rymd. Systemen utvärderas både kvantitativt och kvalitativt. Den kvantitativa åtgärden beror på Intersection over Union (IoU) metrisk medan den kvantitativa utvärderingen mäts av en användarstudie. Experiment vi- sar att det är svårt att skapa ett system som genererar realistiska modeller efter distributionen av datasetet, eftersom användarna har olika åsikter om vad som är realistiskt. Likvärdighet mellan användarinmatning och rekonstruktion är väl genomförd och i själva verket den mest uppskattade funktionen för modellerare. Contents 1 Introduction 1 1.1 Research question . 2 1.2 Motivation . 2 1.3 Delimitations . 3 1.4 Societal, ethical and sustainability aspects . 4 1.5 Outline of the Master Thesis . 5 2 Background and related work 6 2.1 Background . 6 2.1.1 Generative models . 6 2.1.2 3D models . 8 2.2 Related work . 9 2.2.1 2D . 10 2.2.2 3D . 10 2.2.3 User studies . 11 3 Method 13 3.1 Data . 13 3.1.1 Format . 13 3.1.2 Noise functions . 14 3.2 GANs . 15 3.2.1 Network architectures . 16 3.2.2 Objective function . 18 3.3 Distance functions . 19 3.4 Recommendation system . 20 3.5 Evaluation . 21 3.5.1 Quantitative: Distance measure . 21 3.5.2 Qualitative: User study . 22 3.6 Hardware description . 23 v vi CONTENTS 4 Experiments and results 24 4.1 Distance functions . 24 4.2 Noise generalization . 28 4.3 Discriminator strength . 30 4.4 Recommendation system . 31 4.4.1 Realistic system . 32 4.4.2 Balanced system . 34 4.4.3 Similar system . 36 4.4.4 System comparative . 37 4.5 User models . 38 4.6 Qualitative evaluation: User study . 40 4.6.1 Population statistics . 40 4.6.2 Data preprocessing and analysis . 42 4.6.3 Realism experiment . 43 4.6.4 Similarity experiment . 45 4.6.5 Preference experiment . 45 5 Discussion and conclusions 47 5.1 Achievements . 47 5.2 Future work . 48 Bibliography 49 A Complete list of noise functions 53 A.1 Unstructured noise . 53 A.2 Structured noise . 54 B Architecture 56 C User study 58 C.1 Realism experiment . 58 C.2 Similarity experiment . 59 C.3 Preference experiment . 59 C.4 Exit survey . 60 D Additional resources 61 Chapter 1 Introduction Three-dimensional modeling is the process of creating a representation of a surface or object in three dimension via a specialized software where the modeler can create and edit the representation. Another way of creating the surfaces is scanning real-world objects into a point cloud. There are multiple 3D computer graphics software for creating 3D models, each with its own characteristics, tools and render engines. Figure 1.1 shows the User Interface (UI) of two modeling softwares, Blender, as an example of open source (nfhGNU GPLv2+ licence) and Autodesk Maya as a commercial one. (a) Blender interface (b) Maya interface Figure 1.1: User interfaces of different 3D modeling software Currently, 3D modeling is difficult to master. Not everyone can reproduce what they see in a successful way, even less what they imagine. This can be the result of personal limitations, the complexity of working with multiple dimensions, or the intricacy of the 3D modeling software. Movies, many video games and even virtual and mixed reality apps surround us with increasing need for realistic models. Experienced modelers can benefit from a tool that helps them quicken the content creation. Furthermore, with the popularization of 3D printers on a daily basis, even beginners would be able to 1 2 CHAPTER 1. INTRODUCTION create their own natural-looking models. The field of Computer Graphics is not the only one benefiting from advances in the generation of 3D models. Many robotic applications use 3D models to solve problems like interacting with objects. The area of medical imaging also employs 3D models for segmentation of cancer or injuries. The increase in computational power is boosting the research in Deep Learn- ing which, in synergy with generative models, is increasing the amount and qual- ity of 3D Computer-Aided designs (CADs). This data enhancement is, in turn, improving the learning processes and helping achieve better models, adding value to the Machine Learning pipeline. The aim of this Master Thesis is the design and development of an end-to- end recommendation system for 3D models using GANs to generate novel reconstructions from a user input. To the best of the author’s knowledge, no recommendation systems are included in 3D modeling software nor the idea has been researched so far. The decision to use GANs to solve this problem is supported by the preference to reconstruct novel objects and the fact that this method is the state of the art in generation as revealed by the literature study in Section 2. The whole motivation behind this work is outlined in Section 1.2. 1.1 Research question This work addresses the following research question: What are the benefits and limitations of using conditional Generative Adversarial Nets to reconstruct unpolished voxelized models and rec- ommend plausible alternatives? The reconstructions are assessed using Intersection over Union as a quantitative measure and the users’ perception is evaluated from the results of a user study regarding both the level of realism and the similarity with respect to the model entered by the user as measured by forced pair-wise comparison. 1.2 Motivation The ambition behind this work is the creation of a complete end-to-end system that reconstructs 3D models designed by users, in other words, help bringing a preliminary 3D sketch closer to the final result intended by the user when modeling the sketch. CHAPTER 1. INTRODUCTION 3 The reconstructions are guided by three different similarity measures, which make the output follow the distributions of natural 3D models, look like the user models or share features from both natural 3D CADs and the sketch created by the user. This makes it possible to build up a system that uses the reconstructions as recommendations for unfinished or crude models, comparable to predictive text in mobile phones. No previous work has been found that uses end-to-end generation of 3D models to build a recommendation system. The main difference with the closest related work [19] is the lack of an additional network aside from the GAN. As explained in Section 2.2.2 before, Liu, Yu, and Funkhouser [19] project the 3D model into the manifold to obtain a latent vector that is used as input for the GAN at a later time. In this Master Thesis, the 3D models are fed directly to the Generative Adversarial Network. Moreover, the creation of Liu, Yu, and Funkhouser [19] is not meant as a recommendation system, but as a tool that improves the current 3D model expecting an iterative interaction with the user.

Reconstruction and Recommendation of Realistic 3D Models Using Cgans

Image-Based 3D Reconstruction: Neural Networks Vs. Multiview Geometry

Amodal 3D Reconstruction for Robotic Manipulation Via Stability and Connectivity

Stereoscopic Vision System for Reconstruction of 3D Objects

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth Using Stochastic Grammars

3D Scene Reconstruction from Multiple Uncalibrated Views

3D Shape Reconstruction from Vision and Touch

3D Reconstruction Is Not Just a Low-Level Task: Retrospect and Survey

Automatic Reconstruction of Textured 3D Models of Textured 3Dmodels Automatic Reconstruction Dipl.-Ing

Image-Based Synthesis and Re-Synthesis of Viewpoints Guided by 3D Models

3D Reconstruction and Recognition Acknowledgement

Bayesian Reconstruction of 3D Human Motion from Single-Camera Video

Arxiv:2001.05613V2 [Cs.CV] 14 Oct 2020 Mental Results Demonstrate That the Mean Per Joint Position I.E., Parts Or All of the Body Must Not Be Lost at Any Time