Multicamera Real-Time 3D Modeling for Telepresence And
Total Page:16
File Type:pdf, Size:1020Kb
Multicamera Real-Time 3D Modeling for Telepresence and Remote Collaboration Benjamin Petit, Jean-Denis Lesage, Clément Menier, Jérémie Allard, Jean-Sébastien Franco, Bruno Raffin, Edmond Boyer, François Faure To cite this version: Benjamin Petit, Jean-Denis Lesage, Clément Menier, Jérémie Allard, Jean-Sébastien Franco, et al.. Multicamera Real-Time 3D Modeling for Telepresence and Remote Collaboration. International Jour- nal of Digital Multimedia Broadcasting, Hindawi, 2010, Advances in 3DTV: Theory and Practice, 2010, Article ID 247108, 12 p. 10.1155/2010/247108. inria-00436467v2 HAL Id: inria-00436467 https://hal.inria.fr/inria-00436467v2 Submitted on 6 Sep 2010 (v2), last revised 18 Apr 2012 (v3) HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Hindawi Publishing Corporation International Journal of Digital Multimedia Broadcasting Volume 2010, Article ID 247108, 12 pages doi:10.1155/2010/247108 Research Article Multicamera Real-Time 3D Modeling for Telepresence and Remote Collaboration Benjamin Petit,1 Jean-Denis Lesage,2 Clement´ Menier,3 Jer´ emie´ Allard,4 Jean-Sebastien´ Franco,5 Bruno Raffin,6 Edmond Boyer,7 and Franc¸oisFaure7 1 INRIA Grenoble, 655 avenue de l’Europe, 38330 Montbonnot Saint Martin, France 2 Universit´e de Grenoble , LIG, 51 avenue Jean Kuntzmann, 38330 Montbonnot Saint Martin, France 3 4D View Solutions, 655 avenue de l’Europe, 38330 Montbonnot Saint Martin, France 4 INRIA Lille-Nord Europe, LIFL, Parc Scientifique de la Haute Borne, 59650 Villeneuve d’Ascq, France 5 Universit´eBordeaux,LaBRI,INRIASud-Ouest,351coursdelaLib´eration, 33405 Talence, France 6 INRIA Grenoble, LIG, 51 avenue Jean Kuntzmann, 38330 Montbonnot Saint Martin, France 7 Universit´edeGrenoble,LJK,INRIAGrenoble,655avenuedel’Europe,38330MontbonnotSaintMartin,France Correspondence should be addressed to Benjamin Petit, [email protected] Received 1 May 2009; Accepted 28 August 2009 Academic Editor: Xenophon Zabulis Copyright © 2010 Benjamin Petit et al. This is an open access articledistributedundertheCreativeCommonsAttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We present a multicamera real-time 3D modeling system that aims at enabling new immersive and interactive environments. This system, called Grimage, allows to retrieve in real-time a 3D mesh of the observed scene as well as the associated textures. This information enables a strong visual presence of the user into virtual worlds. The 3D shape information is also used to compute collisions and reaction forces with virtual objects, enforcing the mechanical presence of the user in the virtual world. The innovation is a fully integrated system with both immersive and interactive capabilities. It embeds a parallel version of the EPVH modeling algorithm inside a distributed vision pipeline. It also adopts the hierarchical component approach of the FlowVR middleware to enforce software modularity and enable distributed executions. Results show high refresh rates and low latencies obtained by taking advantage of the I/O and computing resources of PC clusters. The applications we have developed demonstrate the quality of the visual and mechanical presence with a single platform and with a dual platform that allows telecollaboration. 1. Introduction this paper, we address these issues and propose a complete framework allowing the full body presence of distant people Teleimmersion is of central importance for the next genera- into a single collaborative and interactive environment. tion of live and interactive 3DTV applications. It refers to the The interest of virtual immersive and collaborative ability to embed persons at different locations into a shared environments arises in a large and diverse set of application virtual environment. In such environments, it is essential to domains, including interactive 3DTV broadcasting, video provide users with a credible sense of 3D telepresence and gaming, social networking, 3D teleconferencing, collabo- interaction capabilities. Several technologies already offer rative manipulation of CAD models for architectural and 3D experiences of real scenes with 3D and sometimes free- industrial processes, remote learning, training, and other viewpoint visualizations, for example, [1–4]. However, live collaborative tasks such as civil infrastructure or crisis man- 3D teleimmersion and interaction across remote sites is still a agement. Such environments strongly depend on their ability challenging goal. The main reason is found in the difficulty to to build a virtualized representation of the scene of interest, build and transmit models that carry enough information for for example, 3D models of users. Most existing systems use such applications. This not only covers visual or transmission 2D representations obtained using mono-camera systems aspects but also the fact that such models need to feed 3D [5–7]. While giving a partially faithful representation of the physical simulations as required for interaction purposes. In user, they do not allow for natural interactions, including 2InternationalJournalofDigitalMultimediaBroadcasting consistent visualization with occlusions, which require 3D being computation tasks. The component hierarchy offers a descriptions. Other systems more suitable for 3D virtual high-level of modularity, simplifying the maintenance and worlds use avatars, as, for instance, massive multiplayer upgrade of the system. The actual degree of parallelism and games analog to Second Life. However, avatars only carry mapping of tasks on the nodes of the target architecture partial information about users and although real-time are inferred during a preprocessing phase from simple data motion capture environments can improve such models and like the list of cameras available. The runtime environment allow for animation, avatars do not yet provide sufficiently transparently takes care of all data transfers between tasks, realistic representations for teleimmersive purposes. being on the same node or not. Embedding the EPVH To improve the sense of presence and realism, models algorithm in a parallel framework enables to reach interactive with both photometric and geometric information should execution times without sacrificing accuracy. Based on this be considered. They yield more realistic representations system we developed several experiments involving one or that include user appearances, motions and even sometimes two modeling platforms. facial expressions. To obtain such 3D human models, In the following, we detail the full pipeline, starting with multicamera systems are often considered. In addition to acquisition steps in Section 2, the parallel EPVH algorithm in appearance, through photometric information, they can Section 3, the textured-model rendering and the mechanical provide a hierarchy of geometric representations from 2D to interactions in Section 4. A collaborative set up between two 3D, including 2D and depth representations, multiple views, 3D-modeling platforms is detailed in Section 6. Section 7 and full 3D geometry. 2D and depth representations are present a few experiments and the associated performance viewpoint dependent and though they enable 3D visualiza- results, before concluding in Section 8. tion [8]and,tosomeextent,free-viewpointvisualization, they are still limited in that respect. Moreover they are 2. A Multicamera Acquisition System not designed for interactions that usually require full shape information instead of partial and discrete representations. To generate real-time 3D content, we first need to acquire 2D Multiple view representations, that is, views from several information. For that purpose we have built an acquisition viewpoints, overcome some of the limitations of 2D and space surrounded by a multicamera vision system. This depth representations. In particular, they increase the free- section will focus on the technical characteristics needed viewpoint capability when used with view interpolation to obtain an image stream from multiple cameras and to techniques, for example, [3, 9, 10]. However, interpolated transform it into suitable information for the 3D-modeling view quality rapidly decreases when new viewpoints distant step, that is, calibrated silhouettes. from the original viewpoints are considered. And similarly to 2D and depth representations, only limited interactions 2.1. Image Acquisition. As described previously, the 3D- can be expected. In contrast, full 3D geometry descriptions modeling method we use is based on images. We thus need allow unconstrained free viewpoints and interactions as they to acquire video streams from digital cameras. Today digital carry more information. They are already used for teleim- cameras are commodity components available from low cost mersion [2, 4, 11, 12]. Nevertheless, existing 3D human webcams to high-end 3-CCD cameras. Images provided by representations, in real-time systems, often have limitations current webcams proved to be of insufficient quality (low