Telepresence System Based on Simulated Holographic Display
Total Page:16
File Type:pdf, Size:1020Kb
Telepresence System based on Simulated Holographic Display Diana-Margarita Cordova-Esparza´ 1 Juan R. Terven2 Hugo Jimenez-Hern´ andez´ 3 Ana Herrera-Navarro1 Alberto Vazquez-Cervantes´ 3 Juan-M. Garc´ıa-Huerta3 1 UAQ, Universidad Autonoma´ de Queretaro´ 2 AiFi Inc. 3 CIDESI Abstract any executive, could live almost anywhere on Earth and still do his business through a device like this“. We present a telepresence system based on a custom- Although this is true, thanks to the Internet, current made simulated holographic display that produces a teleconference technology is still far apart from the full 3D model of the remote participants using com- actual feeling of physical co-presence. Applications modity depth sensors. Our display is composed of such as Skype, FaceTime, and GoToMeeting are lim- a video projector and a quadrangular pyramid made ited in the sense that they provide only a 2D view of acrylic, that allows the user to experience an om- of the participants displayed on a flat screen. One nidirectional visualization of a remote person with- solution to this limitation is the use of holographic out the need for head-mounted displays. To obtain displays. a precise representation of the participants, we fuse A Holographic display is a technology that per- together multiple views extracted using a deep back- forms reconstruction of light wavefronts by using ground subtraction method. Our system represents the diffraction of coherent light sources [3]. This an attempt to democratize high-fidelity 3D telepres- kind of displays can create images with 3D optical ence using off-the-shelf components. effects without the need for additional devices such as glasses or head-mounted displays [4]. However, building true holographic displays is costly and re- 1 Introduction quires specialized hardware. For these reasons, there have been many attempts to create simulated holo- arXiv:1804.02343v1 [cs.CV] 6 Apr 2018 Telepresence is the process of reproducing the visual, graphic displays [5, 6, 7, 8, 9]. This refers to using auditory, or perceptual information in a remote loca- more conventional 3D displays that use stereoscopic tion. With this technology, business transportation vision and motion parallax reprojection to approxi- costs, which are estimated to be 1.3 trillion dollars in mate visual cues provided inherently in holographic 2016 and are predicted to rise to 1.6 trillion in 2020 images [3]. in the US alone [1], can be reduced significantly. In this work, we introduce a real-time, full 3D Arthur C. Clarke 1974’s prediction about comput- telepresence system, which uses an array of depth ers have become a reality [2]: “They will make it and color cameras (RGB-D) and simulated holo- possible to live anywhere we like. Any businessman, graphic projection. We extend our existing multi- 1 camera system [10] with the ability to precisely seg- 3D object. The image was displayed on a stereo- ment the foreground objects and project them onto a scopic display with head-tracking. They performed custom-made fake-holographic display using a com- 3D reconstruction to generate a point cloud of the modity projector (Figure 1). users and transmit it over the Internet at two fps. In contrast to traditional telepresence systems, the More recently, with the availability of consumer virtual representation of the remote participant is depth cameras along with color cameras (RGB-D), projected onto an inverted pyramid to simulate a there was an exponential emergence of 3D telepres- holographic effect. The virtual participant is ob- ence systems. Noticeable examples are the work tained from four RGB-D sensors producing a 3D of Maimone et al. [12, 13] with a dynamic telep- image through data fusion and reconstruction. Our resence system composed of multiple Kinect sen- system does not require users to wear any display sors. Beck et al. [14] with the introduction of an im- or tracking equipment, nor a special background. mersive telepresence system that allows distributed We extract the person or objects using a deep fore- groups of users to meet in a shared virtual 3D world, ground segmentation method that precisely segments and Room2Room [15] with a life-size telepresence the person or objects of interest and at the same time system based on projected augmented reality. This reduces the amount of data that needs to be trans- system is capable of understanding the structure of ferred to the remote location where it is rendered. the environment and projecting the remote partici- Figure 1 shows a schematic overview of our sys- pant onto physically plausible locations. tem, which consists of four Kinect V2 cameras Along with depth cameras that are used to sense placed at 2 meters high with a viewpoint change of the environment, head mounted displays (HMD) approximately 90◦, a light projector, and an acrylic have been the desired choice for AR/VR visual- square pyramid as visualization platform. ization. Maimone et al. [16] presented a proof- Our main contribution is a novel end-to-end real- of-concept of a general purpose telepresence sys- time telepresence system capable of rendering peo- tem using optical see-through displays. Xinzhong et ple and objects in full 3D on a simulated holographic al. [17] proposed an immersive telepresence system display. by employing a single RGB-D and an HMD. Lee et The remainder of the paper is organized as fol- al. [18, 19] describe a telepresence platform where lows. In section 2, we review related literature. We a user wearing an HMD can interact with remote describe the methodology in section 3. In section 4, participants that can experience the user’s emotions we evaluate the performance of the system. We con- through a small holographic display, and finally, Mi- clude in section 5. crosoft’s Holoportation [20] represents the first high- quality real-time telepresence system in AR/VR de- vices. They used multi-view active stereo along with 2 Previous work sophisticated spatial audio techniques to sense the environment. The quality of immersion is unprece- One of the earliest works in telepresence is the one dented; however, the amount of high-end hardware from Towels et al. [11] which enabled end-to-end and high-bandwidth requirements makes this system 3D telepresence with an interaction between partic- hard to reproduce. ipants. They employed an array of cameras and a Similar to these works, our system uses multiple pointing device to control and manipulate a shared RGB-D cameras to sense the environment. However, 2 Internet Figure 1: Shows a schematic overview of the experimental setup for the proposed system, which consists of four Kinect V2 cameras, and the holographic display composed of commodity light projector and an acrylic square pyramid. we display the remote participant on a set of projec- nents without the need of obtrusive wearable devices. tion screens to provide a 360◦ simulated holographic effect. 3 Methodology Regarding true holographic telepresence, Blanche et al [21] developed the first example of this tech- In this section, we describe the procedure followed nology using a photorefractive polymer material to to implement our telepresence system based on sim- demonstrate a holographic display. They used 16 ulated holographic projection. Figure 2 shows the cameras taking pictures every second and their sys- steps followed to acquire, reconstruct, and visualize tem can refresh images every two seconds. More re- remote people using our 3D display. We start by de- cently, Dreshaj [3] introduced Holosuite, an imple- scribing the camera calibration approach, followed mentation of an end-to-end 3D telepresence operat- by the foreground extraction method and data fusion. ing on two remote PCs via the internet. It can render visual output to the holographic displays Mark II and Mark IV, as well as on commercial 3D displays such 3.1 Multiple RGB-D cameras calibration as the zSpace [22] with motion parallax reprojection. For calibration, we followed the method from [10] Holosuite uses RGB-D cameras to sense the environ- and it is briefly described below. Note that we as- ment and allows seamless collaboration between the sume that the cameras are fixed in the scene so that participants in real-time. the calibration procedure is done only once. In summary, while previous works address many The first step of camera calibration is image acqui- challenges presented in telepresence systems, and sition. Each RGB-D camera is connected to a sin- the most impressive results require high-end hard- gle computer, and the whole system communicates ware and high bandwidth, our system can render a through a wireless network. However, all the pro- 360◦ volumetric telepresence system on simulated cessing is performed on the main computer that re- holographic display made of off-the-shelf compo- ceives and stores the calibration data. We use a 1D 3 Image acquisition RGB-D camera array RGB-D camera (M1). We wish to solve for Ri and ti such that M1 = Ri × Mi + ti (1) Foreground Convolutional Neural extraction Networks (CNN) where Ri and ti are the rotations and translations applied to each set of points Mi, i 2 f2;3;4g to align them with the reference M1. Point cloud extraction Multiple RGB-D Cameras Calibration Once we estimate the rigid transformations that Intrinsic Parameters align the cameras with the reference, we applied these transformations to the point clouds PCi, i 2 Point cloud fusion Extrinsic Parameters f2;3;4g to align the 3D points from all the cameras into a single coordinate frame. Then we apply Itera- tive Closest Point (ICP) on each aligned point cloud Virtual views generation with the reference to refine the alignment. Using the point cloud alignments, the next step is to gather multiple points for calibration.