The Convergence of Graphics and Vision
Total Page:16
File Type:pdf, Size:1020Kb
The Convergence of Graphics Cover Feature and Vision Approaching similar problems from opposite directions, graphics and vision researchers are reaching a fertile middle ground. The goal is to find the best possible tools for the imagination. This overview describes cutting- edge work, some of which will debut at Siggraph 98. Jed Lengyel t Microsoft Research, the computer vision sampled representations. This use of captured Microsoft and graphics groups used to be on opposite scenes (enhanced by vision research) yields richer Research sides of the building. Now we have offices rendering and modeling methods (for graphics) along the same hallway, and we see each other than methods that synthesize everything from A every day. This reflects the larger trend in our scratch. field as graphics and vision close in on similar problems. • Exploiting temporal and spatial coherence (sim- Computer graphics and computer vision are inverse ilarities in images) via the use of layers and other problems. Traditional computer graphics starts with techniques is boosting runtime performance. input geometric models and produces image se- • The explosion in PC graphics performance is quences. Traditional computer vision starts with input making powerful computational techniques more image sequences and produces geometric models. practical. Lately, there has been a meeting in the middle, and the center—the prize—is to create stunning images in real VISION AND GRAPHICS CROSSOVER: time. IMAGE-BASED RENDERING AND MODELING Vision researchers now work from images back- What are vision and graphics learning from each ward, just as far backward as necessary to create mod- other? Both deal with the image streams that result els that capture a scene without going to full geometric when a real or virtual camera is exposed to the phys- models. Graphics researchers now work with hybrid ical or modeled world. Both can benefit from exploit- geometry and image models. These models use images ing image stream coherence. Both value accurate as partial results, reusing them to take advantage of knowledge of the surface reflectance properties. Both similarities in the image stream. As a graphics benefit from the decomposition of the image stream researcher, I am most interested in the vision tech- into layers. niques that help create and render compelling scenes The overlapping subset of graphics and vision goes as efficiently as possible. by the somewhat unwieldy description of “image- based.” In this article, I use graphics to describe the GOALS AND TRENDS forward problem (image-based rendering) and vision Jim Kajiya, assistant director of Microsoft Research, for the inverse problem (image-based modeling). proposes that the goal for computer graphics is to cre- Inverse problems have been a staple of computer ate the best possible tool for the imagination. graphics from the beginning. These inverse problems Computer graphics today seeks answers to the ques- range from the trivial (mapping user input back to tion, “How do I take the idea in my head and show it model space for interaction) to the extremely difficult to you?” But imagination must be grounded in real- (finding the best path through a controller state space ity. Vision provides the tools needed to take the real to get a desired animation). This article discusses only world back into the virtual. forward and inverse imaging operations (that is, light There are several current trends that make this an transport and projection onto a film plane). exciting time for image synthesis: Image-based rendering and modeling has been a fruitful field for the past several years, and Siggraph • The combined graphics and vision approaches 98 devotes two sessions to the subject. The papers I have a hybrid vigor, much of which stems from discuss later come from those two sessions. 46 Computer 0018-9162/98/$10.00 © 1998 IEEE . Appearance Physically based based Geometric Images models Lumigraph Image Triangle Global and Sprites morphing scan conversion illumination light field Sprites Layered View- Geometric Monte Carlo Planar Texture with depth dependent level Radiosity ray sprites mapping depth image textures of detail tracing Graphics Texture Range data Silhouettes Illuminance Plane- recovery merging to volume estimation Multiview Optical flow Layered Depth map sweep stereo estimation stereo recovery stereo Image Curves to Geometry Reflectance mosaics 3D mesh fitting estimation Vision Figure 1. Graphics Images have been used to increase realism since Ed proper space for interpolation between two images is and vision Catmull and Jim Blinn first described texture mapping in the common coordinate system defined by the syn- techniques, on a from 1974 to 1978. In a 1980 survey, Blinn described thetic camera view.6 That same year, Steven Gortler spectrum from more the potential of combining vision and graphics and his colleagues at Microsoft Research intro- image-based to more techniques (http://research.microsoft.com/~blinn/ duced the lumigraph (http://research.microsoft.com/ physical- or geome- imagproc.htm). Renewed interest in the shared mid- msrsiggraph), and Marc Levoy and Pat Hanrahan of try-based. Traditional dle ground arose just a few years ago. Stanford University introduced light field rendering graphics starts on the The new research area is split into roughly three (http://www-graphics.stanford.edu/papers/light). Both right with geometric overlapping branches. Here I touch on the graphics systems describe a dense sampling of the radiance over models and moves to articles in the field. (A longer bibliography is available a space of viewing rays. the left to make at http://computer.org/computer/co1998/html/r7046. This year, Jonathan Shade and colleagues describe images. Traditional htm.) For the vision perspective, see the survey arti- new image-based representations that allow multiple vision starts on the cles by Sing Bing Kang1 and Zhengyou Zhang.2 Figure depths per pixel,7 and Paul Rademacher and Gary left with images and 1 provides a schematic illustration of the spectrum of Bishop describe a representation that combines pix- moves to the right to these research interests. els from multiple cameras into a single image.8 make geometric mod- els. Lately, there has Image-based rendering Image-based 3D-rendering acceleration been a meeting in the Given a set of images and correspondences between Given a traditional 3D texture-mapped geometric middle. the images, how do you produce an image from a new model, how can you use image caches (also known as point of view? sprites) to increase the frame rate or complexity of the In 1993, Eric Chen and Lance Williams described model? how to interpolate, or flow, images from one frame to In 1992, Steve Molnar and colleagues described the next instead of having to do full 3D rendering.3 hardware to split the rendering of a scene into 3D ren- Chen then introduced QuickTime VR in 1995, show- dering plus 2D compositing with z.9 In 1994, Matthew ing how an environment map sampled from a real Regan and Ronald Pose built inexpensive hardware scene could be warped to give a strong sense of pres- to show how image caches with independent update ence. Independently, Richard Szeliski described the rates exploit the natural coherence in computer graph- use of image mosaics for virtual environments4 in ics scenes.10 1996 and, the following year with Harry Shum, Paolo Maciel and Peter Shirley introduced in 1995 described improved techniques for combining multi- the idea of using image-based “imposters” to replace ple images into a single panoramic image. the underlying geometric models.11 More recently, In 1995, Leonard McMillan and Gary Bishop intro- Jonathan Shade and colleagues7 and, independently, duced the graphics community to plenoptic modeling, Gernot Schaufler,12 describe techniques that use depth which uses the space of viewing rays to calculate the information for better reconstruction. appropriate warp to apply to image samples; they also In 1996, Jay Torborg and Jim Kajiya introduced introduced the term “image-based rendering.”5 In Microsoft’s Talisman architecture at Siggraph. (See 1996, Steven Seitz and Charles Dyer showed that the http://research.microsoft.com/msrsiggraph for elec- July 1998 47 . tronic versions of MSR Siggraph submissions.) example—has much more coherence than the com- Another The following year, John Snyder and I showed bined scene. For vision, partitioning a scene into lay- longstanding trend how to handle dynamic scenes and lighting fac- ers permits the independent analysis of each layer. torization on Talisman, and this year, we Doing so avoids the difficulties in scene analysis that in computer describe how to sort a layered decomposition stem from overlapping layers and occlusion between graphics is to pursue into sprites without splitting.13 layers. For graphics, partitioning a scene into layers coherence wherever allows rendering algorithms to take better advantage Image-based modeling of spatial and temporal coherence. it can be found. Given an input set of images, what is the most efficient representation that will allow rendering 3D rendering with sprites from new points of view? Taking advantage of temporal coherence requires Paul Debevec, Camillo Taylor, and Jitendra storing the partial results of a given frame for use in a Malik enhanced architectural modeling with simple later frame; in other words, trading off memory for modeling primitives aligned to images with vision tech- computation. The partial results are image samples, niques. They added realism