<<

SPATIAL INPUT/DISPLAY CORRESPONDENCE IN A STEREOSCOPIC COMPUTER GRAPHIC WORK STATION

Christopher Schmandt Architecture Machine Group Massachusetts Institute of Technology

ABSTRACT The fields with the most interest in three-dimensional display are those with a An interactive stereoscopic computer need to analyze complex spatially struc- graphic workspace is described. A conven- tured real world data, gathered by or part tional frame store is used for three- of a process requiring expensive data dimensional display, with left/right eye acquisition hardware (hence the willing- views interlaced in and viewed ness to experiment with relatively expen- through PLZT shutter glasses. The video sive displays). Although a variety of nonitor is seen reflected from a half techniques exist for stereoscopic display, silvered mirror which projects the graphics there has been little work done in inter- into a workspace, into which one can reach acting with 3-D data by 3-D modes: using and manipulate the image directly with a motion parallax to look around objects "magic wand". The wand uses a magnetic [Fisher, Littlefield], peeling away inter- six degree-of-freedom digitizer. In an fering data layers, or constructing addi- alternative configuration, a graphics tional data types in image space itself. tablet was placed within the workspace for input intensive tasks.

CR catagories: 1.3.2 []: Graphics Systems; 1.3.6 [Computer Graphics]: Methodology and Techniques; 1.4.8 [Image Processing]: Scene Analysis- Stereo.

Key Words: Stereoscopic display, half silvered mirror, three-dimensional digi- tization. INTRODUCTION

The last decade has seen a resurgence of interest in three-dimensional imaging, such as , commercial 3-D movies, experimental anaglyphic broad- casts, and lenticular 35mm cameras. Some of this enthusiasm has overflowed into computer graphic applications, most note- ably computer aided design, medical imag- ing, and seismic exploration, but wide- spread appreciation of the utility of stereoscopic displays has yet to happen. This may be in part due to the lack of convincing interactive capabilities of such displays.

Permission to copy without fee all or part of this material is granted Figure i. A cutaway view of the work provided that the copies are not made or distributed for direct station, showing the view through the half commercial advantage, the ACM copyright notice and the title of the silvered mirror of the video monitor and publication and its date appear, and notice is given that copying is by user's hand. permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.

© ACM 0-89791-109-1/83/007/0253 $00.75

253 In short, the blossoming of conventional two-dimensional interactive computer graphics has yet to penetrate the three- dimensional world. Such graphic inter- activity requires three components: display technology (e.g., digital frame store), input technology (e.g., tablet, mouse) and algorithms or techniques link- ing the two.

This paper will briefly survey existing three-dimensional display and input technologies. It will then focus on a style of interaction characterized by spatial correspondence between the display and input devices; we desire to build a stereoscopic workspace which will allow a user to reach into the midst of a three- dimensional image and interact with it at exactly that point in space which he is Figure 3. The PLZT glasses and electronic touching. controller box. The workstation display is a conventional color frame buffer using television inter- conditions [Stover, Fuchs]. Two CRTs may lace and special opto-electric shutter be used in a head mounted display, using glasses for stereoscopic viewing. The optics to deliver each image to the monitor is viewed through a half-silvered respective eye [Sutherland, Callahan]. mirror, which projects its image into an Two CRTs may be viewed through orthogonal illuminated workspace (figure i). The pairs of polarizers, as is commonly done user views his hand and the graphics, in movie theater projection, although mixed by the mirror, overlapping in three- crosstalk may be a problem and the viewer dimensional space. Input technology has must keep his head fairly erect. Two included a six degree-of-freedom magnetic images may be mixed on a single CRT as a digitizer and a conventional magneto- color anaglyph and viewed through red and strictive tablet. green/blue filter glasses. Finally, two images may be time-multiplexed on the CRT A series of experiments will be discussed. and viewed through electrically controlled Early experiments exercised the work glasses [Roese]. station, using the magnetic digitizer, in various applications; despite early optimism, numerous limitations on inter- The latter technique was used in our work. action were noticed. After discussing Conventional actually these, we will explore alternative appli- displays two fields sequentially; one cations, with more clearly defined inter- field consists of the even scan lines, the action goals, as well as a second work- other of the odd lines. Using an ordinary station configuration which uses two- frame buffer, the right eye view is drawn dimensional tablet input to build 2½-D on even lines, with the left eye view on databases. odd lines (figure 2). This mixed image is viewed through wafers of a lead lanthanum DISPLAY TECHNOLOGY zirconate titanium (PLZT) ceramic, a very fast electrically triggered shutter Stereopsis is the perception of depth from (figure 3). Electronics detect the video the binocular fusion of the slightly off- vertical interval, alternately opening set views of our left and right eyes. and closing each eye's shutter while the systems must somehow cor- corresponding field is being displayed. rectly generate left and right eye views, Our controller and PLZT glasses were and present them separately, avoiding obtained from Honeywell, with additional crosstalk, to the respective eyes. As glasses from Matsushita. this paper focuses on real-time inter- active computer generated display of In terms of the computer graphics algo- information, we will limit this section to rithms, the depth of an object is a func- various forms of computer controllable tion of the distance between the even and video display [Okoshi]. odd line portions of the image (figure 4). Zero disparity makes the graphic appear on A number of techniques for display of the surface of the screen, as we are three-dimensional video have been devel- accustomed to viewing. When the horizontal oped. Points may be displayed on a CRT position of left and right eye portions in synchronization with a vibrating mir- of the object are different, eye conver- ror, which generates a true three- gence will make the object appear behind dimensional image under ordinary viewing or in front of the screen. This relation- ship can be expressed as:

254 interfaced to our laboratory's machines (Perkin-Elmer 3230's). f~ dx = IO (D~) Three mutually orthogonal dipole coils in the radiator generate a rotating magnetic where dx is image disparity, d the apparent field, which induces currents in the distance of the object from the screen, sensor, a plastic cube about 1.5 cm on Ds the viewer's distance from the screen, edge which also contains a similar array and IO the interocular distance (about of receiver coils (figure 5). The sensor 65mm). is small and light, connected to nearby preamplifiers by a thin, flexible cable. As position is sensed magnetically, there screen are no awkward linkages, and a user's body plane does not interfere with the device (although, as will be discussed below, color monitors do). 4 ..... --_ For the purposes of this work station, a sensor has been mounted at the end of a small wooden "magic wand" which also has 4 ...... ;-x.I a momentary contact bush button switch I I placed within the handle. The wand, switch, and cable are black, while a white D~ a spot on the sensor aids visibility and indicates the "active" portion of the wand. Figure 4. Ocular convergence and depth The radiator is mounted out of sight perception. The left eye sees an object behind the work area. In later versions at Xl on the screen, the right eye sees it of the workspace, a conventional magneto- at Xr, with disparity dx. The object will strictive graphics tablet, also painted be perceived to lie at distance d behind black with a white spot on the puck, was the screen. Note that if dx were negative used for input. the object would appear in front of the screen.

INPUT TECHNOLOGY

Digitizing three dimensional data points for system input has been a more difficult problem than stereo display. One approach uses various forms of mechanical linkages (such as fishing line) connected to measuring devices [Roberts, Clark]. Although high accuracy may be obtained, these linkages limit freedom of movement and are somewhat awkward.

Other techniques use infrared light sources, worn on the body, as transmitters and light sensing hardware as recievers tQ track multiple points in three space Figure 5. The Polhemus radiator and wand, [Burton, Woltring]. Although the commer- with the sensor mounted near its end. cial product (Selspot) is quite expensive, recent work indicates lower cost tracking may be possible in simplified configura- THE WORKSPACE tions [Ginsberg]. The main problem with light transmission is obscuration by the The goal in workspace design was to allow user's body. a style of interaction in which spatial correspondence between the particular We chose a magnetic six degree-of-freedom input and output devices could be main- digitizer developed by Polhemus Naviga- tained. We wished to be able to project tional Sciences [Polhemus, Rabb] ; while the stereoscopic image into an unobscured still fairly expensive, it avoids most space which would allow one to reach into problems of previous technologies and is the projection with the wand and modify very accurate under ordinary circumstances. the graphic by touching where it appeared This digitizer returns the position (x,y, to be. Proper illumination would be and z) and attitude (azimuth, elevation, essential to maintain the optical mix of and roll) of a small sensor relative to a the projected image and the user's hand. larger radiator 40 times per second. Radiator and sensor electronics are con- Prior work, both at MIT and elsewhere, nected to a Nova 1200 minicompuer which is used half-silvered mirrors to mix computer

255 graphics with reality. The various head frame buffer, except that x is mirrored). mounted displays attempted to place The z coordinate is used to determine the graphics in known positions in the environ- left/right disparity with which to draw ment with mirrors mounted immediately in the graphic at that point. front of the viewer's eyes [Vickers, Callahan]. Larger mirrors had been used In practice, this correspondence works to overlay graphics onto keyboards and excellently most of the time and the paint tablets [Knowlton, Lippman]. Our task was actually appears to flow from the end of to extend this concept into three-space. the wand, provided it is not moved quickly (see below). One can draw lines, A 19 inch color CRT is mounted at a 45 spirals, knots, etc.; nearer objects degree angle above and in front of the obscure more distant ones, but in turn operator's head, and viewed through the draw behind still closer lines. half-silvered mirror parallel to the floor (figure 6). Thus the reflected graphics appear to lie in the space in front of Depth perception adjuncts the operator and below the mirror. This space below the mirror is painted black Early in this stage we discovered that and illuminated, so he can see his hand in only an experienced viewer could reliably the midst of the image. In some configur- and comfortably see the proper depth in the PLZT video image. Purely binocular ations this lighting was dimmed under convergence is missing many strong depth computer control. cues such as shadows, textures, perspec- tive, and motion parallax. Of course, The following sections will describe several early applications for which this hidden surface obscuration provides an workspace was used, with the wand as input. obvious depth perception aid, but further Despite the optimism of our early reports cues ease perception greatly. [Schmandt], various limitations interfered with fully satisfactory interaction. In all demonstrations, hidden surface After discussing these flaws, and our removal is done in real time during image attempts to remedy them, we will present generation using false color and arithmetic a second generation of applications which frame buffer write operations. Associated matched the capability of this configura- with each of the 64 discrete depth levels tion more closely. More recent work has are several slots in the color table, one explored alternative input schemes, using for each of the eight displayable colors. a graphics tablet. Distance is encoded in the high order bits of color, increasing farther away from the viewer; while the low order bits determine IMAGE CREATION which particular color is displayed. Graphics are added to the screen by writ- 3-D Paint ing the appropriate value (a function of color and distance) into the frame buffer Early work was aimed at developing spa- under a minimum function; the lesser tial correspondence algorithms in the (i.e., closer) of the new value and current context of a 3-D paint program. Menu value gets written into the display. buttons at the front edge of the work station allowed color selection; when the When additional depth cues are provided, user pressed the button in the wand, depth perception is significantly facili- colored paint would appear in space, at tated. The brightness and size of lines the location of the white spot on the end decrease as they are drawn farther away, of the wand (figure 7), dribbling out as, as exaggerated perspective and lighting the wand was moved. If the positions of effects. We noted that although stereo- the graphic and wand were well correlated, scopic depth perception was improved with no cursor would be necessary, as the white these adjuncts, they did not work well spot on the wand is itself a real world enough in isolation to render the conver- "cursor". gence cues redundant; rather convergence, obscuration, luminance and size all worked Several calculations determine the posi- together to give a strong three-dimensional tion of the paint. First, the digitizer's sense. location, in its own coordinate system, must be rotated, translated, and scaled VLSI design station into the screen coordinate system (actually, that of the projected image of These results were encouraging enough to the screen). This transformation matrix move toward a more practical demonstration, is a function of the user's point of view, using the wand to input vertices of a 2½ so it was derived from a calibration step dimensional polygon database. Such data- at the beginning of a session. Next, the bases were later used to generate anima- line of sight from the eye to the dig- tion on a write-once optical videodisc, itizer is extended to intersect the pro- for viewpoint dependent image work jected screen. This gives an (x,y) [Fisher]. For a practical application, we coordinate in screen space (that of the chose to work with the graphic component

256 of a VLSI integrated circuit design sta- sample window when the sensor is nearer tion. the CRT, and hence more noise prone, but remain disappointed with response time. Integrated circuits consist of approxim- ately rectilinear components ordered Depth judgement spatially in a laminar manner, with much information contained in how such layers As already mentioned, exaggerated drop off overlap spatially. This content lends in brightness and premature foreshortening itself well to the 2-½ D environment that were early additions to aid depth percep- is suited to real-time frame buffer work. tion, an indication that purely binocular As we were concerned with only the graphic depth cues may be insufficient. Although component, VLSI designs of varying complex- this enhances perception of the relative ity were selected from textbooks, and distance of pieces of the image, it entered by several students using the work- remains very demanding to properly deter- space and wand. mine the distance of one's hand relative to the image. Accurate distance judgement INTERACTION LIMITATIONS is crucial in an input task in which dis- crete objects may be logically required to After initial enthusiasm over the ability lie on coincident planes, or when relative to accurately track the wand graphically depth between planes conveys needed infor- in three space wore off, we were con- mation. fronted with several aspects of the work- space which detracted from performance in A partial explanation of this difficulty the chosen applications. is that although image components may obscure each other, they are always vis- Magnetic interference able through the user's hand. We attempted, not particularly successfully, The most disturbing aspect of this hard- to alleviate this problem by allowing ware configuration is magnetic interfer- computer control of the workspace lighting. ence with the Polhemus digitizer caused by When the wand was moved farther away, the the color CRT. Magnetic fields from the lights were dimmed to allow nearer, and monitor disturb the digitizer in two ways: hence brighter, components of the image to the normally Cartesian digitizer coordin- obscure the hand; unfortunately, the light ate space is badly warped, and noise is dimmer response time was poor and the introduced. overall effect novel but unconvincing. Drawing difficulties As the magnetic space warping is tempor- ally constant, it was sufficient to der- ive a single mapping from the warped to a Most users of the VLSI design station com- Cartesian three space. This was done by plained of the difficulty experienced in hand mapping magnetic contour lines in the trying to draw planar objects floating in workspace, and approximating them by space because of the lack of arm support circular sections. Such an approach works and limited hand stability. It was nearly reasonably well except at extremes of the impossible to repeatably draw rectilinear active graphical volume. The unwarped objects with proper right angled corners, coordinates are then transformed, in soft- and required great care to insure that all ware, by a transformation matrix that is vertices of a polygon lay on the same z generated before each session by having plane, aggravated by the depth perception difficulty. the user touch a succession of displayed dots. Several improvements were tried. An Digitizer noise is induced by strong mag- invisible grid of variable resolution is netic field components at the 15.75 kHz used to force input points to land on lower horizontal video frequency. Despite the resolution (x,y) coordinates, making input of orthogonal line segments easier. Sub- addition of a hardware notch filter to sequent vertices were automatically forced attenuate this frequency in the preamp- to lie in the plane of the initial vertex, lifier, noise remains excessive with the a usefual aid only if the first vertex is close proximity of monitor and radiator in indeed at the proper depth. Although the workspace. We placed an ordinary these additions are in fact helpful, most aluminum window screen across the face of users find complex input tasks difficult the monitor (it can be seen in some of and tedious. the photographs), which reduces noise to a level such that software filtering could NEW DIRECTIONS generate useful data. From the work with VLSI design graphics, This additional filtering has the unfor- several new directions emerged due to tunate effect of introducing a lag, or recognition of the above constraints. lack of responsiveness, into the system, User response had indicated that although with the obvious loss of interactivity. the spatial correspondence between graphics We partially compensated for this with an and the wand was a simple and intuitive adaptive filter, which uses a larger mapping, the arrangment was unsuitable for

257 tasks highly oriented toward the input of The tablet is used to input data only on data. the plane defined by its surface. Access to the complete visual volume is provided One follow on approach explored the useful- by moving the entire image towards or away ness of the same work station configuration from the viewer, under control of a soft for tasks more oriented toward interaction "slider" selection graphic, until the with or manipulation of previously input or appropriate image slice corresonds to the generated data. The other path focused on tablet plane. The whole image moves in and overcoming input limitations using a out such that one is always working on a graphics tablet instead of the 3-D digit- level currently displayed with zero dis- izer. Both approaches retained the ster- parity (figure ii). escopic video display viewed through the half-silvered mirror. Moving the entire image in the Z direction is done by shifting the horizontal posi- Data manipulation tion of the odd scan line graphics with respect to the even lines. This is done Efficient graphical input requires a stable by two methods. The odd scan lines can and responsive digitizing technology; these be read out of the frame buffer, then requirements were mutually exclusive in written back with the appropriate new x the workspace, with the wand so close to offset. A coherence table indicating the the monitor. Accuracy could be traded off left and rightmost extent of each line is for speed by modifying the software filter, maintained to minimize the data suggesting use of the wand as a target transfer necessary. selection, rather than positioning, device. A faster method utilizes two complete In the case of complex 3-D data, one may frame buffers, one for the left eye view well need to manipulate or highlight the and the other for the right eye view. The data in order to understand it; interactive two images are mixed into one signal in graphics should assist this. One of the the video domain by the same electronics obvious complexities with a three- that detect interlace to control the dimensional database is the variety of glasses; the vertical interval detector ways objects can obscure each other. Tools selects alternate fields from two video are needed to take scenes apart, rearrange sources. With this configuration, the them, or accent local volumes for more image can be moved very quickly simply detailed analysis or separation. by a horizontal pan by one of the frame buffers. A fairly ambitious implementation of these concepts was attempted in a "Model In addition to greatly increasing the Kit" project [Nisselson], which allows the speed with which the image can be shifted rearrangement of 3-D scenes composed of in and out, this technique can allow pieces of small, digitized pictures of faster image generation and manipulation. architectural landmarks, medical images, Instead of breaking the graphics into an or microscopy slides as primitives (figure even and odd line portion, preventing 9). Using the wand in the same work sta- use of conventional internal frame buffer tion, the user can move or remove objects, primitives (such as "line", "rectangle", call up new primitives from the photo- etc.), each frame buffer maintains a graphic database, add text, and save separate complete image. Left and right periodic "snapshots" of work in progress. eye views can be generated in parallel using the processors in the frame stores. Many prior limitations of the wand as an input device are avoided by this approach. Tablet input has been used in a variety As it is used much more often to "acquire" of styles. The original VLSI design one target from many in the scene, highly task lends itself quite well to this accurate positioning is less crucial, and method, allowing easy input of rectilinear response is quickened by tolerating more graphic components while retaining needed noise with less filtering. stereoscopic viewing. Other work will use the tablet to input vertices of 3-D solids, built and edited as wire frame Tablet input models. Once the final design is satis- factory, slower shaded surface algorithms An alternative input technology was needed generate solid model views of these for such tasks as building the polygonal objects for animations. The higher speed, components of VLSI design. We mounted a high resolution, and noise immune tablet conventional graphics tablet, painted data is much more suited to these graph- black to avoid image crosstalk, inside the ically detailed tasks than the wand. workstation at a 45 degree angle to the floor, such that the image of the CRT screen coincided with the plane of the tablet (figure i0). We desired to retain as much spatial correspondence as possible, with an intuitive 3-D interaction.

258 CONCLUSIONS Ginsberg, C.M. and Maxwell, D. Graphical marionette. Proceedings, SIGGRAPH/SIGART A stereoscopic video display lends itself interdisciplinary workshop, ACM. April, well to interactive computer graphics, 1983 pp. 172-179. literally opening up a new dimension of Knowlton, K.C., Computer displays optical- graphical control. Elegant input tech- ly superimposed on input devices. Bell niques require careful thought to provide Syst. Tech. J. 56:3 (1977). the tools of real use, however. Lippman, A. And seeing through your hand. All work described in this paper exploits Proceedings of the SID, 22:2 (1981) pp. the spatial mapping of the graphic display 103-107. into or over the work area, an arrangement Littlefield, R.J., Stereo and motion in which was found to provide intuitive the display of 3-D scattergrams. Proc., access to the third dimension. This was Engineering Society of Detroit Computer without a doubt the most exciting element Graphics Conference, 1982. of the project. Our work certainly sug- gests that the input configuration and Okoshi~ T. Three Dimensional Imaging support software will need to be developed Techniques. Academic Press, N.Y. 1976. around requirements of each individual Nisselson, J. A model kit: a system for application, and indeed a task area may constructing three-dimensional interactive require several configurations, some for graphic models. M.S. thesis, Dept. of input and others for image manipulation. Arch., MIT 1983. Polhemus Navigational Sciences, Inc. P.O. ACKNOWLEDGEMENTS Box A, Essex Junction, VT. As with many projects at the Architecture Rabb, F.H., Blood, E.B., Steiner, T.O., Machine Group, many people participated in and Jones, H.R. Magnetic position and various aspects of this work. Topping the orientation tracking system. IEEE list are Jim Zamiska, who wrote the most Trans. on Aerospace and Electronic Systems difficult early software, and Scott Fisher Sept. 1979. pp 709-718. for the optics and convergence algorithm. Roberts, L.G. The Lincoln wand. MIT Without their work this project would have Lincoln Laboratories Report, Lexington, never been more than an idea. Eric Hulteen MA. 1966. built the workstation. Additional proqram- ming was done by David Mellinger, Jane Nisselson, Charles Simmons, and Carol Yao. Roese, J.A. and McCleary, L.E., Stereo- Photos are by Scott Fisher and Barry Arons; scopic computer graphics for simulation Carol Boemer drew the first illustration. and modeling, Proc. SIGGRAPH 13:2 (1979). Kathy Coppola assisted with production. pp. 41-47. Orlen Rice of Honeywell graciously loaned Schmandt, C.M., Interactive three- the Laboratory our first pair of PLZT dimensional computer space, Proc., SPIE glasses, where it all began. conference on processing and display of three-dimensional data, Aug. 1982 (in This project was funded by Atari, Inc. and publication). the Defense Advanced Research Projects Vickers, D.L. Sorcerer's apprentice: Agency, Cybernetics Technology Division, headmounted display and wand. Ph.D. thesis under contract MDA 903-81-C-0097. University of Utah, Dept. Elec. Eng:, 1974

Woltring, H.J. and Marsolais, E.B. O~t0- REFERENCES electric (Selspot) gait measurement in two- and three- dimensional space, a Burton, R.P. and Sutherland, I.E. Twinkle preliminary report. Bulletin of Prosthe- box: a three-dimensional computer input tics Research 17:2 (fall, 1980). pp. 46-52 device. NCC 1974, pp. 513-520. Callahan, M.A. A 3-D display head set for personalized computer, M.S. Thesis, Dept. of Arch., MIT 1983. Clark, J.H. Designing surfaces in 3-D. Comm. ACM 19,8 (Aug. 1976) pp. 454-460. Fisher, S.S. Viewpoint dependent imaging: an interactive stereoscopic display, M.S. Thesis, Dept. of Arch., MIT 1981 Fuchs, H., Pizer, S.M., Tsai, L.C., Bloomberg, S.H., and Heinz, E. R., Adding a true 3-D display to a system. IEEE Computer Graphics and Appli- cations, Sept. 1982. pp. 73-78.

259 Figure 2. A VLSI design. Note the use of Figure 8. One of the single eye views of interlace to convey disparity along edges the adjacent VLSI component. of vertical components.

Figure 6. The work station. The graphics are reflected from a monitor at the same Figure 9. Moving architectural icons with height as the user's head. the "Model Kit".

Figure 7. Two views through the mirror, showing the overlap of graphics and the hand/wand. Lines that appear double show the disparity between the left/right eye views (presented on alternate video scan lines).

260 Computer Graphics Volume 17, Number 3 July 1983

Figure i0. (Left and above) The tablet workspace configuration.

Figure ii. (Above, and to the left). Use of the tablet to input data. Rubber band outlines are drawn, and the polygon is filled when a color is selected. The white rectangles to the left, above the color menu, select the Z plane on which to work.

261