Michael McKenna Three Dimensional Visual Media Laboratory Display Massachusetts Institute of Technology SyStCttlS fOP Vlftlial EnVÍl"OniTientS Cambridge, Massachusetts 02137

David Zeltzer Media Laboratory Research of Electronics Laboratory Abstract Massachusetts Institute of Technology Cambridge, Massachusetts 02139 This paper surveys three-dimensional (3D) visual display technology as it relates to real- time, interactive systems—or virtual environment systems. Five major 3D display types are examined: stereoscopic, lenticular, , slice-stacking, and holographic displays. The major characteristics of each display type are examined, i.e. spatial resolu- tion, depth resolution, field of view, viewing zone, bandwidth, etc. In addition, the cor- responding parameters of the human visual systems are described. The different display systems, as well as the human visual system, are compared in tabular form.

I Introduction

Our sense of visual "depth" is often taken for granted until we encounter a situation in which various depth cues are missing—as when we view a sup- posedly "realistically" rendered, "three-dimensional" image on the face of an ordinary CRT. Yet for many tasks, we can show that the presence or absence of 3D depth cues has important effects on human performance (McWhorter, Hodges, & Rodriguez, 1990; Liu, Stark, & Hirose, 1992). A number of three- dimensional display technologies have been developed, however, and for those virtual environment or teleoperator applications in which must be supported, it is important to provide the appropriate three-dimen- sional display. This paper is a survey of three-dimensional display techniques, with a de- scription and analysis of five major types of three-dimensional imaging systems: stereoscopic displays, lenticulars, parallax barriers, slice-stacking displays, and . These systems are analyzed using a set of criteria to allow quantita- tive comparisons among example displays, and the analyses are geared toward display systems for virtual environments (VEs). This paper does not cover as- pects of VE technology outside of visual displays, except as they relate to the display requirements. For example, there are rendering issues, such as the fidel- ity of lighting models, which are not directly addressed. Also, the input devices necessary to interact with a VE are not discussed—with the exception of head- tracking, required for viewpoint-dependent imaging using stereoscopic dis- plays. There are also a number of other, nonvisual, output devices of use to VEs, such as force-displays and acoustic output, which must be left to other surveys. It is assumed that the reader is familiar with two-dimensional display tech- nology for computer graphic imagery, such as CRT (cathode ray tube) displays raster "frames" and Presence. Volume I. Number 4. Fall 1992 (including calligraphic and displays) and the concepts of e 1993 The Massachusetts Institute of Technology interlaced "fields." Good references on raster CRT displays are Conrac (1985)

McKenna and Zeltzer 421

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 422 PRESENCE: VOLUME I, NUMBER 4

and Foley, van Dam, Feiner, and Hughes (1990), which of 3D display systems. We will first discuss general visual also includes a thorough review of rendering techniques. cues such as brightness and color, as well as the temporal It is important to analyze the task to be accomplished and spatial resolution characteristics of the human visual in order to match the technology to the problem in a system. We will conclude the section with a discussion of cost-effective manner. For example, some tasks may re- depth perception and depth cues. Some of the criteria quire high-resolution displays, or wide-angle views, but are more important for certain kinds of display systems not stereo viewing. The criteria used in this paper should than for others. These differences will be discussed as be able to guide researchers in their selection of an ap- each display system is described. propriate display system for their display requirements. In Section 4, tables are given that compare the at- In addition, it may be possible to develop a general dis- tributes of a set of examples of the different display sys- play type, useful to a wide range of VE tasks. tems and the human visual system, based on the criteria The next section presents the criteria used to examine given below. In this section many of the criteria are ac- the 3D display systems, and relates each of the criteria to companied by a short description, which develops its an aspect of the human visual system. Section 3 provides corresponding table entry. a functional description and analysis of five three-dimen- sional Section 4 a display systems. gives quantitative 2.1 Visual Cues and Display Attributes comparison of the different display systems, in a tabular form. Finally, conclusions are presented in Section 5. 2.1.1 Field of View. Thefield ofview (VOW) of a display measures the angle subtended by the viewing surface from a given observer location. FOV and spatial 2 Criteria for Display Systems resolution are related since a change in the FOV of a dis- play (i.e., enlarging the viewing surface) requires either a The perception of distance is a complex phenome- change in the size or number of pixels. non, involving many mechanisms of the eye as well as The human eye has a very wide visual field. The static the brain. There are many cues, or patterns of stimuli, visualfield is the FOV that is instantaneously seen when that provide us with information about the depth and the eyes are looking straight ahead—over 120° vertically shape of objects in the real world, as well as objects pre- for a single eye, and approximately 180° horizontally for sented to us in images. None of the display systems dis- both eyes, with a 120° overlap between the two eyes. cussed in this paper supports every depth cue, and the Because the FOV is limited by the occluding cheeks, various systems have different shortcomings. Certain brows, and the nose, however, it has a rather irregular cues are better suited to certain tasks or imaging require- shape. The addition of head, neck, and body movements ments, thus certain displays may be more appropriate for allows a full 360° of visual coverage; head and eye move- certain tasks. ments combined can exceed velocities of hundreds of To compare different types of display systems, a set of degrees per second (Rolfe & Staples, 1989). criteria was developed that, in general, relates an aspect of the human visual system to a corresponding aspect of Field ofview. The horizontal x vertical static visual the display system. A few of the criteria arise from tech- field for two eyes is 180° x 120°. Throughout this sur- nological limits, rather than a human visual characteris- vey we will consider a typical workstation display to tic. Using the criteria, quantitative comparisons of dif- measure 33 x 26 cm. When viewed at a "comfortable ferent displays can be made. However, it can be difficult viewing distance" of 46 cm, the display will subtend a to quantitatively compare different types of display sys- horizontal x vertical FOV of roughly 40° x 32°. tems, because display attributes can vary widely within a given display system type. 2.1.2 Spatial Resolution. One of the most com- This section reviews the major output characteristics mon measurements made of 2D displays is spatial résolu-

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 423

tion. With CRT technology, resolution is typically mea- viewing distance of 46 cm, would require a currently sured by the number of pixels that can be displayed in unattainable resolution of 4800 x 3840 pixels. the horizontal and vertical directions. However, resolu- Rates. an tion also measures the size of the pixels: the pitch of a 2.1.3 Refresh and Update To display pixel tells how "wide" or "tall" each pixel is, and the an- apparently stable image, most electronic displays need to gular resolution of a pixel gives the visual angle that the repeatedly redraw or refresh the imagery on the display rate is the pixel subtends, from a particular viewing location. surface many times per second. The refresh at a redraws The Photoreceptors in the human eye are most densely frequency which display its imagery. critical is the threshold above packed in a central area of the retina called thefovea, and fusion frequency (CFF) acuity falls off sharply outside of this region. Measures of which a refreshed image appears steady; displays that refresh below the CFF will to flicker. The CFF is visual acuity, or the spatial resolution of the eye, usually appear on a number of refer to the foveal FOV, which subtends approximately strongly dependent factors, including the of the the ambient 1-2° of the visual field (Davson, 1980). Our eyes are in brightness display, illumination, and the size and location in the visual field of the stimu- constant motion, however, giving the illusion that we lus. For most in room a re- perceive the entire visual field at this foveal resolution. applications average light, fresh rate of 60 Hz will appear flicker-free, and many There are many ways to characterize the manner in workstations refresh their at 60 Hz or which the human eye can resolve detail. Some tests mea- graphics displays To reduce the cost of the refresh while sure visual resolution in terms of response to spatial fre- higher. circuitry still appearing to display flicker free imagery, consumer quencies, such as patterns of light and dark bars or sine TV sets in the United States are interlaced. This means wave gratings (Rogowitz, 1983; Cornsweet, 1970). The that afield—i.e., halfthe scan lines, odd or even—is contrast sensitivity to spatial frequencies of the human drawn each sixtieth of a second, and a full screen image visual system peaks around 2-4 cycles per degree, then is thus redrawn at 30 Hz. TV sets in Europe, however, falls off sharply with increasing frequency. Another im- refresh a field at 50 Hz (full frames are thus refreshed at portant test measures the smallest visual target that the 25 Hz), and may create noticeable flicker—especially in eye can distinguish in terms of the minimum detectable bright conditions, under which the eye is more sensitive of its For normal human subjects, the separation edges. to flicker (Conrac, 1985). smallest visual that can be 50% of the target perceived The update rate is the frequency at which the com- time is 1 min to 30 sec of arc & approximately (Rolfe puter modifies, or updates, the displayed imagery. The Staples, 1989; Okoshi, 1986). perception of apparent motion is complex and not fully Instead of the minimum detectable measuring separa- understood, and thresholds for perceiving smooth mo- tion of of a the vernier test measures dis- edges target, tion of complex synthetic imagery are difficult to mea- For two small needles be placement. example, may sure (Hochberg, 1986). However, a rule used in the end to and are able to detect dis- aligned end, subjects computer graphics industry says that when the update placements of the ends of the needles as small as 5-10 rate drops below about 10-15 Hz, motion will appear sec of arc (Grimson, 1981; Marr, 1982). This ability to discontinuous and become distracting. In interactive detect detail below the size and spacing of the retinal settings, insufficient update rates can introduce unac- receptors is termed hyperacuity. However, for computa- ceptable transport delays, such that the display appears tions involving human visual spatial resolution, we will to lag noticeably behind the control motions of the use the more conservative value of 30 arc-sees. viewer. Since the update rate is largely a function of how quickly the computer image generating system can pro- Spatial resolution. Taking the resolution of the fo- duce frames, rather than an attribute of a display or the veal region of the human eye to be 30 arc-sec, a display human visual system, it will not be discussed further in that matched this resolution over a 40° x 32° FOV, at a this survey.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 424 PRESENCE: VOLUME I , NUMBER 4

Refresh rate. Although the eye's ability to detect flicker varies, we will take the nominal refresh rate to be 60 Hz, since flicker is seldom detectable at this rate un- der normal lighting conditions.

2.1.4 Brightness. CRT and other displays are fairly limited in the range of brightness levels they can display. Typical workstation CRT displays exhibit a brightness range of about 2-8 to 20-60 mL, with a con- trast ratio of approximately 6:1-16:1 (Eckhardt, 1991; Rogowitz, 1983; Seiter, 1992). Framebuffers often quantize the video output levels to 8 bits, yielding 256 gray-scale intensity levels that can be set within the brightness range. The displayed intensity levels for CRT displays are usually nonlinear with respect to the control -5-4-3-2-1 0 1 2 3 4 signal and framebuffer value, requir'mggamma correction Log L (mL) (Foleyetal., 1990). Figure I. Minimum increase in intensity needed for a "typical" human The human eye has a tremendous range of sensitivity observer to discriminate a target from a background intensity. (Adapted to from a few to levels light intensity, ranging photons from Rogowitz, 1983.) of illumination a trillion times as intense (Rogowitz, 1983). The eye cannot operate at its full range simulta- neously, however, but rather adapts to a given ambient active) region of less than 1 mL, however, target/back- lighting condition. Within a particular adaptation, the ground contrast must be increasingly larger as the back- eye is sensitive to a range of approximately 2 log units of ground luminance dims. intensity levels, e.g., from 1 to 100 mL. Adapting to an The overall brightness of a display thus strongly af- increase in light levels occurs quite rapidly, while adapt- fects the visual tasks that the display can support. For ing to darker conditions can take several minutes. example, consider the problem of training pilots to find The ability to detect a visual target at a different lumi- and recognize targets against a cluttered background. If nance from its background depends on the intensity of the flight simulator display cannot match the brightness the background (Davson, 1980; Rogowitz, 1983). Fig- of expected real-world conditions, pilots may not be able ure 1 shows the results of an "incremental threshold" to find targets unless target size or contrast is artificially experiment, in which an observer, adapted to a given enhanced, which may have undesirable training effects. background intensity, tries to discriminate a visual target Brightness also influences visual acuity and color percep- from that background. The test is run over a range of tion (Davson, 1980; Cornsweet, 1970). Since no display background luminances. In the low photopic (cone-ac- can match the range of brightness we experience in daily tive) range of intensities, from approximately 1 to 1000 life, many visual effects simply cannot be portrayed at all mL, the response is roughly linear and flat. Weber's law is given current display technology. The visual require- active in this region, and states that the minimum detect- ments of a given task, therefore, must be carefully able increase in luminance will be a nearly constant pro- matched to display capabilities. portion (in this case, about 2%) of the background lumi- nance. For example, if a target is detectable against a Brightness. The "brightness" table entry is in- given background, the luminance of the target will have tended to compare the brightness efficiency of different to be increased by 10 times to be detectable against a display systems. In this survey, we consider a "typical" background 10 times as intense. In the scotopic (rod- workstation CRT display to have a maximum brightness

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 425

of 50 mL, and we normalize brightness to this value. our eyes do not function well or at all as the ambient Other displays are rated against this normalized value of illumination drops below a few mL. unity. Color. The "color" table entry is intended to com- 2.1.5 Color. In addition to constraints on dis- pare the feasibility of color display using the different playable brightness, no display can match the range of display systems, and to compare the amount of data per colors visible to the healthy human eye. According to the pixel used for display (for information rate computa- tristimulus theory of color perception, we discriminate tion). Suffice it to say that the human visual system is among hues based on the differential stimulation of the capable of detecting colors outside the range of any tri- three kinds of cone photoreceptors in the retina, which color display system. are sensitive to long ("red"), middle ("yellow-green"), and short ("blue") wavelengths. If we had some means 2.1.6 Information Rate and Bandwidth. Given a of stimulating the three kinds of cone cells in the same particular resolution, refresh rate, and number of bits per way they are stimulated in a realworld setting, we ought pixel, the information rate and bandwidth of a particular to be able to reproduce those color sensations. Trichro- display system can be computed. The human visual sys- matic color reproduction tries to do this by using three tem can be said to have an information rate as well. primary colors that are appropriately matched to the sensi- The information rate refers to what rate of data (i.e., tivities of the three kinds of cones. The choice of prima- bits/sec) is needed to drive a display, whereas the band- ries determines the range, orgamut of colors that a dis- width of a display refers to the maximum rate at which play can generate. the signal (pixel values) can change, in other words, the If a subject is shown a test patch of a given colored highest frequency signal that can be displayed. light, and is asked to match the color by varying the in- Just as visual display systems have an information rate tensity of three primary light sources, we find that sub- at which data are being presented, the human visual sys- jects can match many, but not all of the test patches. tem can be said to have an information rate, as well. This is because the sensitivity curves of the three kinds of Based on the number of receptors, and their rate of photoreceptors overlap, such that, for example, a given transmitting neural impulses, the information rate of the "green" primary light source may in fact also stimulate eye can be given as 4.3 Mbits/sec, for the two eyes, with the "red" and "blue" photoreceptors. So to match a a single nerve given as 5 bits/sec (Okoshi, 1976). This is given color patch, we might find that the required a very low information rate when compared to 2 Gbits/ amounts of the "green" and "blue" primaries have also sec for a high-resolution monitor. However, recall that stimulated the red photoreceptors more than necessary. the eyes are "high-resolution" only in the foveal region, This means that we would require a negative amount of which subtends only about 1-2° of the visual field. Also, the "red" primary for a good match—obviously a physi- 4.3 Mbits/sec is the transmission rate to the brain, in- cal impossibility. volving approximately 800,000 nerve links—compared There are many excellent sources on color perception, to 100,000,000 rods and 6,500,000 cones (McAllister, for example (Hunt, 1975; Wyszecki, 1982), and we will Hodges, Robbins, & Noble, 1986). The receptor infor- not go into any detail here. For our purposes, we simply mation is reduced and processed through five layers of note that displays are inherently limited in the range of nerve cells before leaving the eye (Rogowitz, 1983). colors that they can generate. Monochrome displays use only one primary, and can vary intensities of a single 2.1.7 Viewing Zone/Volume Extent. Displays hue. Some displays, such as beam penetration monitors, are limited in the range of locations from which they can use two primaries, and can generate a restricted gamut be viewed; the viewing zone refers to the angular range of colors. Such displays are useful for flight simulators over which the displayed imagery can be perceived. In that generate only night scenes, since the cone cells in addition to viewing zone and field of view ranges, some

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 426 PRESENCE: VOLUME I , NUMBER 4

displays are limited in the nearest and furthest locations their viewing axes on a particular point in space. This is they can image—creating a viewing volume. called convergence. The convergence angle is the angle formed by the two viewing axes. 2.1.8 Number of Views. Most displays are lim- Accommodation and convergence operate together, as ited in the number of distinct views that they can image. the eyes focus and converge on a particular point in In other words, as a viewer moves from side to side, or, space (Okoshi, 1976). Experiments have shown that possibly, up and down, he or she will see different views, when the eyes converge to a certain distance, they auto- providing different perspectives onto the 3D scene. The matically accommodate to that distance (unless voluntar- number of different views that can be seen from different ily "overridden"). Likewise, when the eyes focus to a In locations is limited by the display technology used. certain distance, they converge to that distance, although most cases, the more views which are imaged, the this is a weaker effect. the bandwidth greater required. Accommodation and convergence are somewhat weak depth cues. Accommodation is said to be effective only 2.2 Depth Perception and Depth Cues at distances less than 2 m (Okoshi, 1979). Convergence is effective up to approximately 10 m (Okoshi, 1976). 2.2.1 . Three-dimensional dis- At close distances, the convergence angles change signifi- can be classified into two plays major groups, stereoscopic cantly when the depth changes. At great distances, how- and autostereoscopic. do not Autostereoscopic displays ever, the convergence angles change very little over large aids, such as or require special viewing polarized distances. The change in convergence angles due to a a . on the size of their viewing Depending change in depth has an inverse-square relationship with zone or volume, autostereoscopic displays can viewing the distance. Similarly, focus changes very little at great be seen viewers. In often dis- by multiple addition, they distances. play multiple views, such that viewer motion from side to side (and in some cases, up and down) provides dif- 2.2.3 Binocular Disparity. Due to their displace- ferent viewpoints of the 3D scene. Interactive stereo- ment from each other, each eye sees a slightly different scopic displays can also create multiple views of a 3D image of the same scene. The difference in the retinal scene, but only when the viewer's head motions are images that is due to the projection of object points at tracked. This is explored further in a later section. different depths is known as binocular disparity—a pow- erful cue 2.2.2 Oculomotor Cues. The oculomotor cues physiological depth (Hubel, 1988). Depth per- ception due to binocular disparity is known as . are physiological cues based on our ability to sense the When the fixate and on a in three- tension in the muscles that control eye movement and eyes converge pointy an formed on the foveal area of focus (Goldstein, 1989; Okoshi, 1976). space, imaged point is each reference and in each In accommodation the annular muscles in the eye relax retina, creating points rr r\ Another will to different locations and contract to change the shape of the lens, in order to eye. pointy' image r\ and r\ on each retina. If distance = distance focus on objects ofvarying distances from the eye. We (rr, r'r) (rh then the two will be as be- can sense the accommodation in each eye alone, and it is r'i), points/? and/)' perceived the same distance in But thus a monocular depth cue. However, accommodation ing away three-space. iïp and/?' are at different distances from the then the is more effective when combined with other depth cues, viewer, pro- especially convergence. Display systems that generate jected retinal points r\ and r\ will be at different displace- volumetric images, rather than multiple 2D views, allow ments from the respective reference points on the two the eyes to focus at different depths, providing oculomo- retinas—i.e., distance (r„ r'r) ^ distance (r¡, r\)—and the tor cues to the viewer. two points/? and/?' are perceived as being at diffèrent When we fixate on an object, the eyes rotate to center depths.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 427

Binocular depth resolution. Display systems are lim- Figure 2. Angular disparity of points ited in the number of different depths that can be im- at different depths. = a-9 a to ß aged; in other words, there is resolution the depth. AD = D„ (I) Dß - The causes and limits of resolution are different depth IOD for the different display systems. 2 tan Binocular disparity can be analyzed through the con- atan (IOD/2DJ vergence angles formed from the eyes to different points D„ in space. Points that have unequal convergence angles - are perceived at different depths, within our ability to AD s pje (2) detect the angular difference. Figure 2 depicts such a IOD situation. The in terms of disparity, convergence angles, where IOD is the interocular allows us to compute the difference in depth, when the distance, the distance between the distance to one of the is known. points center of the eyes' pupils. We use an There is a limit on the minimum detectable difference average IOD value of 6.5 cm. in retinal disparity that can be perceived. Using vernier- type experiments, the minimum detectable angular dis- parity between image points on the retina is approxi- sense of Motion can be defined as the mately 10 sec of arc (Okoshi, 1976). Binocular disparity depth. parallax becomes less "accurate" the further the distance—the differential angular velocity of objects at different depths from the observer Close ob- resolution being inversely proportional to the square of (Rolfe & Staples, 1989). move across the visual field more than far the distance (see Fig. 2). Although disparities may be jects rapidly with an inverse between and detectable at values as small as 10 arc-sec, we will use a objects, relationship speed distance Motion is known to more conservative value of 30 arc-sec (1.45 x 10~4 radi- (Goldstein, 1989). parallax ans) for computations involving depth resolution— be a very strong cue for understanding shape and relative relations (Proffitt & Kaiser, 1991). that matching the value of 30 arc-sec we are using for spatial depth Systems as lenticular resolution. provide simultaneous multiple views, such displays, support motion parallax to some degree (de- on the details of the For other Depth resolution. Using 30 arc-sec as the minimum pendent display). display systems, such as CRTs, the viewer's head motion must detectable retinal disparity, we can compute the magni- be tracked to generate new views to the tude of the depth resolution, at a given distance. We will corresponding examine the depth resolution at a distance of 46 cm (18 sampled viewing positions. in.), a "comfortable" viewing distance, used throughout 2.2.5 Pictorial Depth Cues. Pictorial depth cues this survey. Using Eq. (2): are derived from the "planar" retinal image, "assisted by 0.46 m2 1.45 x 10"4 experience and imagination" (Okoshi, 1976). Since they AD = = 4.72 x 10~4 m can be well one or both 0.065 m (3) perceived equally by eyes, they are monocular cues. For a VE workstation, these cues or 0.47 mm. are mostly an issue for rendering—i.e., creating the cor- rect visual information to be displayed. However, these 2.2.4 Motion Parallax. Motion parallax is a cues do influence display system design, to some extent, monocular cue that is generated as the viewpoint of the insofar as they must be displayed correctly. For example, observer changes. By viewing objects from various direc- the system should have enough spatial resolution to ac- tions, and observing how one object moves with respect curately display texture, and enough luminance and to another object (at a different distance) we obtain a color resolution to provide for shading.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 428 PRESENCE: VOLUME I , NUMBER 4

Overlap. When one object partially obscures an- flight simulators support the display of textures (Yan, other object, the blocking object is perceived as being 1985). closer. This cue does not provide a measure of distance, but rather, it gives a sense of relative spacing in depth Aerial perspective. Objects far away in the visual (Goldstein, 1989). field become bluish and hazy, due to atmospheric scat- tering. This is typically a cue that operates over very Image size. The size of the projected retinal image large distances, since a great deal of atmosphere must be of a known object provides strong cues about the dis- traversed to create appreciable scattering. Under extreme tance to the our to object. Size constancy is tendency per- conditions, however, haze, fog, and mist can provide ceive size as the same real objects offamiliar being size, depth cues at short distances, as objects "emerge" from of their retinal size. For when we regardless example, the mist. Simulated haze or light attenuation can be em- observe what to be a we tend to it as appears cat, regard ployed in rendering to provide depth cues, to simulate "cat sized" of how much of our field being independent "realistic" conditions, such as rendered fog or haze in a ofview it occupies. flight simulator, or simply to provide enhanced depth cues for 3D volumes (Farrell & Chistidis, 1989). Linearperspective. As an object recedes in the dis- tance, its apparent size diminishes, and the magnitude Shading. The shading of objects and the shadows and rate of us an indication of relative change give that they cast provide cues about shape and relative dis- Moreover, that are seen in the vi- depth. objects higher tance (Hochberg, 1986). The variation in reflected light sual field are usually perceived as being further away intensity, or shading, of an object informs us about the (Goldstein, 1989). This is related to since perspective, curvature of the object's surface (Woodham, 1984; Min- ground-based objects recede "upward" towards the hori- golla & Todd, 1986). When we make an assumption zon the further away they are. Objects above the hori- about the location of the light source, an object's shadow zon, such as clouds, appear further away the lower they projected onto the ground or on other objects helps us are, again, receding toward the horizon. determine the relative depth ofthat object. Two-dimensional perspective projections are inher- ently ambiguous, since every point on a projection ray 2.2.6 Viewing Situations and Depth Cues. necessarily projects to the same point on the image Different tasks may be accomplished under different plane. We rely heavily on motion—both object motion visual such as visual searches at and motion of the eyepoint (which results in motion conditions, performing near versus far distances. The various cues will parallax)—to help us make sense of visual scenes (Prof- depth have different relative under these fitt & Kaiser, 1991). This is one reason why images pre- importance disparate conditions. sented on head-mounted displays with motion tracking At medium to far distances accommoda- seem so compelling. (over 10m), tion and convergence are ineffective, in that relative cannot be differentiated. It is still Texturegradient. As a textured surface recedes in depths important, absolute indicated conver- the distance, the texture appears smaller and more dense. however, that the depth by Texture gradient is related to linear perspective and im- gence and accommodation match the depth of the scene. age size, since the differential size of the texture provides For example, in most commercial flight simulators the the cue to depth. Texture gradients, especially moving displayed imagery is typically presented at optical infinity textures, are known to provide important cues to object by collimating the light emitted from the displays. (In shape and observer motion (Goldstein, 1989). Indeed, contrast, military simulators often display imagery on without texture cues pilots have difficulty sensing their the inside of large domes.) By thus arranging that light position and orientation. For this reason, all high-end rays from the display all arrive at the observers in paral-

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 429

lei, images are made to appear at far distances rather the noise components present in the two images, binoc- than on the near surfaces of the displays. ular separation increases the overall signal-to-noise ratio. It is possible, however, to significantly override ac- Other aspects of the "apparent image quality" are im- commodation and convergence and retain powerful proved when binocular imagery is used, since retinal depth perception by using other depth cues (especially disparity cues are less dependent on image quality (i.e., binocular disparity). When stereopairs are viewed using resolution, focus, contrast/brightness, etc.) than other a "cross-eyed" or "wall-eyed" stare, accommodation and depth cues (Merritt, 1988). convergence are in extreme conflict with other depth A number of benefits arise when two separate image cues, yet strong depth can be perceived. Nevertheless, sources are employed to generate binocular stereo (Mer- viewing under these conditions is quite uncomfortable ritt, 1988). A wider total field of view can be created and very difficult (or impossible) for some viewers to using two sources when their fields ofview only partially accomplish. overlap (just as the two eyes provide a wider combined At near distances binocular disparity is a very impor- field of view than a single eye). Also, two separate image tant depth cue, if not the dominant cue. At great dis- sources provide a redundancy in the case of partial hard- tances, disparity becomes less important, since the dis- ware failure. parity due to depth change reduces with the square of The surface shading effects of luster, scintillation, and the distance. Still, even at moderately large distances sheen require two differing images. Support for these (hundreds of meters), disparity operates to some degree. effects may be important for certain analysis or inspec- Merritt (1988) discusses a number of situations in tion tasks (Merritt, 1988). which binocular imaging provides (often-overlooked) For the specific task of remotely controlled off-road advantages over 2D imagery. In complex or unfamiliar driving, binocular disparity is important to enhance the scenes, binocular disparity helps, not only in differentiat- perception of the driving-surface slope (Merritt, 1988). ing between the various relative depths of the objects, This is an example of how binocular disparity generally but also in identifying what the objects actually are. This aids in depth perception when the pictorial cues are too is mostly due to the enhanced figure-ground separation ambiguous. provided by binocular disparity. Indeed, powerful depth Pictorial cues are very useful under most conditions. perception can be generated using binocular disparity With still 2D imagery, the pictorial cues are the only only. This is demonstrated very effectively by random dot cues to depth, yet many depth relationships can be deter- stereograms (Grimson, 1981). mined. All of the other depth cues become less effective Binocular disparity is especially important when view- at great distances, leaving pictorial cues to dominate. In ing complex imagery, with many visual elements. In general, the major pictorial cues (linear perspective, im- such cases it can be very difficult to discern the relative age size) should always be supported. Experiments in depths from the pictorial cues alone. For example, a 2D depth perception have shown that the comprehension of close-up image of densely packed leaves in a tree could 3D information is enhanced when "appropriate" per- be very difficult to perceive in depth. Overlap would spective cues are employed (Kim, Tendick, & Stark, help, but mostly for the closest leaves only. Image size 1987). would not help much, since only fragments of leaves Motion parallax is an important cue with complex would be visible through breaks in the foliage. However, imagery—especially when only monocular images are with the addition of binocular disparity, the relative available. Some informal experiments we have per- depths between image features would become rapidly formed show that the addition of motion parallax can apparent. improve the ability of subjects to locate a 3D target in Binocular disparity also improves apparent image space using a 2D display (McKenna, 1992). Although quality, especially useful when low bandwidth or noisy not strictly motion parallax, free observer movement also signals are used. When there is no correlation between enhances the comprehension of a 3D scene by allowing

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 430 PRESENCE: VOLUME I, NUMBER 4

the viewer to "look around" objects and obscuring sur- 3 Three-Dimensional Display Systems faces. At times, it is desirable to observe and study all of the This section covers 3D display technology, espe- data in a volume set, or all of the surfaces in a 3D scene. cially as it relates to virtual environment research. The For example, MRI and CAT-scan data are volumetric in main criterion imposed by virtual environment tasks is nature, and a volume rendering transparently displays all the ability to visually display a computer generated sig- of the information. In this case the pictorial cue oíover- nal, in real time. Other criteria, such as resolution and lap is "disabled." In some ways, it becomes more com- color gamuts, are more task-specific. This section exam- plex to perceive the depth relationships without overlap, ines five major types of 3D display systems—stereoscop- but comprehension of the entire space can be enhanced. ic, lenticular, parallax barrier, slice-stacking, and holo- Volumetric data can also be rendered using surfaces, graphic video—and describes their characteristics in which correspond to boundaries between regions with terms of the criteria developed in Section 2. Examples different densities. Surfaces can then be rendered opaque are presented for each display type—we have maintained (supporting overlap) or transparent. the same bandwidth between the different examples, for Aerial perspective is important when "realistic" condi- comparative purposes. Where possible, the example sys- tions for long-distance viewing are required (for exam- tems have been based on existing commercial or experi- ple, in certain flight simulators). During low-level, high- mental systems, or derivatives of such systems. This is speed flight, the distance and size to long-range targets not intended as an endorsement of the systems, but can be crucial. The degree of atmospheric scattering, for rather, this is intended to ground the examples to cur- a particular region, should be accurately represented dur- rently available technology, and to demonstrate the ways ing training. in which different display types are typically configured. It can also be important to simulate fog and haze real- Note that these systems do not represent the current istically for the training of certain tasks (e.g., flight simu- maximum attainable specifications. lators used for training personnel for hazardous weather conditions). Fog and haze (or simply attenuation with 3.1 Stereoscopic Display distance) can also be useful depth cues, especially when transparent volumes are displayed, because overlap is 3.1.1 Description. First, we will examine basic missing as a cue (Farrell & Chistidis, 1989). Slice-stack- stereographic display technology. A "stereo pair" is gen- ing displays do not support overlap, so attenuation and erated—one 2D image for each eye, corresponding to haze can be useful cues. particularly the view seen by that eye. These 2D views should be For flight simulators, it has been found important to computed using the viewing parameters established by provide objects of known size, and realistic texturing of the observer and the display screen, not only for stereo- the ground surface (Rolfe & Staples, 1989). Objects of scopic displays, but also for each of the display systems familiar size inform the pilot of his or her height, and to be discussed. The view computation must take into textured surfaces aid the pilot in determining the con- account such factors as the correct field of view and tour of the ground surface. Motion parallax is also very placement of the eyes relative to the screen (yielding off- important, and at higher "camera" or observer speeds, it axis projections) (Hodges, 1990). Each image should be becomes increasingly dominant. Stereopsis has not been presented to the eye it is intended for, and so a special employed frequently, except for in-flight refuelling simu- viewer or filtering glasses are used. Viewers, such as lators that involve near to middle distance depth percep- boom-mount and head-mount displays, isolate each im- tion. age optically. Glasses are used to filter a common dis- play, so that each eye sees only its image. Because they require viewing aids, stereoscopic displays are by defini- tion not autostereoscopic.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 431

image of monitors

right-eye monitor monitor

Figure 3. A Wheatstone stereoscope.

Stereoscopic viewers employ two image sources, one cal focus of infinity, are typically preferred for flight sim- for each eye. Figure 3 depicts a simple stereoscopic ulators (Rolfe & Staples, 1989). Infinity optics collimate viewer using two CRT's, and two mirrors. This display the light emitted from each point in the image, so that might be considered an autostereoscopic display, since they form parallel rays. no "viewing aids" are required. However, it is probably A simple stereoscopic viewer employing infinity optics more correct to consider the entire display system as one is shown in Figure 4. Lens or mirror systems are often large stereoscopic viewer, since the eye placement is so used to enlarge small monitors, and to change their ap- restricted. parent distance. It would be impractical to simply pre- Stereoscopic displays such as those in Figure 3 require sent small monitors directly to the eyes, since it would the observer to focus on the screen surface. For 3D im- be difficult and distracting to focus on them at a very agery which is very near the screen plane this is ideal— short distance. Most such optical systems employ an in- accommodation and convergence cues are in agreement. finity focus. Some viewers allow the focus to be adjusted, For 3D objects significantly displaced from the screen to match a desired accommodation, so that as one eye plane, the conflict between accommodation and conver- views the real environment at a given accommodation, gence can be distracting. For example, screen-type flight the other eye views the display at the same accommoda- simulator displays create conflicting depth cues, since the tion. In any case, stereoscopic viewers do not support ocu- displayed imagery is intended to be perceived at great lar accommodation, to match all of the depths in an im- distances, yet the observer must focus on the closer age. screen. Infinity optics, which present imagery at an opti- An example of a stereographic display that uses filter-

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 432 PRESENCE: VOLUME I , NUMBER 4

perceived object

image at °° monitor

filter glasses

Figure 4. A stereoscopic viewer, employing optics. infinity Figure 5. An anaglyph .

ing glasses is shown in Figure 5. A variety of filter types surface. The LCD element polarizes the light emitted can be used; the most simple system employs colored from the screen, and under an electrical charge, rotates transparencies, such as red and green, so that each eye the polarization 90°. Polarizing glasses are worn by the sees only one color. The imagery on the display is gener- observer, with a 90° rotation between the two . ated such that each viewpoint is drawn in the color cor- Like the shutter glasses, the LCD polarization is syn- responding to the eye which should view it. Such an im- chronized to the display, so that a stereo pair is viewed age is called a color anaglyph (McAllister, Hodges, using alternating frames. Robbins, & Noble, 1986). An obvious disadvantage of such a system is that true color imagery cannot be dis- 3.1.2 Spatial Resolution and Field of View. The played. Also, mismatches between filters and display resolution and field of view of a stereoscopic display phosphors, and imperfect filters leads to imageghosting which uses a CRT display are basically those of the CRT or cross-talk, as each eye sees a dim "ghost" of the image itself. Figure 6 shows the FOV and pixel angular resolu- intended for the other. Its advantage is that almost any tion of a stereoscopic display, for a typical viewing setup. standard color display can be used, with lightweight glasses. 3.1.3 Depth Resolution. On a raster CRT dis- Other types of filter glasses are more commonly used play, or any display with a finite spatial resolution, there with computer graphics. Mechanical or electrooptical is an inherent limitation on the number of discrete depth lead lanthanum zirconate titanate (PLZT or LCD) shut- spots that can be imaged, due to a sampling effect from ter glasses alternately block each eye's view of the screen. the pixel size. Because there is a finite width to the pixels, The shutters are synchronized to the display, such that there is a minimum depth range which can be imaged each time a new frame (or field) is displayed, the shutters using a stereo pair comprised of simply two pixels (see switch to allow the other eye to see the screen. The dis- Fig. 7). There is also a limit on the minimum separation play switches the image each frame or field, so that a ste- of depth points that can be imaged by a stereo pair with reo pair is presented to the visual system, over time, finite-sized image elements (see Fig. 8). We will further through the shutters. A variation of this method moves examine the image space formed by sampling picture the shuttering mechanism to the display itself. An LCD elements in Section 3.3, Lenticular Screens (lenticulars and a linear polarizing filter are placed over the screen can form a more complex image space).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 433

7- Minimum that can be a stereo Figure 6. Field of view and pixel angular resolution of a raster display. Fi8ure depth range imaged by pixel pair, imposed by spatial sampling. VW fov = 2 atan (4) ' ^ 2D„ ( '-'screen object/ AD = 2 pitch (6) pitch Q -_ J_ "pixel p, (-V where IOD is the ¡nterocular distance, the distance from pupil center to — screen pupil center, for a given observer. We use an average IOD value of 6.5 where is the horizontal distance between pitch adjacent pixels (e.g.. cm. from pixel center to center).

half while doubling the refresh rate of the display, which 3.1.4 Refresh Rate. Shuttering schemes can in- maintains the same refresh rate for a shuttered eye. Al- troduce or aggravate flicker problems, since only every ternately, the resolution and refresh rate can remain un- other frame is seen by each eye. NTSC monitors can be changed, effectively cutting the refresh rate for a shut- used with such systems, using alternating, interlaced tered eye in half. fields to generate the stereo pair. NTSC fields refresh at a Polarized glasses can also be used with a two monitor rate of 60 Hz, but each eye is only "refreshed" at a rate system. The light from each monitor is polarized, with a of 30 Hz. Flicker becomes quite noticeable at 30 Hz, 90° rotation of polarization between the two screens. and can be distracting. Nonetheless, the imagery is still The screens are optically mixed using a half-silvered mir- easily perceived and fused. Monitors of 120 Hz can be ror. This eliminates the flicker problem if 60-Hz moni- used to update each eye at a rate of 60 Hz, for reduced- tors are used, but the viewing zone is restricted. flicker operation. However, the shuttered 120-Hz moni- tors can create more flicker than a standard 60-Hz moni- 3.1.5 Brightness. When filtering glasses are used, tor, because very short-persistence phosphors must be the brightness to each eye is reduced. Colored filters used so that the last frame does not remain illuminated used for color anaglyphs allow only a narrow band of when the other eye's shutter opens. frequencies to pass, and thus greatly reduce brightness. When using a single monitor with the shuttering sys- When a shuttering system is employed, the brightness to tem, there is a commonly used trade-offbetween resolu- each eye is cut in half over time. In addition, the polar- tion and the refresh rate. To maintain the same band- ized glasses and PLZT or LCD shutter reduce the width of a display, the vertical resolution can be cut in brightness that reaches the eyes significantly.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 434 PRESENCE: VOLUME I, NUMBER 4

Often the refresh rate or the vertical resolution of a dis- play is cut in half, to maintain the same information rate. Color anaglyph systems sacrifice color display, utilizing the different color channels for the different image chan- nels. Using a stereoscopic system, the information rate to each eye is only one half that of the entire display.

3.1.8 Viewing Zone Extent. The viewing zone of stereoscopic displays that employ filtering glasses is basi- cally limited to those regions with a clear view of the display screen. The observer may view from any location in front of the display, although the image can be com- puted correctly only for a single viewing location, at which the viewing parameters established by the viewer's eyes and the display screen match the rendering viewing parameters. In addition, the field of view covered by the screen becomes very small when the viewer moves far off to one side. The viewing zone is very small for an optical system such as that in Figure 4. The eyes must be placed in the IOD correct location to view the imagery. Figure 8. Quantized depth levels, due to sampling the finite-sized 3.1.9 Number of Views. pixel elements. Beyond the viewing distance to the screen, Dscrem, the Stereoscopic displays, distance to the next depth level, d, is determined by the pixel pitch. by definition, image one stereographic "3D" view, com- From similar triangles it can be seen that posed of two 2D images. Interactive stereoscopic dis- described in the also pitch IOD plays, following section, image only ~d~ (7) one 3D view, however, such displays can generate many views of a 3D scene, over time. and that pitch Dscreen 3.1.10 We will Example Stereoscopic Display. d = (8) IOD pitch now examine a dis- - specifically hypothetical stereoscopic play, based on a "typical" workstation CRT display, with an LCD This is 3.1.6 Color. These stereoscopic systems, with the shutter-glasses system. system essentially to a Silicon Iris exception of color anaglyphs, allow for a wide range of equivalent Graphics Indigo computer with Elan and a of colors to be displayed. Typically, an RGB (red, green, Graphics pair StereoGraphics Crystal of and blue) primary CRT or LCD screen is employed. Eyes shutter glasses. The characteristics this example system, as described below, are tabulated in Section 4, in 3.1.7 Information Rate and Bandwidth. The Tables 1-3. information rate and bandwidth of the stereoscopic sys- tems described above are basically that of a similar 2D Spatial resolution. For comparison between the display, except that two images are being generated and different display systems, a "typical" high-resolution displayed, potentially doubling the information rate. monitor will be defined as

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 435

Res = 1280 x 1024 (9) Color. The LCD shutter system does not interfere with color display. Our typical monitor and framebuffer However, our stereoscopic LCD shutter system divides uses RGB primaries, with 8 bits ofresolution per color chan- the vertical resolution into two frames, for shuttering nel (used for information rate computation). purposes. We have divided the vertical resolution, so that depth resolution is maintained, yielding Information rate and bandwidth. For the LCD shutter system: Res = 1280 x 512 (10) Info = XRes • YRes RefRate Bits/Pixel

For a "comfortable distance" of 46 cm (18 in.), • • • viewing = 1280 512 24 120 = 1.9 Gbits/sec (15) and a screen size of 33 x 26 cm, the angular resolution is as given Bw = XRes • YRes RefRate/2

= • • = = 1280 512 cf> 1.9 minutes x 3.8 minutes -» 1.9' x 3.8' (11) 120/2 40 MHz (16) The pixel pitch (distance from the start of one pixel to zone. The LCD shutter allows the the next) is given as Viewing system observer to view from any location in front of the 33 cm 26 cm screen—yielding a viewing zone of approximately 180°. = °025 pitch 1280 -j02¿T cm/pixel (12) 3.2 Interactive Stereoscopic Display and thefield ofview is given as 3.2.1 Description. Autostereoscopic displays FOV = 40° x 32° (13) have two main advantages over simple stereoscopic sys- tems. They do not require viewing aids, and they can provide multiple views of the 3D scene, dependent on Depth resolution. At the viewing distance of 46 cm, the location of the viewer's eyes. A display that presents using Eq. (8), the depth resolution is given as a different image for different observer viewpoints al- 0.00025 0.46 lows the viewer to see more of the scene (to "look = 0.0018 m 0.065 0.00025 (14) around" objects) and to use motion parallax as a depth - cue (Fisher, 1981). or 0.18 cm. Monoscopic and stereoscopic displays can also pro- vide the "viewpoint-dependent" functionality by adding Refresh rate. Our LCD shutter system employs a head-tracking (Fisher, 1981; McKenna et al., 1986). 120-Hz refresh rate, so that each eye is updated at 60 When the head location is known, either through me- Hz. chanical, optical, or magnetic tracking, an appropriate view can be displayed, stereoscopically, of the 3D scene. Brightness. LCD shutter systems, such as the Ste- This requires either real-time image generation or reoGraphics Crystal Eyes, allow approximately 30% of lookup, but such real-time rendering is assumed for a the light through to each eye (McAllister et al., 1986), virtual environment display. Tracking the viewer's head however, the screen is seen only half of the time, reduc- and displaying the appropriate images provide for full ing the overall brightness to 15% (when compared to an motion parallax, in the horizontal as well as vertical di- unshuttered, unfiltered "typical" display). rections. In addition, "roll" motions of the head are sup-

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 436 PRESENCE: VOLUME I, NUMBER 4

experimented with for several years (Sutherland, 1968; Vickers, 1970; Rolfe & Staples, 1989; Fisher et al., 1986), and have recently become commercially available (Blanchard et al., 1990; W. Industries, 1991). Head- mounted displays allow a full 360° of head motions, and typically present a wide field of view to each eye (about 90-120°). With the proper optics, head-mounted dis- plays can mix real world imagery with the displayed ste- reoscopic imagery (Rallison & Schicker, 1991). An alternative to wearing the displays on the head is to mount them on a mobile support. The boom- / hidden objects: Q mounted display is a counter-weighted, stereoscopic dis- that allows a full 360° of motion Ul•*t=v visible objects: O play range (MacDow- all, Bolas, Pieper, Fisher & Humphries, 1990). Measurements of the monitor position and orientation Figure 9. An interactive stereoscopic display. A "window" can be accurately made through the articulations of the is formed by the image/screen plane into the 3D space. The boom, whereas, most head-mounted display systems window both behind and in front the "clips" objects of image employ magnetic tracking, which can exhibit significant plane. When objects which are "projecting" through or are problems with noisy signals and time delay. In addition, in front of the display screen are clipped or cut off, stereo boom-mounted displays can employ high-resolution perception can be greatly diminished, since the objects are which are too and/or to use in seemingly "obscured" by the display boundary, which is monitors, bulky heavy " head-mounted Boom-mounted also further away. This is often termed a "window violation. displays. displays have the of to from" to Although the instantaneously visible 3D objects are limited advantage being easy "step away outside of virtual by the field of view of the screen, there is a very large range perform tasks the environment. of viewing locations. Distortions introduced by the optical elements can be corrected for during the rendering process. Also, the interocular distance (IOD) (the distance from pupil cen- ported, such that the head may tip from side to side and ter to pupil center) for a given viewer must be accom- the stereo pairs are displayed accordingly. modated for when using a stereoscopic viewer, such as a Viewpoint-dependent display systems are commonly head or boom-mounted display, for binocular imagery used with head-mounted displays, but can also be used to be perceived at the correct depths (Robinett & with large, unmoving displays, such as LCD shutter ste- Rolland, 1992). Interactive stereoscopic displays exhibit reoscopic displays, allowing for a very wide viewing the same general properties as noninteractive stereo- zone, with a unique view at every location. The screen scopic displays, with the exception that motion parallax acts much like a "window" into a 3D scene. The range of is supported. We will focus below on the differences be- visible 3D objects is limited by the field of view of the tween a workstation display and a head/boom-mounted display, as is shown in Figure 9. display. A more common method for generating an interactive stereoscopic display is to move the displays along with 3.2.2 Spatial Resolution and Field of View. the head. Small displays are used with optical compo- Since the view is typically very wide using a head- nents, as in Figure 4. The monitors and optics are fitted mounted or boom-mounted display, the pixels of the into a headset to create a head-mounted display. The display subtend appreciable angles. For example, the W. location of the head is tracked, so that appropriate ren- Industries "Visette" head-mounted display has a 90° dered views can be generated. Such systems have been field of view, spanned by 372 horizontal pixels. There-

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 437

fore, each pixel subtends 14.5 arc-min on average. It is Spatial resolution. For the head/boom-mounted possible, of course, to use higher-resolution displays, display we use the same "typical" monitor resolution: especially when using a boom-mounted display; rapid 1280 X 1024. However, optics are used to generate a advancements are being made in compact commercial larger field of view: 90° per eye. The two images only and industrial displays. partially overlap, so that the horizontal FOV is larger for both eyes together: 3.2.3 Depth Resolution. The interactive stereo- FOV = 135° x 90° (17) scopic display systems exhibit the same type of limit on depth resolution due to the sampling effect from the The angular resolution is given as size. The lower resolution of the head/ pixel angular 90° resolution. = = = boom-mounted displays lowers the depth T^r—r 4.2' 0.0012 radians cj> 1280 (18)

Zone Extent. The head- or 3.2.4 Viewing This is a rather coarse resolution, especially in compari- boom-mounted displays require that the viewer's eyes be son to the stereoscopic display described in the previous placed in a restricted location relative to the optics and section. However, the field of view of the head/boom- as was described for the displays, optical stereoscopic mounted display can be easily modified, using different above. in this the is free to viewer However, case, display optics, resulting in a different pixel resolution. The task move along with the head, so that the effective viewing should define the relative importance of angular resolu- zone is 360°, usually within a radius of several feet of tion versus field-of-view. linear movements. Depth resolution. The depth resolution, at the 3.2.5 Number of Views. the interac- Although viewing distance of 46 cm can be computed from Eq. tive two the stereoscopic displays only image views, (8). However, we first need to compute the pitch, which number of different that can be is viewpoints explored would exist if the infinite-focus screen were at our typical limited the and of the only by accuracy range tracking viewing distance of 46 cm: mechanism. pitch = 0.0012 46 cm - 0.056 cm (19) 3.2.6 Example Interactive Stereoscopic 0.00056 0.46 d ° = ° °°40 m Displays. We now describe two examples of interactive 0.065 0.00056 ™ stereoscopic displays. The first is an interactive version of - the stereoscopic LCD shutter glasses display from the or 0.40 cm. previous subsection. The only difference between the All of the autostereoscopic techniques to be discussed interactive and noninteractive displays is that motion below are capable of displaying simple stereo pairs, with parallax is supported in the interactive case. The second the exception of the slice-stacking techniques. However, example is a head/boom-mounted display, which is de- autostereoscopic techniques have the advantage of being scribed in the remainder of this section. able to display, in general, more than two views, so that We will now describe a hypothetical head/boom- as the observer moves from side to side, different views mounted display, which exhibits a fairly high resolution are visible. for such a display, attainable only on high-end commer- cial This is done for systems. comparative purposes—for 3.3 Lenticular Screen example, with the same resolution as a typical worksta- tion display, head-mounted displays typically exhibit a 3.3.1 Description. A lenticular sheet is an array much lower angular pixel resolution, due to their large of cylindrical lenses that can be used to generate an au- FOV. tostereoscopic 3D image by directing different 2D im-

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 438 PRESENCE: VOLUME I, NUMBER 4

Figure 11. Side-lobes. (From Okoshi, 1976.)

a lenticular sheet into different subzones. The lenticular lenses are oriented vertically. Therefore, the lenses direct different horizontal CRT pixels out at different horizon- tal angles. The output angle (6) depends on the lens fo- cal length (f) and the displacement (x) of the imaging point from the center of the lens, as given in the figure. When viewed from a distance, a lenticule will appear to be evenly illuminated by a thin vertical strip on the back screen. As the eye looks across the lenticular sheet, it sees each lenticule illuminated by a thin vertical strip behind it. In the horizontal direction each lenticule acts as one "pixel" (picture element), but in the vertical direction the lenti- cule retains the resolution of the back screen. By estab- a between the of the Figure 10. A lenticular sheet, used with a CRT display (top view). lishing correspondence placement vertical and the subzones into which are Each of the four vertical pixel-strips is imaged into a different viewing strips they pro- zone by the cylindrical lenses, n = index of and f = focal jected, coherent 2D images are seen from each subzone. length: Therefore, when the two eyes view from different sub- an is seen when the f=x/9 (21) zones, autostereoscopic image ap- propriate 2D imagery is formed in each subzone. (Hamasaki, 1980). As the viewer moves from side to side, the eyes "scan" across the back screen behind each lenticule, in the op- ages into viewing subzones. The subzones are imaged out posite direction. When the screen is quantized, as by the at different angles in front of the lenticular sheet. When pixels on a CRT, the eyes enter different viewing sub- an observer places each of his or her eyes in a different zones as the viewer moves. As the observer moves from viewing zone, each eye sees a different image, thus al- subzone to subzone when viewing an autostereoscopic lowing for binocular disparity. 3D image, he or she can use the cue of motion parallax The cylindrical lenses of a lenticular sheet are generally and can "look around" obscuring objects. placed one away from the imaging "back The entire viewing zone is made up ofN subzones, screen" (photographic emulsion, diffusion screen, or created by A7 pixels behind each lenticule. Off to each CRT phosphor plane) so that a point on the screen is side of the main centered viewing zone are duplicates of ideally imaged at infinity (i.e., with parallel emerging the viewing zone, termed "side-lobes" (Okoshi, 1976) rays). Figure 10 depicts the projection of CRT pixels by (see Fig. 11). The side-lobes are formed as the cylindri-

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 439

cal lenses direct the back screen imagery, which lies be- White diffusing screen hind neighboring lenticules. When a viewer moves be- yond the "last" subzone in a viewing zone and begins to enter a side-lobe, she or he will see a pseudoscopic 3D image (with the left and right eye views reversed), until she or he has moved fully into the side-lobe. The side- lobes are more subject to optical distortion, since the lenticules are further off-axis. A number of problems exist with lenticular imaging. First, a very high resolution is needed horizontally to image a large number of views at a high resolution (N views, which each has a horizontal resolution ofXres requires a horizontal resolution ofA7 • Xres). With CRT technology, the pixel size (ultimately, the electron beam spot diameter) limits the upper resolution, and thus the number of views. The bandwidth requirements can also become very large, since N views are displayed. Further- more, N views must be rendered in real time, with the imagery "sliced" and placed into the vertical strips be- hind the lenticules. Another limit on the number of views arises from the imperfect focusing ability of the cylindrical lenses. Lens aberrations and diffraction of the light reduce the direc- of the lenses, so that the focused from the tivity imagery Left back screen does not emerge with parallel rays, but eye Right eye rather spreads with some angle. This spread limits the Figure 12. A lenticular sheet with a diffusing back screen, used with number of subzones that can be differentiated from each multiple projectors. The projected images are directed back at the same other. As we shall see below, however, the horizontal angle, into viewing subzones. (Adapted from Okoshi, 1976.) resolution of a CRT screen limits the number of zones more severely than imperfect lens directivity. Another key issue with lenticular sheet displays is that the back screen imagery must be closely aligned with the rescent stripes at a regular pitch of 1 mm on the inner slits or lenticules. Otherwise the subzone imagery will surface of the display. When the stripes are struck by the not be directed into the appropriate subzone. This is an electron beam they emit light back into the tube cavity, important issue when using CRT technology, because where it is detected by four sensors. This timing signal CRTs are not typically completely flat and linear. Experi- allowed Hamasaki et al. to more accurately place the ments by Hamasaki with lenticular/CRT displays subzone pixels relative to the lenticular sheet. showed that the nonlinearity of the tube introduced in- Lenticular sheets can also be used in conjunction with terference patterns that reduced the field of view (Hama- a diffusing screen and multiple projectors, as depicted in saki, 1980). In later experiments, Hamasaki et al. em- Figure 12. CRT, LCD, laser, or other types of projec- ployed a "Braun tube" to accurately register the imagery tors can be used, and high horizontal resolution can be with the lenticular sheet (Hamasaki, Okada, Utsu- maintained with a large number of views. Such a system nomiya, Vematsu, Takeuchi, Kambayashi, & Shimada, would, of course, require a high bandwidth, since there 1989). The Braun tube incorporates thin (50-txm) fluo- would be N high-resolution projectors.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 440 PRESENCE: VOLUME I, NUMBER 4

Integral photography is a method similar to lenticular is further degraded. Before discussing this topic further, imaging. Instead of cylindrical lenses, integral photogra- we will analyze the image space created by the lenticular phy uses a two-dimensional array of "fly's-eye" lenses sheet more closely. (small spherical lenses). Viewpoints are imaged up and Hamasaki analyzes the lattice of points in the image down, as well as back and forth, requiring either a subdi- space generated by a lenticular sheet (see Figs. 13, 14, vision of the display screen imagery in the vertical and and 15). These points are the quantized 3D locations or a of used horizontal directions, 2D array projectors that can be imaged, due to the finite size of the imaging with a diffusion similar to the shown in screen, setup elements. The/ parameter specifies how many subzones 12. The additional necessitate a vast Figure viewpoints are imaged between the distance spanned by one IOD. more a low- increase in the display bandwidth, or, likely, A higher value of/ will make the visual transition be- of resolution. In addition, the aberrations in these ering tween adjacent views more "smooth" and continuous. of lenses are the ultimate types significant, limiting spa- The image space represented in the figures can be used tial and depth resolutions (Okoshi, 1976). to analyze stereoscopic displays as well, using the param- Lenticular CRT exhibit some of the same displays = = = = eter values N 2, u IOD,J 1, andpitch\ pixel properties as our "typical" 2D display, namely, field of pitch. Another difference between the lenticular and ste- view, refresh rate, and lack of ocular accommodation. reoscopic image spaces is that the viewpoints are fixed in lenticulars, but the view- 3.3.2 Spatial Resolution. The horizontal resolu- horizontally space using are located at the tion as seen from a subzone is the resolution of the len- points always eyes using stereoscopic is then to match the ticular sheet. The horizontal resolution of the CRT back displays. Head-tracking required rendered view to the real locations of the screen (XmCRT) is equal to the number of imaged views eyepoints. Because the lenticules do not direct the (N ) times the horizontal resolution of each view (Xres) : perfectly imag- ery, the subzones may overlap, degrading the depth res- width Xresr olution. The of resolution limitations due Xres = concept depth N (22) pitch \ to spreading is depicted in Figure 16. However because the is the resolu- where width is the width of the screen, andpitch\ is the image space already quantized, depth width of the lenticules. The horizontal angular resolu- tion is not degraded until the blur angle approaches the tion is given as angle subtended by the viewing subzones. To avoid overlap of the subzones, and the subsequent loss of reso- pitch\ the elements to the of the

where is the due to the minimum electron 3.3.3 Depth Resolution. The depth resolution of ae spread beam width as a lenticular/CRT display system is limited in basically (We) (or pixel width), given the same way that the stereoscopic systems are. The fi- nite size of the horizontal imaging elements (the pitch of (30) the lenticular lenses, in this case) limits the depth levels that can be resolved, as depicted in Figure 8. where/is the focal length of the lenticules. The spread Lens aberrations and diffraction limit the ability of the due to lens aberrations is given as a.\. This angular spread lenticules to accurately direct the back screen imagery oflight due to cylindrical/spherical distortion is plotted into the viewing subzones. When the subzones overlap in Figure 17. The spread due to diffraction is given as a¿, due to the spread of the lenticules, the depth resolution and is computed as follows:

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 441

IOD = Ju kz J= integer IOD: J=l

IOD: J=2

Figure 13. Image space geometry of a lenticular sheet display (top view). The viewpoints, shown at the bottom of the figure, represent the centers of the viewing subzones. Rays are drawn from the viewpoints, through the centers of each lenticule. A 3D image space point is formed at the intersection of rays that emerge from two viewpoints, which are separated by one IOD. These image space points are the quantized locations that can be differentiated. In the z direction, the quantized depth levels formed by the ray intersections are indicated with a dotted line. These different levels are indexed with the parameter I. (Adapted from Hamasaki, 1980.) Notice that the J = 1 and J = 2 coses represent a rescaling of the image space in the x direction. There are twice as many depth levels indicated with J = 2, because the lenticular pitch is one-half that of the J = 1 case; the eyes span twice as many lenticules. N = number of viewpoints, u = distance between viewpoints, IOD = interocular distance (inter pupillary), J = number of viewpoints per IOD, pitch/ = width of , Dscreen = distance from viewpoints to screen, and I = index ofquantized depth planes.

= 3.3.4 Lenticulars have oí¿ 2 asin (31) Brightness. fairly good pitchy brightness properties, especially when compared to par- where X. is the wavelength of light (Halliday & Resnick, allax barriers (see the next section). As each eye views a 1978). The example lenticular/CRT system described at lenticular lens, it will see only a fraction of the illumi- the end of this section demonstrates that the diffraction nated screen, but the light emitted from that portion of and aberration spreads are less significant than the spread the screen is collected over a greater angle, and focused due to typical electron beam sizes. toward the eye. However, Hamasaki reports brightness

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 442 PRESENCE: VOLUME I , NUMBER 4

IOD IOD

Figure 14. Quant/zed depth levels, due to sampling the Figure 15. Separation between image space points for the finite-sized lenticular elements. From similar triangles it can pitch, p(l)j for a given depth level. From similar triangles: be seen that P'tch, p(l) (26) I pitch, IOD D ~z(l)^

" (24) Dscreen Z(l) z(l) and and that Pitch,z(l) P(0 (27) IOD D„ 2(0 IOD 4- (25) I pitch, Similarly, the vertical resolution is quantized into a lattice. The pitch between image space points in the vertical direction is given as and contrast problems with the Braun tube system, re- pitchCR7Vz(l) portedly due to inefficiencies in the optical elements and [Vw (28) imperfect registration of the subzone imagery (Hamasa- ki et al., 1989). where pitchCRT/ is the vertical pitch of the CRT pixels.

3.3.5 Color. Color can be problematic with len- arranged vertically, and aligned well, color imaging ticular/CRT displays, because the color phosphors for should be possible (Hamasaki et al., 1989). Monitors, one color "pixel" cannot be distributed horizontally, or such as the "Trinitron" CRT, employ a linear arrange- they will be imaged at different angles, and, therefore, ment of color phosphors, although they are normally into different subzones. If the three color phosphors are arranged horizontally (Yoshida, Ohkoshi, & Miyaoka,

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 443

'object

0.01 t^Bcrecn

0H© Figure 17. Plot of computed angular spread due to cylindrical/spherical lens aberrations. (From Okoshi, 1976.) In this Figure 16. Limitation in depth resolution due to angular spreading diagram, ß^ = a,, and 4>0 = the output angle of the lenticule. (finite directivity) of the parallax barrier or lenticular sheet. (Adapted from Okoshi, 1976.) The depth resolution (AD) is given as

(D,screen ' ^objecÙ AD = 26D, object IOD (32) (see Fig. 18). The angle spanned by a single subzone is given as

8 = 2 atan subzone 2D. D«, (37) 1982). Presumably, at least some of these monitors can be operated on their sides, or they can be designed to The side lobes begin where the main viewing zone ends, operate with vertical stripes. as the lenticules direct imagery behind adjacent lenti- cules. 3.3.6 Information Rate and Bandwidth. The The number of views determines how "smoothly" the bandwidth of a to lenticular sheet display is proportional view will change, as an observer moves from subzone to the number of views and the resolution provided per subzone. The projection of an image space point onto view: the lenticular sheet will jump or "flip" from one lenticule

• to when the viewer subzone. The InfoRate = N Xres Tres bits/pixel RefRate another, changes mag- (33) nitude of the flip depends on the depth of the image and the number of views between one IOD Bandwidth = N Xres Tres RefRate¡2 (34) point (/). Figure 19 diagrams the angle subtended by the "flip." where Hamasaki has determined, through experiments with lenticular that the should be XresCKY = N Xres, TwCRT = Tres (35) still, images, parameter/ equal to at least 2, and preferably 3, to achieve a "good" 3.3.7 Viewing Zone and Number of Views. The autostereoscopic image (Hamasaki, 1980). viewing zone is determined from the span of the viewing 3.3.8 Example Lenticular/CRT Display. For our subzones. For N zones, each u wide, the full viewing lenticular/CRT we will describe a sys- zones is Nu wide. The angle spanned by the viewing example display tem similar to the one Hamasaki et al. us- zone is given as developed by ing the Braun tube, which allows accurate registration of Nu the vertical with the lenticular elements 6vil.w = 2 atan (36) pixel strips (Hamasaki et al., 1989). Their system had 8 views, each

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 444 PRESENCE: VOLUME I , NUMBER 4

viewing sub-zones

Figure 18. Viewing zone extent. The viewing zone width (Nu) is depicted at bottom. We can derive the focal length of the lenticules that is required to generate a desired viewing zone. Recall the focusing viewpoints equation:

IOD = Ju f = (38) 7=2

Assuming that the back screen is fully utilized, then screen imagery Figure 19. Flip angle. In the illustrated example, in which J = spanning the width of one lenticule is directed out over a net angle : 2, an image space point at depth level I = —2 "flips" from one lenticule to a neighboring lenticule. In the J = 1 case, the pitch, f = (39) image space point would have jumped over two lenticules, rather than just one. The number of interocular views controls Nu pitch how "smoothly" the view will change. At a viewpoint, this flip, 2 atan - (40) 2D;r„,„n from one lenticule to the next, can be stated in terms of the angle subtended by the distance between the two lenticule pitch,Dscreen (41) centers, from the viewpoint. This is termed the flip angle, Qf, Nu pitch, — and it is given as (Hamasaki, 1980). I pitch, e,(i) (42)

with a resolution of 256 x 256 pixels. The lenticular sheet had a pitch of approximately 1 mm, and a focal somewhat in our example system, to be more closely length of 2.25 mm. The pitch of the horizontal pixels linked to our other examples. was 0.125 mm, with a minimum electron beam spot size of 0.07 mm. The centers of the viewing subzones were Spatial resolution. Our "typical display" has a spa- separated by approximately 35-40 mm at a viewing dis- tial resolution of 1280 x 1024. We will double the hori- tance of 750 mm. We will modify these parameters zontal resolution and halve the vertical resolution, result-

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 445

ing in a CRT resolution of 2560 x 512, attainable on similar to the Hamasaki system). The viewing zone is Nu very high resolution displays. This retains the same = 26 cm wide and subtends an angle of bandwidth of our system. typical Nu The will 8 views. The resolution = system provide per "view 2 atan , _ *-. 31.6° (48) view is therefore — Each subzone subtends Xres.CRT Res = x Tres = 320 x 512 (43) N u H -= 4.0° subzonc D (49) The screen dimensions are 33 X 26 cm. The pixel pitches are The maximum angle to be directed by the lenticules is therefore approximately 16° (half the viewing zone). 330 mm 260 mm = Okoshi's values in 17, at 16° the pitchda 2560 512 Using computed Figure blur due to spherical aberration is approximately 0.1°. = 0.13 mm x 0.51 mm (44) The focal length of the lenticules can now be computed as well, as in Eqs. (39)-(41): 330 mm = = 1.03 mm pitchx ——— (45) x pitchy /=e = T=L9mm (50) We will use the same viewing distance, 46 cm, that was a lenticular sheet with a used to analyze the other display systems. Because we (Hamasaki's system employed have changed the viewing distance from the Hamasaki focal length of 2.25 mm.) The angle subtended by the minimum-sized electron beam is as system described above (75 cm), other viewing parame- given ters will change accordingly. The screen size and FOV We 0.07 remain the same as in our other example systems. The a' = /" = ~L9~ = 2J° (51) angular resolution, subtended at the eye, is then given as Now we compute the spread due to diffraction: pitch <|> = 7.6' x 3.c (46) X 540 x 10^6 j--= = 2 asin = 2 0-06° ad pitchj^uu asin-T7vî-=1.03 (52) resolution. we can Depth Using Eq. (8), compute It can be easily seen that aberration and diffraction the resolution at the view distance of 46 cm: depth spreads are not on the same order as the electron beam and that 0.001 0.46 spread d = • = 0.0072 m 0.065 0.001 (47) - MVaJ+al < esubzone (53) or 0.72 cm. v/2.12 + 0.12 + 0.062 = 2.1° < 4° (54) We must verify that the depth resolution is not de- graded by the lens aberrations and diffraction, using Eq. is easily satisfied. (29). As with the Hamasaki system, the minimum elec- tron beam diameter is as 0.07 mm. To con- spot given 3.4 Parallax Barrier vert the electron beam size to an angular spread, we need to know the focal length of the lenticules. We also re- 3.4.1 Description. A parallax barrier is a vertical quire the viewing zone measurement to determine the slit plate which is placed in front of a display, simply to spread due to lens aberrations. We will specify a viewing block part of the screen from each eye. A parallax barrier subzone width (u) of 32.5 mm, so that/ = 2 (again, acts much like a lenticular screen, except that it uses bar-

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 446 PRESENCE: VOLUME I , NUMBER 4

Lett-eye ¡mage "^

Right-eye image

Figure 21. A small section of a parallax panoramagram. Many views are directed from Figure 20. A parallax barrier used to present a stereo pair. The slit narrow strips on the image plane through the plate restricts what part of the image plane each eye can see. (From slits to different viewing zones. (From Okoshi, Okoshi, 1976.) 1976.)

riers to obstruct part of the display, rather than lenticules to direct the screen 20 shows a imagery. Figure parallax White diffusing screen barrier stereo pair setup. The screen displays two images, each of which is divided into vertical strips. The strips displayed on the screen alternate between the left and right eye images. Each eye can see only the strips in- tended for it, because of the slit plate. More than two images can be displayed on the screen, to create multiple views from side to side (see Fig. 21). When a CRT mon- itor is used with a parallax barrier, the horizontal resolu- tion is divided the number of 2D views by provided. Projectors monitors can be used to maintain a Multiple projecting i uJ >-* i 1 & ' higher horizontal resolution with a large number of 2 3 4 b views (see Fig. 22). Each projector images a different viewpoint, and the barrier and diffusion screen direct the Left eye Right eye light back to the viewing zones. Figure 22. A parallax barrier- Parallax barriers are not commonly used, because they projection system. (From Okoshi, suffer from several drawbacks (Okoshi, 1976). The dis- 1976.) The slit plate is nonreflective. played imagery is often dim, because the barrier blocks The display screen functions like a most of the light to each eye. Also, with small slit retroreflective screen, in the horizontal direction. widths, the diffraction of light from the slit gap can be- come problematic, as the light rays spread. As discussed above, the CRT imagery must be segmented into strips, as the lenticular sheet display, described earlier [see Eqs. as with a lenticular sheet display. (22)-(23)]. When compared to our "typical" 2D display, a paral- lax barrier display exhibits the same properties of field of 3.4.3 Depth Resolution. In general, the depth view, refresh rate, and lack of ocular accommodation. resolution of a parallax barrier can be analyzed in the same manner as that of a lenticular sheet display [see Fig. 3.4.2 Spatial Resolution. A parallax barrier re- 14, an Eqs. (24)-(25)]. Parallax barriers do not suffer duces the visible horizontal resolution of a raster display due to lens aberrations. However, diffraction is more of by the number of 2D views imaged in the same manner a concern. Due to the small slit widths, diffraction

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 447

spreading can potentially cause a significant degradation rificed (such as vertical resolution or refresh rate) [see of the depth resolution. The angular spread of the light Eqs. (33)-(35)]. passing through a slit ofwidth a is given approximately as 3.4.7 Viewing Zone Extent. The viewing zone is determined by the parallax barrier geometry, in the same manner as the lenticular For N 6 = 2 asin (55) displays. subviewing zones, each of width u, an overall viewing zone ofNu is where X is the wavelength of the light passing through created, with side-lobes off to the sides of the main view- the slit (Halliday & Resnick, 1978). This loss of the di- ing zone (see Fig. 18). rectivity of the parallax barrier imposes a limit on the 3.4.8 Number of Views. Parallax barriers are ca- depth resolution (see Fig. 16). When compared to a len- pable of imaging multiple views. However, each view ticular sheet display, a parallax barrier exhibits more dif- reduces the horizontal resolution and the brightness of fraction because the slit width in a parallax barrier is only the display. It may not be wise to image very many a fraction of the lenticular The slit width, a, of a pitch. views, because the width of the barrier strips will be- parallax barrier is come large in comparison to the width of the illumi- nated slit. When the barriers become too the hori- pitchslit large, N (56) zontal imagery will no longer appear to be continuous. where N the number of and is the dis- is views, pitchslit 3.4.9 Example Parallax Barrier/CRT Display. tance from the start of one slit to the next. Our example parallax barrier display system will be specified to closely match the lenticular sheet display, 3.4.4 Brightness. Parallax barriers reduce the which was, in turn, based on the Hamasaki Braun tube light which reaches each eye, by the ratio of the slit lenticular display, as described in the previous section. width, a, to the barrier slit pitch: Spatial resolution. We will describe a parallax bar- = BQ (57) rier system, similar to our lenticular display, which im- Brightness slit pitch ages 8 views, from a screen resolution of 2560 x 512: where B0 is the brightness of an unblocked screen. Dif- XRes Res = x TRes = 320 x 512 (58) fraction can cause further brightness loss. NumViews 4> = 7.6' x 3.8' (59) 3.4.5 Color. The diffraction from the slits will cause different wavelengths to spread by different Depth resolution. The overall limit to depth resolu- amounts. If the amount of spreading due to diffraction is tion combines the limitations due to sampling [see Fig. significant, color "smearing" will occur. Equation (55) 14 and Eq. (25)] and angular spreading (see Fig. 16), as can be used to analyze the degree of spreading for differ- with the lenticular sheet display. With a barrier slit pitch ent wavelengths of light. given as 33 cm/320 = 0.103 cm, the depth resolution at 46 cm is the same as our lenticular display: 0.72 cm, 3.4.6 Information Rate and Bandwidth. As with since the pitches are the same. The spread due to diffrac- tion is from the width of the slit the lenticular sheet display, the horizontal screen resolu- computed (33 cm/2560 = 0.13 mm). a tion is divided by the number of views, so that the band- Using "green" wavelength: width has to be significantly increased to maintain a high X 540 x 10"6 6e = 2 asin = 2 asin-rrrrz-= 0.48° - (60) visible resolution, or some other parameter must be sac- B a 0.13

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 448 PRESENCE: VOLUME I . NUMBER4

Although the spreading due to diffraction is significantly (McAllister et al., 1986). A common way to create such larger than that exhibited by the similar lenticular display a mirror is acoustically. A 30-Hz acoustic signal is com- (0.06°), the spread is still considerably smaller than the monly used to vibrate a reflective membrane. As the mir- angle defined by the viewing subzone (4.0°), and depth ror vibrates, its focal length changes, and a reflected resolution should not be reduced due to diffraction. monitor will be imaged, over time, in a truncated-pyra- mid viewing volume (see Fig. 23). The mirror continu- Color. Examining the spread of three primaries for ously changes its magnification, so that imagery scanned an RGB CRT display, using a parallax slit width of 0.13 over time (as a CRT operates) will be continually chang- mm: ing in depth (not in discrete "slices"). Calligraphic dis- plays are more appropriate to this display type, since X 640 x 10"6 they can scan any area of the screen at a given time, and 6r = 2 asin = 2 asin-t-tt-= 0.56° a- 0.13 (61) thus, scan at any 3D location. However, calligraphic displays are not well suited to rendering shaded imagery. Another method for a volumetric is X 540 x 10"6 generating image 6e = 2 asin = 2 asin-t-tz-= 0.48° to illuminate a surface with a random access g a- 0.13 (62) rotating light source. Some experimental systems have employed a double illuminated controlled X 440 x 106 spinning helix, by lasers, 6b = 2 asin = 2 asin-——-= 0.39° scanners Trias, & (t- U. 1 ó (63) by acoustooptic (Soltan, Robinson, Dahlke, 1992). To illuminate a specific location in the volume, the laser is timed to strike the helical as reveals that the blue primary will spread approximately surface, it that location. However, we will focus 70% as much as the red. This will cause some chromatic passes through the discussion and our on the CRT/varifocal separation, although it may be within acceptable limits. exmaple mirror system. From the viewing distance of 46 cm, red will bend ap- Slice-stacking methods trace out a luminous volume, proximately 0.45 cm off axis, while blue bends 0.31 cm. such that are and obscured The difference of 0.14 cm between the red and blue im- objects transparent, normally objects, further in cannot be hidden. This can agery will likely be noticeable as "color smear." depth, be ideal for volumetric data sets and solid modeling problems, but is poorly suited to "photographic" or real- zone. The zone and subzones are Viewing viewing istic images with hidden surfaces. The addition of head- defined in the same manner as the example lenticular tracking would allow hidden surfaces to be approxi- sheet display [see Eqs. (48)-(49)]. mately removed in the rendering step, for one viewer. Not all surfaces can be correctly rendered, however, be- cause the two eyes view from differing locations; each 3.5 Slice-Stacking eye should see some surfaces which are obscured to the 3.5.1 Description. "Slice-stacking" refers to other. building up a 3D volume by layering 2D images (slices). Slice-stacking displays are also referred to as 3.5.2 Spatial Resolution. The spatial resolution "multiplanar" displays. Just as a spinning line of LEDs exhibited by a slice-stacking, varifocal mirror display is can perceptually create a planar image, a rotating plane the same as its underlying 2D display resolution. Due to of LEDs can create a volumetric image. A similar vol- the continuously changing depth of the image, and the ume can be scanned using CRT displays and moving limited bandwidth of the display, a limited amount of mirrors. Rather than using a planar mirror, which would information can be displayed at a given depth range. have to move over a large displacement at a high fre- This is discussed further in Section 3.5.4, Depth Resolu- quency, a variable-focus, or varifocal, mirror can be used tion, below.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 449

at least slightly longer than the time it takes to draw the next "pixel," reducing the depth spots to

Bandwidth • 2 PixelDrawTime " " (65) RefreshRate DecayTime 1 RefreshRate • DecayTime Mirror 3.5.5 Ocular Accommodation. Slice-stacking displays support ocular accommodation, the first display type discussed thus far to do so. Slice-stacking dis- CRT screen plays actually image points in 3D space, either di- rectly (for example, with a spinning plane of LEDs) or optically (for example, by reflecting offof a varifocal mirror).

3.5.6 Refresh Rate. The refresh rate of varifocal Figure 23. A slice-stacking display, using a slice-stacking displays is twice the frequency of their vi- variable focus mirror. As the mirror changes its bration. Typically, a 30-Hz acoustic signal drives the magnification, the reflection of the CRT screen mirror, and the 3D image is scanned on both the changes in apparent depth. "inward" and "outward" deformations of the mirror, yielding a 60-Hz refresh rate. This is matched to a 60-Hz CRT display. Note, however, that the imagery must be scanned in order for the "inward" and 3.5.3 Field of View. The field ofview of a varifo- opposite "outward" passes. cal display is essentially that of its 2D display monitor. It should be noted, however, that varifocal mirrors are cur- 3.5.7 Brightness. Because the depth of the re- limited to 20 in., due to acoustic rently approximately flected CRT is continually changing with the varifocal and mirror characteristics, which limits the potential mirror display, short-persistence phosphors must be FOV (McAllister et al., 1986). used to prevent smearing of the image in depth. Com- pared to our "typical" 2D display, the brightness is 3.5.4 Depth Resolution. Because the depth of somewhat reduced, since the phosphors are not illumi- the reflected CRT is continually changing, using a vari- nated as long. focal mirror slice-stacking display, a very fine resolution of can be The number of depth spots potentially imaged. 3.5.8 Color. As just mentioned, short persistence discrete that are is limited the band- depths imaged by are to smear as the mirror width of the CRT and the of the phosphors required, prevent display persistence changes its magnification, using vari-focal displays. Un- phosphors. Based simply on the bandwidth, the varifocal fortunately, phosphors of short enough duration are can every at a different as display image "piscel" depth, available in green (McAllister et al., in currently only 1986).

Bandwidth 2 = 3.5.9 Information Rate and Bandwidth. The DePthSP°tS RefreshRate (64) bandwidth of slice-stacking displays is highly dependent More realistically, the persistence of the phosphor will be on how much imagery is to be drawn within a given

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 450 PRESENCE: VOLUME I , NUMBER 4

depth range. It is possible to use a slice-stacking display of a "typical" bandwidth (40 MHz), however, this pre- cludes complex, shaded imagery (too much information 33cm per "slice").

3.5.10 Viewing Zone/Volume Extent. The view- zone is limited the of the ing by position display CRT, Mirror- using a varifocal mirror system, when the CRT occupies ,AsH 50cm wide in front of the as shown in 24. A space mirror, Figure Th=3.9mm beam could be used to move the CRT down be- ' splitter viewing' 46cm low the vari-focal mirror, so that the CRT would not obstruct the viewing zone, but this would lower the CRT screen- brightness by at least 75%. 33cm wide Slice-stacking systems create a viewing volume, with a limit on the near and far points that can be imaged. Varifocal mirrors have a leverage of approximately 85, which means that a movement of distance h of the mir- ror will create a movement of 85¿? in the reflected imag- Figure 24. A varifocal mirror, slice-stacking display The zone is limited the ery. Because the magnification of the mirror changes, the system. viewing by placement of the CRT. size of the reflected CRT changes as well, so that a trun- cated pyramid volume is imaged, rather than a rectangu- lar volume (McAllister et al., 1986). the maximum which can be Other slice-stacking or volumetric displays can pro- pute possible depth spots within its view volume: vide larger viewing zones. For example, a double-helix imaged laser scanning display can provide a full 360° around the Bandwidth 2 « = L3 X ^ display, and 180° or more vertically. DePthSP°tS RefreshRate (66> However, in the above calculation, we have assumed an 3.5.11 Number of Views. Because slice-stacking unrealistically short phosphor decay time. Using a more create a 3D volumetric the number of displays image, realistic decay time, we will recompute the depth resolu- different views is unlimited (within the view- essentially tion. One fast phosphor is the "P-46" green phosphor, zone, of course). Horizontal and vertical are ing parallax which has a decay time under 1 ixsec (McAllister et al., supported. 1986). Using Eq. (65), we can compute the number of depth spots imaged within the viewing volume: 3.5.12 Example Slice-Stacking Display. Spatial 1 1 resolution. Although a calligraphic CRT is employed RefreshRate • 60 1 x 10~6 with our example slice-stacking display for purposes of DecayTime we same comparison, will give it the angular resolution: = 16,667 (67) 1.9 arc-min, and same effective "pixel" count: 1280 x 1024. From the center of the viewing volume, which is given as 33 cm deep, the distance between adjacent "pixels" in depth is given as Depth resolution. Using a vari-focal mirror display, and assuming that the phosphor persistence is extremely depth 33 0.002 cm (68) (and unrealistically) short, we can use Eq. (64) to com- DepthSpots 16,667

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 4SI

This resolution is significantly higher than human depth Jepsen, Kollin, Yoshikawa, & Underkoffkr, 1990; Ben- resolution, however, note that the 16,667 imaged points ton, 1991). Because the holographic "signal" can be are all that the system can display. scanned in real-time, and potentially broadcast, this sys- tem is referred to as "holographic video" by its creators. Viewing zone. Using our "comfortable" viewing We will refer to this prototype display as the "MIT distance of 46 cm as the center of the viewing volume, system." and a 50-cm (20-in.) mirror, a viewing zone of approxi- The information contained in a hologram with di- mately 30° is created on each side (and above and below) mensions of 100 by 100 mm, with a viewing angle of the CRT (see Fig. 24). 30° corresponds to approximately 25 Gbytes (25 billion samples), well beyond the range of current technology Viewing volume. We will choose a total depth of 33 to update at frame rates (St. Hilaire et al., 1990). Benton cm, to match the CRT width of 33 cm. With a varifocal et al. address this problem in the MIT system by reduc- mirror of leverage 85: ing the information rate in three ways—by eliminating vertical parallax (saving several orders of magnitude), by VolumeDepth = k h = 85 • 0.39 = 33 cm (69) limiting the viewing zone to approximately 1 Io (wider angles require higher spatial frequency diffraction pat- so that the mirror must move ±3.9 mm. terns), and by limiting the image size. The diffraction patterns for a frame are computed on a 3.6 Computer-Generated Holography super-computer (Connection Machine II) in under 5 sec, for fairly simple objects composed of luminous 3.6.1 Description. Computer-generated (CG) points. The hologram is stored in a high-resolution fall under two main CG stereo- holograms categories, frame buffer (approximately 6 Mbytes per frame) and is and CG diffraction CG are grams patterns. stereograms transmitted to a high-bandwidth acoustooptical modula- recorded from a set of 2D views of a 3D scene. optically, tor (AOM). The AOM modulates a coherent light The final each 2D into a view- hologram projects image source to create the 3D image. Both monochrome and ing zone, and stereo views can be seen, with horizontal tricolor displays have been demonstrated. parallax (Benton, 1982). Full-color, high-resolution im- have been as well as wide field-of- ages generated, large, 3.6.2 Spatial Resolution. A very high-horizontal view This is a non-real time tech- holograms. imaging resolution is needed to generate the diffraction patterns, however; it off-line A nique, requires recording. large but the vertical resolution can be set to more "typical" amount of information is needed to the holo- generate resolutions, ideally matching the resolution of the eye, as since view must gram well, every (typically 100-300) or the resolution of the data to be displayed. The high be generated. horizontal resolution is not the resolution of the dis- Rather than record a set of 2D views holographically, played holographic image, however, it does determine a true diffraction pattern can be computer generated. that resolution. The horizontal resolution of the imaged When the will create a 3D wave- illuminated, hologram points is diffraction limited, and is beyond human per- 3D and sources in front, imaging objects light space ceptual limits (St. Hilaire et al., 1990). (Tricóles, 1987; Dallas, 1980). The methods used to compute the diffraction patterns are complex, and com- 3.6.3 Field of View. The FOV of the holographic Until CG putationally expensive. recently, holograms display is limited not only by the size of the display, but had to be recorded or using plotter printing techniques, also by the spatial resolution of the diffraction pattern. as an off-line A new allows a process. method, however, Light is diffracted from a grating by an angle of holographic image to be displayed in real-time, from a fast frame buffer storage (St. Hilaire, Benton, Lucente, 6 = asin (/• X) (70)

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 452 PRESENCE: VOLUME I , NUMBER 4

where/is the frequency of the diffraction pattern. 8 hologram with a viewing angle of 9, the diffraction pat- must be great enough to diffract the light from the tern frequency is given as outermost of the to the location. edges display viewing sin 6 /-— (71) 3.6.4 Depth Resolution. The resolution of the depth is limited by diffraction, and is beyond human At least twice as many samples are required to accurately perceptual capabilities (as is the spatial resolution). represent that frequency. For a horizontal-only parallax hologram of width w, with a vertical resolution off: 3.6.5 Ocular Accommodation. Because points 2 • w • v ' sin 8 are in =- actually imaged space, holographic displays support samples X (72) ocular accommodation. The only other display systems that have been discussed in this survey which support To create a display similar to our "typical monitor, in focus are slice-stacking displays. field of view, refresh rate, and vertical resolution, but with much greater spatial and depth resolution: 3.6.6 Refresh Rate. The video holographic sys- 2 330 mm 1024 • sin 40°

tem refreshes in a manner to CRT samplesv =-——-—:- analogous displays, 620 x 10"6 mm redrawing the imagery at a given frequency. The MIT system refreshes at 36 Hz, reportedly exhibiting little = 7 x 108 (73) flicker, the is viewed under although system typically 8 bits and three color channels: dark ambient lighting conditions. providing /sample InfoRate = 7 x 108 • 8 • 3 60 Hz bits/sample 3.6.7 Brightness. The holographic video has = 1 x 1012 (74) good brightness and contrast properties, using only a low-power laser (a few milliwatts). Brightness can easily or 1000 Gbits/sec! be increased by substituting a higher power laser. 3.6.10 Viewing Zone Extent. The viewing zone 3.6.8 Color. Both monochromatic and color angle is determined by the frequency of the diffraction holographic video displays have been demonstrated. In pattern, as in Eq. (70). The viewing volume is limited as the MIT prototype system, a trade-off was made be- well. The MIT system's depth range is limited to approx- tween color (RGB channels) and vertical resolution. To imately 100 mm, due to limits in the framebuffer output use the same information rate, the vertical resolution was circuitry. A more recent MIT system has a depth of over divided by three, and two more color channels were 1 m, although the perceptual depth is limited to a few added. Because laser primaries can be very pure (essen- hundred millimeters by astigmatism—the result of using tially one single wavelength), a very large color gamut horizontal parallax only (Benton, 1991). can be achieved in such a system. 3.6.11 Number of Views. The MIT holographic 3.6.9 Information Rate and Bandwidth. Com- video system provides many views from side to side pared to other systems, the holographic video display (perceptually limited), but no vertical parallax. requires a very high bandwidth (for the same field of view and viewing zone). This is due to the fact that a 3.6.12 Example Holographic Video Display. very great number of samples are required to generate The example system we will describe is based on the the high spatial-frequency diffraction patterns. For a MIT prototype holographic video display.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 453

Spatial resolution. The MIT system has been tested Color. As discussed, both monochrome and color with a display size of 50 x 50 mm, with a frame buffer systems have been demonstrated. The color system sacri- resolution of fices vertical resolution for color, retaining the same overall information rate. Res = 32,000 x 192 (75) for the monochrome system, and Bandwidth. The 192 line display actually employs 3 channels for transmission of the signal, yielding 37 Res = 32,000 x 64 (76) MHz x 3. The color system also employs 3 channels, for the color system. each of 64 lines: 37 MHz x 3. The horizontal resolution of the displayed imagery is beyond human perceptual capabilities. With a viewing Viewing zone. The viewing zone is limited by the maximum diffraction The zone of the distance of 46 cm, the vertical angular resolution is given angle. viewing as MIT system is given as

= = x = *vcr, = 1.9' (77) 9 asin (fk) asin (320 620 10"6) 11.4° (81) for the monochrome system, and The near and far depth is limited to approximately 100 vert = 5.8' (78) mm. for the color system. 4. Quantitative Comparison Field ofview. Using a display size of 50 x 50 mm, 4.1 and a viewing distance of 46 cm: Display System Characteristics Tables 1 and 2 provide a quantitative comparison FOV = 2 atan \—A = 6.2° (79) of the display characteristics of the different example 3D display systems that were developed throughout Section The diffraction can of: pattern generate angles 3. The example display systems do not represent the / 32,000 \ maximum currently attainable specifications. They are e = nm =11.4° asiriT—^-620 (80) intended as of the in which 3D \2 50 mm / examples ways displays are typically configured. The bandwidths of the diffèrent so that the FOV is well within the of the diffracted range systems have been chosen to be identical (the holo- light. graphic display's bandwidth is slightly smaller) to have a better basis of comparison. In most cases, the display Depth resolution. The depth resolution of the holo- systems could easily be configured with different display graphic image is beyond human perceptual limits. At a characteristics, using the trade-offs discussed in the text. viewing distance of 46 cm, the human depth resolution The table entries are based on the criteria from Section 2 was previously calculated as 0.47 mm. and example systems described in Section 3.

Brightness. A measure of the brightness of the 4.2 Display Systems and Depth Cues MIT holographic video display is not provided. How- ever, brightness is not a problem, since a low-power la- Table 3 categorizes the ability of the various dis- ser provides "a very bright image" (Benton, 1991), so play systems to present the different perceptual depth we will indicate the brightness as 1. cues.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 454 PRESENCE: VOLUME I , NUMBER 4

Table I. Comparison of Three-Dimensional Imaging Systems, Part Spatial Angular resolution resolution Refresh Brightness Color Information (horiz. x vert.) (arc-min) rate (per eye) (bits/pixel) rate Bandwidth

Human visual 4800 x 3800 0.5' 60 Hz N/A N/A 4.3 Mbits/sec N/A system Stereoscopic 1280 x 512 1.9' x 3.; 120 Hz, 60 0.15 8 bits each 1280 x 512 x 24 1280 x 512 (LCD shutter) Hz per eye RGB x 120 = 1.9 x 120/2 = Gbits/sec 40 MHz Interactive 1280 x 512 1.9' x 3.: 120 Hz, 60 0.15 8 bits each 1280 x 512 x 24 1280 x 512 stereoscopic Hz per eye RGB x 120 = 1.9 x 120/2 = (LCD shutter) Gbits/sec 40 MHz Head-mounted 1280 x 1024 x 2 4.2' Two 60-Hz 1 8 bits each 1280 x 1024 x 24 1280 x 1024 display, boom- monitors RGB x 60 x 2 = 3.8 x 60/2 = mounted display Gbits/sec 40 MHz x 2 Lenticular 320x512 7.6' x 3.8' 60 Hz <1 8 bits each 2560 x 512 x 24 2560 x 512 barrier ( x 8 view zones) RGB x 60 = 1.9 x 60/2 = with CRT Gbits/sec 40 MHz Parallax barrier 320 x 512 7.6' x 3.8' 60 Hz <0.12 8 bits each 2560 x 512 x 24 2560 x 512 with CRT (x8 view zones) RGB x 60= 1.9 x 60/2 = Gbits/sec 40 MHz Slice stacking Calligraphic 1.9' 60 Hz (30- < 1 (fast 8 bits 1280 x 1024 x 8 1280 x 1024 (varifocal mirror) (1280 x 1024) Hz mirror) phosphor) Green x 60 = 600 x 60/2 = Mbits/sec 40 MHz Holographic video 32k x 192 mono, Small horiz 36 Hz 1 8 bit/pixel, 32K x 192 x 8 32K x 64 (MIT system) 32k x 64 color x 1.9'vert 3 channel x 36= 1.8 x 36/2 = (mono) color Gbits/sec 37 MHz x 3 X5.8' vert (color)

Because all of the 3D display systems image at least supported with slice-stacking displays, because of the two views, they all support retinal disparity and conver- significant limit on the amount of information that can gence. The eyes pivot and change their convergence an- be scanned within a given depth. gles as they fixate on different objects at different depths Aerial perspective is supported to some degree on all (due to different retinal disparities) in the 3D image. of the display systems. Holographic and slice-stacking The pictorial cues are generally supported on all of the systems, however, image only relatively small volumes at display systems. However, slice-stacking displays cannot close range. Therefore, "true" aerial perspective, which is image several of the pictorial cues: overlap is not sup- effective at very great distances, cannot be accurately dis- ported on slice-stacking systems, because a luminous played. However, aerial perspective, considered more volume is imaged, which creates "transparent" surfaces. generally as "hazy" or "foggy" attenuation, can be dis- All other display systems, however, are capable of dis- played on these systems within their volumes. Volume- playing both opaque and transparent surfaces. Overlap filling "haze" would be beyond the bandwidth capabili- can be approximated when using slice-stacking displays ties of most slice-stacking displays (too much with the addition of head tracking, as described previ- information to scan at every depth.) Nevertheless, a ba- ously. Shading and texture gradient are only partially sic "luminous haze" could be easily incorporated

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 455

Table 2. Comparison of Three-Dimensional Imaging Systems, Part 2 Field of Viewing Number Depth Autostereo- view zone/volume ofviews resolution scopic Comments

Human visual 180° x 120° N/A 2 instantaneous, 0.47 mm N/A system oo over time Stereoscopic 40° x 32° ~ 180° 2 1.8 mm No, glasses 120 Hz for flicker (LCD shutter) reduction Interactive 40° x 32° ~ 180° 2 generated, 1.8 mm No, glasses 120 Hz for flicker stereoscopic many over time reduction; re- (LCD shutter) quires head tracking Head-mounted 135° x 90° 360° 2 generated, 4.0 mm No, optical Requires head display, boom- many over time viewer tracking; small, mounted hi-res monitors display are expensive Lenticular 40° x 32° 32°, with 8 (more and less 7.2 mm Yes Requires very barrier mult, zones views possible) precise, linear with CRT display Parallax barrier 40° x 32° 32°, with 8 (more and less 7.2 mm Yes Requires very with CRT mult, zones views possible) precise, linear display Slice stacking 40° x 32° 30° zone, Very high, horiz. 0.02 mm Yes View volume lim- (varifocal 33 cm depth and vert. ited to truncated mirror) pyramid; fast phosphor (green only) Holographic 6.2° 11.4° 100 mm Very high horiz, Very small, Yes Prototype stage video (MIT depth 1 vertical diffraction system) limited

through a secondary channel that illuminates a 2D exhibit a "coarse" horizontal parallax, since only a lim- plane, scanned into a volume by the varifocal mirror. ited number ofviews are imaged. Motion parallax can be considered in two different ways. Motion parallax due to the motion of the synthetic camera (as in a flight simulator) is supported on all of 5. Conclusion the systems (and 2D systems as well). Motion parallax due to observer motion is supported by all of the display Compared to other three-dimensional displays, types, except for the noninteractive stereoscopic display. stereoscopic displays have the advantage of a relatively However, motion parallax is supported in the horizontal low bandwidth because they generate only two views, direction only (no vertical parallax) by the parallax bar- matching the "two view" characteristic of the human rier, lenticular, and holographic-video display systems. visual system. With the addition of viewpoint dependent Furthermore, the parallax barrier and lenticular displays imaging via headtracking, stereoscopic displays can gen-

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 456 PRESENCE: VOLUME I , NUMBER 4

Table 3. 3D Cues and 3D Imaging Systems" Accommo- Conver- Image Linear Texture Aerial Horiz. Vertical Binocular dation gence size Overlap perspect. gradient perspect. Shading parallax parallax disparity Stereoscopic (LCD shutter) o •••• •• • O o • Interactive stereoscopic (LCD shutter) o •••• •• •••• Head-mounted display, boom- mounted display o •••• •• •••• Lenticular barrier with CRT o •••• •• »co» Parallax barrier with CRT o •••• •• «co« Slice stacking (varifocal mirror) • • • O • © e ©••• Holographic video (MIT system) • •••• •• • • o • ao, Not supported; e, partially supported (see text); •, supported.

erate many views of a three-dimensional scene, enhan- Holographic video displays require a significant in- cing depth perception through motion parallax and crease in the information rate for the display, compared "look around." Another advantage of stereoscopic dis- to other methods with a comparable viewing zone, ex- plays is that the two views can be generated by dividing cept perhaps for a theoretical slice-stacking display, the vertical, rather than horizontal, resolution—which which images a significant amount of information, with does not reduce the depth resolution (unlike parallax many "slices." They provide for ocular accommodation barrier and lenticular displays). Their main disadvan- and full horizontal parallax, with a very high horizontal tages are that they cannot provide focus information, resolution. They are autostereoscopic, allowing multiple and they require viewing aids, and possibly head-track- viewers to observe the data from their own viewpoints ing devices, which often exhibit noise, error, and time (within the limits of the viewing zone) without head- delay problems. tracking. The displays provide high contrast, bright, Parallax barrier and lenticular sheet do not displays color images. However, they are currently in prototype exhibit cannot good depth resolution, provide focus, stages. and also reduce the horizontal resolution. They have the advantage of being autostereoscopic, and of supporting multiple views. References Varifocal mirror displays are well suited to applica- tions requiring the display of a small volume of lumi- Benton, S. A. (1982). Survey of Holographic Stereograms. In nous, transparent data. They currently cannot support J. J. Pearson (Ed.), Proceedings ofSPIE 367: Processing and full color, because only green phosphors are available display ofthree-dimensional data (pp. 15-19). SPIE. with a short enough persistence time. They provide for Benton, S. A. (1991). Experiments in holographic video imag- ocular accommodation, and full horizontal and vertical ing. Proceedings ofthe SPIE Institute on Holography, Belling - parallax. ham, WA.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 McKenna and Zeltzer 457

Blanchard, C, Burgess, S., Harvill, Y., Lanier, J., Lasko, A., Thomas ( Eds. ), Handbook ofperception and human perfor- Oberman, M., & Teitel, M. (1990). Reality built for two: A mance (pp. 22-1-22-64). New York: John Wiley. virtual reality tool. 1990 Symposium on Interactive 3D Graph- Hodges, L. F. (1990). Basic principles of stereographic soft- ics, Snowbird, Utah. Special issue oí Computer Graphics, ware development. Stereoscopic Displays and Applications II, 24(2), 35-26. Proc. SPIE 145. Conrac-Division. (1985). Rastergraphics handbook (2nd ed.) Hubel, D. H. (1988). The corpus callosum and stereopsis. Eye, New York: Van Nostrand Reinhold. brain, and vision. New York: Scientific American Library. Cornsweet, T. N. (1970). Visual perception. San Diego, CA: Hunt, R. W. G. (1975). The reproduction ofcolour in photogra- Harcourt, Brace lovanovich. phy, printing and . New York: John Wiley. Dallas, W. J. (1980). Computer-generated holograms. In B. Kim, W. S., Tendick, F., & Stark, L. W. (1987). Visual en- R. Frieden (Ed.), The computer in optical research (pp. 291- hancements in pick-and-place tasks: Human operators con- 366). Berlin: Springer-Verlag. trolling a simulated cylindrical manipulator. IEEEJournal of Robotics andAutomation, 418-425. Davson, H. (1980). Physiology ofthe eye (4th ed.) New York: RA-3(5), Academic Press. Liu, A., Stark, L., & Hirose, M. (1992). Effects of stereo and occlusion on simulated the Eckhardt, R. C. (1991). Solving the monitor mystery. Mat- telemanipulation. Proceedings of International World, 134-141. 1992 Societyfor Information Display Symposium, Boston, MA. Farrell, E. J. & Chistidis, Z. D. (1989). Visualization of com- S. 8c plex data. SPIE Three-Dimensional Visualization and Display MacDowall, I. E., Bolas, M., Pieper, S., Fisher, S., and of a (Los Angeles, CA), 1083, 153-160. Humphries, J. (1990). Implementation integration Technologies counterbalanced CRT-based for interac- Fisher, S. S. (1981). Viewpoint imaging: An interac- stereoscopic display dependent tive control in virtual environment tive stereoscopic display. Master's Thesis, Massachusetts Insti- viewpoint applications. In J. Merritt (Ed.), Proceedings ofSPIE Stereoscopic Displays tute of Technology, Cambridge, MA. and San Jose, CA. Fisher, S. S., McGreevy, M., Humphries J., & Robinett, W. Applications, Marr, D. (1982). Vision. San Francisco, CA: W. H. Freeman. (1986). Virtual environment display system. Proceedings of McAllister, D. F., Hodges, L. F., Robbins, W. E., & Noble, L. the 1986 ACM on Interactive Hill, Workshop Graphics, Chapel (1986). Three-dimensional and tech- NC, 77-87. display technology niques for computer-generated images. SPIE's 1986 O-E/ Foley, J. D., van Dam, A., Feiner, S. K., & Hughes, J. F. LASE First Annual Symposium on Optoelectronics and Laser (1990). Computergraphics: and practice (2nd ed.). Principles in Science and Engineering, Los Angeles, CA: MA: Applications Reading, Addison-Wesley. Tutorial T21. E. B. 7: and Goldstein, (1989). Chapter "Perceiving Depth McKenna, M. (1992). Interactive viewpoint control and three Size" and 4: Color." In Sensation and Chapter "Perceiving dimensional operations. Proceedings ofthe 1992 Symposium on perception (3rd ed.). Belmont, CA: Wadsworth Publishing. Interactive 3D Graphics, Cambridge, MA, 53-56. Crimson, W. E. L. (1981). From images to surfaces. Cambridge, McWhorter, S. W., Hodges, L. F., & Rodriguez, W. (1990). MA: MIT Press. Evaluation of 3-D display techniques for engineering design Halliday, D., & Resnick, R. (1978). Physics, Part 2 (3rd ed.). visualization. Proceedings ofASEE Engineering Design Graph- New York: John Wiley. ics, Tempe, Arizona. Hamasaki, J. (1980). Autostereoscopic 3-D television experi- Merritt, J. O. (1988). Often-overlooked advantages of 3-D ments. In M. A. Machado & L. M. Narducci (Eds.), Ameri- displays. SPIE-Three Dimensional Imaging and Remote Sens- can Institute ofPhysics Conference Proceedings: Optics in Four ing Imaging (Los Angeles, CA), 902, 46-47'. Dimensions—Number 65 (pp. 531-556). Mingolla, E., & Todd, J. T. (1986). Perception of solid shape Hamasaki, J., Okada, M., Utsunomiya, S., Uematsu, S., from shading, biological Cybernetics, 53, 137-151. Takeuchi, O., Kambayashi, K., & Shimada, S. (1989). Re- Okoshi, T. (1976). Three dimensional imaging systems. New cent experiments on an autostereoscopic di- York: Academic Press. rectly seen on Braun tube by the naked eye. Proceedings of89' Proffitt, D. R., & Kaiser, M. K. (1991). Perceiving environ- 3dmt, Montreal. mental properties from motion information: Minimal condi- Hochberg, J. (1986). Representation of motion and space in tions. In S. R. Ellis (Ed.), Pictorial communication in virtual video and cinematic displays. In K. Boff, L. Kaufman, & J. and real environments. London: Taylor & Francis.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021 458 PRESENCE: VOLUME I , NUMBER 4

Rallison, R. D., & Schicker, S. R. (1991). Combat vehicle ste- Sutherland, I. E. (1968). A head-mounted three-dimensional reo HMD. In H. M. Assenheim, R. A. Flasck, T. M. Lip- display. Proceedings ofthe FallJoint Computer Conference, pen, 8c J. Bentz (Eds.), Proceedings SPIE Large Screen Projec- 765-776. tion, Avionic, and Helmet-Mounted Displays, San Jose, CA. In Tricóles, G. (1987). Computer generated holograms: an his- press. SPIE. torical review. Applied Optics, 26(20), 4351^-360. Robinett, W., 8c Rolland, J. P. (1992). A computational Vickers, D. L. (1970). Head-mounted display terminal. Pro- model for the stereoscopic optics of a head-mounted displav. ceedings ofthe 1970 IEEE International Computer Group Con- Presence: Teleoperators and Virtual Environments, 1(1), 45-62. ference. Rogowitz, B. E. (1983). The human visual system: A guide W. Industries Ltd. (1991). Product specifications. Leicester for the displav technologist. Proceedings ofthe Societyfor Infor- LEI 5WD, Great Britain. mation Display 24/2. Woodham, R. J. (1984). Photometric method for determining Rolfe, J. M., 8c K. J. Staples, (1989). Visual systems inflight shape from shading. In S. Ullman & W. Richards (Eds.), simulation. Cambridge, England: Cambridge University Image understanding 1984. New Jersey: Ablex Publishing Press. Co. Seiter, C. (1992). 24-Bit monitors: Fast and functional. Mac- Wyszecki, G. (1982). Color science: Concepts and methods, quan- World, 124-131. titative data andformulae (2nd ed.). New York: John Wiley Soltan, P., Trias, J., Robinson, W., 8c Dahlke, W. (1992). La- 8c Sons. ser based 3D system. Proceedings ofSPIE, Yan, J. K. (1985). Advances in computer-generated imagery San Jose, CA, February 9-14. for flight simulation. IEEE Computer Graphics &Applica- St. Hilaire, P., Benton, S. A., Lucente, M., Jepsen, M. L., Kol- tions, 5(8), 37-51. lin, J., Yoshikawa, H, 8c Underkoffler, J. (1990). Electronic Yoshida, S., Ohkoshi, A., 8c Miyaoka, S. (1982). The display system for computational holography. SPIE Proceed- 'Trinitron"—A new color tube. IEEE Transactions on Con- ings, Bellingham, WA, Vol. 1212 "Practical Holography IV." sumer Electronics, CE-28(\).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/pres.1992.1.4.421 by guest on 29 September 2021