OSCAM - Optimized Stereoscopic Control for Interactive 3D

Thomas Oskam 1,2 Alexander Hornung 2 Huw Bowles 3,4 Kenny Mitchell 3,4 Markus Gross 1,2 1 ETH Zurich 2 Disney Research Zurich 3 Black Rock Studio 4 Disney Interactive Studios OSCAM uncontrolled

c Disney Enterprises, Inc. Figure 1: Two stereoscopic shots of the camera moving towards objects. Our method keeps a constant target depth range when moving close to the objects. Uncontrolled , in contrast, can cause large disparities and destroy stereoscopic perception. Abstract 1 Introduction

This paper presents a controller for camera convergence and inter- Stereoscopic content creation, processing, and display has become axial separation that specifically addresses challenges in interactive a pivotal element in movies and entertainment, yet the industry is stereoscopic applications like games. In such applications, unpre- still confronted with various difficult challenges. Recent research dictable viewer- or object-motion often compromises due has made substantial progress in some of these areas [Lang et al. to excessive binocular disparities. We derive constraints on the 2010; Koppal et al. 2011; Didyk et al. 2011; Heinzle et al. 2011]. camera separation and convergence that enable our controller to Most of these works focus on the classical production pipeline, automatically adapt to any given viewing situation and 3D scene, where the consumer views ready-made content that has been op- providing an exact mapping of the virtual content into a comfort- timized in (post-) production to ensure a comfortable stereoscopic able depth range around the display. Moreover, we introduce an experience. See Tekalp et al. [2011] for an overview. interpolation function that linearizes the transformation of stereo- scopic depth over time, minimizing nonlinear visual distortions. In interactive applications that create stereoscopic output in real- We describe how to implement the complete control mechanism time, one faces a number of fundamentally different challenges on the GPU to achieve running times below 0.2ms for full HD. [Gateau and Neuman 2010]. For example, in a first-person game This provides a practical solution even for demanding real-time where the player is in control of the view, a simple collision with a applications. Results of a user study show a significant increase wall or another object will result in excessive disparities that cause of stereoscopic comfort, without compromising perceived realism. visual fatigue or destroy stereopsis (see Figure 1). In order to guar- Our controller enables ‘fail-safe’ stereopsis, provides intuitive con- antee proper stereoscopy, one needs a controller that adjusts the trol to accommodate to personal preferences, and allows to properly range of disparities to the viewer’s preferences. An example for display stereoscopic content on differently sized output devices. such a controller is the work of Lang et al. [2010] which, how- ever, has been designed for post-capture disparity range adaptation using complex image-domain warping techniques. In a game en- CR Categories: I.3.3 [Computer Graphics]: Picture/Image vironment where the stereoscopic output is created and displayed generation—display algorithms,viewing algorithms; in real-time, it is advisable to optimize the stereoscopic rendering parameters, i.e., camera convergence and interaxial separation, and to avoid computationally expensive solutions. Keywords: stereoscopic 3D, disparity control, real-time graphics, games, interactive 3D The problem can be formulated as one of controlling perceived depth. We use the term ‘perceived depth’ in the geometrical sense, where the distances reconstructed by the viewer are dominated by Links: DL PDF the observed screen disparities. Even though there are other im- portant cues such as vertical size, focus that influence perceived depth [Backus et al. 1999; Watt et al. 2005], the work of Held and Banks [2008] showed that the geometrical approach is a valid ap- proximation. The range of perceived depth around the screen that can be viewed comfortably is generally referred to as the comfort zone, and is defined as the range of positive and negative disparities that can be comfortably watched by each individual viewer [Smolic et al. 2011; Shibata et al. 2011]. Therefore, we are looking for an exact mapping of a specific range of distances in the scene into this depth volume around the screen. In the course of this article, we will refer to this volume as the target depth range. While we concentrate on the control of the mapping between the virtual and real space there exists prior work on how to derive a comfortable target depth range [Woods et al. 1993; Shibata et al. 2011] Contributions. The contribution of this paper is a real-time In contrast to these works, our paper provides explicit constraints stereoscopic control mechanism for disparity that is able to guar- on both camera convergence and interaxial separation in order to antee an exact mapping of arbitrary content to a comfortable target control the mapping of virtual scene content into perceived space. depth range. We start with a brief summary of the basic geometry Moreover, none of the previous works has proposed a solution to in stereoscopic rendering, based on a viewer-centric model and an handle nonlinear visual distortion during temporal interpolation of inverse scene-centric model. We derive constraints on the camera these parameters. convergence and interaxial separation from these models that pro- vide full control over resulting disparities and, hence, the mapped Stereoscopic content analysis and post-processing: Based on the target depth range of arbitrary scene content. A second contribution above works, several methods have been developed for stereoscopic of our work is a controlled temporal interpolation of the camera video analysis which estimate image disparities in order to predict convergence and interaxial separation, which we use to linearize and correct visual distortions such as cardboarding, the ‘puppet the- the perceived change in stereoscopic depth to avoid visual artifacts. ater effect’, and other types of distortions [Masaoka et al. 2006; Finally we describe how the complete controller is implemented Kim et al. 2008; Pan et al. 2011]. Koppal et al. [2011] describe a on the GPU with very high performance (approximately 0.2ms per framework for viewer-centric stereoscopic editing. They present a frame at full HD resolution), resulting in minimal added overhead sophisticated interface for previewing and post-processing of live compared to naive stereoscopic rendering. action stereoscopic shots, and support measurements in terms of perceived depth. Nonlinear remapping of disparities based on dense depth maps has been discussed by Wang et al. [2008]. Lang et Applications. Our controller has a variety of benefits for interac- al. [2010] generalize these concepts to more general nonlinear dis- tive stereoscopic applications. Given the viewing geometry (screen parity remapping operators and describe an implementation for size, viewing distance) we can map any scene content into a spe- stereoscopic editing based on image-domain warping. A detailed cific target depth range. This means that the content created by state-of-the-art report on tools for stereoscopic content production a producer is guaranteed to create the desired stereoscopic depth is provided in [Smolic et al. 2011]. This report details challenges effect independent of the actual display device, be it for example a in the context of 3D video capturing and briefly discusses the most large polarized display or a small Nintendo 3DS. Moreover, an ap- common problems in practical stereoscopic production such as the plication can be easily adapted to the depth or disparity constraints comfort zone, lens distortions, etc. of a particular device, which helps to reduce ghosting or crosstalk artifacts. Because our method de-couples content production from The focus of all these methods is on post-production analysis and stereoscopic display and automatically adapts to arbitrary output correction of stereoscopic live-action video. Our work targets real- devices, it has the potential of considerably simplifying production time stereoscopic rendering, with control over perceived depth for and reducing costs. For the consumer our controller ensures that the dynamic 3D environments, minimization of nonlinear depth dis- stereoscopic footage is always comfortable to watch. The consumer tortions, and high efficiency for demanding real-time applications. may even adjust the target depth range intuitively to accommodate Hence, our goals are complementary to these previous works. to personal preferences, such that proper stereopsis without exces- Perceptual research on stereoscopy: An excellent overview of sive disparities is guaranteed. The results of a user study show that various stereoscopic artifacts such as Keystone Distortion, Puppet our controller is preferred over naive stereoscopic rendering. Theater Effect, Crosstalk, Cardboarding, and the Shear Effect and their effects on visual comfort is given in the work of Meesters et al. [2004]. The works from Backus et al. [1999] and Watt et al. 2 Related Work [2005] show that perceived depth not only depends on the amount of disparities seen by the viewer, but also on monocular cues such Production and consumption of stereoscopic 3D has been re- as vertical size, focus, or perspective. The previously mentioned searched for many years, with applications ranging from cinema work by Woods et al. [1993] provides a user study to what extent [Lipton 1982], scientific visualization [Fröhlich et al. 1999], televi- different subjects can still fuse various disparity ranges. The re- sion broadcasting [Meesters et al. 2004; Broberg 2011] to medical sults clearly showed that different persons have significantly vary- applications [Chan et al. 2005]. A recent survey over the field is ing stereoscopic comfort zones, indicating that individual control provided in Tekalp et al. [2011]. Interestingly, solutions for in- over stereoscopic depth is desirable. Stelmach et al. [2003] showed teractive applications such as games are rare. In the following we in a user study that shift-image convergence changes are gener- discuss related works on stereo geometry, analysis and correction, ally preferred over toed-in camera setups, and that static, parallel camera control in real-time environments, and perception. camera setups are often problematic due to excessive disparities for nearby objects. A perceptual model which emphasizes the impor- Stereo geometry: A detailed derivation of the geometry of binocu- tance and utility of individual control over disparity and perceived lar vision and stereoscopic imaging is given in Woods et al. [1993]. depth, is described by Didyk et al. [2011]. Shibata et al. [2011] thor- Their main focus is on the description of various image distortions oughly examine the vergence-accommodation conflict and conduct such as keystoning or depth plane curvature, and they show how user studies on the comfort of perceived depth for different view- perceived depth changes under different viewing conditions. Grin- ing distances. Their work also provides a way to define a range of berg et al. [1994] also describe the mapping between scene and disparities that are comfortable to watch by an average viewer. perceived depth using different frames-of-reference, and propose a Results from perceptual experiments and research on stereoscopy framework based on a minimum set of fundamental parameters to clearly indicate a large variation in the physiological capabilities describe a 3D-stereoscopic camera and display system. Held and and preferences of different people. These indications motivate the Banks [2008] derive a very complete geometrical model that maps need for tools that allow for a content-, display-, and user-adaptive from the scene over the screen to the perceived depth, including the control of stereoscopic disparity. projection to the retina. They not only parameterize the distance from the screen, but also the yaw, pitch, and roll of the viewer’s Camera control in interactive environments: Finally there is head as well as a relative position to the screen. They use this model a large body of work on real-time camera control in virtual 3D to predict distortions perceived by the viewer. Another summary of environments and games, ranging from intuitive through-the-lens the stereo geometry is provided by Zilly et al. [2011]. They discuss editing interfaces [Gleicher and Witkin 1992] and cinematographic constraints on the camera separation but do not take the camera shot composition [He et al. 1996; Bares et al. 1998] to sophisticated convergence into account. Similar to the previous works they are camera path planning and minimization of occlusions of points of mainly focused on quantifying depth distortions. interest [Oskam et al. 2009]. There exist excellent overviews of VIEWER CENTRIC MODEL 1 Viewer-Screen Setup 2 Virtual Camera 3 Dual Camera-Scene Setup

Viewer Screen virtual image plane convergence point z real units α w d/2 d p i h b ws e virtual units

dv f ccvg c

SCENE CENTRIC MODEL

Figure 2: The geometry pipeline of stereoscopic vision. Stage 1: the viewer with eye separation de at distance dv to a screen of width ws reconstructs a point at depth z in the target space due to the on-screen parallax p. Stage 2: The on-screen parallax in stage 1 is caused by a disparity d on the two camera image planes. The camera renders the scene with focal length f and an image shift h. Stage 3: Two , with opposite but equidistant image shifts converge at distance ccvg in the scene, and are separated by the interaxial distance b. The image disparity d between both cameras corresponds to a point with distance c from the cameras. this field [Christie et al. 2008; Haigh-Hutchinson 2009], which ad- dress theory of camera control as well as practical solutions for d p(b, c , c) z(b, c , c) = v cvg . (1) high-performance interactive applications. The work of Jones et cvg d − p(b, c , c) al. [2001] also addresses real-time control of camera parameters e cvg to adjust the target depth range. However, they assume a parallel Note that the reconstructed depth z is positive behind the screen, camera setup and solve the problem for still images only. and negative in front of the screen. p 3 Basic Geometric Models of Stereoscopy The conversion from on-screen parallax to the image disparity d is simply a matter of scaling according to the screen width ws and the virtual image plane width wi (see Figure 2.1 and 2.2). This In this section we briefly revisit the basic geometry of stereoscopic extends Eq. (1) to vision relevant to our work. Our description is, in general, based on previous models [Woods et al. 1993; Held and Banks 2008; Zilly dv d(b, ccvg, c) z(b, ccvg, c) = . (2) et al. 2011]. For the stereoscopic camera parameter, i.e. the conver- wi de − d(b, ccvg, c) gence and interaxial separation, we follow the same definition as the ws existing models. The interaxial separation b is defined as the dis- tance between the positions of the two cameras. The convergence Finally, we can incorporate the scene distance c of the point, that is reconstructed at z, using the camera geometry (Figure 2.2) and the distance ccvg is defined as the distance between the intersection of the two camera viewing directions and the middle point between the scene setup (Figure 2.3). two cameras. Both parameters are schematically shown in figure The triangle defined through the scene distance c and half the in- Figure 2.3. Note that we converge our cameras using image-shift teraxial distance b is similar to the triangle defined by the camera instead of toeing them in. Image-shift convergence produces less focus f and h − d/2, where h is the image shift. Using the intercept artifacts [Woods et al. 1993; Stelmach et al. 2003]. theorem and h = f b/(2 ccvg), we can reformulate Eq. (2) to include First, we treat the camera convergence and interaxial separation as the convergence distance ccvg and the camera interaxial distance b unknowns. This will enable us later to derive constraints for these dv · (c − ccvg) parameters to achieve an optimal mapping of depths between scene- z(b, ccvg, c) = . (3) wi de c ccvg and real-world spaces. Second, we define real-world distances in a − (c − ccvg) ws f b 3D volume relative to the screen instead of the viewer. This al- lows for a more intuitive definition and control of the target depth range (see Figure 2.1). Based on these prerequisites we derive the 3.2 Scene-Centric Model corresponding viewer-centric model and, as the inverse case, the scene-centric model, both of which are later needed to formulate Inverse to the viewer-centric model, the scene-centric model seeks constraints for the stereoscopic camera controller and the temporal the scene distance c as a function of the stereoscopic parameters ccvg transformation functions. Figure 2 gives an overview of the geom- and b and a defined depth z in real-world space. This corresponds etry pipeline of those two models. to a right-to-left traversal of the pipeline in Figure 2. Given the scene setup in Figure 2.3 and the camera geometry in 3.1 Viewer-Centric Model Figure 2.2, the scene distance c is given as

The viewer-centric model describes the reconstructed depth z of a f b f b c(b, ccvg, z) = = . (4) point as a function of the scene distance c, the camera interaxial 2h − d(z) f b − d(z) ccvg separation b, and the convergence distance ccvg. This corresponds to a left-to-right traversal of the pipeline in Figure 2. The image disparity d, which depends on the distance z, can be We use the viewer-screen configuration from Figure 2.1, where the rescaled back to the on-screen parallax p using the image width wi viewer is looking from the defined distance dv at the screen of the and the screen width ws defined width w . d is the human interocular separation, usually s e f b considered to be 65mm. The viewer reconstructs a point in space c(b, ccvg, z) = . (5) f b wi p(z) relative to the screen due to his eyes converging according to the c − w screen parallax p. This point is perceived at the distance cvg s z z c cvg b 40 40 110 14 1 2 3 30 30 100 12 20 20 90 10 10 10 80 0 0 8 70 -10 -10 6 -20 -20 60 -30 t -30 t 50 t Figure 3: Temporal interpolation of camera convergence and interaxial separation. The graph on the left (1) shows an example how the target depth range transforms over time when the stereoscopic parameters are linearly interpolated. In the middle (2) the same range is interpolated as in (1), using our linearized transformation. The two functions on the right (3) show the functions of ccvg and b that achieve the linearized range transformation in (2).

Finally, we can incorporate the viewer-screen geometry (Figure 2.1) points [c1, c2] is of highest importance, e.g., for mapping a given 3D to replace the on-screen parallax p by inverting Eq. (1). Inserting scene into the pre-defined volume around the screen. In this case the the result in Eq. (5) we get above system has one non-trivial solution, and we can analytically determine the constraints for ccvg and b: f b c(b, ccvg, z) = . (6) f b wi de z c c · (d − d ) c − w d +z 1 2 1 2 cvg s v ccvg = , (7) c1 d1 − c2 d2 With Eq. (3) and Eq. (6) we have related perceived depth and scene c c · (d − d ) b = 1 2 1 2 . (8) distances using the camera convergence and interaxial separation. f · (c − c ) In the next section we use these equations to derive the constraints 1 2 on these two parameters used in our controller. The constraints in Eq. (7) and Eq. (8) enable an exact mapping of a specific range of distances in the scene into an arbitrary prescribed 4 Dynamic Stereo Control depth volume around the screen. Useful application scenarios are, for example, mapping of the com- Our goal is to control the camera convergence and interaxial separa- plete visible scene or of a particular salient object into a prescribed tion over time such that we can optimally map dynamically chang- depth range. The disparities d corresponding to the desired output ing scene content to a controlled target depth range. We therefore i depth volume are defined via Eq. (2), and the values ci are set to the want to find constraints for the stereoscopic parameters that map current minimum and maximum distances in the scene. Another any scene content to this pre-defined target range. This mapping application of these constraints is to adapt to variable zi boundaries. can be achieved by solving two problems. As the viewer adjusts the desired perceived depth volume, the ren- derer adjusts the camera convergence and interaxial separation to First, we derive constraints for the parameters ccvg and b so that a produce the desired output depth. series of points in the scene [c1, c2,... cn] are mapped onto a defined series of points [z1, z2,..., zn] in the target depth space. We solve for the constraints using a least-squares optimization in Section 4.1. 4.2 Temporal Constraint Interpolation Second, we derive an interpolation function for ccvg and b that min- imizes nonlinear distortions over time. We achieve this by guiding In the previous section, we have discussed how the basic stereo- the zi points with control functions as described in Section 4.2. scopic parameters can be constrained in order to keep the perceived depth range within a defined limit. The constraints, however, only 4.1 Constraints on Convergence and Separation consider a snap-shot in time. In an interactive environment, unpredictable object- or viewer- Given a defined series of depth values [z , z ,..., z ], z < z for 1 2 n i j motion can change the scene depth instantly. This causes two i < j, where the screen surface is the reference frame. We want problems. Let (ct , bt ) denote the set of stereoscopic parameters to compute values for the camera convergence c and separation cvg cvg at time t. On the one hand, if the scene depth changes from time b such that a corresponding series of scene points [c1, c2,... cn], with c < c for i < j, is perceived as close as possible in the least t − 1 to t and the interaxial separation and convergence distance i j are kept constant at (ct−1, bt−1), the scene is mapped to a different squares sense to the zi. This will allow us to map salient objects or cvg the entire visible scene into defined target depth ranges. target range, which can result in excessive disparities and compro- mise the stereoscopic perception. On the other hand, if the camera First, we use the relation between perceived depth z and image dis- convergence and interaxial separation are immediately re-computed t t parity d in Eq. (2), to simplify the problem. The transformation as (ccvg, b ) according to the constraints introduced in the previous between these two parameters is independent of ccvg and b and, section, the perceived depth of scene elements visible at both time- therefore, we can interchange the depth values zi with the corre- steps will change instantly. These sudden jumps in depth can be sponding disparities di. The mapping problem can now be formu- irritating to the viewer as well. So in general, we would like to lated by inserting the disparity values into equation Eq. (4), and control the stereoscopic parameters over time in order to reduce setting them equal to the scene points. This gives us the following both types of artifacts. nonlinear system of equations The straight forward solution would be to interpolate linearly be- t−1 t−1 t t f b ci − f b ccvg − ci di ccvg = 0 for i = 1 ... n, tween the two parameter sets (ccvg , b ) and (ccvg, b ). However, a simple linear change of camera convergence and interaxial sep- which can, for example, be solved in a least-squares sense with a aration causes the target depth range to transform in a nonlinear Gauss-Newton solver. fashion, as shown in Figure 3.1. This scaling of the target depth re- sults in nonlinear changes of object shapes and of the scene volume Target range control using two constraints. In general, the spe- over time. In order to minimize these types of visual artifacts, we cial case using two real-world depth values [z1, z2] and two scene derive an interpolation function for our stereoscopic constraints that iue4: Figure h notoldcmr al opeev ofral iprt ag,cuigecsiedsaiisadhneidcn y-tant the to eye-strain inducing hence and disparities excessive causing range, disparity comfortable a preserve to fails camera uncontrolled The t h au of value The h eutn ucin for functions resulting The tar- the on (10) Eq. of effect the shows 3.2 Figure in graph The functions interpolation linear standard the Then iestep time neet sn u bv omltos ecndfieteinterpo- Let the depth. particular target define the of can of is we terms formulations, in depth lation above maximum our by a Using just and interest. defined ranges minimum depth respective two their between interpolating of case cial interpolation. range Linearized target the restore to to back separation depths interaxial and convergence camera depths functions target interpolation from new depth change individual resulting values the depth compute scene the first over as constant soon approximately as depths target time, the keep to order ‘overshooting’ time-step in large Now, prevent a to of (9) case Eq. in interpolation equation the in of function min the use interpolation. depth General perceived the constant. keeping approximately while volume depth depth perceived ensured. is in stereopsis changes comfortable initially linearizes that so OSCAM, the as setup same exact viewer. the has camera uncontrolled the shot, the of beginning the htorotmzdfntosdfe osdrbyfo ierinter- linear apparent from parameters. is considerably the It differ of polation parameter. functions the optimized dashed of our the interpolation that graphs, linear both the In shows 3.3. line Figure in shown are transformation time. over distortions linear less simple significantly a the introduces to mation along Compared linear range. is of depth transformation interpolation target The the of time. boundaries over range depth get depth in change linear desired the achieve boundaries func- interpolation we linear) time, necessarily over (not constant tion arbitrary volume an depth define perceived can the keep to order attetre et ag transforms range depth target the fast velocity predefined a and vari- interpolation interpolation the define able we simultaneously, points individual time point the changes gradually

ntre pc ihrsett h cen(e eto .) In 4.1). Section (see screen the to respect with space target in uncontrolled OSCAM α I t i ( − safnto htdpnso h urn iestep time current the on depends that function a as z t i , .I re ocnrlalitroainfntosfrall for functions interpolation all control to order In 1. ∆ z oprsnbtenOCMaducnrle trocp fmdu ofs aeamto hog ope niomns At environments. complex through motion camera fast to medium of stereoscopy uncontrolled and OSCAM between Comparison t i t − n h vrg egho l h oto curves control the all of length average the and 1 α , α I α z scmue stertobtentedsac o the for distance the between ratio the as computed is lin t i c ( ) − [ cvg ∆ ( z 1 o aho h et values depth the of each for z t 1 t . − t i , , and 1 = v) z z , t i t i z − t 2 hc losu olnaietecag in change the linearize to us allows which , 1 − b , 1 min α ] seFgr .) u ierzdtransfor- linearized our 3.1), Figure (see and c z ( = ) t i cvg akt t position its to back

[ and Let z n 1 1 t 1 ∑ , − iia oScin41 h spe- the 4.1, Section to Similar z i= n z t 2 b α v t i ] 1 v eoeadphvlea time at value depth a denote vrtm o h linearized the for time over ∆ ) en h w et ranges. depth two the define len( z t hti sdt oto how control to used is that I t i i + ogaulyudt the update gradually to I i ) α , z ∆ z t i 1 t i n hnuethe use then and , z t − ! t i z I rvelocity or 1 ahfunction Each . t i lin . − . 1 c nterange the on taprevious a at t i − 1 ∆ to t I c fthe of i v. t i We . we , (10) (9) 1920x1080. epromanme fmnmxrdcinpse ftedepth the of passes reduction min-max of number a perform we loih 1 Algorithm )wt q 9.Te,tetre et ag a einterpolated be can range depth target the Then, (9). Eq. with 4) ln )uigE.() ie h agtdphrne[z range depth target the Given (3). Eq. using 3) (line h rtse st cur h e cn et range depth scene new the acquire to is step first The pa- of sequence a produces algorithm controller stereoscopic The etdt h orsodn iprt values disparity corresponding the to verted copnigvdofrteata edrnso yai scenes dynamic the of also method. renderings see our Please actual with created method. the our for a of video utility conducted accompanying the furthermore validate We to of study controller. applications user and camera scenarios stereoscopic several describe our we following, the In Evaluation Experimental and Applications 6 at 0.18ms and (the 1280x720 GPU at Nvidia 0.09ms very recent only was computed a cost on are the passes of and highly passes the GTX580) number are timed these the we GPUs and Indeed, modern Although workload cheaply. size, this screen 2006]. for the al. optimized in et logarithmic [Greß is GPU passes depths, the these on obtain efficiently be buffer To cannot that scene. and algorithm budget the minimum the our in frame of depths of determination total the maximum stage the is cost only 60fps, constant The in (at into computed usable milliseconds fit 16.7ms). be to few only To needs a is algorithm factor. of real-time budget critical any a a scenario, is production efficiency a and in tight acquisition. are range budgets depth Efficient and (7) for Eq. values new constraints, the parameter (8), Eq. stereoscopic the using Finally, vrtm ln )uigteitroainfnto nE.(0 con- (10) Eq. in function interpolation the using 5) (line time over velocity the this Using [z range below. depth described new is the range range, this of computation Efficient e paei rvddi loih 1. Algorithm in provided is update ter (c pairs rameter Implementation Efficient 5 2: 1: 3: 4: 5: 6: 7: 8: procedure n procedure end [ [ α [ [b return c z d t 0 t 1 0 t ← , , , , z c c d h neplto variable interpolation the v, t 1 t t cvg 2 1 computeAlpha(∆ trocpcPrmtrUpdate Parameter Stereoscopic ] ] ] ← ← [c t ] cvg ← U ← t cvg PDATE mapDepth(c , getSceneDepthRange() transform(I b , constrainParameter(c t b tec frame each at ) t ] S TEREO c t cvg t 0 lin t , t 1 and , z , v) , t 1 c P o hsfaecnb computed be can frame this for ] z t 2 ARAMETER t 0 , b , c t z t t cvg t − 1 suocd fteparame- the of code Pseudo . a ecmue ln 6). (line computed be can , 1 α , nra-ieapplications, real-time In α b ) t t 0 a edtrie (line determined be can − , [ 1 c d ) t 1 0 (c , , d t cvg d − 0 1 inyEtrrss Inc. Enterprises, Disney 1 , ] c , d uigE.(2)). Eq. (using [ b 1 c t ) t 0 − , 1 t c 0 ) 0 t 1 , ] z ln 2). (line t 1 0 ,and ], car environment z -8 -7.5 0 (screen) 14

c Disney Enterprises, Inc.

2 minimum scene depth 0 -8 perceived depth (OSCAM) perceived depth (uncontrolled) -24

3 convergence distance interaxial separation (OSCAM) (OSCAM) Figure 6: Example of a scene with multiple points mapped onto 0 multiple target depth values. The top row shows the desired map- Figure 5: Comparison between constrained and unconstrained ping. The car should appear in the target range [-7.5, 0.0] cm while stereoscopic cameras while horizontally moving the camera past the environment should be mapped into [-8.0, 14.0] cm. The bottom a close obstacle. The graphs on the bottom show that the uncon- row shows how different views are rendered as close as possible to strained stereoscopic parameters cause excessive disparities. Our the desired mapping in the least squares sense using our nonlinear constrained camera adapts smoothly to the discontinuous depth. optimization.

6.1 Applications to different output screens, given a pre-defined comfortable target depth range for an average viewer. Adaptive stereoscopy and automatic fail-safe. When moving the camera through a scene, the render-depth is constantly changing. Intuitive control. Our method provides intuitive and exact Improper camera convergence and interaxial separation can cause stereoscopic control by allowing the viewer to adapt the perceived large disparity values when an obstacle suddenly comes close to the borders of the depth range to the respective personal comfort zone. camera. An example is shown in Figure 4, where a player is moving The user can also define high-level goals for the depth, as shown fast through the environment. While our method is able to adapt in Figure 8. The viewer may specify to move or scale the target to vast depth changes, uncontrolled stereoscopy causes excessive depth image, without worrying about the exact depth values or the disparities and hence induces eye-strain to the viewer. stereoscopic camera parameters. OSCAM also provides an interesting tool for content creators and Another typical situation encountered in interactive, dynamic en- artists in production. Our controller can be used to intuitively script vironments is the sudden change of scene depth. An example is perceived depth for different scene parts and hence create artistic shown in Figure 5, where the camera is horizontally translated effects such as emphasized depth for certain objects, stereoscopic across the street. It passes very closely in front of a couple, creating flatlands, etc. Moreover, since our method can map any scene a sudden discontinuous decrease in the minimum scene depth. If content into a pre-defined target depth range without changing any this is not handled properly, the is immediately camera parameters except interaxial separation and convergence, destroyed. The graphs in Figure 5 show how our method adapts the we effectively de-couple classical ‘2D’ camera work from stereo- stereoscopic parameters to prevent exceeding disparities to appear scopic camera work. This allows for a stream-lined production for too long. pipeline, where both stereoscopic and classical camera artists can Figure 6 shows an example of our parameter optimization, mapping work independently. multiple points in the scene onto multiple target ranges. The the top graph shows the desired mapping of the car and the environment in 6.2 User Study real space. The bottom images show stereoscopic renderings of the least-squares solution for the stereoscopic parameters and the In order to further evaluate the utility of our method we conducted mappings of the car and environment. a user study with 31 subjects. The study was aimed at comparing our OSCAM to standard stereoscopic rendering with fixed camera Changing target screen size and viewing conditions. Stereo- convergence and interaxial separation. All subjects were tested for scopic imagery that is produced for a certain target screen and view- proper stereoscopic vision using a random dot stereogram test. The ing distance is optimized to create a particular depth impression. If, goals of this study were twofold. First, we tested whether the sub- however, the content is shown under different viewing conditions, jects prefer the static, uncontrolled stereoscopy or our OSCAM as the viewer may experience a completely different depth sensation. more comfortable. Second, we examined if our controller compro- Figure 7 shows a comparison of two different views that are opti- mises perceived realism due to the involved adaptation of camera mized for a Television Screen, a PC Monitor, and Nintendo 3DS. convergence and interaxial separation. The viewing conditions and target depth ranges for each device can be found in Table 1. The stereoscopic image created for a Nin- To this end we rendered 10 side-by-side comparisons of different tendo 3DS shown on a large television screen can cause extremely scenes using uncontrolled stereoscopic parameters and using the large disparities. Our method is able to adapt content automatically OSCAM controller for pairwise evaluation [David 1963]. The ren- dered scenes contained typical scenarios encountered in interactive TV PC 3DS 3D environments, including TV [-51.4, 86.5] [-6.1, 8.2] [-0.5, 0.6] PC [-65.8, 164.7] [-8.0, 14.0] [-0.7, 1.0] • continuous view changes between close-ups and wider scenes 3DS [-837.1, 3532.2] [-105.4, 240.9] [-1.0, 1.5] • objects suddenly entering the field of view • three long sequences where a player explores a complex game Table 1: Content scaling matrix. The entry in the row i and column environment j shows the target depth range (in cm) of the image that is produced We randomized the order of scenes as well as the respective po- for i and viewed on j. The viewing conditions for each device (in sitions on the screen. Each pair was shown three times so that cm): TV: ws = 100, dv = 300. PC: ws = 50, dv = 65. Nintendo the viewers had sufficient time for comparison. The study was 3DS: ws = 7.6, dv = 35. performed on a line-wise polarized 46inch Miracube display. The iue7: Figure al .I h ideo ahrw anfiain fcranparts certain of magnifications row, each of middle the In 1. Table hncnieigal1 cnsi h vlain ercie 310 received we evaluation, the in scenes 10 all considering When eepeerdi 07 18o 1)o h cnscmae to compared scenes the of 310) of (188 60.7% in preferred were 11o 1)o h xmls hl h xdsee a rfre in preferred was stereo fixed the while examples, the of 310) of (191 iltosocre rsligi 4 nwrprqeto) h pref- the question), per answer 248 in (resulting occurred violations the of front in i.e., disparity, negative with object an if violations: about 1 question Regarding questions. two the of each for votes iwn itnews3,adorcnrle a ofiue for configured was controller our and 3m, was distance viewing n.Ti sitrsigsnetednmcaatto fbaseline of adaptation the dynamic answers the since the realistic interesting of more is 86.1% the This as In selected one. also anticipate. was rendering not comfortable perceived did more and we comfort that between results realism correlation the interesting addition, com- In an without depth. indicate scene experience of viewing realism comfortable perceived subjects promising more the imagery a by stereoscopic preferred created generally the these and was that controller All conclude our by can optimized 248). we of results these (172 From 69.3% p-value to a of with (176 realism significant 70.9% of statistically to are terms raises results comfort in of and terms in 248), method frame the our obvious remove for most erence we the if where Therefore, evaluation and the 2011]. [Gateau from al. sequences windows’ et two ef- ‘floating Smolic the corresponding 2010; evaluate adding trivial Neuman to is by correction order a add such in However, to only. violations remapping frame depth of such fects did for deliberately We correct This situations. display. not some the in around violations cm frame 86.5] introduced [-51.4, complete volume the target the mapped a For controller into viewer. stereo system scene the our study, visual to this uncomfortable human in be used the results can borders, This screen confused. get the can at cropped frame so-called is of stereo screen, problem the the by is considered paper been this in not proposed has controller that issue stereoscopic One settings. stereo static controller the our for of 39.3% results the realism, of terms In cases. the of 61.7% 38.3% in preferred was controller our viewing, stereo comfortable you? to realistic more looks watch? one to Which comfortable Q2: more is one Which Q1: questions: two following u otolr codn ooraoemnindgas o every either for answer goals, to to mentioned had dis- above participants identical the the our the were comparison at to disparities to that According resulting respect such controller. the with set our scene were cm each parameters of 86.5] stereoscopic beginning [-51.4, static of The range play. depth target a de- the in perceived be demand to conditions image viewing range. depth sired differing the The for disparities shown. different are vistas in found the be of can device each for ranges depth target and conditions 3DS PC TV otn rdcdfrdfeetsre ie.Teviewing The sizes. screen different for produced Content left < inyEtrrss Inc. Enterprises, Disney c 0. or 01. right o the for iue8: Figure hnotmzn o utpetre ein,adeaut owhat to evaluate and regions, target multiple for optimizing when h iprt piiainfaeok oti n,ol manipu- only end, this to framework, optimization disparity The et h ou srnee uhta tapasdrcl nfotof front in directly appears it that such rendered is torus The Left: egneaditrxa eaain hsalw o nanalytical an for allows This separation. interaxial and vergence etleauto hwdta u otolri rfre vrnaive over preferred rendering. is is stereoscopic controller controller experi- our our Our that showed resolution, applications. less evaluation real-time HD mental times demanding full for running even at With enough even fast frame space. per target 0.2ms the than in render ef- effects to such transformations depth of temporal complex desired linearization more a for for also in but allows blending distortions fects, method nonlinear of Our config- resulting problem depth. viewing the perceived the and and addressed into parameters have devices optimally stereoscopic we arbitrary content Moreover, for scene range urations. visible depth any that target controller rendering any camera We of stereoscopic capable problem. a optimization is for the an constraints between as derived and mapping depth have viewer-centric perceived the a and defined depth have of solu- scene we basis efficient model, the scene-centric and On interactive, a in effective environments. parameters an 3D camera described dynamic stereoscopic have optimizing for we tion paper this In Conclusion 7 our of effect the investigate perception. further viewer’s the to on like the interpolation On would linearized result. we the hand, influence optimization other the behavior of minima temporal local the extent explore further to optimal. want is we interpolation hand, on-the-fly. linearized one the stereoscopy the if the adjusting On yet Finally, clear for not [2011]. is well it However, work al. in- to et parameters inves- seems stereoscopic Heinzle stereoscopic to tuitively the by a of like one interpolation on the temporal would method linearized as We our such implement rig too. to camera method possibility cameras, our the real prevent tigate camera with would the working manipulate nothing from only convergence, the we and over as separation control However, without movement. environments interactive method camera our for understand Furthermore, better designed control. to stereoscopic is a conducted such be of to effects com- the need viewer studies increase only can Additional is stereoscopy study fort. adaptive our that However, indicator viewer. first the might a for this beneficial more that even evidence be Our encouraging provides remappings. disparity study a nonlinear experimental in complex solution investigate more to a for like would only techniques practi- we environments, is most real-time the it for is solution but cal this compute, While to space. configuration fast two-dimensional very con- is camera that the solution parameters, stereoscopic basic most two the lates Work Future and Limitations 6.3 realism perceived of terms in over disparities. compromising excessive depth less than perceived be of to scaling seems resulting time the and convergence and its dynamically. be appears and changes can screen viewpoint torus settings such the the The controller while behind our guaranteed With Right: length halved. target is original length screen. perceived its the front of behind in seventh appears half one torus other the of the half and Exactly Middle: plane. screen the -z xmlso xc trocpccnrluigormethod. our using control stereoscopic exact of Examples z -z z -z z Acknowledgements JONES,G.,LEE,D.,HOLLIMAN,N., AND EZRA, D. 2001. Con- trolling perceived depth in stereoscopic images. In Stereoscopic The authors are grateful to Martin Banks, Aljoscha Smolic, Tobias Displays And Virtual Reality Systems VIII, 200–1. Pfaff, Alex Stuard, and the anonymous reviewers for their helpful comments and suggestions as well as Wojciech Jarosz for providing KIM,H.J.,CHOI, J. W., CHAING,A.-J., AND YU, K. Y. 2008. the car model. Reconstruction of stereoscopic imagery for visual comfort. In Stereoscopic Displays and Virtual Reality Systems XIV, SPIE Vol. 6803. References KOPPAL,S.J.,ZITNICK,C.L.,COHEN, M. F., KANG,S.B., BACKUS,B.,BANKS,M.S., VAN EE,R., AND CROWELL,J.A. RESSLER,B., AND COLBURN, A. 2011. A viewer-centric ed- 1999. Horizontal and vertical disparity, eye position, and stereo- itor for 3D movies. IEEE Computer Graphics and Applications scopic slant perception. Vision Research 39, 6, 1143–1170. 31, 1, 20–35.

BARES, W. H., GREGOIRE, J. P., AND LESTER, J. C. 1998. Re- LANG,M.,HORNUNG,A.,WANG,O.,POULAKOS,S.,SMOLIC, altime constraint-based cinematography for complex interactive A., AND GROSS, M. H. 2010. Nonlinear disparity mapping for 3D worlds. In In Tenth National Conference on Innovative Ap- stereoscopic 3D. ACM Trans. Graph. 29, 4. plications of Artificial Intelligence, 1101–1106. LIPTON, L. 1982. Foundations of the Stereoscopic Cinema: A BROBERG, D. 2011. Infrastructures for home delivery, interfacing, Study in Depth. Van Nostrand Reinhold Inc.,U.S. captioning, and viewing of 3-d content. Proceedings of the IEEE 99, 4 (april), 684 –693. MASAOKA,K.,HANAZATO,A.,EMOTO,M.,YAMANOUE,H., NOJIRI, Y., AND OKANO, F. 2006. Spatial distortion prediction CHAN, H. P., GOODSITT,M.M.,HELVIE,M.A.,HADJIISKI, system for stereoscopic images. Electronic Imaging 15, 1. L.M.,LYDICK, J. T., ROUBIDOUX,M.A.,BAILEY,J.E., NEES,A.,BLANE,C.E., AND SAHINER, B. 2005. Roc study MEESTERS,L.M.J.,IJSSELSTEIJN, W. A., AND SEUNTIENS, of the effect of stereoscopic imaging on assessment of breast le- P. J. H. 2004. A survey of perceptual evaluations and require- sions. Medical Physics 32, 4, 1001–1009. ments of three-dimensional TV. IEEE Trans. Circuits Syst. Video Techn. 14, 3, 381–391. CHRISTIE,M.,OLIVIER, P., AND NORMAND, J.-M. 2008. Cam- era control in computer graphics. Comput. Graph. Forum 27, 8, OSKAM, T., SUMNER, R. W., THÜREY,N., AND GROSS,M.H. 2197–2218. 2009. Visibility transition planning for dynamic camera control. In Symposium on Computer Animation, 55–65. DAVID, H. A. 1963. The Method of Paired Comparisons. Charles Griffin & Company. PAN,H.,YUAN,C., AND DALY, S. 2011. 3D video disparity scal- ing for preference and prevention of discomfort. In Stereoscopic DIDYK, P., RITSCHEL, T., EISEMANN,E.,MYSZKOWSKI,K., Displays and Applications XXII, SPIE Vol. 7863. AND SEIDEL, H.-P. 2011. A perceptual model for disparity. ACM Trans. Graph. 30, 4, 96. SHIBATA, T., KIM,J.,HOFFMAN,D.M., AND BANKS,M.S. 2011. The zone of comfort: Predicting visual discomfort with FRÖHLICH,B.,BARRASS,S.,ZEHNER,B.,PLATE,J., AND GÖ- stereo displays. Journal of Vision 11, 8. BEL, M. 1999. Exploring geo-scientific data in virtual environ- ments. In IEEE Visualization, 169–173. SMOLIC,A.,KAUFF, P., KNORR,S.,HORNUNG,A.,KUNTER, M.,MÜLLER,M., AND LANG, M. 2011. Three-dimensional GATEAU,S., AND NEUMAN, R. 2010. Stereoscopy from xy to z. video postproduction and processing. Proceedings of the IEEE In SIGGRAPH ASIA Courses. 99, 4 (april), 607 –625.

GLEICHER,M., AND WITKIN, A. 1992. Through-the-lens camera STELMACH,L.B.,TAM, W. J., SPERANZA, F., RENAUD,R., control. SIGGRAPH Comput. Graph. 26 (July), 331–340. AND MARTIN, T. 2003. Improving the visual comfort of stereo- scopic images. In Proc. SPIE 5006, 269. GRESS,A.,GUTHE,M., AND KLEIN, R. 2006. Gpu-based colli- sion detection for deformable parameterized surfaces. Comput. TEKALP,A.M.,SMOLIC,A.,VETRO,A., AND ONURAL,L., Graph. Forum 25, 3, 497–506. Eds. 2011. Special issue on 3-D Media and Displays, vol. 99, 4. Proceedings of the IEEE. GRINBERG, V. S., PODNAR,G., AND SIEGEL, M. 1994. Geom- etry of binocular imaging. In Stereoscopic Displays and Virtual WANG,C., AND SAWCHUK, A. A. 2008. Disparity manipula- Reality Systems, vol. 2177, 56 – 65. tion for stereo images and video. In Stereoscopic Displays and Applications XIX, SPIE Vol. 6803. HAIGH-HUTCHINSON, M. 2009. Real-time cameras. A guide for game designers and developers. Morgan Kaufmann. WATT,S.J.,AKELEY,K.,ERNST,M.O., AND BANKS,M.S. 2005. Focus cues affect perceived depth. Journal of Vision 5, HE,L.,COHEN, M. F., AND SALESIN, D. 1996. The virtual 10. cinematographer: A paradigm for automatic real-time camera control and directing. In SIGGRAPH, 217–224. WOODS,A.,DOCHERTY, T., AND KOCH, R. 1993. Image dis- tortions in stereoscopic video systems. In Stereoscopic Displays HEINZLE,S.,GREISEN, P., GALLUP,D.,CHEN,C.,SANER,D., and Applications IV, Proceedings of the SPIE, vol. 1915. SMOLIC,A.,BURG,A.,MATUSIK, W., AND GROSS,M.H. 2011. Computational stereo camera system with programmable ZILLY, F., KLUGER,J., AND KAUFF, P. 2011. Production rules control loop. ACM Trans. Graph. 30, 4, 94. for stereo acquisition. Proceedings of the IEEE 99, 4 (april), 590 –606. HELD, R. T., AND BANKS, M. S. 2008. Misperceptions in stereo- scopic displays: a vision science perspective. In APGV, 23–32.