Multiperspective Images From Video

Jiwon Kim Kiera Henning Steve Seitz David Salesin

University of Washington

Abstract The composed image can further be edited by adjusting the slice and/or the mapping. The user can change the slice itself by chang- We present a prototype of an interactive authoring tool for render- ing the selected columns, and the mapping can be changed by a ing images from a video input. The input video is considered as a stretch-and-compress operation. To stretch or compress a certain space-time cube of pixel data, and the user interacts with the tool to region of the output image, the user first specifies a region of influ- create the desired image by cutting a smoothly varying slice through ence as a range of x coordinate. The operation will affect only the the cube. Such images exhibit interesting temporal and spatial dis- specified region, and the rest of the image will stay fixed. Then the tortions which cannot be seen in normal perspective images, be- user click-and-drags on a column within that region to stretch and cause it samples pixels over varying time and space. This can be compress either side of the column. useful for non-perspective visualization of a scene, or for creating The rest of the paper is organized as follows. After we discuss unusual artistic effects. The tool may also be extended to produce related work in section 2, we describe the implementation details in an animated output by interpolating between a set of rendered im- section 3. Then we demonstrate some example results we produced ages. using the tool. We conclude the paper with a discussion of possible improvements to be made and future directions. 1 Introduction

A video sequence is a collection of pixel rays emanating from a time-varying set of points in space captured over a certain lapse of time. Therefore, generating an image by sampling pixels from a 2 Related work video input results in an image composed of pixels from different points in time and space. Let us think of a video sequence as a space-time cube of pixel data as shown in Figure 1. If we cut a [2] provided a major inspiration for our work. It suggests the utility smooth surface through the cube and map it onto a 2D image, it of multiperspective images for storytelling purposes. The desired will result in an image with strange yet continuous temporal and images are generated from a video input filmed by a slit-camera spatial distortions which cannot be observed in normal perspective moving along a carefully planned camera path. They also demon- images, as shown in Figure 2. strated a preliminary implementation which allows the user to in- Such images, if carefully generated to obtain the desired effect, teractively create such images from a 3D model. can be useful for various purposes. For example, a video sequence The issue of rendering a multiperspective image from a 3D that records a static scene with a moving camera can be used to model has been studied by a number of researchers in the past. produce an image that contains multiple viewpoints of different ob- [7] addressed the problem of creating a seamless multiperspective jects in the scene. Such a non-perspective image can be an effective panoramic image corresponding to a user-defined camera path. [6] visualization of the scene in a way that is impossible with a perspec- also produced similar images from a 3D model, called multiple- tive image. This was a common trick used in old paintings such as center-of-projection images (MCOP), but their center of interest egyptian murals or paintings from pre-renaissance period. [2] also was using such images for an image-based 3D reconstruction, suggested that images with multiple viewpoints can be used as a rather than visualization. While the above two used the slit-camera new form of storytelling. Moreover, these images may be useful method to produce images, [1] took an object-based approach for purely artistic purposes. In fact, the underlying philosophy of where a multiperspective image is interactively generated by spec- multiperspective images is quite similar to that of cubism. Other ifying different viewing angles for different objects in the scene. types of input movies, such as a dynamic scene filmed by a static This work is similar in spirit to ours in that the aim was to pro- camera, may provide even more interesting effects. vide an interactive multiperspective rendering tool. However, all of Our aim is to design an interactive authoring tool for rendering the above assumed the existence of a 3D model and the freedom to such images from a video input. To create an image, the user first choose a desired camera path or viewing angles. selects groups of pixels from the input frames that s/he wants to have in the final image and drags them over to the desired locations The concept of considering video as a space-time cube of data of the final image. Then the program automatically calculates a was first introduced by Video Cube ([5]), an interactive tool for tak- smooth slice through the video cube which interpolates these pixels, ing arbitrary planar slices through the video cube. The distinction and also a continuous mapping that maps pixels on the slice to their of our work from this is that our tool is not limited to planar slices, desired positions in the final image. Currently, the tool only sup- but allows arbitrary slices that are vertical to the x-t plane. Also, ports 2D slices across the cube that are made of vertical columns of Video Cube is more of a viewing tool that does not provide a lot of input frames, as illustrated in Figure 2. Therefore, the user can only user controls, whereas our tool is designed for interactive authoring. select a single column or a set of multiple adjacent columns from This concept was utilized by [4] to produce a non-photorealistic input frames. If many consecutive columns are selected from the video with different artistic styles from a video input. In particular, same input frame, the slice will look nearly perspective around the multiperspective video outputs such as those resembling the style region where these columns are interpolated. In general, the num- of cubism or photo collage can be generated. However, it focuses ber of consecutive columns is proportional to the local perspectivity more on simulating different artistic styles rather than creating a of the corresponding input frame in the output image. particular multiperspective image. an input frame

y t

x

Figure 1: A video can be considered as a space-time cube of data, with x-y-t axes.

y t

x

output image

Figure 2: A slice cut through the video cube generates a 2D image. 3 Implementation 4 Results

3.1 Slice and mapping computation Using the tool, we produced a variety of images and animations. The types of input video sequences we used varied most signifi- Since the slice is limited to be vertical, i.e., perpendicular to the cantly in the types of motion exhibited by the camera. Below are x-t plane, we can conveniently consider it as a 2D planar curve on selections from our resulting images. the x-t plane instead. When the user specifies the columns to be The pictures in this second group have some clear ties to sev- interpolated, they appear as points or straight lines on the x-t plane. eral art movements from the past century. At the forefront of these We use an interpolating 2D cubic spline to interpolate these points are Surrealism and Cubism. Artists from both of these movements or line segments. Each of the points is considered a control point to often use multiple perspectives to describe some subject matter. be interpolated, as well as the middle point of each line segment. Essentially, both movements sought to more accurately depict the The slope and length of the line segment is used to determine the essence of an object. Cubists usually aimed to accomplish this direction and magnitude of the tangent at its corresponding control specifically by incorporating multiple perspectives, but tended to do point. Since the columns of the same line segment comes from the so in a discontinuous fashion. Cubist pieces tended to look like an same input frame, the line segment is always vertical, and therefore unwrapping of an object geometrically, and because of this, our au- there are two possible directions. The tool allows the user to choose thoring tool is not quite ideal for creating art in a Cubist style. With- one of the directions. The length of the line segment is proportion- out the ability to create cusps in the curve, the transition between ally related to the magnitude of the tangent. The longer the line perspectives is generally too smooth to accomplish this unwrapping segment, the bigger the magnitude of the tangent, thus making the in a stylistically similar way. Surrealists, alternately, did not focus curve more tightly approximate the perspective view of the corre- specifically on unwrapping perspective. Instead, they more ambigu- sponding columns. We use a rather ad-hoc and empirically defined ously defined their goal as trying to show the essence of an object formula for the correlation. As discussed in section 5, its perfor- by showing more aspects than just the visually apparent. This of- mance is not satisfactory and we hope to find a better formula. For ten led to perspective shifts or perspective-related inconsistencies in

single points, the tangent is automatically determined by constrain- Surrealist works. Since the use of perspective in this art movement ¾ ing the neighboring portion of the curve to be C continuous. The were less defined and usually much more subtle, Our smooth video points and lines are interpolated in the order of their desired x co- slicing tool is much better suited to creating pieces in this style. ordinatein the final image. See Figure 3 for a graphical illustration. In Figure 4, the camera moves from the kitchen to the dining Computing the mapping between columns of pixels on the slice room to the living room, which are connected by a common hall- and the output image x coordinate is equivalent to determining how way in an L shape. The final image has the effect of straightening to sample pixels along the 2D curve representing the slice. It is out the L shape into a straight line, so that they can all be seen also done by interpolating a curve through a set of control points. simultaneous in a planar image. The slice curve contains 3 con- In this case, the axes are the x coordinate of the output image and trol points each corresponding to a roughly central position of each the accumulated arc length of the slice curve, instead of x and t. In room, and it weakly approximates the perspective view at those other words, the resulting curve specifies which point on the slice points by forming a steep slope around them for a short distance. curve (at a particular arc length) is mapped to a particular column Such a weak approximation of local perspectivity is preferable to of the output image. Since the arc length is accumulated, the curve a tight approximation for obtaining a smoothly varying unwrapped has to be a monotonically increasing function of output image x view of the entire space as shown in the final image. A few steps coordinate. As mentioned later in section 5, however, our current of stretch-and-compress operations were applied to make all three approach does not guarantee this property all the time. rooms about the same width and shrink the walls between them. The mapping curve starts out with the same control points as The examples in Figure 5 were produced from a sequence cap- the slice curve, i.e., the accumulated arc length and desired x co- tured by a camera rotating around the subject in a circle. Figure

ordinate in the final image of the points or the central point of line 5(a) uses an almost linear curve between two views of the subject ¾ segments. Since these are all single points, a simple C continuous about 90 degrees apart. It creates a partially unwrapped view of the cubic spline is used to interpolate these points. Later on, whenever subject’s head as shown in the result. The control points are all sin- the user performs a stretch-and-compress operation, new control gle points, and do not try to approximate a perspective view at any points are introduced at the endpoints of the region of influence and particular point. Points are preferred to line segments in this case the clicked position. Since it is undesirable to keep increasing the since we want the same amount of contribution from every viewing number of control points indefinitely, a new control point is added angle. only if there is no existing control point within a small proximity. Figure 5(b) was created by first transposing all the input frames, Also, control points within the region of influence are shifted ac- which allowed us to work on a row-by-row basis instead of column- cording to the stretch and compress. The new mapping curve is by-column, then transposing back the resulting image. Proceeding computed using the existing control points (some of which may from top to bottom, it samples rows from input images ranging from have changed) plus the newly added control points. a front view of the subject to a back view. It attempts to maintain a certain degree of perspectivity near the eyes and the mouth, as 3.2 Animation shown by the two steep slopes in the curve. Figure 6 was produced from a sequence filmed from inside an The tool supports animation of the slice curve moving through the elevator going up a 520-feet tall tower. There exists a vertical par- video cube. The user can set keyframes composed of individual allex between the frames, although the amount is rather small be- images created by slicing the cube. After setting the desired num- cause the scene is relatively far away from the camera. The curve ber of keyframes, the user initiates the creation of the animation. assigns near-the-ground view at both ends of the final image, and The control points for both the actual video slice curve as well as as it approaches the center of the image, it samples columns from the sampling distribution curve are used to interpolate new control near-the-top frames. The white rods near the left and right edge of points in each intermediate frame between the keyframes. A sepa- the image come from the frames where the view was briefly blocked rate cubic spline is calculated for each control point in each of the by an architectural structure outside. two curves. The assumption is made that the number of control Figure 7 was created from an input video whose camera mo- points does not vary from keyframe to keyframe. tion was generally from right to left. The subject, a hibiscus, stays point line constraints constraints

x

t

output image

Figure 3: A top-down view of the slice displayed as a curve.

(a) Selected input frames

(b) Output image

(c) Slice curve

Figure 4: (a) Selected input frames

(b) Output image 1

(c) Slice curve 1

(d) Output image 2

(e) Slice curve 2

Figure 5: (a) Selected input frames

(b) Output image

(c) Slice curve

Figure 6: (a) Selected input frames

(b) Output image

(c) Slice curve

Figure 7: primarily near the center of each frame for most of the sequence. desired image has to be completely white, but if we use a naive Several perspectives of the flower can be seen in this image. The bilinear interpolation, we get an image that goes back and forth be- left side of the image samples frames from the end of the video and tween gray and white as Figure 11(c). progresses toward the middle of the video as the image is crossed. Currently the animation feature does not allow the user to specify Three control points on the curve correspond to the flower’s approx- keyframes with differing numbers of control points in either curve. imately three clearly shown perspectives. This leads to problems while trying to create an animation. It is For Figure 8, the direction of the camera motion in the input somewhat unreasonable to expect that a user will know at the be- video can be considered random. There is no main subject of the ginning of the process how many control points will be needed to video, but it features the large white arches over the Pacific Science create the desired animation. This particularly becomes a prob- Center. The slice used to create this image is self intersecting and lem in longer animations. Two solutions have been discussed. One samples from certain moments in time more than once. The result possibility is to store a super set of control points containing every is an image in which certain objects repeat themselves. Specifically, control point from every keyframe. When the user indicates that both sides of the building in the center are sampled from the same they are done creating keyframes, all the keyframes can be recalcu- footage of one side of a building, and the arches over the Pacific lated with the entire super set of control points added. This is not Science Center are shown on both sides of the building. the optimal solution because the new control points introduced into The images in Figure 9 were produced from the same input each keyframe will have adverse effects on the final appearance of video. The video sequence contained just a single occurence of the frame and also because the super set could easily become too the Space Needle. For Figure 9(a), the motion of the camera was cumbersome to use efficiently in computation. An alternate solu- mostly vertical. The camera pans up the view of the Space Needle tion would involve using the Dynamic Time Warp algorithm ([3]) and then back down it again. Each control point in the curve corre- to map control points in each keyframe to control points in the next, sponds roughly to one of the occurences of the tower in the image. allowing for intelligent handling of the insertion of a minimal num- Because the size of the disc at the top changes with perspective in ber of new control points or possibly for removing the restriction each frame, it appears to be fanning out. that each keyframe must have the same number of control points. For Figure 9(b), most of the data in the input video is just clear Removing this restriction would entail calculating certain bounds sky. Because of this, this image is relatively simple despite having within which a control point exists from keyframe to keyframe and many control points. The tower in this image approximately cor- only interpolating its vaule within these bounds. DTW mapping is a responds with the second control point from the left in the curve. relatively intuitive way to recognize in a set of control points which The nature of the video and the curve distort the edges of the disc, of the control points has newly appeared in the keyframe and which giving it some anthropomorphic qualities. has just been deleted. There are a few more minor problems. The slice curve is not guaranteed to stay within the x-t frame even if all the control points 5 Discussion are inside the frame. Currently, the tool simply clamps the portion of the curve that leaves the frame to the frame boundary, but a more

It was our aim in designing the user interface to make the under- elegant solution is needed. Also, the mapping curve is not guaran- ¾ lying curves as transparent as possible to the user, and let the user teed to be a monotonically increasing function. The entire C cubic only interact with the tool at the image level. Judging from our spline is determined by the tangents at the two endpoints, and de- experience with the tool, however, it seems unavoidable to expose pending on these tangents certain intermediate Bezier control points the underlying representation at least to some extent so that the user can end up in undesirable positions. It seems preferable to be able

has more control over what s/he can achieve. It turns out that having to control each tangent independently rather than constraining them ¾

fine control of the curve gives the user more freedom and flexibil- by enforcing C continuity. Lastly, the system is not optimized for ity to generate results. In addition, even with a more intuitive and performance. It does work at interactive rate, but there is definitely abstract user interface, the user still needs to think about the actual more room for speeding up the system. slice that would produce a particular desired effect. Most issues with the current tool seem to be on the technical side. First of all, as mentioned earlier in section 3.1, we need to improve 6 Future work the formula that correlates the length of line segment (or the number of consecutive columns in the selection) with the local perspectiv- In this section, we discuss a few possible extensions in the more ity. Currently, it is a simple linear equation and works reasonably distant future. well for small numbers of consecutive columns, but not for a large One obvious extension of the system is to make it handle arbi- number. Therefore, it may be necessary to look for nonlinear equa- trary 3D surface instead of a vertical 2D slice. Since we need to tions. More importantly, even if the curve itself looks reasonable, solve for both the shape and parameterization of the surface, it may it produces undesirable result especially if two selections are close be appropriate to define the surface as a set of point samples that enough to each other. As illustrated in Figure 10(a), the part of the are mapped to the 2D grid of the output image, and solve a con- curve that connects the two locally perspective regions contains un- strained optimization problem. It will also increase the flexibility desired pixels. It can be fixed by the user by adjusting the sampling of the system if we make it support multiple disconnected slices or so that very few pixels are sampled from that region, but it would discontinuity within a slice. be desirable if the system could automatically take care of it to a Another possible extension is an object-based user interface. certain extent. In addition to reducing samples, it could also try to Compared to the current interface for selecting columns or hand- use a different slice. For example, the slice shown in Figure 10(b) drawn areas of input frames, it would be much more useful and may be preferable because it produces a more smooth image. intuitive to be able to do operations at object-level. Simlpe object- Another problem is the zigzag and ghosting artifacts that are level operations such as translation, rotation and scaling of an object more clearly observed in some examples, as shown in Figure 11(a). are possible candidates to implement, as well as freeform object- It is caused by the way real-valued samples on the slice are interpo- level distortions. With object-level support, it will be easier to im- lated. Currently, the real-valued (x, t) points are sampled by a sim- plement compositing results from multiple input videos. For in- ple bilinear interpolation in x and t. However, if the relative motion stance, recently developed video matting techinique may be used between adjacent frames is considered, clearly it makes more sense to track and extract an object, and composite a distorted version of to interpolate along the optical flow, as shown in Figure 11(b). The it onto a background from another input video. Since objects have (a) Selected input frames

(b) Output image

(c) Slice curve

Figure 8: (a) Selected input frames

(b) Output image 1

(c) Slice curve 1

(d) Output image 2

(e) Slice curve 2

Figure 9: : samples : samples 5 6 7 8 9 x 4 5 6 7 8 5 6 7 8 9 10 3 4 5 6 7 x 4 5 6 7 8 9 4 2 3 5 6 3 4 5 6 7 8 1 2 3 4 5 2 3 4 5 6 7 1 2 3 4 5 6 t t output image output image

2 3 4 665 4 5 7 undesired samples 2 3 4 5 6 7 (a) (b)

Figure 10: Suppose horizontal positions in the scene are numbered 1, 2, ..., 9 from left to right, and the camera filmed the scene also from left to right. Desired output is an image smoothly capturing 2-4 and 5-7. (a) If line constraints are specified for perspective view of 2-4 and 5-7 at left- and rightmost frame, the curve has to go up and down to connect them, and ends up sampling unnecessary samples. (b) If a straight line is used as the slice instead, it successfully captures 2-4 and 5-7 without intervening artifacts.

slice curve slice curve

x x

optical flow direction t t

bilinear optical interpolation flow

output image (a) (b) (c)

Figure 11: (a) Zigzag-looking artifacts are generated in an output image. (b) An example where bilinear sampling makes mistake while sampling along the optical flow direction works correctly. (c) The slice should produce entirely white image, but bilinear sampling produces a wrong image. 2D appearance, it requires increasing the dimension of the slice. Therefore, object-based interface is closely tied to the problem of extending the system to 3D.

References

[1] AGRAWALA, M., ZORIN, D., AND MUNZNER, T. Artistic multiprojection rendering. In Proc. Eurographics Rendering Workshop (2000).

[2] ANDREW GLASSNER. Cubism and cameras: Free-form op- tics for computer graphics. Tech. Rep. MSR-TR-2000-05, Mi- crosoft Research, January 2000.

[3] KEOGH, E. J., AND PAZZANI, M. J. Derivative dynamic time warping. In Proc. First International Conference on Data Min- ing (2001).

[4] KLEIN,A.W.,SLOAN, P.-P. J., COLBURN, R. A., FINKEL- STEIN, A., AND COHEN, M. F. Video cubism. Tech. Rep. MSR-TR-2001-45, Microsoft Research, 2001.

[5] MICROSOFT RESEARCH GRAPHICS GROUP. Video cube, ver- sion 1.0.

[6] RADEMACHER,P.,AND BISHOP, G. Multiple-center-of- projection images. Proceedings of SIGGRAPH 98 (July 1998), 199Ð206.

[7] WOOD,D.N.,FINKELSTEIN, A., HUGHES,J.F.,THAYER, C. E., AND SALESIN, D. H. Multiperspective panoramas for cel animation. In Proc. SIGGRAPH 97 (1997), pp. 243Ð250.