Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths
Total Page:16
File Type:pdf, Size:1020Kb
Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths Matthias Grundmann1;2 Vivek Kwatra1 Irfan Essa2 [email protected] [email protected] [email protected] 1Google Research, Mountain View, CA, USA 2Georgia Institute of Technology, Atlanta, GA, USA Abstract smooth camera path, and (3) Synthesizing the stabilized video using the estimated smooth camera path. We present a novel algorithm for automatically applying We address all of the above steps in our work. Our key constrainable, L1-optimal camera paths to generate stabi- contribution is a novel algorithm to compute the optimal lized videos by removing undesired motions. Our goal is to steady camera path. We propose to move a crop window compute camera paths that are composed of constant, lin- of fixed aspect ratio along this path; a path optimized to ear and parabolic segments mimicking the camera motions include salient points and regions, while minimizing an L1- employed by professional cinematographers. To this end, smoothness constraint based on cinematography principles. our algorithm is based on a linear programming framework Our technique finds optimal partitions of smooth paths by to minimize the first, second, and third derivatives of the re- breaking the path into segments of either constant, linear, or sulting camera path. Our method allows for video stabiliza- parabolic motion. It avoids the superposition of these three tion beyond the conventional filtering of camera paths that types, resulting in, for instance, a path that is truly static only suppresses high frequency jitter. We incorporate addi- within a constant segment instead of having small residual tional constraints on the path of the camera directly in our motions. Furthermore, it removes low-frequency bounces, algorithm, allowing for stabilized and retargeted videos. e.g. those originating from a person walking with a camera. Our approach accomplishes this without the need of user We pose our optimization as a Linear Program (LP) subject interaction or costly 3D reconstruction of the scene, and to various constraints, such as inclusion of the crop window works as a post-process for videos from any camera or from within the frame rectangle at all times. Consequently, we an online source. do not perform additional motion inpainting [10,3], which is potentially subject to artifacts. 1. Introduction Related work: Current stabilization approaches employ Video stabilization seeks to create stable versions of casu- key-point feature tracking and linear motion estimation in ally shot video, ideally relying on cinematography princi- the form of 2D transformations, or use Structure from Mo- ples. A casually shot video is usually filmed on a handheld tion (SfM) to estimate the original camera path. From this device, such as a mobile phone or a portable camcorder original shaky camera path, a new smooth camera path is es- with very little stabilization equipment. By contrast, pro- timated by either smoothing the linear motion models [10] fessional cinematographers employ a wide variety of sta- to suppress high frequency jitter, or fitting linear camera bilization tools, such as tripods, camera dollies and steady- paths [3] augmented with smooth changes in velocity to cams. Most optical stabilization systems only dampen high- avoid sudden jerks. If SfM is used to estimate the 3D path frequency jitter and are unable to remove low-frequency of the camera, more sophisticated smoothing and linear fits distortions that occur during handheld panning shots, or for the 3D motion may be employed [8]. videos shot by a walking person. To overcome this limita- To rerender the original video as if it had been shot from tion, we propose an algorithm that produces stable versions a smooth camera path, one of the simplest and most robust of videos by removing undesired motions. Our algorithm approaches is to designate a virtual crop window of pre- works as a post process and can be applied to videos from defined scale. The update transform between the original any camera or from an online source without any knowl- camera path and the smooth camera path is applied to the edge of the capturing device or the scene. crop window, casting the video as if it would have been shot In general, post-process video stabilization [10] consists from the smooth camera path. If the crop window does not of the following three main steps: (1) Estimating the origi- fit within the original frame, undefined out-of-bound areas nal (potentially shaky) camera path, (2) Estimating a new may be visible, requiring motion-inpainting [3, 10]. Addi- 225 Figure 1: Five stills from our video stabilization with saliency constraints using a face detector. Original frames on top, our face-directed final result at the bottom. The resulting optimal path is essentially static in y (the up and down motion of camera is completely eliminated) and composed of linear and parabolic segments in x. Our path centers the object of interest (jumping girl) in the middle of the crop window (bottom row) without sacrificing smoothness of the path. Please see accompanying video. tionally, image-based rendering techniques [1] or light-field We want our computed camera path P (t) to adhere to rendering (if the video was captured by a camera array [13]) these cinematographic characteristics, but choose not to in- can be used to recast the original video. troduce additional cuts beyond the ones already contained While sophisticated methods for 3D camera stabiliza- in the original video. To mimic professional footage, we tion [8] have been recently proposed, the question of how optimize our paths to be composed of the following path the optimal camera path is computed is deferred to the user, segments: either by designing the optimal path by hand or selecting • A constant path, representing a static camera, a single motion model for the whole video (fixed, linear or i.e. DP (t) = 0, D being the differential operator. quadratic), which is then fit to the original path. The work • A path of constant velocity, representing a panning or 2 of Gleicher and Liu [3] was the first to our knowledge to a dolly shot, i.e. D P (t) = 0. use a cinematography-inspired optimization criteria. Beau- • A path of constant acceleration, representing the ease- in and out transition between static and panning cam- tifully motivated, the authors propose a system that creates 3 a camera path using greedy key-frame insertion (based on eras, i.e. D P (t) = 0. To obtain the optimal path composed of distinct constant, a penalty term), with linear interpolation in-between. Their linear and parabolic segments, instead of a superposition of system supports post-process saliency constraints. Our al- them, we cast our optimization as a constrained L1 min- gorithm approximates the input path by multiple, sparse imization problem. L1 optimization has the property that motion models in one unified optimization framework in- the resulting solution is sparse, i.e. it will attempt to sat- cluding saliency, blur and crop window constraints. Re- isfy many of the above properties along the path exactly. cently, Liu et al. [9] introduced a technique that imposes The computed path therefore has derivatives which are ex- subspace constraints [5] on feature trajectories when com- actly zero for most segments. On the other hand, L2 min- puting the smooth paths. However, their method requires imization will satisfy the above properties on average (in long feature tracks over multiple frames. a least-squared sense), which results in small but non-zero Our proposed optimization is related to L1 trend filtering gradients. Qualitatively, the L2 optimized camera path al- [6], which obtains a least square fit, while minimizing the ways has some small non-zero motion (most likely in the second derivate in L1 norm, therefore approximating a set direction of the camera shake), while our L1 optimized path of points with linear path segments. However, our algorithm is only composed of segments resembling a static camera, is more general, as we also allow for constant and parabolic (uniform) linear motion, and constant acceleration. paths (via minimizing the first and third derivate). Figure8 Our goal is to find a camera path P (t) minimizing the shows that we can achieve L1 trend filtering through a par- above objectives while satisfying specific constraints. We ticular weighting for our objective. explore a variety of constraints: Inclusion constraint: A crop window transformed by the 2. L1 Optimal Camera Paths path P (t) should always be contained within the frame rectangle transformed by C(t), the original camera From a cinematographic standpoint, the most pleasant path. When modeled as a hard constraint, this allows viewing experience is conveyed by the use of either static us to perform video stabilization and retargeting while cameras, panning ones mounted on tripods or cameras guaranteeing that all pixels within the crop window placed onto a dolly. Changes between these shot types can contain valid information. be obtained by the introduction of a cut or jerk-free transi- Proximity constraint: The new camera path P (t) should tions, i.e. avoiding sudden changes in acceleration. preserve the original intent of the movie. For example, 226 if the original path contained segments with the camera Residual R1 R2 zooming in, the optimal path should also follow this motion -1 -1 -1 B1 = C 1 P1 B2 = C 2 P2 B3 = C 3 P3 motion, but in a smooth manner. Crop Saliency constraint: Salient points (e.g. obtained by a face window detector or general mode finding in a saliency map) Camera F2 F3 path Ct should be included within all or a specific part of the (known) crop window transformed by P (t). It is advantageous to model this as a soft constraint to prevent tracking C1 C2 C3 of salient points, which in general leads to non-smooth Figure 2: Camera path.