Smooth Loops from Unconstrained Video

Smooth Loops from Unconstrained Video L. Sevilla-Laray1, J. Wulff1, K. Sunkavalli2 and E. Shechtman2 1Max Planck Institute for Intelligent Systems 2Adobe Research Abstract Converting unconstrained video sequences into videos that loop seamlessly is an extremely challenging problem. In this work, we take the first steps towards automating this process by focusing on an important subclass of videos containing a single dominant foreground object. Our technique makes two novel contributions over previous work: first, we propose a correspondence-based similarity metric to automatically identify a good transition point in the video where the appearance and dynamics of the foreground are most consistent. Second, we develop a technique that aligns both the foreground and background about this transition point using a combination of global camera path planning and patch-based video morphing. We demonstrate that this allows us to create natural, compelling, loopy videos from a wide range of videos collected from the internet. 1. Introduction fectly “closing the loop” to seamlessly transition from the last frame back to the first frame (e.g., [Inc, Raj, Car]). Cre- Loopy videos and animated GIFs have gained tremendous ating perfectly loopy clips from casual video footage can be popularity in the last few years with the ease of video cap- highly tedious or even impossible for some videos. It typi- ture, and the introduction of video sharing services like cally involves manually finding the right cut locations in the Vine.co and Instagram.com. More than 100 million people video clip, and aligning the two ends with professional video watch Vine videos every month, and over one billion loops editing tools. The goal of our work is to automate these two are played daily on Vine alone [Vin14]. The typical length steps, and make the process of creating compelling loopy of these videos is surprisingly short – up to six seconds on video clips significantly easier. Vine and 15 on Instagram. These videos are popular in so- cial networks, blogs, digital marketing, music clips and art, The seminal “Video Textures” work by Schödl et because they capture key scene dynamics and can convey al. [SSSE00] proposed an elegant framework to automate a richer meaning than a single photograph, but are more this process for specific types of content. They showed that concise, portable, and sharable than long videos. Most such videos with dynamic texture-like characteristics (such as the videos are created by cutting a short clip from a longer video. flame of a candle) often contain multiple moments with sim- This frequently leads to abrupt changes from the last to the ilar appearance and dynamics that can be used as transition first frame, resulting in an uncomfortable experience when points for creating infinite loopy videos. The camera and watching them played as a loop. One popular “trick” to avoid background in these videos are static. this, is to play the video back-and-forth (by concatenating a copy of the video in reverse order). While this alleviates In this work, we generalize the Video Textures frame- abruptness due to changes in the position of the objects in work to handle videos “in the wild”. These are typically the video, the changes in motion are still abrupt, and often captured by hand-held devices and contain arbitrary camera lead to unrealistic motions due to time-reversal. motion (including translation and zoom) and complex, non- rigid scene dynamics (including human motion). We focus In contrast, artists, animators, and professional photog- on one popular type of content: videos of a dominant moving raphers create strikingly hypnotizing micro-videos by per- foreground, such as a moving person, animal or an object, in front of a roughly static background (small motion in the background is usually fine), captured by a moving camera. y Portions of this work were performed while the first author was Motivated by research on visual attention [FS11] that shows at University of Massachusetts, Amherst that people have higher tolerance to inaccuracies in the pe- L. Sevilla-Lara, J. Wulff, K. Sunkavalli & E. Shechtman / Smooth Loops from Unconstrained Video riphery of the point of attention, our key observation is that, problems that become especially harder in the presence of in many cases, finding moments where only the foreground complex camera motion and scene dynamics. Previous work is similar, is sufficient to produce pleasant looping videos. has tackled this by relying on user input to specify the re- In order to handle such challenging videos we replace both gion of interest [BAAR12, RWSG13], or correspondences the analysis and synthesis components of the Video Textures between pairs of videos [LLN∗14]. These methods cannot framework with new algorithms. During analysis, we find handle regions without correspondences, as often happens moments in the input video where the dominant foreground with the backgrounds in our examples. In addition, the syn- is similar both in appearance and dynamics. We start with a thesized motion trajectories need to be consistent with the rough segmentation of the foreground in a scene. We develop motion in the original footage for the morphed result to look a similarity metric based on this segmentation to robustly natural. Many morphing techniques (for e.g., Shechtman et assess similarity in the motion and appearance of the fore- al. [SRAIS10]) do not account for this, and produce non- ground between two sets of video frames. In the synthesis realistic results for general video sequences. In our work, step, we propose a patch-based method to morph between we compute correspondences between video frames using two video clips using second-order motion constraints for the technique of HaCohen et al. [HSGL11]. We morph the the foreground and automatic temporal gap estimation based background and the foreground separately to account for the on the dynamics of the scene. We show compelling results fact that they might move in different ways. In addition, we on several challenging videos downloaded from the internet, synthesize background motion trajectories using linear inter- as well as comparisons to previous methods. polation (or use linear motion constraints when background correspondences do not exist), while using parabolic constraints to synthesize foreground motion trajectories. Unlike 2. Related Work previous work, this allows us to handle both moving cameras The analysis of scene dynamics in videos is a critical com- and fairly general scene dynamics. ponent of many video editing tasks, and has been studied ex- Video Textures and Cinemagraphs Video Tex- tensively in graphics and vision literature. We focus on the tures [SSSE00, KSE∗03, DCWS03, AZP∗05] create techniques that are particularly relevant to our work. infinitely looping videos by finding similar frames in a Video Transitions Combining multiple video clips is one of video clip (based on image features), and transitioning the most common video editing operations, and film editors between them using morphing. However, these methods have developed a taxonomy of the different kinds of transi- were designed to work on videos that are shot by a static tions (or “cuts”) used to achieve this (see Goldman [Gol07] camera (or a smoothly moving camera), and where the for an overview). While general video editing requires a sig- dynamics are either local (e.g., a swinging candle flame nificant amount of skill and time, it can be (semi-) auto- or flapping flags) or stochastic in nature (e.g., flowing mated in specific instances. Zheng et al. [ZCA∗09] lever- water, fire flames). More recent efforts use spatially varying age information in a light field to automatically author cine- dynamics to handle multiple motions [LJH13] and to matic effects from photographs. Kemelmacher-Shlizerman include manual interaction [TPSK11], but they assume a et al. [KSSGS11] generate smooth animations from per- static camera. Cinemagraphs are a related form of media sonal photo albums by aligning and transitioning between that lie between video and photographs; the salient objects the faces in the photographs. Berthouzoz et al. [BLA12] fo- in the scene remain animated while the surrounding objects are held still. Recent work has proposed interactive tools for cus on editing interview footage, and propose a method that ∗ uses audio-visual features to automatically find good cut lo- their creation [TPSK11,BAAR12,JMD 12]. These methods cations. Most of this previous work is applicable only to work by using one of the video frames for the background specific classes of data (for e.g., faces) and cannot be triv- and pasting the moving foreground on top. The inputs to ially extended to general video sequences. In contrast, our these methods have to be captured using a static camera, technique does not make strong assumptions about the con- the motion is often localized or repetitive in nature, and the tent of the input video sequences, and, to our knowledge, methods require some user interaction. Unlike this previous is the first general purpose approach for synthesizing realis- work, our technique can handle both camera motion and tic video transitions automatically in the presence of camera non-stochastic scene dynamics (including highly structured motion and complex foreground dynamics. motions like human movement). We are able to achieve this by considering the background and foreground separately Video Morphing Transitioning between two shots might re- while finding good transition points, and aligning and quire morphing between the two (especially when the con- morphing them. tent is not properly aligned, or has significant differences). There is a significant amount of literature on image mor- 3. Overview phing [Gom99], and most techniques compute correspondences between images, and construct motion trajectories Given an input video, V, the goal of our method is to use a from these correspondences. Both of these are challenging subset of the original frames and produce a shorter video, c 2015 The Author(s) Computer Graphics Forum c 2015 The Eurographics Association and John Wiley & Sons Ltd.

Load more