Roto++: Accelerating Professional Rotoscoping Using Shape Manifolds*
Total Page:16
File Type:pdf, Size:1020Kb
Roto++: Accelerating Professional Rotoscoping using Shape Manifolds Wenbin Li 1 Fabio Viola 2;∗ Jonathan Starck 2 Gabriel J. Brostow 1 Neill D. F. Campbell 3 University College London 1 The Foundry 2 University of Bath 3 Intermediate Baseline Ours Frame or = Rotoscoping Keyframes 6 Clicks + 6 Drags 1 Select + 1 Drag Final Result Figure 1: An example of our new Roto++ tool working with a professional artist to increase productivity. The artist has already specified a number of keyframes but is not satisfied with one of the intermediate frames. Under standard baselines, correcting the erroneous curve requires moving the individual control points of the spline. Using our new shape model we are able to provide an Intelligent Drag Tool that can generate likely shapes given the other keyframes. In our new interaction, the user simply selects the incorrect points and drags them all towards the correct shape. Our shape model then correctly proposes the new control point locations, allowing the correction to be performed in a single operation. Abstract 1 Introduction Rotoscoping (cutting out different characters/objects/layers in raw Visual effects (VFX) in film, television, and even games are created video footage) is a ubiquitous task in modern post-production and through a process called compositing in the post-production industry. represents a significant investment in person-hours. In this work, we Individual shots are broken down into visual elements that are then study the particular task of professional rotoscoping for high-end, modified and combined, or seamlessly integrated with computer live action movies and propose a new framework that works with generated (CG) imagery. This process of assembling each shot from roto-artists to accelerate the workflow and improve their productivity. different sources requires teams of highly skilled artists working on a Working with the existing keyframing paradigm, our first contribu- range of problems from removing unwanted objects such as rigging; tion is the development of a shape model that is updated as artists modifying appearance to create a specific look; combining 2D live add successive keyframes. This model is used to improve the output action elements from different clips, to integrating 3D elements to of traditional interpolation and tracking techniques, reducing the augment a set; or adding digital effects. A fundamental requirement number of keyframes that need to be specified by the artist. Our of this work is that it is procedural: each effect has to be shared, second contribution is to use the same shape model to provide a new reviewed, and modified with multiple iterations between artists, interactive tool that allows an artist to reduce the time spent editing before being reproduced when rendering the final composite. each keyframe. The more keyframes that are edited, the better the Rotoscoping is the technique used in live-action movies to separate interactive tool becomes, accelerating the process and making the the elements in each shot and allow artists to perform these com- artist more efficient without compromising their control. Finally, we positing tasks. Imagine a set augmentation where a reflection in also provide a new, professionally rotoscoped dataset that enables a window is changed or a set extension where futuristic buildings truly representative, real-world evaluation of rotoscoping methods. are added in the background; these tasks both require separation of We used this dataset to perform a number of experiments, including all the foreground layers in the shot so that the background can be an expert study with professional roto-artists, to show, quantitatively, updated. the advantages of our approach. Professional Rotoscoping Rotoscoping itself is a time- CR Categories: I.3.7 [Computer Graphics]: Methodology and consuming, manual task. It requires the set up of complex shapes to Techniques—Interaction Techniques; I.4.1 [Image Processing and decompose a shot into the different elements and then tracking or Computer Vision]: Scene Analysis—Shape animating the shapes to match the shot. An experienced artist can rotoscope on average 15 frames per day, depending on the complexi- Keywords: Rotoscoping, Video Segmentation, Manifold Models, ty of the scene. In big-budget movies, effect-rich shots are routinely Shape Models rotoscoped to separate every element ready for compositing, and in 2D to 3D conversion all shots must be rotoscoped to allocate each ∗Fabio Viola is currently affiliated with Google DeepMind item a specific depth in the scene. This rotoscoping work is often Permission to make digital or hard copies of all or part of this work for outsourced to dedicated artists, a lengthy process that creates a bot- personal or classroom use is granted without fee provided that copies are not tleneck and dependency in post-production pipeline. These artists made or distributed for profit or commercial advantage and that copies bear are highly trained and very demanding of the tools they use; they this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting by the owner/author(s). Publication rights licensed to ACM. with credit is permitted. To copy otherwise, or republish, to post on servers SIGGRAPH 2016 Technical Paper, July 24 - 28, 2016, Anaheim, CA or to redistribute to lists, requires prior specific permission and/or a fee. ISBN: 978-1-4503-4279-7/16/07 Request permissions from [email protected]. © 2016 Copyright held DOI: http://dx.doi.org/10.1145/2897824.2925973 require complete control over the shapes and contours they produce and suggest which keyframes the artists should edit next. This since they will often have to draw-in details that are obscured in the feedback process further reduces the number of keyframes shot by complications such as motion blur or smoke. needed across a variety of different shots. 2. Intelligent Drag Tool: After consultation with professional Why not Green Screen? The simplest way to achieve the sep- roto-artists we have developed a new interactive tool for editing aration of foreground elements in live-action footage is by using a individual keyframes. Traditionally, global models have been green (or blue) screen background so that the foreground elements avoided in the editing process since artists need to maintain can be extracted through chroma-keying, for example using the work control and do not like local edits to have non-local effects. of Smith and Blinn [1996]. This, however, requires the placement Through careful design, we have produced a new tool that ex- of a screen on-set which interrupts production; it is much simpler ploits the “global” shape information across the shot, from our to shoot in a natural environment. Also, there is risk that a screen, shape model, with a locally constrained editing process. This if poorly located or badly lit, will result in blue or green color spill new tool reduces the amount of time the artists needs to spend on the foreground. Furthermore, a screen cannot be used to separate editing each keyframe. The more keyframes that are edited, the multiple foreground elements or layers which may be present. Thus, better the interactive tool becomes, accelerating the process there is a trade-off between the creative impact of shooting against a and making the artist more efficient without compromising green screen and the cost of rotoscoping in post-production. There their control and precision. is a clear need to be able to shoot in natural scenes without incurring 3. Professional Rotoscoping Dataset: We provide a new, pro- large post-production costs for rotoscoping. fessionally rotoscoped dataset that enables truly representative, real-world evaluation of rotoscoping methods. It comprises a Why not Brush-Based Matting? Image matting is a well stud- large number of shots over a range of complexity ratings repre- ied problem in computer vision. Many techniques have been pro- sentative of a high-end, live-action movie. The ground truth posed based on either a trimap that separates a shot into a known data was generated professional artists and comprises over 700 foreground and known background region with an uncertain region person-days of rotoscoping effort. We used this dataset to per- for the matte, or through a scribble interface where a user can anno- form a number of experiments, including an expert study with tate a shot to define the foreground and background regions. The professional roto-artists to show, quantitatively, the advantages Video SnapCut method of Bai et al. [2009] is an example of the of our approach. state-of-the-art in this approach. While there are many use cases where these approaches perform well, brush or scribble based meth- 2 Background ods are universally not used by artists in our target area of high-end VFX post-production. This is due to the lack of procedural control of Both the interfaces and algorithms underpinning rotoscoping build the generated alpha matte that this interaction paradigm offers. The from research in several domains. We now outline the main con- workflow quickly degrades into an iterative process of defining more nections to color segmentation, contour tracking, and shape models. detailed constraints for the optimization to improve the quality of In examining these works, one should keep in mind the variety the matte, at which point it is simply faster to paint the alpha matte of everyday rotoscoping challenges, including non-rigid objects, by hand. occlusion and out-of-plane rotation, textured surfaces, changing illumination/shadows, and blur caused by motion or defocus. In Keyframe Paradigm The standard interface paradigm used by practice, these situations regularly challenge automatic algorithms, professional roto-artists is that of keyframing. In this framework, the so roto-artists are distrustful of automation, and insist on override artist picks a subset of the video frames in a shot to be keyframes.