15 Facial Performance Capture

Total Page:16

File Type:pdf, Size:1020Kb

15 Facial Performance Capture Spring 2019 CSCI 621: Digital Geometry Processing 15 Facial Performance Capture Hao Li http://cs621.hao-li.com 1 Performance to Facial Animation 2 Motion Capture 3 Motion Capture 4 Facial Animation in Films 5 Weta Digital Facial Modeling and Scanning 7 Facial Modeling and Scanning acquisition registration initial merging alignment copyright Paramount Pictures 8 Facial Modeling and Scanning 9 High-End 3D Scanning 10 Low-Cost Passive Scanning (AGI Soft) stereo pair 3D scan 11 Low-Cost Active Scanning Microsoft Kinect & Kinect Fusion 12 Rigging & Animation 13 Blendshapes & Correctives for Realism 14 Motion Capture Technologies Sparse Markers Dense Markers Markerless MOVA Image Metrics 15 Using Markers tracking input video input performance with markers retargeting 16 Using Dense Markers 17 Vision-Based Tracking & Texturing LA Noire 18 Typical Animation Workflow in Industry Modeling + Light-weight Motion Cleanup & 3D Scanning Fitting Rigging Capture Key-Framing Source Character Complex Key-Framing + Modeling Retargeting Rigging Proc.+Sim. Target Character 19 Markerless Facial Capture 20 3D Range Sensor Motion can be Captured at the Same Resolution as the Geometry 21 Vapor Ware? (Spatial Phase Imaging) 22 USC ICT Light Stage 5 23 Template Fitting 24 Template Fitting with PCA 25 Template Based Tracking 26 Overview depth for tracking texture for tracking Li et al. Siggraph Asia Borshukov et al. 2003 2009 coupled tracking + texture and depth for reconstruction tracking Valgaerts et al. Siggraph Asia Beeler et al. Siggraph 2011 2012 27 Using Light Stage X Using Light Stage X Using Light Stage X Using Light Stage X Using Light Stage X Requirements for a Practical System 1.Real-time performance 2.Robustness to noise 3.High-level semantics 33 Realtime Facial Capture 34 Why Realtime? VFX/Game Production Virtual Avatars Robotics AR/Virtual Mirror 35 Objective 36 Building Expression Space tracked template input scan 37 Expression PCA for Reduced Dimension Principal Component Analysis = +w1 +w2 +w3 +w4 38 Realtime Systems depth sensor as input with training with little training little to no training Weise et al. SCA 09 Li et al. Siggraph 2010 Weise et al. Siggraph 2011 & Bouaziz et al. Siggraph 2013 reduced calibration and more accessible 39 Realtime Systems video as input without training without training with training Saragih et al. IJCV 2011 Image Metrics 2011 Cao et al. Siggraph 2013 increased accuracy and expressiveness 40 Automatic Facial Rigging 41 Blendshape Animation Blendshape Animation blending weights = B0 + α1B1 + α2B2 + α3B3 + ... laughing neutral face blendshapes SIGGRAPH 2010 42 Blendshape Retargeting laughing ... many blendshapes SIGGRAPH 2010 43 Expression Transfer prior blendshapes ... [Noh & Neumann ’01] [Sumner & Popovic ’04] reconstructed blendshapes ... 44 Problems prior expression transfer ground truth 45 Example Based-Facial Rigging input face ... input training poses Facial Rigging ... prior generic rig blending weights output blendshapes ... ... 46 Bilinear Problem ... input face input examples n B + α B S 0 ij i ⇡ j i=1 X blending weights output blendshapes ... ... 47 Decoupled Optimization n B + α B S 0 ij i ⇡ j i=1 X 48 Decoupled Optimization n B + α B S Step A 0 ij i ⇡ j i=1 X ... n Step B B + α B S 0 ij i ⇡ j i=1 X ... 49 Gradient Domain Optimization n n 2 ˜ 2 S 2 2 argmin B0 + αijBi Sj + β Bi Bi argmin M0 + αijMi Mj +β Mi + M0 Gi M0 k − k k − k M k − k ⇥ − · ⇥ Bi i=1 i i=1 X X 50 Comparison prior without examples with 6 examples input example whistle surprise 51 Directable Facial Animation 3D scans facial tracking 52 Blendshapes for Tracking ICP with Blendshapes 53 Animation Prior 54 Problem: Noisy Input input scans tracking goal 55 Performance-Based Facial Tracking expression model animation prior ... ... input data expression parameters Stable Tracking temporal coherence 56 Animation as Prior reference video 9500 frames artist 57 N-Dim Expression Space smile mouth open 58 Animation Manifold correction ↵t smile mouth open 59 Probabilistic Animation Prior correction ↵t smile mouth open Lau et al. 2009 60 Probabilistic Animation Prior mouth open correction ↵t smile time mouth open 61 Probabilistic Animation Prior correction ↵t smile t M ↵ − mouth open 62 Temporal Joint Probabilistic Distribution M time K t t M t t M T 2 p(α ,..,α − )= π (α ,..,α − µ ,C C + ⇥ I). kN | k k k k kX=1 MPPCA principal Gaussian weights mean model components noise 63 MAP Estimation t t 1 t M ↵ = arg max p(↵ D, ↵ − ,..,↵ − ) ↵ | MPPCA t 1 t M arg max p(D ↵) p(↵, ↵ − ,..,↵ − ) ≈ ↵ | likelihood prior | {z } | {z } V T 2 n (v v⇤) geometry p(G x)= k exp( || i i − i || ) | geo − 2σ2 i=1 geo Y V T 2 I (p p⇤) texture p(I x)= k exp( ||⇥ i i − i || ) | im − 2σ2 i=1 im Y 64 ILM’s Kinect Monster Mirror 65 Fast Calibration Li et al. SIGGRAPH 2013 Facial Performance Capture Li et al. SIGGRAPH 2013 Pipeline Pipeline Overview input output output retargeting retargeting 69 Pipeline Overview input output tracking retargeting 70 Pipeline Overview input output tracking 71 Pipeline Overview input tracking output rigid motion 72 Pipeline Overview input tracking output rigid motion 73 Pipeline Overview input tracking output rigid motion blendshape 74 Pipeline Overview input tracking output rigid motion blendshape 75 Pipeline Overview input tracking output per-vertex rigid motion blendshape deformation initial blendshapes 76 Building Initial Blendshape Model rigid ICP template deformation deformation deformation PCA fitting deformation fitting transfer transfer transfer non-rigid ICP 77 Pipeline Overview input tracking output per-vertex rigid motion blendshape deformation initial blendshapes initial blendshapes 78 Rigid Motion Tracking S ni v c (R, t)= n>(v (R, t) p ) i i i i − i pi S E =min ci (R, t) rigid R,t i X 79 Rigid Motion Tracking input tracking output per-vertex rigid motion blendshape deformation initial blendshapes 80 Blendshape Tracking (0) (l) vi(x)= vi + vi xl Xl S ni u c (x) = n> (vi(x) pi) v j i i − p i i F cj (x)= Hj(uj) P vj(x) E = min (cS(x))2 + w cF(x) 2 bs x i k j k i j X X x [0, 1] l 2 81 Pipeline Overview input tracking output per-vertex rigid motion blendshape deformation initial blendshapes 82 Per-Vertex Deformation P c (∆vi) = (pi vi) ∆v i − − i u j W pi vi cj (∆vj) = Hj(uj) P ∆vj L c (∆v) = L(b0) ∆v I G Q ∆v = a 2 3 L 4 5 83 Fast Laplacian Deformation I G Q ∆v = a 2 3 uj L pi vi 4 5 G K ∆v = a 1 1 ∆v = K>[KK>]− G− a sparse pre-factorized sparse constant in time constant in time trivially inversed 84 Pipeline Overview input tracking output per-vertex rigid motion blendshape deformation initial blendshapes 85 Pipeline Overview input tracking output per-vertex rigid motion blendshape deformation initial blendshapes 86 Pipeline Overview input tracking output per-vertex rigid motion blendshape deformation initial blendshapes 87 Pipeline Overview input tracking output per-vertex rigid motion blendshape adaptive PCA deformation adaptive PCA model initial blendshapes 88 Pipeline Overview input tracking output per-vertex rigid motion blendshape adaptive PCA deformation anchor corrective incremental shapes shapes PCA initial blendshapes adaptive PCA model 89 Tracking Comparison depth map & our Weise et al. ’11 our Weise et al. ’11 2D features tracking tracking retargeting retargeting 90 Tracking Basic Emotions Li et al. SIGGRAPH 2013 anger surprise joy sadness disgust fear input data our method Weise et al. 2011 Faces 2014: Discussion Bouaziz et al. 2013 Cao et al. 2014 Li et al. 2013 input 3D/2D input 2D input 3D/2D no calibration neutral face neutral face blendshape blendshape per-vertex deformation Into the Mainstream Ninja Theory / Epic Games Motivation (Human Digitization) State-of-the-art Digital Humans (Siren) Snapchat Facebook / MSQRD FaceX Robustness FaceX Instant User Switching FaceX Kids State-of-the-Art in Real-Time Facial Tracking Preliminary Findings: Segmentation Microsoft 2015 Pipeline Two-Stream Deconvolution Network Comparison Open Problems USC/ICT Activision Open Problems Melbourne Acting School 2010 Virtual Reality Once upon a dream Virtual Reality Reloaded Oculus VR 2012 / Crytek 2014 Consuming VR NextGen Communication Platform Online Virtual Worlds Oculus Oculus Virtual Training, Simulation, Education Challenges 2016 2021 ▪Accurate simulation/training ▪Building trust, communicate feelings, resolving conflicts Occlusions Preliminary Findings: Segmentation Microsoft 2015 Facial Performance Sensing HMD Facial Performance Sensing HMD Facial Performance Sensing HMD Ultra Thin Flexible Electronic Materials Live Demo HMD Prototype Design Microsoft 2015 Feature-Based Tracking Kazemi and Sullivan, CVPR 2014 Cao et al., SIGGRAPH 2014 High Dimensionality & Non-Linearity occlusions, lighting, expressions, identity sticky lips, biting, visemes, … Deep Learning Model for Facial Expressions non-linear high dimensional low dimensional Online Operation Label Transfer via Audio Alignment Eye Animation Results Retargeting Motivation (Human Digitization) AR Telepresence (Hololens 2) Full Facial Performance Capture Pinscreen Facial Tracker (2018) Pinscreen (2017) Real-Time Lighting Estimation Deep Learning-Based Face Synthesis Deep Learning-Based Face Synthesis Masked Neutral Input Deformed Neutral Generator Synthesized Expression Neutral Expression Normal + Depth Conditional GAN Deep Learning-Based Face Synthesis Deep Learning-Based Face Synthesis Deep Learning-Based Face Synthesis NextGen Photoreal Avatars confidential www.pinscreen.com Full Video Inference Interaction With 3D Avatars Blade Runner 2049 (2017) http://cs621.hao-li.com Thanks! 162.
Recommended publications
  • Learning to Infer Implicit Surfaces Without 3D Supervision
    Learning to Infer Implicit Surfaces without 3D Supervision , , , , Shichen Liuy x, Shunsuke Saitoy x, Weikai Chen (B)y, and Hao Liy x z yUSC Institute for Creative Technologies xUniversity of Southern California zPinscreen {liushichen95, shunsuke.saito16, chenwk891}@gmail.com [email protected] Abstract Recent advances in 3D deep learning have shown that it is possible to train highly effective deep models for 3D shape generation, directly from 2D images. This is particularly interesting since the availability of 3D models is still limited compared to the massive amount of accessible 2D images, which is invaluable for training. The representation of 3D surfaces itself is a key factor for the quality and resolution of the 3D output. While explicit representations, such as point clouds and voxels, can span a wide range of shape variations, their resolutions are often limited. Mesh-based representations are more efficient but are limited by their ability to handle varying topologies. Implicit surfaces, however, can robustly handle complex shapes, topologies, and also provide flexible resolution control. We address the fundamental problem of learning implicit surfaces for shape inference without the need of 3D supervision. Despite their advantages, it remains nontrivial to (1) formulate a differentiable connection between implicit surfaces and their 2D renderings, which is needed for image-based supervision; and (2) ensure precise geometric properties and control, such as local smoothness. In particular, sampling implicit surfaces densely is also known to be a computationally demanding and very slow operation. To this end, we propose a novel ray-based field probing technique for efficient image-to-field supervision, as well as a general geometric regularizer for implicit surfaces, which provides natural shape priors in unconstrained regions.
    [Show full text]
  • CEO & Co-Founder, Pinscreen, Inc. Distinguished Fellow, University Of
    HaoLi CEO & Co-Founder, Pinscreen, Inc. Distinguished Fellow, University of California, Berkeley Pinscreen, Inc. Email [email protected] 11766 Wilshire Blvd, Suite 1150 Home page http://www.hao-li.com/ Los Angeles, CA 90025, USA Facebook http://www.facebook.com/li.hao/ PROFILE Date of birth 17/01/1981 Place of birth Saarbrücken, Germany Citizenship German Languages German, French, English, and Mandarin Chinese (all fluent and no accents) BIO I am CEO and Co-Founder of Pinscreen, an LA-based Startup that builds the world’s most advanced AI-driven virtual avatars, as well as a Distinguished Fellow of the Computer Vision Group at the University of California, Berkeley. Before that, I was an Associate Professor (with tenure) in Computer Science at the University of Southern California, as well as the director of the Vision & Graphics Lab at the USC Institute for Creative Technologies. I work at the intersection between Computer Graphics, Computer Vision, and AI, with focus on photorealistic human digitization using deep learning and data-driven techniques. I’m known for my work on dynamic geometry processing, virtual avatar creation, facial performance capture, AI-driven 3D shape digitization, and deep fakes. My research has led to the Animoji technology in Apple’s iPhone X, I worked on the digital reenactment of Paul Walker in the movie Furious 7, and my algorithms on deformable shape alignment have improved the radiation treatment for cancer patients all over the world. I have been keynote speaker at numerous major events, conferences, festivals, and universities, including the World Economic Forum (WEF) in Davos 2020 and Web Summit 2020.
    [Show full text]
  • Innovations Dialogue. DEEPFAKES, TRUST & INTERNATIONAL SECURITY 25 AUGUST 2021 | 09:00-17:30 CEST
    the 2021 innovations dialogue. DEEPFAKES, TRUST & INTERNATIONAL SECURITY 25 AUGUST 2021 | 09:00-17:30 CEST WELCOME AND OPENING REMARKS 09:00 Robin Geiss United Nations Institute for Disarmament Research KEYNOTE ADDRESS: TRUST AND INTERNATIONAL SECURITY IN THE ERA 09:10 OF DEEPFAKES Nina Schick Tamang Ventures The scene-setting keynote address will explore the importance of trust for international security and stability and shed light on the extent to which the growing deepfake phenomenon could undermine this trust. UNPACKING DEEPFAKES – CREATION AND DISSEMINATION OF DEEPFAKES 09:40 MODERATED BY: Giacomo Persi Paoli United Nations Institute for Disarmament Research FEATURING: Ashish Jaiman Microsoft Hao Li Pinscreen and University of California, Berkeley Katerina Sedova Center for Security and Emerging Technology, Georgetown University This panel will provide a technical overview of how visual, audio and textual deepfakes are generated and disseminated. The key questions it will seek to address include: what are the underlying technologies of deepfakes? How are deepfakes disseminated, particularly on social media and communication platforms? What types of deepfakes currently exist and what is on the horizon? Which actors currently have the means to create and disseminate them? COFFEE BREAK 11:15 UNDERSTANDING THE IMPLICATIONS FOR INTERNATIONAL SECURITY 11:30 AND STABILITY MODERATED BY: Kerstin Vignard United Nations Institute for Disarmament Research FEATURING: Saiffudin Ahmed Nanyang Technological University Alexi Drew RAND Europe Anita Hazenberg INTERPOL Innovation Centre Moliehi Makumane United Nations Institute for Disarmament Research Carmen Valeria Solis Rivera Ministry of Foreign Affairs of Mexico Petr Topychkanov Stockholm International Peace Research Institute This panel will examine the uses of deepfakes and the extent to which they could erode trust and present novel risks for international security and stability.
    [Show full text]
  • Employing Optical Flow on Convolutional Recurrent Structures for Deepfake Detection
    Rochester Institute of Technology RIT Scholar Works Theses 6-13-2020 Employing optical flow on convolutional recurrent structures for deepfake detection Akash Chintha [email protected] Follow this and additional works at: https://scholarworks.rit.edu/theses Recommended Citation Chintha, Akash, "Employing optical flow on convolutional recurrent structures for deepfake detection" (2020). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. Employing optical flow on convolutional recurrent structures for deepfake detection by Akash Chintha June 13, 2020 A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Engineering Department of Computer Engineering Kate Gleason College of Engineering Rochester Institute of Technology Approved By: Dr. Raymond Ptucha Date Primary Advisor - RIT Dept. of Computer Engineering Dr. Matthew Wright Date Secondary Advisor - RIT Dept. of Computing and Information Sciences Dr. Shanchieh Yang Date Secondary Advisor - RIT Dept. of Computer Engineering Akash Chintha's Masters Thesis Document 1 ACKNOWLEDGEMENTS I would like to take this opportunity to thank Dr. Raymond Ptucha and Dr. Matthew Wright for continuous guidance and support in my research. I am also thankful to Dr. Shanchieh Yang for showing interest in my work and being a part of the thesis committee. I would like to thank my coresearchers Saniat Sohrawardi, Aishwarya Rao, Bao Thai, Akhil Santha, and Kartavya Bhatt for assisting me with my thesis research.
    [Show full text]
  • Rapid Avatar Capture and Simulation Using Commodity Depth Sensors
    Rapid Avatar Capture and Simulation using Commodity Depth Sensors Ari Shapiro∗1, Andrew Feng†1, Ruizhe Wang‡2, Hao Li§2, Mark Bolas¶1, Gerard Medionik2, and Evan Suma∗∗1 1Institute for Creative Technologies, University of Southern California 2University of Southern California Abstract objects into a virtual environment in 3D. For ex- We demonstrate a method of acquiring a 3D ample, a table, a room, or work of art can be model of a human using commodity scanning quickly scanned, modeled and displayed within hardware and then controlling that 3D figure in a virtual world with a handheld, consumer scan- a simulated environment in only a few minutes. ner. There is great value to the ability to quickly The model acquisition requires 4 static poses and inexpensively capture real-world objects taken at 90 degree angles relative to each and create their 3D counterparts. While numer- other. The 3D model is then given a skeleton ous generic 3D models are available for low- and smooth binding information necessary for or no-cost for use in 3D environments and vir- control and simulation. The 3D models that tual worlds, it is unlikely that such acquired 3D are captured are suitable for use in applications model matches the real object to a reasonable where recognition and distinction among char- extent without individually modeling the object. acters by shape, form or clothing is important, In addition, the ability to capture specific ob- such as small group or crowd simulations, or jects that vary from the generic counterparts is other socially oriented applications. Due to the valuable for recognition, interaction and com- speed at which a human figure can be captured prehension within a virtual world.
    [Show full text]
  • ARCH: Animatable Reconstruction of Clothed Humans
    ARCH: Animatable Reconstruction of Clothed Humans Zeng Huang1,2,∗ Yuanlu Xu1, Christoph Lassner1, Hao Li2, Tony Tung1 1Facebook Reality Labs, Sausalito, USA 2University of Southern California, USA [email protected], [email protected], [email protected], [email protected], [email protected] Canonical Reconstructed Abstract Input Reposed Reconstruction 3D Avatar In this paper, we propose ARCH (Animatable Recon- struction of Clothed Humans), a novel end-to-end frame- work for accurate reconstruction of animation-ready 3D clothed humans from a monocular image. Existing ap- proaches to digitize 3D humans struggle to handle pose variations and recover details. Also, they do not produce models that are animation ready. In contrast, ARCH is a learned pose-aware model that produces detailed 3D rigged full-body human avatars from a single unconstrained RGB image. A Semantic Space and a Semantic Deformation Field are created using a parametric 3D body estimator. They allow the transformation of 2D/3D clothed humans into a canonical space, reducing ambiguities in geometry caused by pose variations and occlusions in training data. Figure 1. Given an image of a subject in arbitrary pose (left), Detailed surface geometry and appearance are learned us- ARCH creates an accurate and animatable avatar with detailed ing an implicit function representation with spatial local clothing (center). As rigging and albedo are estimated, the avatar can be reposed and relit in new environments (right). features. Furthermore, we propose additional per-pixel su- pervision on the 3D reconstruction using opacity-aware dif- Moreover, the lack of automatic rigging prevents animation- ferentiable rendering. Our experiments indicate that ARCH based applications.
    [Show full text]
  • Headphones Needed: ☒ YES ☐ NO
    Erik Amerikaner.guru Tech In The News Headphones Needed: ☒ YES ☐ NO Step One: Watch this video Step Two: Read the story at the bottom of this document Step Three: Navigate to this website and review the information Step Four: Create a Word document, and list the five W’s (Who,What,Where,When,Why) of the story. Why is this technology potentially so important to society? What are the pros/cons of this technology? Can it be used for good and/or evil? What should a ‘consumer’ of this technology be aware of? If you were showing this technology to a young person, how would you describe it? Do you think this technology should be developed? Why/why not? Submit Your PinScreen_your name Assignment : To Mr. Amerikaner Gdrive Using: 2/19/2018 Fake videos are on the rise. As they become more realistic, seeing shouldn't always be believing ADVERTISEMENT 12 WEEKS FOR 99¢ TOPICS LOG IN Sale ends 2/28 SPECIAL OFFER | 12 WEEKS FOR 99¢ Former Trump aide Richard Netanyahu in deeper peril as L.A. County's homeless problem Toyota Prius so Gates to plead guilty; agrees to more Israeli officials are arrested is worsening despite billions reduce fuel effi TECHNOLOGY BUSINESS LA TIMES testify against Manafort, sourc… on corruption charges from tax measures say Fake videos are on the rise. As they become more realistic, seeing shouldn't always be believing By DAVID PIERSON FEB 19, 2018 | 4:00 AM 12 WEEKS FOR ONLY 99¢ SPECIAL OFFER SAVE NOW Hurry, offer ends 2/28 Pinscreen, a Los Angeles start-up, uses real-time capture technology to make photorealistic avatars.
    [Show full text]
  • Chloe A. Legendre E-Mail: [email protected] Cell: (443) 690-6924
    Chloe A. LeGendre E-mail: [email protected] Cell: (443) 690-6924 www.chloelegendre.com RESEARCH I develop and apply computer vision and computational photography techniques to STATEMENT solve problems in computer graphics, towards the goal of creating compelling, photo- realistic content for people to enjoy. RESEARCH Computational Photography, Appearance Capture, Lighting Capture, Color Imaging & INTERESTS Measurement, Computer Vision for Computer Graphics EDUCATION University of Southern California, Los Angeles, CA August 2015 - May 2019 Ph.D., Computer Science; GPA: 3.82 Dissertation: ”Compositing Real and Virtual Objects with Realistic, Color-Accurate Illumination” Stevens Institute of Technology, Hoboken, NJ September 2012 - May 2015 M.S., Computer Science; GPA: 4.00 University of Pennsylvania, Philadelphia, PA September 2005 - May 2009 B.S. in Engineering; GPA: 3.69 PROFESSIONAL Google Research, Los Angeles, CA EXPERIENCE Senior Software Engineer - Computational Photography June 2020 - present • Core technology research and development for computational photography. • Developed machine learning based lighting estimation module for the Portrait Light feature, a hero launch for the Google Pixel 2020 phone in both Google Camera and Google Photos. Google Daydream (VR/AR), Los Angeles, CA Software Engineer - Augmented Reality June 2019 - June 2020 • Core technology research and development for mobile AR. Student Researcher - Augmented Reality December 2017 - May 2019 • ML-based lighting estimation technique was published at CVPR 2019 and shipped as the Environmental HDR Lighting Estimation module for ARCore in June 2019. • Trained ML models using distributed Tensorflow environment. Software Engineering Intern - Augmented Reality June 2017 - August 2017 • Developed mobile AR lighting estimation techniques for Google Pixel Phone’s AR Stickers camera mode (Playground).
    [Show full text]
  • Protecting World Leaders Against Deep Fakes
    Protecting World Leaders Against Deep Fakes Shruti Agarwal and Hany Farid University of California, Berkeley Berkeley CA, USA {shruti agarwal, hfarid}@berkeley.edu Yuming Gu, Mingming He, Koki Nagano, and Hao Li University of Southern California / USC Institute for Creative Technologies Los Angeles CA, USA {ygu, he}@ict.usc.edu, [email protected], [email protected] Abstract peared [5], and used to create non-consensual pornography in which one person’s likeliness in an original video is re- The creation of sophisticated fake videos has been placed with another person’s likeliness [13]; (2) lip-sync, in largely relegated to Hollywood studios or state actors. Re- which a source video is modified so that the mouth region is cent advances in deep learning, however, have made it sig- consistent with an arbitrary audio recording. For instance, nificantly easier to create sophisticated and compelling fake the actor and director Jordan Peele produced a particularly videos. With relatively modest amounts of data and com- compelling example of such media where a video of Presi- puting power, the average person can, for example, create a dent Obama is altered to say things like “President Trump is video of a world leader confessing to illegal activity leading a total and complete dip-****.”; and (3) puppet-master, in to a constitutional crisis, a military leader saying something which a target person is animated (head movements, eye racially insensitive leading to civil unrest in an area of mil- movements, facial expressions) by a performer sitting in itary activity, or a corporate titan claiming that their profits front of a camera and acting out what they want their puppet are weak leading to global stock manipulation.
    [Show full text]
  • Arxiv:2012.02512V3 [Cs.CV] 20 Aug 2021 a High Level of Realism
    ID-Reveal: Identity-aware DeepFake Video Detection Davide Cozzolino1 Andreas Rossler¨ 2 Justus Thies2,3 Matthias Nießner2 Luisa Verdoliva1 1University Federico II of Naples 2Technical University of Munich 3Max Planck Institute for Intelligent Systems, Tubingen¨ Abstract A major challenge in DeepFake forgery detection is that state-of-the-art algorithms are mostly trained to detect a specific fake method. As a result, these approaches show poor generalization across different types of facial manip- ulations, e.g., from face swapping to facial reenactment. To this end, we introduce ID-Reveal, a new approach that learns temporal facial features, specific of how a person moves while talking, by means of metric learning coupled with an adversarial training strategy. The advantage is that Figure 1: ID-Reveal is an identity-aware DeepFake video detec- we do not need any training data of fakes, but only train on tion. Based on reference videos of a person, we estimate a tem- real videos. Moreover, we utilize high-level semantic fea- poral embedding which is used as a distance metric to detect fake tures, which enables robustness to widespread and disrup- videos. tive forms of post-processing. We perform a thorough exper- imental analysis on several publicly available benchmarks. Compared to state of the art, our method improves gener- basis. As a result, supervised detection, which requires ex- alization and is more robust to low-quality videos, that are tensive training data of a specific forgery method, cannot usually spread over social networks. In particular, we ob- immediately detect a newly-seen forgery type. tain an average improvement of more than 15% in terms of This mismatch and generalization issue has been ad- accuracy for facial reenactment on high compressed videos.
    [Show full text]
  • Cultural Leaders Annual Meeting of the New Champions 2019
    Cultural Leaders Annual Meeting of the New Champions 2019 This year at the Annual Meeting of the New Champions in Dalian, a select Introduction group of Cultural Leaders will lend their unique voices to broaden the dialogue and reflect on the complexity of some of the world’s most pressing issues. As innovators and pioneers in their respective fields, these artists will share their vision and values to shape a new era of globalization by taking part in multiple sessions and interactive experiences as part of the official programme. Cultural Leaders - Annual Meeting of the New Champions 2019 2 Cultural Leaders Jesse Appell, People’s Republic of China/USA Jesse Appell is a Beijing-based Xiangsheng performer and standup comedian, committed to using the power of humor as a means to bridge cultural divides between China and the Western world. His website LaughBeijing serves as a platform for his unique combination of Chinese and Western comedic styles as well as the proliferation of comedy as a means of intercultural communication. Speaking roles in the official programme: The Creativity Lab 1 July, 09.15 – 10.00 Betazone Promoting Responsible Leadership 1 July, 12.15 – 13.30 Studio The Art of Influence 1 July, 17.00 – 18.00 Salon Andrea Bandelli, Ireland Andrea Bandelli is the Executive Director of Science Gallery International. He has led and managed international projects linked with science, art, democracy and civic participation, and widely writes on public engagement with science. Bandelli is also a strong advocate for diversity and inclusion in STEM, particularly on LGBTI inclusion. He is a member of the World Economic Forum’s Expert Network.
    [Show full text]
  • View the Revised S2018 Advance Program
    PLAN YOUR EXPERIENCE ADVANCE PROGRAM The 45th International Conference & Exhibition on Computer Graphics and Interactive Techniques TABLE OF CONTENTS SCHEDULE AT A GLANCE ................................................... 3 CURATED CONTENT REASONS TO ATTEND ......................................................... 6 SIGGRAPH 2018 offers several events and sessions that are individually chosen by program chairs to CONFERENCE OVERVIEW ...................................................7 address specific topics in computer graphics and interactive techniques. CONFERENCE SCHEDULE ................................................ 10 Curated content is not selected through the regular APPY HOUR ..........................................................................19 channels of a comprehensive jury. ART GALLERY ......................................................................20 ART PAPERS........................................................................23 INTEREST AREAS SIGGRAPH brings together a wide variety of BUSINESS SYMPOSIUM ...................................................25 professionals who approach computer graphics and COMPUTER ANIMATION FESTIVAL: interactive techniques from different perspectives. ELECTONIC THEATER ........................................................26 Our programs and events align with five broad interest areas (listed below). Use these interest areas to help COMPUTER ANIMATION FESTIVAL: VR THEATER ........ 27 guide you through the content at SIGGRAPH 2018. COURSES .............................................................................28
    [Show full text]