Deep Learning for Deepfakes Creation and Detection: a Survey Thanh Thi Nguyen, Quoc Viet Hung Nguyen, Cuong M

Total Page:16

File Type:pdf, Size:1020Kb

Deep Learning for Deepfakes Creation and Detection: a Survey Thanh Thi Nguyen, Quoc Viet Hung Nguyen, Cuong M 1 Deep Learning for Deepfakes Creation and Detection: A Survey Thanh Thi Nguyen, Quoc Viet Hung Nguyen, Cuong M. Nguyen, Dung Nguyen, Duc Thanh Nguyen, Saeid Nahavandi, Fellow, IEEE Abstract—Deep learning has been successfully applied I. INTRODUCTION to solve various complex problems ranging from big data analytics to computer vision and human-level control. Deep In a narrow definition, deepfakes (stemming from learning advances however have also been employed to cre- “deep learning” and “fake”) are created by techniques ate software that can cause threats to privacy, democracy that can superimpose face images of a target person onto and national security. One of those deep learning-powered a video of a source person to make a video of the target applications recently emerged is deepfake. Deepfake al- gorithms can create fake images and videos that humans person doing or saying things the source person does. cannot distinguish them from authentic ones. The proposal This constitutes a category of deepfakes, namely face- of technologies that can automatically detect and assess the swap. In a broader definition, deepfakes are artificial integrity of digital visual media is therefore indispensable. intelligence-synthesized content that can also fall into This paper presents a survey of algorithms used to create two other categories, i.e., lip-sync and puppet-master. deepfakes and, more importantly, methods proposed to Lip-sync deepfakes refer to videos that are modified to detect deepfakes in the literature to date. We present make the mouth movements consistent with an audio extensive discussions on challenges, research trends and recording. Puppet-master deepfakes include videos of a directions related to deepfake technologies. By reviewing target person (puppet) who is animated following the the background of deepfakes and state-of-the-art deepfake detection methods, this study provides a comprehensive facial expressions, eye and head movements of another overview of deepfake techniques and facilitates the devel- person (master) sitting in front of a camera [1]. opment of new and more robust methods to deal with the While some deepfakes can be created by traditional increasingly challenging deepfakes. visual effects or computer-graphics approaches, the re- Impact Statement—This survey provides a timely cent common underlying mechanism for deepfake cre- overview of deepfake creation and detection methods ation is deep learning models such as autoencoders and presents a broad discussion of challenges, potential and generative adversarial networks, which have been trends, and future directions. We conduct the survey applied widely in the computer vision domain [2]–[4]. with a different perspective and taxonomy compared to existing survey papers on the same topic. Informative These models are used to examine facial expressions graphics are provided for guiding readers through the and movements of a person and synthesize facial images latest development in deepfake research. The methods of another person making analogous expressions and surveyed are comprehensive and will be valuable to the movements [5]. Deepfake methods normally require a arXiv:1909.11573v3 [cs.CV] 26 Apr 2021 artificial intelligence community in tackling the current large amount of image and video data to train models challenges of deepfakes. to create photo-realistic images and videos. As public figures such as celebrities and politicians may have a Keywords: deepfakes, face manipulation, artificial intelli- large number of videos and images available online, they gence, deep learning, autoencoders, GAN, forensics, survey. are initial targets of deepfakes. Deepfakes were used to swap faces of celebrities or politicians to bodies in porn T. T. Nguyen and D. T. Nguyen are with the School of Information Technology, Deakin University, Victoria, Australia. images and videos. The first deepfake video emerged in Q. V. H. Nguyen is with School of Information and Communication 2017 where face of a celebrity was swapped to the face Technology, Griffith University, Queensland, Australia. of a porn actor. It is threatening to world security when C. M. Nguyen is with LAMIH UMR CNRS 8201, Universite´ Polytechnique Hauts-de-France, 59313 Valenciennes, France. deepfake methods can be employed to create videos D. Nguyen is with Faculty of Information Technology, Monash of world leaders with fake speeches for falsification University, Victoria, Australia. purposes [6]–[8]. Deepfakes therefore can be abused to S. Nahavandi is with the Institute for Intelligent Systems Research cause political or religion tensions between countries, to and Innovation, Deakin University, Victoria, Australia. Corresponding e-mail: [email protected] (T. T. fool public and affect results in election campaigns, or Nguyen). create chaos in financial markets by creating fake news 2 [9]–[11]. It can be even used to generate fake satellite Deepfake Detection Challenge to catalyse more research images of the Earth to contain objects that do not really and development in detecting and preventing deepfakes exist to confuse military analysts, e.g., creating a fake from being used to mislead viewers [25]. Data obtained bridge across a river although there is no such a bridge from https://app.dimensions.ai at the end of 2020 show in reality. This can mislead a troop who have been guided that the number of deepfake papers has increased signif- to cross the bridge in a battle [12], [13]. icantly in recent years (Fig. 1). Although the obtained As the democratization of creating realistic digital numbers of deepfake papers may be lower than actual humans has positive implications, there is also positive numbers but the research trend of this topic is obviously use of deepfakes such as their applications in visual increasing. effects, digital avatars, snapchat filters, creating voices of those who have lost theirs or updating episodes of movies without reshooting them [14]. However, the number of malicious uses of deepfakes largely dominates that of the positive ones. The development of advanced deep neural networks and the availability of large amount of data have made the forged images and videos almost indistinguishable to humans and even to sophisticated computer algorithms. The process of creating those ma- nipulated images and videos is also much simpler today as it needs as little as an identity photo or a short video of a target individual. Less and less effort is required to produce a stunningly convincing tempered footage. Recent advances can even create a deepfake with just a still image [15]. Deepfakes therefore can be a threat Fig. 1. Number of papers related to deepfakes in years from 2016 to 2020, obtained from https://app.dimensions.ai at the end of 2020 affecting not only public figures but also ordinary people. with the search keyword “deepfake” applied to full text of scholarly For example, a voice deepfake was used to scam a CEO papers. The number of such papers in 2018, 2019 and 2020 are 64, out of $243,000 [16]. A recent release of a software 368 and 1268, respectively. called DeepNude shows more disturbing threats as it can transform a person to a non-consensual porn [17]. This paper presents a survey of methods for creating Likewise, the Chinese app Zao has gone viral lately as well as detecting deepfakes. There have been existing as less-skilled users can swap their faces onto bodies survey papers about this topic in [26]–[28], we however of movie stars and insert themselves into well-known carry out the survey with different perspective and taxon- movies and TV clips [18]. These forms of falsification omy. In Section II, we present the principles of deepfake create a huge threat to violation of privacy and identity, algorithms and how deep learning has been used to and affect many aspects of human lives. enable such disruptive technologies. Section III reviews Finding the truth in digital domain therefore has different methods for detecting deepfakes as well as their become increasingly critical. It is even more challeng- advantages and disadvantages. We discuss challenges, ing when dealing with deepfakes as they are majorly research trends and directions on deepfake detection and used to serve malicious purposes and almost anyone multimedia forensics problems in Section IV. can create deepfakes these days using existing deepfake tools. Thus far, there have been numerous methods II. DEEPFAKE CREATION proposed to detect deepfakes [19]–[23]. Most of them Deepfakes have become popular due to the quality are based on deep learning, and thus a battle between of tampered videos and also the easy-to-use ability of malicious and positive uses of deep learning methods their applications to a wide range of users with various has been arising. To address the threat of face-swapping computer skills from professional to novice. These ap- technology or deepfakes, the United States Defense plications are mostly developed based on deep learning Advanced Research Projects Agency (DARPA) initiated techniques. Deep learning is well known for its capability a research scheme in media forensics (named Media of representing complex and high-dimensional data. One Forensics or MediFor) to accelerate the development variant of the deep networks with that capability is of fake digital visual media detection methods [24]. deep autoencoders, which have been widely applied Recently, Facebook Inc. teaming up with Microsoft Corp for dimensionality reduction and image compression and the Partnership on AI coalition have launched the [29]–[31]. The first attempt of deepfake creation was 3 TABLE I SUMMARY OF NOTABLE DEEPFAKE TOOLS Tools Links Key Features Faceswap https://github.com/deepfakes/faceswap - Using two encoder-decoder pairs. - Parameters of the encoder are shared. Faceswap-GAN https://github.com/shaoanlu/faceswap-GAN Adversarial loss and perceptual loss (VGGface) are added to an auto-encoder archi- tecture. Few-Shot Face https://github.com/shaoanlu/fewshot-face- - Use a pre-trained face recognition model to extract latent embeddings for GAN Translation translation-GAN processing. - Incorporate semantic priors obtained by modules from FUNIT [42] and SPADE [43].
Recommended publications
  • Artificial Intelligence in Health Care: the Hope, the Hype, the Promise, the Peril
    Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril Michael Matheny, Sonoo Thadaney Israni, Mahnoor Ahmed, and Danielle Whicher, Editors WASHINGTON, DC NAM.EDU PREPUBLICATION COPY - Uncorrected Proofs NATIONAL ACADEMY OF MEDICINE • 500 Fifth Street, NW • WASHINGTON, DC 20001 NOTICE: This publication has undergone peer review according to procedures established by the National Academy of Medicine (NAM). Publication by the NAM worthy of public attention, but does not constitute endorsement of conclusions and recommendationssignifies that it is the by productthe NAM. of The a carefully views presented considered in processthis publication and is a contributionare those of individual contributors and do not represent formal consensus positions of the authors’ organizations; the NAM; or the National Academies of Sciences, Engineering, and Medicine. Library of Congress Cataloging-in-Publication Data to Come Copyright 2019 by the National Academy of Sciences. All rights reserved. Printed in the United States of America. Suggested citation: Matheny, M., S. Thadaney Israni, M. Ahmed, and D. Whicher, Editors. 2019. Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril. NAM Special Publication. Washington, DC: National Academy of Medicine. PREPUBLICATION COPY - Uncorrected Proofs “Knowing is not enough; we must apply. Willing is not enough; we must do.” --GOETHE PREPUBLICATION COPY - Uncorrected Proofs ABOUT THE NATIONAL ACADEMY OF MEDICINE The National Academy of Medicine is one of three Academies constituting the Nation- al Academies of Sciences, Engineering, and Medicine (the National Academies). The Na- tional Academies provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions.
    [Show full text]
  • Grease to Open at Scotch Plains-Fanwood High School Westfield
    A WATCHUNG COMMUNICATIONS, INC. PUBLICATION The Westfield Leader and The Scotch Plains – Fanwood TIMES Thursday, March 12, 2015 Page 19 Poetry Series to Host Chloe Honum, Susan Lembo Balik FANWOOD – The Carriage House Me was published last summer by Poetry Series in Fanwood will present Garden Oak Press. Susan is Associ- readings by two distinguished poets ate Director of Cultural Affairs at on Tuesday, March 17. The free po- Passaic County Community College etry performances by Chloe Honum in Paterson, New Jersey (home of and Susan Lembo Balik will begin the Poetry Center). She has a Master’s promptly at 8 p.m. in the Patricia degree in journalism from New York Kuran Arts Center on Watson Road, University and has worked as a news- off North Martine Avenue, adjacent paper feature writer and columnist. to Fanwood Borough Hall (GPS use Her poems have appeared in The 75 N. Martine Avenue). Paterson Literary Review, Lips Chloe Honum is the author of The Magazine, Paddlefish, Tiferet Jour- Tulip-Flame published by the Cleve- nal, and the San Diego Poetry An- land State University Poetry Center. nual. Her poems have appeared in The Paris The Carriage House Poetry Series Review, Poetry, and The Southern is currently in its seventeenth year at Review, among other journals, and in the Kuran Arts Center, an historic the 2008 and 2010 editions of “Best Gothic Revival structure that was once AND THE BAND PLAYS ON...Dr. Thomas Connors, Music Director, conducts New Poets.” Chloe is the recipient of a 19th century carriage house, hence the Westfield Community Concert Band.
    [Show full text]
  • Real Vs Fake Faces: Deepfakes and Face Morphing
    Graduate Theses, Dissertations, and Problem Reports 2021 Real vs Fake Faces: DeepFakes and Face Morphing Jacob L. Dameron WVU, [email protected] Follow this and additional works at: https://researchrepository.wvu.edu/etd Part of the Signal Processing Commons Recommended Citation Dameron, Jacob L., "Real vs Fake Faces: DeepFakes and Face Morphing" (2021). Graduate Theses, Dissertations, and Problem Reports. 8059. https://researchrepository.wvu.edu/etd/8059 This Thesis is protected by copyright and/or related rights. It has been brought to you by the The Research Repository @ WVU with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you must obtain permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/ or on the work itself. This Thesis has been accepted for inclusion in WVU Graduate Theses, Dissertations, and Problem Reports collection by an authorized administrator of The Research Repository @ WVU. For more information, please contact [email protected]. Real vs Fake Faces: DeepFakes and Face Morphing Jacob Dameron Thesis submitted to the Benjamin M. Statler College of Engineering and Mineral Resources at West Virginia University in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering Xin Li, Ph.D., Chair Natalia Schmid, Ph.D. Matthew Valenti, Ph.D. Lane Department of Computer Science and Electrical Engineering Morgantown, West Virginia 2021 Keywords: DeepFakes, Face Morphing, Face Recognition, Facial Action Units, Generative Adversarial Networks, Image Processing, Classification.
    [Show full text]
  • Synthetic Video Generation
    Synthetic Video Generation Why seeing should not always be believing! Alex Adam Image source https://www.pocket-lint.com/apps/news/adobe/140252-30-famous-photoshopped-and-doctored-images-from-across-the-ages Image source https://www.pocket-lint.com/apps/news/adobe/140252-30-famous-photoshopped-and-doctored-images-from-across-the-ages Image source https://www.pocket-lint.com/apps/news/adobe/140252-30-famous-photoshopped-and-doctored-images-from-across-the-ages Image source https://www.pocket-lint.com/apps/news/adobe/140252-30-famous-photoshopped-and-doctored-images-from-across-the-ages Image Tampering Historically, manipulated Off the shelf software (e.g imagery has deceived Photoshop) exists to do people this now Has become standard in Public have become tabloids/social media somewhat numb to it - it’s no longer as impactful/shocking How does machine learning fit in? Advent of machine learning Video manipulation is now has made image also tractable with enough manipulation even easier data and compute Can make good synthetic Public are largely unaware of videos using a gaming this and the danger it poses! computer in a bedroom Part I: Faceswap ● In 2017, reddit (/u/deepfakes) posted Python code that uses machine learning to swap faces in images/video ● ‘Deepfake’ content flooded reddit, YouTube and adult websites ● Reddit since banned this content (but variants of the code are open source https://github.com/deepfakes/faceswap) ● Autoencoder concepts underlie most ‘Deepfake’ methods Faceswap Algorithm Image source https://medium.com/@jonathan_hui/how-deep-learning-fakes-videos-deepfakes-and-how-to-detect-it-c0b50fbf7cb9 Inference Image source https://medium.com/@jonathan_hui/how-deep-learning-fakes-videos-deepfakes-and-how-to-detect-it-c0b50fbf7cb9 ● Faceswap model is an autoencoder.
    [Show full text]
  • 4C Puppet Love
    ting it to blow perfect smoke rings, until ONWARD AND UPWARD WITH THE ARTS the stage manager ordered them back. She was the heavy. “Take five!” “On- stage!” Twist was soft-spoken and pa- PUPPET LOVE tient, but he looked nervous. The show dates, April 12th and 13th, were only six The artistry of Basil Twist. weeks away. “The crucial point about puppets,” BY JOAN ACOCELLA Twist told me, “is that they are real and unreal at the same time.” At the begin- asil Twist, one of this country’s pre- those people were in North Carolina. ning of the twentieth century, many mier puppeteers, is preparing a Whereas the theatre in Chapel Hill has writers and visual artists (Alfred Jarry, pieceB to Stravinsky’s world-shaking bal- fifty-five line sets (stage-wide pipes in Paul Klee, Oskar Schlemmer, Sophie let score “The Rite of Spring” for the the flies, from which you can hang Taeuber-Arp), looking for something Carolina Performing Arts festival “The props and curtains), the church had just that was a little bit human, but much Rite of Spring at 100,” in Chapel Hill. one pole, installed on ropes by Twist’s more art, made puppets, or works for At the end of February, I went to a de- crew. But the cast was game. During the puppets. The trend continues. Opera consecrated church in Bushwick to see action, Twist stood behind a table, play- now routinely supplements its human how the rehearsals were going. The ing the score on a laptop, but he kept his casts with puppets, as in the Metropol- Twist in the studio.
    [Show full text]
  • CSE 152: Computer Vision Manmohan Chandraker
    CSE 152: Computer Vision Manmohan Chandraker Lecture 15: Optimization in CNNs Recap Engineered against learned features Label Convolutional filters are trained in a Dense supervised manner by back-propagating classification error Dense Dense Convolution + pool Label Convolution + pool Classifier Convolution + pool Pooling Convolution + pool Feature extraction Convolution + pool Image Image Jia-Bin Huang and Derek Hoiem, UIUC Two-layer perceptron network Slide credit: Pieter Abeel and Dan Klein Neural networks Non-linearity Activation functions Multi-layer neural network From fully connected to convolutional networks next layer image Convolutional layer Slide: Lazebnik Spatial filtering is convolution Convolutional Neural Networks [Slides credit: Efstratios Gavves] 2D spatial filters Filters over the whole image Weight sharing Insight: Images have similar features at various spatial locations! Key operations in a CNN Feature maps Spatial pooling Non-linearity Convolution (Learned) . Input Image Input Feature Map Source: R. Fergus, Y. LeCun Slide: Lazebnik Convolution as a feature extractor Key operations in a CNN Feature maps Rectified Linear Unit (ReLU) Spatial pooling Non-linearity Convolution (Learned) Input Image Source: R. Fergus, Y. LeCun Slide: Lazebnik Key operations in a CNN Feature maps Spatial pooling Max Non-linearity Convolution (Learned) Input Image Source: R. Fergus, Y. LeCun Slide: Lazebnik Pooling operations • Aggregate multiple values into a single value • Invariance to small transformations • Keep only most important information for next layer • Reduces the size of the next layer • Fewer parameters, faster computations • Observe larger receptive field in next layer • Hierarchically extract more abstract features Key operations in a CNN Feature maps Spatial pooling Non-linearity Convolution (Learned) . Input Image Input Feature Map Source: R.
    [Show full text]
  • Automated Elastic Pipelining for Distributed Training of Transformers
    PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers Chaoyang He 1 Shen Li 2 Mahdi Soltanolkotabi 1 Salman Avestimehr 1 Abstract the-art convolutional networks ResNet-152 (He et al., 2016) and EfficientNet (Tan & Le, 2019). To tackle the growth in The size of Transformer models is growing at an model sizes, researchers have proposed various distributed unprecedented rate. It has taken less than one training techniques, including parameter servers (Li et al., year to reach trillion-level parameters since the 2014; Jiang et al., 2020; Kim et al., 2019), pipeline paral- release of GPT-3 (175B). Training such models lel (Huang et al., 2019; Park et al., 2020; Narayanan et al., requires both substantial engineering efforts and 2019), intra-layer parallel (Lepikhin et al., 2020; Shazeer enormous computing resources, which are luxu- et al., 2018; Shoeybi et al., 2019), and zero redundancy data ries most research teams cannot afford. In this parallel (Rajbhandari et al., 2019). paper, we propose PipeTransformer, which leverages automated elastic pipelining for effi- T0 (0% trained) T1 (35% trained) T2 (75% trained) T3 (100% trained) cient distributed training of Transformer models. In PipeTransformer, we design an adaptive on the fly freeze algorithm that can identify and freeze some layers gradually during training, and an elastic pipelining system that can dynamically Layer (end of training) Layer (end of training) Layer (end of training) Layer (end of training) Similarity score allocate resources to train the remaining active layers. More specifically, PipeTransformer automatically excludes frozen layers from the Figure 1. Interpretable Freeze Training: DNNs converge bottom pipeline, packs active layers into fewer GPUs, up (Results on CIFAR10 using ResNet).
    [Show full text]
  • 1 Convolution
    CS1114 Section 6: Convolution February 27th, 2013 1 Convolution Convolution is an important operation in signal and image processing. Convolution op- erates on two signals (in 1D) or two images (in 2D): you can think of one as the \input" signal (or image), and the other (called the kernel) as a “filter” on the input image, pro- ducing an output image (so convolution takes two images as input and produces a third as output). Convolution is an incredibly important concept in many areas of math and engineering (including computer vision, as we'll see later). Definition. Let's start with 1D convolution (a 1D \image," is also known as a signal, and can be represented by a regular 1D vector in Matlab). Let's call our input vector f and our kernel g, and say that f has length n, and g has length m. The convolution f ∗ g of f and g is defined as: m X (f ∗ g)(i) = g(j) · f(i − j + m=2) j=1 One way to think of this operation is that we're sliding the kernel over the input image. For each position of the kernel, we multiply the overlapping values of the kernel and image together, and add up the results. This sum of products will be the value of the output image at the point in the input image where the kernel is centered. Let's look at a simple example. Suppose our input 1D image is: f = 10 50 60 10 20 40 30 and our kernel is: g = 1=3 1=3 1=3 Let's call the output image h.
    [Show full text]
  • Where Knowledge Blossoms
    August 2021 Where Knowledge Blossoms Hello Island Trees! All through July many of the staff worked very hard to move all of our furniture & materials into our new space and arrange it all into a beautiful, functional library that we cannot wait to share with you! We have a few more tweaks to finish getting it ready and some staff training on some new technology we're introducing and then we should be good to go and we will inform you of our re-opening dates and celebration. Though I do not believe we will be starting in house programming just yet, we should be able to have all other services up and running including our new self-checkout. You should note that the old book drop on the Farmedge Rd side of the library is closed and the new book drop accessed via the parking lot (but not the drive through) is open. Until we are fully open to the public though, fines will continue to be waived. We welcome our new Board Trustee Greg Kelty *see pic below. We congratulate him on joining us and taking on this responsibility and hope he will enjoy working with us. At first, our First Ever Cutest Pet Contest seemed in jeopardy with only 3 submissions. After a gentle reminder (from yours truly), we now have approximately 100 submissions. We'd like to thank all of our pet parents who submitted their pet babies and ask for patience as we design the voting survey. We will send an email blast and we will post a link to it on Facebook as well! We appreciate your participation in voting for your favorite pet even if you did not submit a picture.
    [Show full text]
  • Deep Clustering with Convolutional Autoencoders
    Deep Clustering with Convolutional Autoencoders Xifeng Guo1, Xinwang Liu1, En Zhu1, and Jianping Yin2 1 College of Computer, National University of Defense Technology, Changsha, 410073, China [email protected] 2 State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, 410073, China Abstract. Deep clustering utilizes deep neural networks to learn fea- ture representation that is suitable for clustering tasks. Though demon- strating promising performance in various applications, we observe that existing deep clustering algorithms either do not well take advantage of convolutional neural networks or do not considerably preserve the local structure of data generating distribution in the learned feature space. To address this issue, we propose a deep convolutional embedded clus- tering algorithm in this paper. Specifically, we develop a convolutional autoencoders structure to learn embedded features in an end-to-end way. Then, a clustering oriented loss is directly built on embedded features to jointly perform feature refinement and cluster assignment. To avoid feature space being distorted by the clustering loss, we keep the decoder remained which can preserve local structure of data in feature space. In sum, we simultaneously minimize the reconstruction loss of convolutional autoencoders and the clustering loss. The resultant optimization prob- lem can be effectively solved by mini-batch stochastic gradient descent and back-propagation. Experiments on benchmark datasets empirically validate
    [Show full text]
  • Tensorizing Neural Networks
    Tensorizing Neural Networks Alexander Novikov1;4 Dmitry Podoprikhin1 Anton Osokin2 Dmitry Vetrov1;3 1Skolkovo Institute of Science and Technology, Moscow, Russia 2INRIA, SIERRA project-team, Paris, France 3National Research University Higher School of Economics, Moscow, Russia 4Institute of Numerical Mathematics of the Russian Academy of Sciences, Moscow, Russia [email protected] [email protected] [email protected] [email protected] Abstract Deep neural networks currently demonstrate state-of-the-art performance in sev- eral domains. At the same time, models of this class are very demanding in terms of computational resources. In particular, a large amount of memory is required by commonly used fully-connected layers, making it hard to use the models on low-end devices and stopping the further increase of the model size. In this paper we convert the dense weight matrices of the fully-connected layers to the Tensor Train [17] format such that the number of parameters is reduced by a huge factor and at the same time the expressive power of the layer is preserved. In particular, for the Very Deep VGG networks [21] we report the compression factor of the dense weight matrix of a fully-connected layer up to 200000 times leading to the compression factor of the whole network up to 7 times. 1 Introduction Deep neural networks currently demonstrate state-of-the-art performance in many domains of large- scale machine learning, such as computer vision, speech recognition, text processing, etc. These advances have become possible because of algorithmic advances, large amounts of available data, and modern hardware.
    [Show full text]
  • Fully Convolutional Mesh Autoencoder Using Efficient Spatially Varying Kernels
    Fully Convolutional Mesh Autoencoder using Efficient Spatially Varying Kernels Yi Zhou∗ Chenglei Wu Adobe Research Facebook Reality Labs Zimo Li Chen Cao Yuting Ye University of Southern California Facebook Reality Labs Facebook Reality Labs Jason Saragih Hao Li Yaser Sheikh Facebook Reality Labs Pinscreen Facebook Reality Labs Abstract Learning latent representations of registered meshes is useful for many 3D tasks. Techniques have recently shifted to neural mesh autoencoders. Although they demonstrate higher precision than traditional methods, they remain unable to capture fine-grained deformations. Furthermore, these methods can only be applied to a template-specific surface mesh, and is not applicable to more general meshes, like tetrahedrons and non-manifold meshes. While more general graph convolution methods can be employed, they lack performance in reconstruction precision and require higher memory usage. In this paper, we propose a non-template-specific fully convolutional mesh autoencoder for arbitrary registered mesh data. It is enabled by our novel convolution and (un)pooling operators learned with globally shared weights and locally varying coefficients which can efficiently capture the spatially varying contents presented by irregular mesh connections. Our model outperforms state-of-the-art methods on reconstruction accuracy. In addition, the latent codes of our network are fully localized thanks to the fully convolutional structure, and thus have much higher interpolation capability than many traditional 3D mesh generation models. 1 Introduction arXiv:2006.04325v2 [cs.CV] 21 Oct 2020 Learning latent representations for registered meshes 2, either from performance capture or physical simulation, is a core component for many 3D tasks, ranging from compressing and reconstruction to animation and simulation.
    [Show full text]