Deepfakes Generation Using LSTM Based Generative Adversarial Networks

Rochester Institute of Technology RIT Scholar Works Theses 6-2020 Deepfakes Generation using LSTM based Generative Adversarial Networks Akhil Santha [email protected] Follow this and additional works at: https://scholarworks.rit.edu/theses Recommended Citation Santha, Akhil, "Deepfakes Generation using LSTM based Generative Adversarial Networks" (2020). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. Deepfakes Generation using LSTM based Generative Adversarial Networks By Akhil Santha June 2020 A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Engineering from Rochester Institute of Technology Approved by: ____________________________________________________________________________ Dr. Raymond Ptucha, Assistant Professor Date Thesis Advisor, Department of Computer Engineering ____________________________________________________________________________ Dr. Sonia Lopez Alarcon, Associate Professor Date: Thesis Advisor, Department of Computer Engineering __________________________________________________________________ Dr. Clark Hochgraf, Associate Professor Date: Thesis Advisor, Department of Electrical Engineering Deepfakes Generation using LSTM based Generative Adversarial Networks Akhil Santha May 2020 KATE GLEASON COLLEGE OF ENGINEERING A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Engineering Department of Computer Engineering 2 Acknowledgements I am very grateful to Dr. Raymond Ptucha, my primary advisor, for his constant encouragement and guidance throughout my masters. The team meetings and the courses he taught helped me to build new innovative and research skills. I would like to thank the deepfakes team and MIL lab, for constantly giving me inputs for my research work. I would also like to thank the MIL team for their support and for making a friendly environment in the lab. Department of Computer Engineering 3 Abstract Deep learning has been achieving promising results across a wide range of complex task domains. However, recent advancements in deep learning have also been employed to create software which causes threats to the privacy of people and national security. One among them is deepfakes, which creates fake images as well as videos that cannot be detected as forgeries by humans. Fake speeches of world leaders can even cause threat to world stability and peace. Apart from the malicious usage, deepfakes can also be used for positive purposes such as in films for post dubbing or performing language translation. This latter case was recently used in the latest Indian election such that politician speeches can be converted to many Indian dialects across the country. This work was traditionally done using computer graphic technology and 3D models. But with advances in deep learning and computer vision, in particular GANs, the earlier methods are being replaced by deep learning methods. This research will focus on using deep neural networks for generating manipulated faces in images and videos. This master’s thesis develops a novel architecture which can generate a full sequence of video frames given a source image and a target video. We were inspired by the works done by NVIDIA in vid2vid and few-shot vid2vid where they learn to map source video domains to target domains. In our work, we propose a unified model using LSTM based GANs along with a motion module which uses a keypoint detector to generate the dense motion. The generator network employs warping to combine the appearance extracted from the source image and the motion from the target video to generate realistic videos and also to decouple the occlusions. The training is done end-to-end and the keypoints are learnt in a self-supervised way. Evaluation is demonstrated on the recently introduced FaceForensics++ and VoxCeleb datasets. The main contribution of our work is to develop novel neural networks for generating deepfake images/videos. Furthermore, our main motive is to generate datasets for deepfake detection challenges. Department of Computer Engineering 4 Table of Contents: Signature Sheet……………………………………………………………………..…………….3 Abstract………………………………………………………..……………..…………...………5 List of Figures……………………………………………………………………….……………8 List of Tables…………………………………………………………….....……….………..…10 Acronyms………………………………………………………………………………………..11 Chapter 1………………………………………………………………..…………...….……….12 1.1 Introduction …………………………………………………………….…………..12 1.2 Deepfakes and its types ……………………………………………………..……...14 1.3 Contributions ………………………………………………….……………….…..16 Chapter 2 ………………………………………………………………………………....……..17 2.1 Convolutional Neural networks ……………………………………………...……17 2.2 Generative Adversarial Networks………………………………………...……….18 2.3 Long short-term memory ………………………………………………....……….19 2.4 Optical flow and Flownet ………………………………………………….………21 Chapter 3 …………………………………………………………………………….………….23 3.1 Faceswap ……………………………………………………....……………………23 3.2 Video-to-Video Synthesis …………………………...……………...……………...24 3.3 Few-shot Video-to-Video Synthesis …………………………………...…………..25 3.4 First Order Model for Image Animation …………………………………………27 Department of Computer Engineering 5 Chapter 4………………………………………………………………………………..…...….30 4.1 FaceForensics++: Learning to Detect Manipulated Facial Images ……...……..30 4.1.1 FaceSwap…………………………………………..…………………..….30 4.1.2 DeepFakes ……………….……………………...…………....….………..31 4.1.3 Face2Face …………………………………………...….…...…………….31 4.1.4 NeuralTextures ………………………………………….….…………….32 4.1.5 Post processing of dataset ………………………………………..………33 4.2 Deep Learning Face Attributes in the Wild (CelebA) ……………….....………..34 4.3 VoxCeleb: Large-scale speaker verification in the wild …………………….……35 4.4 Preprocessing of datasets …………………………………………………....……..37 Chapter 5 ……………………………………………………………………………....………..39 5.1 Base Model ……………………………………………………………….…………39 5.2 LSTM based Generative Adversarial Networks …………………………………40 5.3 LSTM based GANs with Flow …………………………………….………………41 Chapter 6 ……………………………………………………………………………………….45 6.1 Image Reconstruction Loss …………………………………………...…………...45 6.2 Adversarial Loss …………………………………………………………...……….46 6.3 Equivariance Loss …………………………………………………………...…….46 Chapter 7 ………………………………………………………………………………………..47 7.1 Implementation details ………………………………………………….…...…….47 7.2 Results……………………………………………………...………………..………47 Department of Computer Engineering 6 Chapter 8 ………………………………………………………………………………………..53 7.1 Conclusion……………..……………………………………………………...…….53 7.2 Future work….……………………………………………...………………………53 References ……………………………………………………………………………………....55 Department of Computer Engineering 7 List of Figures: Figure 1 An example of Convolutional Neural Network used for facial recognition………..…....13 Figure 2: An example of Faceswap……………………………………………………….…..…14 Figure 3: An example of facial reenactment……………………………………………….…….15 Figure 4: An example of Convolutional Neural Network……………………………….…..…...18 Figure 5: A typical example of Generative Adversarial Network……………………….............19 Figure 6: A typical example of Recurrent Neural Networks…………………………….............20 Figure 7: Long Short Term Memory Network……………………………………………….…..21 Figure 8: An example of Optical Flow network……………………………………………..…...22 Figure 9: Faceswap GAN network…………………………………………………………….....24 Figure 10: Network architecture of pix2pixHD……………………………………………….…25 Figure 11: Comparison between vid2vid and few-shot vid2vid……………………………..…...26 Figure 12: The network contains SPADE residual blocks with upsampling layers…….………..27 Figure 13: An overview of monkey net architecture………………………………………..........28 Figure 14: Overview of first order model for image animation approach……………………...…29 Figure 15: An example of two categories of face manipulation methods…………….…...…30 Figure 16: Overview of Faceswap GAN………………………………….…………………32 Figure 17: Statistics of FaceForensics dataset……………………….……………………….33 Figure 18: Sample images from FaceForesnsics++ dataset…………………………….……34 Figure 19: Distribution of VoxCeleb dataset………………………………………...………36 Figure 20: Sample images from VoxCeleb dataset………………………………………….36 Department of Computer Engineering 8 Figure 21: A typical example showing Dlib facial landmark points………………………..38 Figure 22: A sample example showing an image and its corresponding heatmap………….38 Figure 23: Our Base Model network……………………………………………………..…39 Figure 24: LSTM based Generator………………………………………………………….41 Figure 25: Our LSTM based generator with dense motion as conditional input…………....42 Figure 26: Motion module network…………………………………………………………42 Figure 27: Results of our Model on FaceForensics++ dataset……………..…………………48 Figure 28: Results of our Model on VoxCeleb dataset………………..……………...………49 Figure 29: Comparison of our results with vid2vid models…………..……………………...50 Figure 29: Comparison of our results with other models…………..……………………...…51 Figure 29: Comparison with one Layer LSTM and Bidirectional LSTM models………....…52 Department of Computer Engineering 9 List of Tables: Table 1: Number of images per each method……………………………………………..34 Table 2: Statistics of CelebA dataset.……………………………………………………..35 Table 3: Statistics of VoxCeleb verification dataset…………………………………....…36 Table 4: Statistics of VoxCeleb identification dataset…………………………………….36 Department of Computer Engineering 10 Acronyms: CNN Convolutional

Deepfakes Generation Using LSTM Based Generative Adversarial Networks

Artificial Intelligence in Health Care: the Hope, the Hype, the Promise, the Peril

Real Vs Fake Faces: Deepfakes and Face Morphing

Synthetic Video Generation

CSE 152: Computer Vision Manmohan Chandraker

Automated Elastic Pipelining for Distributed Training of Transformers

1 Convolution

Deep Clustering with Convolutional Autoencoders

Tensorizing Neural Networks

Fully Convolutional Mesh Autoencoder Using Efficient Spatially Varying Kernels

Exposing Deepfake Videos by Detecting Face Warping Artifacts

JCS Deepfake

Geometrical Aspects of Statistical Learning Theory