An Examination of Fairness of AI Models for Deepfake Detection
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21) An Examination of Fairness of AI Models for Deepfake Detection Loc Trinh∗, Yan Liu Department of Computer Science, University of Southern California floctrinh, [email protected] Abstract To address this challenge, the vision community has con- ducted a series of excellent works on detecting deepfakes Recent studies have demonstrated that deep learn- [Tolosana et al., 2020; Mirsky and Lee, 2021]. Sophis- ing models can discriminate based on protected ticated facial forgery detection tools [Afchar et al., 2018; classes like race and gender. In this work, we eval- Li et al., 2020; Liu et al., 2020] and advanced training sets uate bias present in deepfake datasets and detection [Rossler¨ et al., 2019; Jiang et al., 2020] were developed to models across protected subgroups. Using facial train detectors capable of identifying deepfakes with high pre- datasets balanced by race and gender, we exam- cision. Such results have also seen in real-world impact with ine three popular deepfake detectors and find large Microsoft’s release of Video Authenticator [Burt and Horvitz, disparities in predictive performances across races, 2020], an automated tool trained on the publicly available with up to 10.7% difference in error rate between FaceForensics++ dataset, to analyze a still photo or video to subgroups. A closer look reveals that the widely provide a percentage chance that the media is artificially ma- used FaceForensics++ dataset is overwhelmingly nipulated. It works by detecting the blending boundary of the composed of Caucasian subjects, with the major- deepfake and subtle fading or grayscale elements that might ity being female Caucasians. Our investigation not be detectable by the human eye. On the other hand, Face- of the racial distribution of deepfakes reveals that book has also been pioneering its own system to detect AI- the methods used to create deepfakes as positive generated profiles and ban hundreds of fake accounts, pages, training signals tend to produce “irregular” faces - posts, and social groups1, along with strengthening its policy when a person’s face is swapped onto another per- on deepfakes and authentic media2. son of a different race or gender. This causes de- While these works have achieved good progress towards tectors to learn spurious correlations between the the prediction task, detecting fake videos at a low false- foreground faces and fakeness. Moreover, when positive rate is still a challenging problem [Li et al., 2020]. detectors are trained with the Blended Image (BI) Moreover, since most studies focus on the visual artifacts ex- dataset from Face X-Rays, we find that those de- isting within deepfakes, little is discussed about how such tectors develop systematic discrimination towards systems perform on diverse groups of real people across gen- certain racial subgroups, primarily female Asians. der and race, which is the common setting where personal profiles and videos are being audited en masse for authentic- 1 Introduction ity via automated systems. In this context, a small percentage difference in false-positive rates between subgroups would Synthetic media have become so realistic with the advance- indicate that millions people of a particular group are more ment of deep neural networks that they are often indiscernible likely to be mistakenly classified as fake. from authentic content. However, synthetic media designed This draws a connection to fairness in machine learn- to deceive poses a dangerous threat to many communities ing, where growing concerns about unintended consequences around the world [Cahlan, 2020; Ingram, 2019]. In this con- from biased or flawed systems call for a careful and thor- text, Deepfake videos – which portray human subjects with ough examination of both datasets and models. Gender altered identities or malicious/embarrassing actions - have Shades [Buolamwini and Gebru, 2018] have demonstrated emerged as a vehicle for misinformation. With the current ad- how facial recognition system discriminates across gender vancement and growing availability of computing resources, and race, showing a large gap in the accuracy of gender clas- sophisticated deepfakes have become more pervasive, espe- sifiers across different intersectional groups: darker-skinned cially to generate revenge pornography [Hao, 2019] and de- fame celebrities or political targets [Vaccari and Chadwick, 2020]. Hence, there is a critical need for automated systems 1https://about.fb.com/news/2019/12/removing-coordinated- that can effectively combat misinformation on the internet. inauthentic-behavior-from-georgia-vietnam-and-the-us/ 2https://about.fb.com/news/2020/01/enforcing-against- ∗Contact Author manipulated-media/ 567 Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21) females are misclassified in up to 34.7% of cases, while the detectors towards unseen manipulations. In particular, Foren- maximum error rate for lighter-skinned males is only 0.8%. sicTransfer [Cozzolino et al., 2018] proposes an autoencoder- Others have shown that training with biased data has resulted based network to transfer knowledge between different but in algorithmic discrimination [Bolukbasi et al., 2016]. Al- related manipulations via the hidden latent space. Face X-ray though many works have studied how to create fairer algo- [Li et al., 2020] addressed the problem by focusing on the rithms, and benchmarked discrimination in various contexts more general blending artifacts as well as creating a blended [Hardt et al., 2016; Liu et al., 2018], few works have done image dataset to help networks generalize across unseen ma- this analysis for computer vision in the context of synthetic nipulations. In addition to generalization, recent work have media and deepfakes. Our contributions are as follows: also demonstrated the vulnerability of deepfake detectors to adversarial attacks [Carlini and Farid, 2020], where small tai- 1. We find that the FaceForensics++ dataset commonly lored perturbations generated via either black-box or white- used for training deepfake detectors is overwhelmingly box attacks can easily fool the networks. This raises a con- composed of Caucasian subjects, with the majority cern about the robustness and commercial readiness of deep- (36.6%) of videos features female Caucasian subjects. fake detectors. In contrast to complex adversarial attacks, 2. We also find that approaches to generate fake samples as our work examines the performances of deepfake detectors positive training signals tend to overwhelmingly produce on natural images composing of different gender and diverse “irregular” deepfakes - when a person’s face is swapped racial groups, as well as investigating the real-world conse- onto another person of a different race or gender, which quences if deepfake detectors are commercially adopted. leads to detectors learning spurious correlations between foreground faces and fakeness. 2.3 Algorithmic Fairness and Consequences 3. Using facial datasets balanced by gender and race, we Concerns about malicious applications of AI and unintended find that classifiers designed to detect deepfakes have consequences from flawed or biased systems have propelled large predictive disparities across racial groups, with up many investigations in studying representational and algorith- to 10.7% difference in error rate. mic bias. Gender Shades [Buolamwini and Gebru, 2018] 4. Lastly, we observe that when detectors are trained with have demonstrated how facial recognition system discrimi- the Blended Images (BI) from Face X-Rays [Li et al., nates across gender and race, especially for darker-skinned [ ] 2020], we find that detectors develops systematic dis- females. Liu et al., 2018 showed that common fairness crimination towards female Asian subjects. criteria may in fact harm underrepresented or disadvantaged groups due to delayed outcomes. [Celis et al., 2019] pro- posed a framework to combat echo chambers created by 2 Related Work highly personalized recommendations on social media that 2.1 Deepfake Detection reinforced people’s biases and opinions. In terms of stan- dardized approaches for the field, [Mitchell et al., 2019] and Early deepfake forensic work focused on hand-crafted facial [Gebru et al., 2018] recommend the usage of model cards and features such as eye colors, light reflections, and 3D head datasheets to better document the intended usage of models poses / movements. However, these approaches do not scale and data. Although many works have studied how to create well to more advanced GAN-based deepfakes. To combat fairer algorithms and benchmarked discrimination in various the new generation of deepfakes, researchers leverage deep contexts [Hardt et al., 2016; Liu et al., 2018], we conduct a learning and convolutional networks to automatically extract fairness analysis in the context of deepfake detection, which [ meaningful features for face forgery detection Rossler¨ et al., requires bookkeeping of the racial distribution face swaps and ] 2019 . Work on shallow networks such as MesoInception4 providing subgroup-specific deepfakes for audit. [Afchar et al., 2018] and Patch-based CNN [Chai et al., 2020] are developed to focus on low and medium level manipula- tion artifacts. Deep networks, such as Xception [Rossler¨ et 3 Deepfake Detection al., 2019], also demonstrated success by achieving state-of- We investigate 3 popular deepfake detection models of vari- the-arts via