Re-Excavated: Debunking a Fallacious Account of the JAFFE Dataset Michael Lyons
Total Page:16
File Type:pdf, Size:1020Kb
”Excavating AI” Re-excavated: Debunking a Fallacious Account of the JAFFE Dataset Michael Lyons To cite this version: Michael Lyons. ”Excavating AI” Re-excavated: Debunking a Fallacious Account of the JAFFE Dataset. 2021. hal-03321964 HAL Id: hal-03321964 https://hal.archives-ouvertes.fr/hal-03321964 Preprint submitted on 18 Aug 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives| 4.0 International License “Excavating AI” Re-excavated: Debunking a Fallacious Account of the JAFFE Dataset Michael J. Lyons Ritsumeikan University Abstract Twenty-five years ago, my colleagues Miyuki Kamachi and Jiro Gyoba and I designed and photographed JAFFE, a set of facial expression images intended for use in a study of face perception. In 2019, without seeking permission or informing us, Kate Crawford and Trevor Paglen exhibited JAFFE in two widely publicized art shows. In addition, they published a nonfactual account of the images in the essay “Excavating AI: The Politics of Images in Machine Learning Training Sets.” The present article recounts the creation of the JAFFE dataset and unravels each of Crawford and Paglen’s fallacious statements. I also discuss JAFFE more broadly in connection with research on facial expression, affective computing, and human-computer interaction. Keywords: facial expression, machine learning, training sets, affective computing, artificial intelligence “To make a mistake and not correct it: this is a real mistake.” Confucius, The Analects arXiv:2107.13998v1 [cs.CY] 28 Jul 2021 In September of 2019, Kate Crawford and Trevor Paglen opened the art exhibi- tion Training Humans at the Milan Osservatorio of the Fondazione Prada. The stated objective of the exhibition was to explore: ... how humans are represented, interpreted and codified through train- ing datasets, and how technological systems harvest, label and use this material. Accompanying the exhibition, the essay “Excavating AI: The Politics of Im- ages in Machine Learning Training Sets” was self-published online.1 To our 1https://excavating.ai/ 1 surprise and dismay, Training Humans exhibited more than a hundred facial images taken from the JAFFE dataset,2 a set of posed facial expressions pho- tographed by my colleagues and me for use in our research [39]. Though we provide the JAFFE dataset to other researchers for purposes of non-commercial scientific research, Crawford and Paglen’s public exhibition of the images was a violation of the terms of use [36]. Meanwhile, “Excavating AI” spotlighted JAFFE as an ‘anatomical model’ of machine learning training data, using it to expound an ‘archaeological method’ of “cataloguing the principles and values by which [it] was constructed.” The essay so misportrayed JAFFE as to render it nearly unrecognizable by those most familiar with the dataset: its creators. A previous commentary [36] outlined the errors in Crawford and Paglen’s account of JAFFE and explained that their unauthorized use of the images violated not only the terms of use but also the principle of informed consent. By misusing JAFFE, the authors of “Excavating AI” breached the very ethics they claimed to espouse. Despite the objections raised in my commentary, they continue to propagate a fallacious account of JAFFE by republishing the nearly unmodified text of “Excavating AI”—it has now been reprinted in a publication3 accompanying the 11th edition of the Liverpool Biennial of Contemporary Art [24] and published in a special issue of the peer-reviewed scholarly journal AI & Society [1].4 This has motivated the present in-depth exposition of Crawford and Paglen’s misleading statements about JAFFE. The present article describes the JAFFE dataset in some detail, summariz- ing information from the published documentation of the dataset. Each of Crawford and Paglen’s claims regarding JAFFE is explicitly refuted. Anal- ysis of the general pattern of the claims shows that “Excavating AI” makes liberal use of the common fallacy of informal logic known as the Straw Man argument [56]. Concurrently, the article situates JAFFE in the broader context of experimental psychology, neural computation, affective computing, and human-computer interaction. What is JAFFE? JAFFE—The Japanese Female Facial Expression dataset—is a set of images de- picting facial expressions posed by Japanese women, accompanied by seman- tic ratings on nouns describing the expressions. JAFFE was created specifically for use in an experiment that compared a biologically-inspired representation of facial appearance [30, 31, 40] with facial expression perception by human observers [27, 28, 29, 39]. JAFFE is described as a dataset because it includes observer ratings for each facial expression image. Without this data, it would be more accurate to describe JAFFE as an image set or stimulus set. 2JAFFE is available for non-commercial scientific research via the Zenodo repository. Note that all JAFFE images in this article are subject to specific terms of use and may not be reused without permission, regardless of the license applied to the document as a whole. 3https://www.biennial.com/journal/issue-9/ 4An editor of the special issue assured me that the latter publication had indeed undergone peer review, but the text of the AI & Society version is nearly identical to the original web essay, casting doubt on the thoroughness of the referees’ evaluations. As of June/July 2021, the article has been retracted from the AI & Society website but is still available elsewhere online. 2 Figure 1: Posed facial expression images from the JAFFE dataset. We planned and assembled JAFFE in 1996 in collaboration with Miyuki Kamachi and Jiro Gyoba of the Psychology Department at Kyushu University, Japan. Miyuki recruited volunteers in the Psychology Department and asked them to pose three or four examples of six facial expressions (happiness, sad- ness, surprise, anger, disgust, fear) and a neutral or expressionless face.5 These six expressions are the basic facial expressions (BFE) studied by Paul Ekman and many others and thought by some of these scientists to be recognizable, to an extent, in all cultures [18].6 JAFFE volunteers posed the requested expressions while facing the camera through a transparent, semi-reflective plastic pane [39] and triggered the shutter remotely while viewing the reflection of their face—the JAFFE images are selfies. Unlike some other facial expression image sets (see, for example, [17, 42]), volunteers were not coached to pose specified facial configurations or to imitate prototypes, nor were they trained in the facial action coding system (FACS) [19]). The volunteers were given a Japanese adjectival noun for each of the six basic facial expressions and asked to photograph three or four poses for each of these while attending to their reflected expressions. JAFFE images did not undergo a selection procedure, as with, for example, Ekman’s POFA (Pictures of Facial Affect) set, for which an iterative procedure selected the most prototypical images for each basic facial expression [17]. JAFFE includes nearly all of the photographs posed by all of the volunteers.7 Because of this, the facial configuration for each expression category varies considerably between poses (figure2) and between posers (figure3). We created JAFFE to experimentally test a mathematical model of facial expression representation derived from a simplified description of the recep- tive fields of neurons in the visual cortex [29, 30, 39, 40]. We aimed to compare the model with human perception. To reduce arbitrary assumptions, we 5Facial expression poses were requested using the Japanese adjectival nouns: Y福、 悲し?、Vき、ñJ、ê悪、EL、and \Js for the expressionless pose. We used the same adjectival nouns to obtain semantic ratings by Japanese female observers. 6The universality hypothesis for facial expression is attributed to Charles Darwin, who thought that at least seven categories of facial expressions—the six BFE and additionally contempt—are recognizable by all humans [15]. The extent to which facial expressions are universal remains a matter of controversy [4,5] and a topic of active research [11, 53]. 7The publicly available version of JAFFE contains 213 images: ten expressers posing three (or in a few cases four) examples of six expressions and a neutral pose. The original, complete set contains 219 images, but the publicly available set omitted six of the images due to an oversight. There is nothing unusual about the six images missing from the published set. 3 Figure 2: Two happy expressions posed by YM. compared image similarity derived from the model with similarity measured from human observations of the facial expression images. We did not use categorical facial expression labels directly in the comparison. Instead, we obtained an empirical six-dimensional description for each image by asking observers to view expression images and rate the perceived expression using a five-level Likert scale for each of the six adjectival nouns used to request poses from the volunteers. Images were viewed and rated by 60 Japanese female observers (volunteers recruited at Kyushu University), and the re- sponses averaged. Figure2 shows sample mean ratings for two JAFFE images. The semantic ratings afford a more detailed and nuanced description of the images than that obtained with the more commonly used forced categoriza- tion approach [5]. Similarity calculated from the multidimensional ratings was correlated with similarity derived from our mathematical model. Model and empirical similarity measures were also analyzed using non-metric mul- tidimensional scaling, revealing low dimensional structure resembling the affective circumplex [47, 50].