<<

OR OR # # #. OPENREVIEW VERSION

000 054 001 055 002 056 003 Large image datasets: A pyrrhic win for vision? 057 004 058 005 059 006 Anonymous submission 060 007 061 008 062 009 Paper ID 063 010 064 011 065 012 Abstract that still affords the basic safeguards towards protecting 066 013 one’s identity in large scale . However, in the age 067 014 In this paper we investigate problematic practices and of Big Data, the fundamentals of informed consent, privacy, 068 015 consequences of large scale vision datasets. We examine or agency of the individual have gradually been eroded. In- 069 016 broad issues such as the question of consent and justice stitutions, academia, and industry alike, amass millions of 070 017 as well as specific concerns such as the inclusion of verifi- images of people without consent and often for unstated 071 018 ably pornographic images in datasets. Taking the ImageNet- purposes under the guise of anonymization, a claim that is 072 019 ILSVRC-2012 dataset as an example, we perform a cross- both ephemeral [57, 68] and vacuous [30]. As can be seen 073 020 sectional model-based quantitative census covering factors in Table1, several tens of millions of images of people are 074 021 such as age, gender, NSFW content scoring, class-wise accu- found in peer-reviewed literature. These images are obtained 075 022 racy, human-cardinality-analysis, and the semanticity of the without consent or awareness of the individuals or IRB ap- 076 023 image class information in order to statistically investigate proval for collection. In Section 5-B of [79], for instance, 077 024 the extent and subtleties of ethical transgressions. We then the authors nonchalantly state “As many images on the web 078 025 use the census to help hand-curate a look-up-table of images contain pictures of people, a large fraction (23%) of the 79 079 026 in the ImageNet-ILSVRC-2012 dataset that fall into the cate- million images in our dataset have people in them”. With this 080 027 gories of verifiably pornographic: shot in a non-consensual background, we now focus on one of the most celebrated and 081 028 setting (up-skirt), beach voyeuristic, and exposed private canonical large scale image datasets: the ImageNet dataset. 082 029 parts. We survey the landscape of harm and threats both 083 030 society broadly and individuals face due to uncritical and 1.1. ImageNet: A brief overview 084 031 ill-considered dataset curation practices. We then propose 085 The emergence of the ImageNet dataset [21] is widely 032 possible courses of correction and critique the pros and cons 086 considered a pivotal moment2 in the revolu- 033 of these. We have duly open-sourced all of the code and the 087 tion that transformed (CV), and Artificial 034 census meta-datasets generated in this endeavor for the com- 088 Intelligence (AI) in general. Prior to ImageNet, computer 035 puter vision community to build on. By unveiling the severity 089 vision and image processing researchers trained image classi- 036 of the threats, our hope is to motivate the constitution of 090 fication models on small dataset such as CalTech101 (9k im- 037 mandatory Institutional Review Boards (IRB) for large scale 091 ages), PASCAL-VOC (30k images), LabelMe (37k images), 038 dataset curation processes. 092 039 and the SUN (131k images) dataset (see slide-37 in [51]). 093 040 1. Introduction ImageNet, with over 14 million images spread across 21,841 094 041 synsets, replete with 1,034,908 bounding box annotations, 095 042 Born from World War II and the haunting and despi- brought in an aspect of scale that was previously missing. A 096 043 cable practices of Nazi era experimentation [4] the 1947 subset of 1.2 million images across 1000 classes was carved 097 044 Nuremberg code [84] and the subsequent 1964 Helsinki dec- out from this dataset to form the ImageNet-1k dataset (pop- 098 045 laration [30], helped to establish the doctrine of Informed ularly called ILSVRC-2012) which formed the for 099 046 Consent which builds on the fundamental notions of human the Task-1: classification challenge in the ImageNet Large 100 047 dignity and agency to control dissemination of information Scale Visual Recognition Challenge (ILSVRC). This soon 101 3 048 about oneself. This has shepherded data collection endeav- became widely touted as the Computer Vision Olympics . 102 049 ors in the medical and psychological sciences concerning The vastness of this dataset allowed a Convolutional Neural 103 050 human subjects, including photographic data [8, 56], for the 104 2“The data that transformed AI research—and possibly the world”: 051 105 past several decades. A less stringent version of informed https://bit.ly/2VRxx3L 052 consent, broad consent, proposed in 45 CFR 46.116(d) of the 3https://engineering.missouri.edu/2014/01/team-takes-top-rankings-in- 106 053 Revised Common Rule [24], has been recently introduced computer-vision-olympics/ 107

1 OR OR # # #. OPENREVIEW VERSION

108 Number of Number of 162 Number of images 109 Dataset categories consensual 163 (in millions) 110 (in thousands) images 164 111 JFT-300M ([41]) 300+ 18 0 165 112 Open Images ([50]) 9 20 0 166 113 Tiny-Images ([79]) 79 76 0 167 114 Tencent-ML ([89]) 18 11 0 168 115 ImageNet-(21K,11k1,1k) ([70]) (14, 12, 1) (22, 11, 1) 0 169 116 170 Places ([93]) 11 0.4 0 117 171 118 Table 1: Large scale image datasets containing people’s images 172 119 173 120 174 121 Network (CNN) with 60 million parameters [49] trained by narratives, obscuring political and moral reasoning behind a 175 122 the SuperVision team from University of Toronto to usher in category. Over time, messy and contingent histories hidden 176 123 the rebirth of the CNN-era (see [3]), which is now widely behind a category are forgotten and trivialized [75]. With the 177 124 dubbed the AlexNet moment in AI. adoption of taxonomy sources, image datasets inherit seem- 178 125 Although ImageNet was created over a decade ago, it ingly invisible yet profoundly consequential shortcomings. 179 126 remains one of the most influential and powerful image The dataset creation process, its implication for ML systems, 180 127 databases available today. Its power and magnitude is and subsequently, the societal impact of these systems has 181 128 matched by its unprecedented societal impact. Although attracted a substantial body of critique. We categorize such 182 129 an a posteriori audit might seem redundant a decade after body of work into two groups that compliment one another. 183 130 its creation, ImageNet’s continued significance and the cul- While the first group can be seen as concerned with the broad 184 131 ture it has fostered for other large scale datasets warrants an downstream effects, the other concentrates mainly on the 185 132 ongoing critical dialogue. dataset creation process itself. 186 133 From the questionable ways images were sourced, to 187 134 troublesome labeling of people in images, to the downstream 2.1. Broad critiques: 188 135 effects of training AI models using such images, ImageNet 189 136 and large scale vision datasets (LSVD) in general constitute The absence of critical engagement with canonical 190 137 a Pyrrhic win for computer vision. We argue, this win datasets disproportionately negatively impacts women, racial 191 138 has come at the expense of harm to minoritized groups and and ethnic minorities, and vulnerable individuals and com- 192 139 further aided the gradual erosion of privacy, consent, and munities at the margins of society [7]. For example, im- 193 140 agency of both the individual and the collective. age search results both exaggerate stereotypes and system- 194 141 The rest of this paper is structured as follows. In sec- atically under-represent women in search results for occu- 195 142 tion 2, we cover related work that has explored the ethi- pations [47]; systems designed to detect 196 143 cal that arise with LSVD. In section 3, we de- pedestrians display higher error rates for recognition of de- 197 144 scribe the landscape of both the immediate and long term mographic groups with dark skin tones [87]; and gender 198 145 threats individuals and society as a whole encounter due to classification systems show disparities in image classifica- 199 146 ill-considered LSVD curation. In Section 4, we propose a tion accuracy where lighter-skin males are classified with 200 147 set of solutions which might assuage some of the concerns the highest accuracy while darker-skin females suffer the 201 148 raised in section 3. In Section 5, we present a template quan- most misclassification [14]. Gender classification systems 202 149 titative auditing procedure using the ILSVRC2012 dataset as that lean on binary and cis-genderist constructs operational- 203 150 an example and describe the data assets we have curated for ize gender in a trans-exclusive way resulting in tangible 204 151 the computer vision community to build on. We conclude harm to trans people [48]. With a persistent trend where 205 152 with broad reflections on LSVDs, society, ethics, and justice. minoritized and vulnerable individuals and communities of- 206 153 ten disproportionately suffer the negative outcomes of ML 207 154 2. Background and related work systems, [25] have called for a shift in rethinking ethics not 208 155 just as a fairness metric to mitigate the narrow concept of 209 156 The very declaration of a taxonomy brings some things bias but as practice that results in justice for the most neg- 210 157 into existence while rendering others invisible [9]. A gender atively impacted. Similarly, [46] contend that perspectives 211 158 classification system that conforms to essentialist binaries, that acknowledge existing inequality and aim to redistribute 212 159 for example, operationalizes gender in a cis-centric way re- power are pertinent as opposed to fairness-based perspec- 213 160 sulting in exclusion of non-binary and transgender people tives. Such understanding of ethics as justice then requires 214 161 [48]. Categories simplify and freeze nuanced and complex a focus beyond ‘bias’ and fairnesss’ in LSVDs and requires 215 OR OR # # #. OPENREVIEW VERSION

216 270 questioning of how images are sourced, labelled, and what Tiny Images Dataset 217 271 it means for models to be trained on them. One of the most 2000 n_images 218 272 thorough investigation in this regard comes from [20]. In 1750 219 this recent work, Crawford and Paglen present an in-depth 273 1500 220 critical examination of ImageNet including the dark and 274 221 troubling results of classifying people as if they are objects. 1250 275 222 276 Offensive and derogatory labels that perpetuate historical 1000 223 277 and current prejudices are assigned to people’s actual images. 750

224 Number of images 278 The authors emphasise that not only are images that were 500 225 scraped across the web appropriated as data for computer 279 250 226 vision tasks, but also the very act of assigning labels to peo- 280 227 0 281

ple based on physical features raises fundamental concerns h e t r e **

228 *** 282 c **r **** b around reviving long-discredited pseudo-scientific ideolo- **** c w 229 n 283 molester

gies of physiognomy [90]. pedophile 230 284 rape_suspect 231 2.2. Critiques of the curation phase: child_molester 285 232 (a) Class-wise counts of the offensive classes 286 233 Within the dataset creation process, taxonomy sources 287 234 pass on their limitations and taken for granted assumptions. First 25 images of with label:n****r 288 235 The adoption of underlying structures present a challenge 289 236 where — without critical examination of the architecture — 290 237 ethically dubious taxonomies are inherited. This has been 291 238 one of the main challenges for ImageNet given that the 292 239 dataset is built on the backbone of WordNet’s structure. Ac- 293 240 knowledging some of the problems, the authors from the 294 241 ImageNet team did recently attempt to address [91] the stag- 295 242 nant concept vocabulary of WordNet. They admitted that 296 243 only 158 out of the 2,832 existing synsets should remain in 297 4 244 the person sub-tree . Nonetheless, some serious problems 298 245 remain untouched. This motivates us to address in greater 299 246 depth the overbearing presence of the WordNet effect on 300 247 image datasets. 301 248 302 2.3. The WordNet Effect 249 303 (b) Samples from the class labelled n****r 250 ImageNet is not the only large scale vision dataset that 304 251 has inherited the shortcomings of the WordNet taxonomy. Figure 1: Results from the 80 Million Tiny Images dataset 305 252 The 80 million Tiny Images dataset [79] which grandfathered 306 253 the CIFAR-10/100 datasets also used the same path. Un- 307 254 like ImageNet, this dataset has never been audited or scru- 308 255 tinized and some of the sordid results from inclusion of 309 256 ethnophaulisms in its label space are displayed in Figure 310 257 1. The figure demonstrates both the number of images in as [37] has explored the intentionally hidden labour, which 311 258 a subset of the offensive classes (sub-figure(a)) and the ex- they have termed as Ghost Work, behind such tasks. Image 312 259 emplar images (sub-figure(b)) that show the images in the labeling and validation requires the use of crowd-sourced 313 260 noun-class labelled n****r5, a fact that serves as a stark platforms such as MTurk, often contributing to the exploita- 314 261 reminder that a great deal of work remains to be done by the tion of underpaid and undervalued gig workers. Within the 315 262 ML community at large. topic of image labeling but with a different and 316 263 And finally, the labeling and validation of the curation focus, recent work such as [80] and [6] has focused on the 317 264 process also presents ethical challenges. Recent works such shortcomings of human-annotation procedures used during 318 265 the ImageNet dataset curation. These shortcomings, the au- 319 4 266 In order to prune all the nodes. They also took into account the image- thors point out, include single label per-image procedure that 320 ability of the synsets and the skewed representation in the images pertaining 267 causes problems given that real-world images often contain 321 to the Image retrieval phase 268 5Due to its offensiveness, we have censored this word here, however, it multiple objects, and inaccuracies due to “overly restrictive 322 269 remains uncensored on the website at the time of writing. label proposals”. 323 OR OR # # #. OPENREVIEW VERSION

324 3. The threat landscape ImageNet, with its vast amounts of data, has not only erected 378 325 a canonical landmark in the history of AI, it has also paved 379 326 In this section, we survey the landscape of harm and the way for even bigger, more powerful, and suspiciously 380 327 threats, both immediate and long term, that emerge with opaque datasets. The lack of scrutiny of the ImageNet 381 328 dataset curation practices in the absence of careful ethical dataset by the wider computer vision community has only 382 329 considerations and anticipation for negative societal conse- served to embolden institutions, both academic and com- 383 330 quences. Our goal here is bring awareness to the ML and mercial, to build far bigger datasets without scrutiny (see 384 331 AI community regarding the severity of the threats and to Table1). Various highly cited and celebrated papers in recent 385 332 motivate a of urgency to act on these. We hope this years [10, 16, 41, 77], for example, have used the unspoken 386 333 will result in practices such as the mandatory constitution unicorn amongst large scale vision datasets, that is, the JFT- 387 334 of Institutional Review Boards (IRB) for large scale dataset 300M dataset [?]9. This dataset is inscrutable and operates 388 335 curation processes. in the dark, to the extent that there has not even been official 389 336 1: The rise of engines, loss of pri- communication as to what JFT-300M stands for. All that 390 337 vacy, and the blackmailing threat: Large image datasets, the ML community knows is it purportedly boasts more than 391 338 when built without careful consideration of societal impli- 300M images spread across 18k categories. The open source 392 339 cations, pose a threat to the welfare and well-being of in- variant(s) of this, the Open Images V4-5-6 [50] contains a 393 340 dividuals. Most often, vulnerable people and marginalised subset of 30.1M images covering 20k categories (and also 394 341 populations pay a disproportionately high price. Reverse has an extension dataset with 478k crowd-sourced images 395 6 342 image search engines that allow face search such as [2] across more than 6000 categories). While parsing through 396 343 have gotten remarkably and worryingly efficient in the past some of the images, we found verifiably10 non-consensual 397 344 year. For a small fee, anyone can use their portal or their images of children that were siphoned off of flickr hinting 398 7 345 API to run an automated process to uncover the “real-world” towards the prevalence of similar issues for JFT-300M from 399 346 identities of the humans of ImageNet dataset. For example, which this was sourced. Besides the other large datasets 400 347 in societies where sex work is socially condemned or legally in Table1, we have cases such as the CelebA-HQ dataset, 401 348 criminalized, re-identification of a sex worker through image which is actually a heavily processed dataset whose grey- 402 349 search, for example, bears a real danger for the individ- box curation process only appears in Appendix-C of [45] 403 350 ual victim. Harmful discourse such as revenge porn, are where no clarification is provided on this "frequency based 404 351 part of a broader continuum of image-based sexual abuse visual quality metric" used to sort the images based on qual- 405 352 [52]. To further emphasize this specific point, many of the ity. Benchmarking any downstream of such an 406 353 images in classes such as maillot, brassiere, and opaque, biased and a (semi-)synthetic dataset will only result 407 354 bikini contain images of beach voyeurism and other non- in controversial scenarios such as [53], where the authors 408 355 consensual cases of gathering (covered in de- had to hurriedly incorporate addendums admitting biased 409 356 tail in Section-5). We were able to (unfortunately) easily results. Hence, it is important to reemphasize that the ex- 410 357 map the victims, most of whom are women, in the pictures istence and use of such datasets bears direct and indirect 411 358 to “real-world” identities of people belonging to a myriad impact on people, given that decision making on social out- 412 359 of backgrounds including teachers, medical professionals, comes increasingly leans on ubiquitously integrated AI sys- 413 360 and academic professors using reverse image search engines tems trained and validated on such dataset. Yet, despite such 414 361 such as [63]. Paying heed to the possibility of the Streisand profound consequences, critical questions such as where the 415 8 362 effect , we took the decision not to divulge any further quan- data comes from or whether the images were obtained con- 416 363 titative or qualitative details on the extent or the location of sensually are hardly considered part of the LSVD curation 417 364 such images in the dataset besides alerting the curators of process. 418 365 the dataset(s) and making a passionate plea to the commu- 419 The more nuanced and perhaps indirect impact of Ima- 366 nity not to underestimate the severity of this particular threat 420 geNet is the culture that it has cultivated within the broader 367 vector. 421 AI community; a culture where the appropriation of images 368 2: The emergence of even larger and more opaque 422 of real people as raw material free for the taking has come be 369 datasets: The attempt to build computer vision has been 423 to perceived as the norm. Such norm and lack of scrutiny has 370 gradual and can be traced as far back as 1966 to Papert’s 424 played a role towards the creation of monstrous and secretive 371 The Summer Vision Project [60], if not earlier. However, 425 datasets without much resistance, prompting further ques- 372 426 tions such as ‘what other secretive datasets currently exist 373 6For example, PimEyes: https://bit.ly/3bSKcZQ 427 7 374 Please refer to the supplementary material for the screenshots 428 8The Streisand effect “is a social phenomenon that occurs when an 9We have decided to purposefully leave the ’?’ in place and plan to 375 429 attempt to hide, remove, or censor information has the unintended con- revisit it only after the dataset’s creator(s) publish the details of it’s curation 376 sequence of further publicizing that information, often via the Internet” 10See https://bit.ly/2y1sC7i. We performed verification with 430 377 [86] the uploader of the image via the Flickr link shared. 431 OR OR # # #. OPENREVIEW VERSION

432 hidden and guarded under the guise of proprietary assets?’ In the context of face recognition, works such as [74] have 486 433 Current work that has sprung out of secretive datasets, such demonstrated that CNNs with high predictive power unwit- 487 434 as Clearview AI [40] 11, points to a deeply worrying and tingly accommodate accurate extraction of subsets of the 488 435 insidious threat not only to vulnerable groups but also to the facial images that they were trained on, thus abetting dataset 489 436 very meaning of privacy as we know it [44]. leakage. 490 437 3: The Creative Commons fallacy: In May 2007 the iconic 5: Perpetuation of unjust and harmful stereotypes: Fi- 491 438 case of Chang versus Virgin mobile: The school girl, the bill- nally, zooming out and taking a broad perspective allows 492 439 board, and virgin [17] unraveled in front of the world, lead- us to see that the very practice of embarking on a classifica- 493 440 ing to widespread debate on the uneasy relationship between tion, taxonomization, and labeling task endows the classifier 494 441 personal privacy, consent, and image copyright, initiating a with the power to decide what is a legitimate, normal, or 495 442 substantial corpus of academic debate (see [15, 18, 19, 39]). correct way of being, acting, and behaving in the social 496 443 A Creative Commons license addresses only copyright issues world [9]. For any given society, what comes to be per- 497 444 – not privacy rights or consent to use images for training. Yet, ceived as normal or acceptable is often dictated by dominant 498 445 many of the efforts beyond ImageNet, including the Open ideologies. Systems of classification, which operate within 499 446 Images dataset [50], have been built on top of the Creative power asymmetrical social hierarchy, necessarily embed and 500 447 commons loophole that large scale dataset curation agencies amplify historical and cultural prejudices, injustices, and 501 448 interpret as a free for all, consent-included green flag. This, biases [75]. In western societies, how “desirable”, “posi- 502 449 we argue, is fundamentally fallacious as is evinced in the tive”, and “normal” characteristics and ways of being are 503 450 views presented in [54] by the Creative commons organi- constructed and maintained in a way that align with the 504 451 zation that reads: “CC licenses were designed to address dominant narrative, giving advantage to those that fit the 505 452 a specific constraint, which they do very well: unlocking status quo. Groups and individuals on the margins, on the 506 453 restrictive copyright. But copyright is not a good tool to other hand, are often perceived as the “outlier” and the “de- 507 454 protect individual privacy, to address research ethics in AI viant”. Image classification and labelling practices, without 508 455 development, or to regulate the use of tools em- the necessary precautions and awareness of these problem- 509 456 ployed online.”. Datasets culpable of this CC-BY heist such atic histories, pick up these stereotypes and prejudices and 510 457 as MS-Celeb-1M and IBM’s Diversity in Faces have now perpetuate them [31, 58, 59]. AI systems trained on such 511 458 been deleted in response to the investigations (See [28] for data amplify and normalize these stereotypes, inflicting un- 512 459 a survey) lending further support to the Creative Commons precedented harm on those that are already on the margins 513 460 fallacy. of society. While the ImageNet team did initiate strong ef- 514 461 4: Blood diamond effect in models trained on this forts towards course-correction [92], the Tiny Images dataset 515 462 dataset: Akin to the ivory carving-illegal poaching and still contains harmful slurs and offensive labels. And worse, 516 463 diamond jewelry art-blood diamond nexuses, we posit that we remain in the dark regarding the secretive and opaque 517 464 there is a similar moral conundrum at play here that effects LSVDs. 518 465 all downstream applications entailing models trained using 519 466 a tainted dataset. Often, these transgressions may be rather 4. Candidate solutions: The path ahead 520 467 subtle. In this regard, we pick an examplar field of applica- 521 468 tion that on the surface appears to be a low risk application Decades of work within the fields of Science and Tech- 522 469 area: Neural generative art. Neural generative art created nology Studies (STS) and the Social Sciences show that 523 470 using tools such as BigGAN [11] and Art-breeder [1] that there is no single straightforward solution to most of the 524 471 in turn use pre-trained deep-learning models trained on ethi- wider social and ethical challenges that we have discussed 525 472 cally dubious datasets, bear the downstream burden12 of the [5, 25, 76]. These challenges are deeply rooted in social and 526 473 problematic residues from non-consensual image siphoning, cultural structures and form part of the fundamental social 527 474 thus running afoul of the Wittgensteinian edict of ethics and fabric. Feeding AI systems on the world’s beauty, ugliness, 528 475 aesthetics being one and the same. [29]. We also note that and cruelty, but expecting it to reflect only the beauty is a 529 476 there is a privacy-leakage facet to this downstream burden. fantasy [5]. These challenges and tensions will exist as long 530 477 as humanity continues to operate. Given the breadth of the 531 478 challenges that we have faced, any attempt for a quick fix 532 11Clearview AI is a US based privately owned technology company that 479 risks concealing the problem and providing a false sense of 533 provides facial recognition service to various customers including North 480 American law enforcement agencies. With more than 3 billion photos solution. The idea of a complete removal of biases, for exam- 534 481 scraped from the web, the company operated in the dark until its services to ple, might in reality be simply hiding them out of sight [36]. 535 482 law enforcement was reported in late 2019 Furthermore, many of the challenges (bias, discrimination, 536 12Please refer to the supplementary material where we demonstrate one 483 injustice) vary with context, history, and place, and are con- 537 such real-world experiment entailing unethically generated neural art replete 484 with responses obtained from human critiques as to what they felt about the cepts that continually shift and change constituting a moving 538 485 imagery being displayed. target [7]. The pursuit of panacea in this context, therefore, 539 OR OR # # #. OPENREVIEW VERSION

540 594 541 Dataset audit card - ImageNet 595 542 Census audit (A) 596 543 Metrics: Class-level mean count (ηc ), mean gender 597 (A) (A) 544 • 83436 images with 101070 − 132201 persons (Models:skewness (ξc ) and mean-age (αc ): 598 545 DEX ([69]), InsightFace ([38])) 599 Nc Nc 546 (A) 1 X (A) 1 X (A) 600 • Mean-age (male): 33.24 (Female):25.58 ( RetinaFace ηc = I[φi], αc = I[φi]a and 547 N N i 601 [23], ArcFace [22]) c i=1 c i=1 548 3 602 Nc (A) (A) ! 549 • Confirmed misogynistic images: 62. Number of classes (A) 1 X gi − µc 603 ξ = I[φi] 550 c N (A) 604 with infants: 30 c i=1 σc 551 605 (A) (A) 552 • (µc and σc : Mean and standard-deviation of the ( 606 1 if face present th 553 gender-estimate of images in class c estimated by algo- φi = in i image. 607 0 otherwise 554 rithm (A).) 608 555 Kendal's-tau: 0.746 (0.0) Kendal's-tau: 0.557 (0.0) Kendal's-tau: 0.391 (0.0) 609 Spearman's-r: 0.895(0.0) Spearman's-r: 0.739(0.0) Spearman's-r: 0.556(0.0) 556 Pearson-r: 0.973(0.0) Pearson-r: 0.723(0.0) Pearson-r: 0.567(0.0) 610 557 2 611 50 558 612 1500 559 0 613 40 560 614 1000 561 2 615 30 562 500 616 Mean Age (DEX) 563 4 617 Number of faces (DEX)

Gender skewness (DEX) 20 564 0 618 565 0 500 1000 1500 3 2 1 0 20 30 40 619 Number of faces (Insightface) Gender skewness (InsightFace) Mean Age (InsightFace) 566 620 567 621 Figure 2: Class-wise cross-categorical scatter-plots across the cardinality, age and gender scores 568 622 569 bikini, two-piece|0.859 24 0 0 0 623

570 nsfw brassiere, bra, bandeau|0.61 4 0 0 0 20 624 1 0 0 0 mean holster|0.058 571 | 625 15 1 0 0 0 572 label kimono|0.092 626

maillot, tank suit|0.768 0 6 0 0 Mean NSFW 573 Class 10 0.8 627 574 maillot|0.802 6 4 0 0 0.6 628 5 575 miniskirt, mini|0.619 0 0 11 1 0.4 629 20 576 shower cap|0.13 0 0 0 3 25 630 0.2 30 0 35 577 beach_voyeur exposed verifiably 631 upskirt 0.0 40 private pornographic 45 Mean age 578 parts 0.2 0.4 50 632 0.6 0.8 1.0 579 Cross tabulated counts of hand-labelled images Mean Gender 633 580 634 581 Figure 3: Statistics and locationing of the hand-labelled images 635 582 636 0.8 0.0 583 637 0.5 584 0.6 638 1.0 585 0.4 639

586 gender_bias 1.5 640 0.2 Women-majority 587 2.0 641 Men-majority Mean(Gender) [DEX] 588 0.0 642 Skewness(Gender) [DEX] 589 643 Toy sax harp cello oboe flute Hound Terrier violin drum banjo Working Herding chime cornet 590 Sporting Unknown maraca 644 accordion trombone Non-Sporting harmonica 591 French horn 645 dog_group 592 acoustic guitar 646 593 Figure 4: Known human co-occurrence based gender-bias analysis 647

Figure 5: Dataset audit card for the ImageNet dataset OR OR # # #. OPENREVIEW VERSION

648 is not only unattainable but also misguided. Having said 4.3. Synthetic-to-real and Dataset distillation 702 649 that, there are remedies that can be applied to overcome the 703 The basic idea here is to utilize (or augment) syn- 650 specific harms that we have discussed in this paper, which 704 thetic images in lieu of real images during model train- 651 eventually potentially play constituent roles in improving the 705 ing. Approaches include using hand-drawn sketch images 652 wider and bigger social and structural issues in the long run. 706 653 (ImageNet-Sketch [82]), using GAN generated images [26] 707 and techniques such as Dataset distillation [83],where a 654 4.1. Remove, replace, and open strategy 708 655 dataset or a subset of a dataset is distilled down to a few 709 656 In [92], the authors concluded that within the person representative synthetic samples. This is a nascent field with 710 657 sub-tree of the ImageNet dataset, 1593 of the 2832 peo- some promising results emerging in unsupervised domain 711 658 ple categories were potentially offensive labels and planned adaptation across visual domains [61] and universal digit 712 659 to "remove all of these from ImageNet.". We strongly ad- classification [64]. 713 660 vocate a similar path for the offensive noun classes in the 4.4. Ethics-reinforced filtering during the curation 714 661 Tiny Images dataset that we have identified in section 2.1, 715 662 as well as images that fall into the categories of verifiably The specific ethical transgressions that emerged during 716 663 pornographic, shot in a non-consensual setting (up-skirt), our longitudinal analysis of ImageNet could have been pre- 717 664 beach voyeuristic, and exposed genitalia in the ImageNet- vented if there were explicit instructions provided to the 718 665 ILSVRC-2012 dataset. In cases where the image category MTurkers during the dataset curation phase to enable fil- 719 666 is retained but the images are not, the option of replace- tering of these images at the source (See Fig.9 in [67] for 720 667 ment with consensually shot financially compensated images example). We hope ethics checks become an integral part of 721 668 arises. It is possible that some of the people in these images the User-Interface deployed during the humans-in-the-loop 722 669 might come forward to consent and contribute their images validation phase for future dataset curation endeavors. 723 670 in exchange for fair financial compensation, credit, or out of 4.5. Dataset audit cards 724 671 sheer altruism [12]. We re-emphasize that our consternation 725 672 focuses on the non-consensual aspect of the images and not Much along the lines of model cards [55] and datasheet 726 673 on the category-class and the ensuing content of the images for datasets [35], we propose dissemination of dataset audit 727 674 in it. This solution, however, brings forth further questions: cards. This allows large scale image dataset curators to 728 675 does this make image datasets accessible only to those who publish the goals, curation procedures, known shortcomings 729 676 can afford it? Will we end up with pool of images with a and caveats alongside their dataset dissemination. 730 677 predominantly financially disadvantaged participants? In Figure5, we have curated an example dataset audit card 731 for the ImageNet dataset using the quantitative analyses 678 Science is self-correcting so long as it is accessible and 732 carried out in Section5 679 open to critical engagement and this is what we have done 733 680 given what we know of these LSVDs. The secretive and 5. Quantitative dataset auditing: ImageNet as 734 681 opaque LSVDs thread a dangerous territory, given that they a template 735 682 directly or indirectly impact society. We strongly contend 736 We performed a cross-categorical quantitative analysis of 683 that making them open and accessible is a crucial first step 737 ImageNet to assess the extent of the ethical transgressions 684 towards an ethical scientific endeavour. 738 685 and the feasibility of model-annotation based approaches. 739 This resulted in an ImageNet census, entailing both image- 686 4.2. Differentially private obfuscation of the faces 740 687 level as well as class-level analysis across the 57 different 741 688 This path entails harnessing techniques such as DP-Blur metrics (see supplementary section) covering Count, Age 742 689 [32] with quantifiable privacy guarantees to obfuscate the and Gender (CAG), NSFW-scoring, semanticity of class la- 743 690 identity of the humans in the image. The Inclusive images bels and accuracy of classification using pre-trained models. 744 691 challenge [73], for example, already incorporated blurring We have distilled the important revelations of this census as 745 692 during dataset curation13 and addressed the downstream ef- a dataset audit card presented in Figure5. This audit also 746 693 fects surrounding change in predictive power of the models entailed a human-in-the-loop based hybrid-approach that the 747 694 trained on the blurred versions of the dataset curated. We pre-trained-model annotations (along the lines of [27, 92]) 748 695 believe that replication of this template that also clearly to segment the large dataset into smaller sub-sets and hand- 749 696 included avenues for recourse in case of an erroneously non- label the smaller subsets to generate two lists covering 62 750 697 blurred image being sighted by a researcher will be a step in misogynistic images and 30 image-classes with co-occuring 751 698 the right direction for the community at large. children. We used the DEX [69] and the InsightFace 752 14 699 [38] pre-trained models to generate the cardinality, gender 753 700 13https://www.kaggle.com/c/ 14While harnessing these pre-trained gender classification models, we 754 701 inclusive-images-challenge would like to strongly emphasize that the specific models and the problems 755 OR OR # # #. OPENREVIEW VERSION

756 file_name shape file_contents 810 757 df_insightface_stats.csv (1000, 30) 24 classwise statistical parameters obtained by running the InsightFace model ([38]) on the ImageNet dataset 811 df_audit_age_gender_dex.csv (1000, 12) 11 classwise (ordered by the wordnet-id) statistical parameters obtained from the json files (of the DEX paper) [69] 758 df_nsfw.csv (1000, 5) The mean and std of the NSFW scores of the train and val images arranged per-class. (Unnamed: 0: WordNetID of the class) 812 df_acc_classwise_resnet50.csv (1000, 7) Classwise accuracy metrics (& the image level preds) obtained by running the ResNet50 model on ImageNet train and Val sets 759 df_acc_classwise_NasNet_mobile.csv (1000, 7) Classwise accuracy metrics (& the image level preds) obtained by running the NasNet model on ImageNet train and Val sets 813 760 df_imagenet_names_umap.csv (1000, 5) DF with 2D UMAP embeddings of the Glove vectors of the classes of the ImageNet dataset 814 df_census_imagenet_61.csv (1000, 61) The MAIN census dataframe covering class-wise metrics across 61 parameters, all of which are explained in df_census_columns_interpretation.csv 761 df_census_columns_interpretation.csv (61, 2) The interpretations of the 61 metrics of the census dataframe above! 815 df_hand_survey.csv (61, 3) Dataframe contaimning the details of the 61 images unearthed via hand survey (Do not pay heed to 61. it is a mere coincidence) 762 df_classes_tiny_images_3.csv (75846, 3) Dataframe containing the class_ind, class_name (wordnet noun) and n_images 816 763 df_dog_analysis.csv (7, 4) Dataframe containing breed, gender_ratio and survey result from the paper Breed differences in canine aggression’ 817 764 818 765 Table 2: Meta datasets curated during the audit processes 819 766 820 767 skewness, and age-distribution results captured in Figure2. gorizing people. ImageNet has been championed as one of 821 768 This resulted in discovery of 83,436 images with persons, the most incredible breakthroughs in computer vision, and 822 769 encompassing 101,070 to 132,201 individuals, thus consti- AI in general. We indeed celebrate ImageNet’s achievement 823 770 tuting 8 − 10% of the dataset. Further, we munged together and recognize the creators’ efforts to grapple with some 824 771 gender, age, class semanticity15 and NSFW content flagging ethical questions. Nonetheless, ImageNet as well as other 825 772 information from the pre-trained NSFW-MobileNet-v2 large image datasets remain troublesome. In hindsight, per- 826 773 model [34] to help perform a guided search of misogynistic haps the ideal time to have raised ethical concerns regarding 827 774 consent-violating transgressions. This resulted in discov- LSVD curation would have been in 1966 at the birth of The 828 775 ery of 62 images16 across four categories: beach-voyeur- Summer Vision Project [60]. The right time after that was 829 776 photography, exposed-private-parts, verifiably pornographic when the creators of ImageNet embarked on the project to 830 777 and upskirt in the following classes: 445-Bikini, 638 -maillot, “map out the entire world of objects”. Nonetheless, these are 831 778 639-tank suit, 655-miniskirt and 459-brassiere (see Figure crucial conversations that the computer vision community 832 779 3). Lastly, we harnessed literature from areas spanning from needs to engage with now given the rapid democratization 833 780 dog-ownership bias ([42],[66]) to engendering of musical of scraping tools ([71, 72, 81]) and dataset-zoos ( 834 781 instruments ([88], [13]) to generate analysis of subtle forms [43, 65, 78]). The continued silence will only serve to cause 835 782 of human co-occurrence-based gender bias in Figure4. more harm than good in the future. In this regard, we have 836 783 Captured in Table2 are the details of the csv formatted outlined a few solutions, including audit cards, that can be 837 784 data assets curated for the community to build on. The considered to ameliorate some of the concerns raised. We 838 785 CAG statistics are covered in df_insightface_stats.csv have also curated meta-datasets and open-sourced the code 839 786 and df_audit_age_gender_dex.csv. Similarly, we have to carry out quantitative auditing using the ILSVRC2012 840 787 also curated NSFW scoring (df_nsfw.csv), Accuracy dataset as a template. However, we posit that the deeper 841 788 (df_acc_classwise_resnet50/_NasNet_mobile.csv and Se- problems are rooted in the wider structural traditions, incen- 842 789 manticity (df_imagenet_names_umap.csv) datasets as well. tives, and discourse of a field that treats ethical issues as an 843 790 df_census_imagenet_61.csv contains the 61 cumulative para- afterthought.A field where in the wild is often a euphemism 844 791 maters for each of the 1000 classes (with their column in- for without consent. We are up against a system that has ver- 845 792 terpretations in df_census_columns_interpretation.csv). We itably mastered ethics shopping, ethics bluewashing, ethics 846 793 have duly open-sourced these meta-datasets and 14 tutorial- lobbying, ethics dumping, and ethics shirking [33]. 847 794 styled Jupyter notebooks (spanning both ImageNet and Tiny- 848 795 Images datasets) for community access17. 849 796 Within such an ingrained tradition, even the most thought- 850 797 6. Conclusion and discussion ful scholar can find it challenging to pursue work outside 851 798 the frame of the “tradition”. Subsequently, radical ethics 852 799 We have sought to draw the of the machine that challenge deeply ingrained traditions need to be incen- 853 800 learning community towards the societal and ethical im- tivised and rewarded in order to bring about a shift in culture 854 801 plications of large scale datasets, such as the problem of that centres justice and the welfare of disproportionately 855 802 non-consensual images and the oft-hidden problems of cate- impacted communities. We urge the com- 856 803 munity to pay close attention to the direct and indirect im- 857 that they were intended to solve, when taken in isolation, stand on ethically 804 dubious grounds themselves. In this regard, we strongly concur with previ- pact of our work on society, especially on vulnerable groups. 858 805 ous work such as [85] that gender classification based on appearance of a Awareness of historical antecedents, contextual, and political 859 806 person in a digital image is both scientifically flawed and is a technology dimensions of current work is imperative is this regard. We 860 that bears a high risk of systemic abuse. 807 hope this work contributes to raising awareness and adds 861 15Obtained using GloVe embeddings [62] on the labels 808 16Listed in df_hand_survey.csv to a continued discussion of ethics and justice in machine 862 809 17Link: https://rb.gy/zccdps learning. 863 OR OR # # #. OPENREVIEW VERSION

864 918 References ference on computer vision and , pages 865 1251–1258, 2017.4 919 866 [1] Collaborative art tool for discovering images. [17] Creative Commons. Chang v. virgin mobile - creative 920 867 https://ganbreeder.app, 2019. [Online; accessed 9-8- commons. https://wiki.creativecommons.org/ 921 868 2019].5 wiki/Chang_v._Virgin_Mobile, Jun 2013. (Ac- 922 [2] Face search • pimeyes. https://pimeyes.com/en/, 869 cessed on 06/03/2020).5 923 May 2020. (Accessed on 05/04/2020).4 [18] Susan Corbett. Creative commons licences: A symptom or a 870 [3] Md Zahangir Alom, Tarek M Taha, Christopher Yakopcic, Ste- 924 871 cause? Available at SSRN 2028726, 2009.5 925 fan Westberg, Paheding Sidike, Mst Shamima Nasrin, Brian C [19] Susan Corbett. Creative commons licences, the copyright 872 926 Van Esesn, Abdul A S Awwal, and Vijayan K Asari. The regime and the online community: Is there a fatal disconnect? 873 history began from : A comprehensive survey on deep The Modern Law Review, 74(4):503–531, 2011.5 927 874 learning approaches. arXiv preprint arXiv:1803.01164, 2018. [20] Kate Crawford and Trevor Paglen. Excavating ai. https: 928 875 2 //www.excavating.ai/, Sep 2019. (Accessed on 929 876 [4] Emily Bazelon. Nazi anatomy history: The origins of 04/30/2020).3 930 877 conservatives’ anti-abortion claims that rape can’t cause [21] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li 931 pregnancy. http://www.slate.com/articles/ 878 Fei-Fei. Imagenet: A large-scale hierarchical image . 932 life/history/2013/11/nazi_anatomy_ 879 In 2009 IEEE conference on computer vision and pattern 933 history_the_origins_of_conservatives_ recognition, pages 248–255. Ieee, 2009.1 880 934 anti_abortion_claims_that.html, Nov 2013. [22] Jiankang Deng, Jia Guo, Xue Niannan, and Stefanos Zafeiriou. 881 (Accessed on 06/16/2020).1 Arcface: Additive angular margin loss for deep face recogni- 935 882 [5] Ruha Benjamin. Race after technology: Abolitionist tools for tion. In CVPR, 2019.6 936 883 the new jim code. John Wiley & Sons, 2019.5 [23] Jiankang Deng, Jia Guo, Zhou Yuxiang, Jinke Yu, Irene Kot- 937 884 [6] Lucas Beyer, Olivier J. Hénaff, Alexander Kolesnikov, Xi- sia, and Stefanos Zafeiriou. Retinaface: Single-stage dense 938 885 aohua Zhai, and Aäron van den Oord. Are we done with face localisation in the wild. In , 2019.6 939 886 ?, 2020.3 [24] Executive departments and agencies of the federal govern- 940 [7] Abeba Birhane and Fred Cummins. Algorithmic injustices: 887 ment of the United States. ecfr — code of federal regulations. 941 Towards a relational ethics. arXiv preprint arXiv:1912.07376, https://www.ecfr.gov/cgi-bin/text-idx? 888 942 2019.2,5 SID=d387165baf23de2b80af8ea39e2addad&mc= 889 943 [8] Colin Blain, Margaret Mackay, and Judith Tanner. Informed true&node=se45.1.46_1116&rgn=div8, Jun 2020. 890 consent the global picture. British Journal of Perioperative (Accessed on 06/02/2020).1 944 891 Nursing (United Kingdom), 12(11):402–407, 2002.1 [25] Catherine D’Ignazio and Lauren F Klein. Data feminism. 945 892 [9] Geoffrey C Bowker and Susan Leigh Star. Sorting things out: MIT Press, 2020.2,5 946 893 Classification and its consequences. MIT press, 2000.2,5 [26] Fabio Henrique Kiyoiti dos Santos Tanaka and Claus Aranha. 947 [10] Andrew Brock, Jeff Donahue, and Karen Simonyan. Large 894 using gans. Proceedings of Machine 948 scale gan training for high fidelity natural image synthesis. 895 Learning Research XXX, 1:16, 2019.7 949 arXiv preprint arXiv:1809.11096, 2018.4 [27] Chris Dulhanty and Alexander Wong. Auditing imagenet: 896 950 [11] Andrew Brock, Jeff Donahue, and Karen Simonyan. Large Towards a model-driven framework for annotating demo- 897 951 scale gan training for high fidelity natural image synthesis. graphic attributes of large-scale image datasets. arXiv preprint 898 arXiv preprint arXiv:1809.11096, 2018.5 arXiv:1905.01347, 2019.7 952 899 [12] Alexander L Brown, Jonathan Meer, and J Forrest Williams. [28] Chris Dulhanty and Alexander Wong. Investigating the impact 953 900 Why do people volunteer? an experimental analysis of prefer- of inclusion in face recognition training data on individual 954 901 ences for time donations. Management Science, 65(4):1455– face identification, 2020.5 955 902 1468, 2019.7 [29] Robert Eaglestone. One and the same? ethics, aesthetics, and 956 [13] Claudia Bullerjahn, Katharina Heller, and Jan Hoffmann. 903 truth. Poetics Today, 25(4):595–608, 2004.5 957 How masculine is a flute? a replication study on gender [30] Editorial. Time to discuss consent in digital-data 904 958 stereotypes and preferences for musical instruments among studies. https://www.nature.com/articles/ 905 959 young children. In Proceedings of the 14th International d41586-019-02322-z, July 2019. (Accessed on 906 Conference on Music and Cognition, pages 5–9, 06/02/2020).1 960 907 2016.8 [31] Virginia Eubanks. Automating inequality: How high-tech 961 908 [14] Joy Buolamwini and Timnit Gebru. Gender shades: Intersec- tools profile, police, and punish the poor. St. Martin’s Press, 962 909 tional accuracy disparities in commercial gender classification. 2018.5 963 910 In Conference on fairness, accountability and transparency, [32] Liyue Fan. Image pixelization with differential privacy. In 964 911 pages 77–91, 2018.2 IFIP Annual Conference on Data and Applications Security 965 [15] Emma Carroll and Jessica Coates. The school girl, the bill- 912 and Privacy, pages 148–162. Springer, 2018.7 966 board, and virgin: The virgin mobile case and the use of cre- [33] Luciano Floridi. Translating principles into practices of digital 913 967 ative commons licensed photographs by commercial entities. ethics: five risks of being unethical. Philosophy & Technology, 914 Knowledge policy for the 21st century. A legal perspective, 32(2):185–193, 2019.8 968 915 pages 181–204, 2011.5 [34] Bedapudi Praneeth Gant Laborde. Nsfw detection machine 969 916 [16] François Chollet. Xception: Deep learning with depthwise learning model. https://github.com/GantMan/ 970 917 separable . In Proceedings of the IEEE con- nsfw_model, Jan 2019. (Accessed on 05/31/2020).8 971 OR OR # # #. OPENREVIEW VERSION

972 1026 [35] Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jen- arXiv:1811.00982, 2018.2,4,5 973 nifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, [51] Fei Fei Li and Jia Deng. Where have we been? where are 1027 974 and Kate Crawford. Datasheets for datasets. arXiv preprint we going? http://image-net.org/challenges/ 1028 975 arXiv:1803.09010, 2018.7 talks_2017/imagenet_ilsvrc2017_v1.0.pdf, 1029 976 [36] Hila Gonen and Yoav Goldberg. Lipstick on a pig: Debiasing Sep 2017. (Accessed on 05/01/2020).1 1030 977 methods cover up systematic gender biases in word embed- [52] Clare McGlynn, Erika Rackley, and Ruth Houghton. Beyond 1031 978 dings but do not remove them. In Proceedings of the 2019 revenge porn: The continuum of image-based sexual abuse. 1032 979 Conference of the North American Chapter of the Association Feminist Legal Studies, 25(1):25–46, 2017.4 1033 [53] Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, 980 for Computational Linguistics: Human Language Technolo- 1034 gies, Volume 1 (Long and Short Papers), pages 609–614, and Cynthia Rudin. Pulse: Self-supervised photo upsampling 981 1035 2019.5 via latent space exploration of generative models, 2020.4 982 [37] Mary L Gray and Siddharth Suri. Ghost Work: How to [54] Ryan Merkley. Use and fair use: Statement on shared im- 1036 983 Stop Silicon Valley from Building a New Global Underclass. ages in facial recognition ai - creative commons, Mar 2019. 1037 984 Eamon Dolan Books, 2019.3 (Accessed on 06/03/2020).5 1038 985 [38] Jia Guo and Jiankang Deng. deepinsight/insightface: Face [55] Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker 1039 986 analysis project on mxnet. https://github.com/ Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, 1040 987 deepinsight/insightface, May 2020. (Accessed on Inioluwa Deborah Raji, and Timnit Gebru. Model cards for 1041 model reporting. arXiv preprint arXiv:1810.03993, 2018.7 988 05/31/2020).6,7,8 1042 [39] Herkko Hietanen. Creative commons olympics: How big [56] S Naidoo. Informed consent for photography in dental 989 1043 media is learning to license from amateur authors. J. Intell. practice: communication. South African Dental Journal, 990 1044 Prop. Info. Tech. & Elec. Com. L., 2:50, 2011.5 64(9):404–406, 2009.1 991 [40] Kashmir Hill. The Secretive Company That Might End Pri- [57] Arvind Narayanan and Vitaly Shmatikov. Robust de- 1045 992 vacy as We Know It, 2020.5 anonymization of large sparse datasets. In 2008 IEEE Sym- 1046 993 [41] , Oriol Vinyals, and Jeff Dean. Distill- posium on Security and Privacy (sp 2008), pages 111–125. 1047 994 ing the knowledge in a neural network. arXiv preprint IEEE, 2008.1 1048 995 arXiv:1503.02531, 2015.2,4 [58] Safiya Umoja Noble. of oppression: How search 1049 engines reinforce racism. nyu Press, 2018.5 996 [42] Elizabeth C Hirschman. Consumers and their animal com- 1050 panions. Journal of consumer research, 20(4):616–632, 1994. [59] Cathy O’neil. Weapons of math destruction: How big data 997 1051 8 increases inequality and threatens democracy. Broadway 998 [43] Google Inc. Dataset search. https://datasetsearch. Books, 2016.5 1052 999 research.google.com/, Sep 2018. (Accessed on [60] Seymour A Papert. The summer vision project. AIM-100, 1053 1000 06/17/2020).8 1966.4,8 1054 1001 [44] Khari Johnson. Aclu sues facial recognition startup clearview [61] Xingchao Peng, Ben Usman, Neela Kaushik, Dequan Wang, 1055 1002 ai for privacy and safety violations | venturebeat, May 2020. Judy Hoffman, and Kate Saenko. Visda: A synthetic-to-real 1056 1003 (Accessed on 06/02/2020).5 benchmark for visual domain adaptation. In Proceedings 1057 of the IEEE Conference on Computer Vision and Pattern 1004 [45] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 1058 Progressive growing of gans for improved quality, stability, Recognition Workshops, pages 2021–2026, 2018.7 1005 1059 and variation. arXiv preprint arXiv:1710.10196, 2017.4 [62] Jeffrey Pennington, Richard Socher, and Christopher D Man- 1006 [46] Maximilian Kasy and Rediet Abebe. Fairness, equality, and ning. Glove: Global vectors for word representation. In 1060 1007 power in algorithmic decision making. Technical report, Proceedings of the 2014 conference on empirical methods 1061 1008 Working paper, 2020.2 in natural language processing (EMNLP), pages 1532–1543, 1062 1009 [47] Matthew Kay, Cynthia Matuszek, and Sean A Munson. Un- 2014.8 1063 1010 equal representation and gender stereotypes in image search [63] PimEyes. Face search • pimeyes. https://pimeyes. 1064 1011 results for occupations. In Proceedings of the 33rd Annual com/en/, Jun 2020. (Accessed on 06/03/2020).4 1065 [64] Vinay Uday Prabhu, Sanghyun Han, Dian Ang Yap, Mihail 1012 ACM Conference on Human Factors in Systems, 1066 Douhaniaris, Preethi Seshadri, and John Whaley. Fonts-2- 1013 pages 3819–3828, 2015.2 1067 [48] Os Keyes. The misgendering machines: Trans/hci implica- handwriting: A seed-augment-train framework for universal 1014 1068 tions of automatic gender recognition. Proceedings of the digit classification. arXiv preprint arXiv:1905.08633, 2019. 1015 ACM on Human-Computer Interaction, 2(CSCW):1–22, 2018. 7 1069 1016 2 [65] PyTorch. torchvision.datasets — 1.5.0 docu- 1070 1017 [49] , Ilya Sutskever, and Geoffrey E Hinton. Im- mentation. https://pytorch.org/docs/stable/ 1071 1018 agenet classification with deep convolutional neural networks. torchvision/datasets.html, Jun 2020. (Accessed 1072 1019 In Advances in neural information processing systems, pages on 06/17/2020).8 1073 1020 1097–1105, 2012.2 [66] Michael Ramirez. “my dog’s just like me”: Dog ownership as 1074 a gender display. Symbolic Interaction, 29(3):373–391, 2006. 1021 [50] Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Ui- 1075 jlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Ste- 8 1022 1076 fan Popov, Matteo Malloci, Tom Duerig, et al. The open [67] Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and 1023 1077 images dataset v4: Unified image classification, object detec- Vaishaal Shankar. Do imagenet classifiers generalize to ima- 1024 tion, and visual relationship detection at scale. arXiv preprint genet? arXiv preprint arXiv:1902.10811, 2019.7 1078 1025 [68] Luc Rocher, Julien M Hendrickx, and Yves-Alexandre 1079 OR OR # # #. OPENREVIEW VERSION

1080 De Montjoye. Estimating the success of re-identifications nuremberg code. Bulletin of the History of Medicine, pages 1134 1081 in incomplete datasets using generative models. Nature com- 37–71, 2001.1 1135 1082 munications, 10(1):1–9, 2019.1 [85] Sarah Myers West, Meredith Whittaker, and Kate Crawford. 1136 1083 [69] Rasmus Rothe, Radu Timofte, and Luc Van Gool. Deep Discriminating systems. https://ainowinstitute. 1137 1084 expectation of real and apparent age from a single image org/discriminatingsystems.html, 2019.8 1138 1085 without facial landmarks. International Journal of Computer [86] Wikipedia. Streisand effect - wikipedia. https://en. 1139 1086 Vision, 126(2-4):144–157, 2018.6,7,8 wikipedia.org/wiki/Streisand_effect, April 1140 [70] , Jia Deng, Hao Su, Jonathan Krause, San- 2020. (Accessed on 04/29/2020).4 1087 1141 jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, [87] Benjamin Wilson, Judy Hoffman, and Jamie Morgenstern. 1088 1142 Aditya Khosla, Michael Bernstein, et al. Imagenet large Predictive inequity in object detection. arXiv preprint 1089 scale visual recognition challenge. International journal of arXiv:1902.11097, 2019.2 1143 1090 computer vision, 115(3):211–252, 2015.2 [88] Elizabeth R Wrape, Alexandra L Dittloff, and Jennifer L 1144 1091 [71] Anantha Natarajan S. Imagescraper · pypi. https://pypi. Callahan. Gender and musical instrument stereotypes in mid- 1145 1092 org/project/ImageScraper/, May 2015. (Accessed dle school children: Have trends changed? Update: Appli- 1146 1093 on 06/17/2020).8 cations of Research in Music Education, 34(3):40–47, 2016. 1147 1094 [72] Anubhav Sachan. bingscraper · pypi. https://pypi. 8 1148 [89] Baoyuan Wu, Weidong Chen, Yanbo Fan, Yong Zhang, Jin- 1095 org/project/bingscraper/, July 2018. (Accessed 1149 on 06/17/2020).8 long Hou, Jie Liu, and Tong Zhang. Tencent ml-images: A 1096 1150 [73] Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, large-scale multi-label image database for visual representa- 1097 Jimbo Wilson, and D Sculley. No classification without rep- tion learning. IEEE Access, 7:172683–172693, 2019.2 1151 1098 resentation: Assessing geodiversity issues in open data sets [90] Blaise Aguera y Arcas, Margaret Mitchell, and Alexander 1152 1099 for the developing world. arXiv preprint arXiv:1711.08536, Todorov. Physiognomy’s new clothes. Medium (6 May 2017), 1153 1100 2017.7 online:< https://medium. com/@ blaisea/physiognomys-new- 1154 1101 [74] Congzheng Song, Thomas Ristenpart, and Vitaly Shmatikov. clothesf2d4b59fdd6a, 2017.3 1155 1102 Machine learning models that remember too much. In Pro- [91] Kaiyu Yang, Klint Qinami, Li Fei-Fei, Jia Deng, and Olga 1156 1103 ceedings of the 2017 ACM SIGSAC Conference on Computer Russakovsky. Towards fairer datasets: Filtering and balancing 1157 the distribution of the people subtree in the imagenet hier- 1104 and Communications Security, pages 587–601, 2017.5 1158 [75] Susan Leigh Star and Geoffrey C Bowker. Enacting silence: archy. In Proceedings of the 2020 Conference on Fairness, 1105 1159 Residual categories as a challenge for ethics, information Accountability, and Transparency, pages 547–558, 2020.3 1106 systems, and communication. Ethics and Information Tech- [92] Kaiyu Yang, Klint Qinami, Li Fei-Fei, Jia Deng, and Olga 1160 1107 nology, 9(4):273–280, 2007.2,5 Russakovsky. Towards fairer datasets: Filtering and balancing 1161 1108 [76] Lucy Suchman. Human-machine reconfigurations: Plans and the distribution of the people subtree in the imagenet hier- 1162 1109 situated actions. Cambridge university press, 2007.5 archy. In Proceedings of the 2020 Conference on Fairness, 1163 1110 [77] Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhi- Accountability, and Transparency, pages 547–558, 2020.5,7 1164 1111 nav Gupta. Revisiting unreasonable effectiveness of data in [93] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, 1165 1112 deep learning era. In Proceedings of the IEEE international and Antonio Torralba. Places: A 10 million image database 1166 conference on computer vision, pages 843–852, 2017.4 for scene recognition. IEEE transactions on pattern analysis 1113 1167 [78] TensorFlow. Tensorflow datasets. https://www. and machine intelligence, 40(6):1452–1464, 2017.2 1114 .org/datasets, Jun 2020. (Accessed on 1168 1115 06/17/2020).8 1169 1116 [79] Antonio Torralba, Rob Fergus, and William T Freeman. 80 1170 1117 million tiny images: A large data set for nonparametric object 1171 1118 and scene recognition. IEEE transactions on pattern analysis 1172 1119 and machine intelligence, 30(11):1958–1970, 2008.1,2,3 1173 1120 [80] Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, An- 1174 drew Ilyas, and Aleksander Madry. From imagenet to image 1121 1175 classification: Contextualizing progress on benchmarks, 2020. 1122 1176 3 1123 [81] Amol Umrale. imagebot · pypi. https://pypi. 1177 1124 org/project/imagebot/, July 2015. (Accessed on 1178 1125 06/17/2020).8 1179 1126 [82] Haohan Wang, Songwei Ge, Eric P. Xing, and Zachary C. 1180 1127 Lipton. Learning robust global representations by penalizing 1181 1128 local predictive power, 2019.7 1182 [83] Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and 1129 1183 Alexei A Efros. Dataset distillation. arXiv preprint 1130 1184 arXiv:1811.10959, 2018.7 1131 [84] Paul Weindling. The origins of informed consent: the interna- 1185 1132 tional scientific commission on medical war crimes, and the 1186 1133 1187