Large Image Datasets: a Pyrrhic Win for Computer Vision?

OR OR # # #. OPENREVIEW VERSION 000 054 001 055 002 056 003 Large image datasets: A pyrrhic win for computer vision? 057 004 058 005 059 006 Anonymous submission 060 007 061 008 062 009 Paper ID 063 010 064 011 065 012 Abstract that still affords the basic safeguards towards protecting 066 013 one’s identity in large scale databases. However, in the age 067 014 In this paper we investigate problematic practices and of Big Data, the fundamentals of informed consent, privacy, 068 015 consequences of large scale vision datasets. We examine or agency of the individual have gradually been eroded. In- 069 016 broad issues such as the question of consent and justice stitutions, academia, and industry alike, amass millions of 070 017 as well as specific concerns such as the inclusion of verifi- images of people without consent and often for unstated 071 018 ably pornographic images in datasets. Taking the ImageNet- purposes under the guise of anonymization, a claim that is 072 019 ILSVRC-2012 dataset as an example, we perform a cross- both ephemeral [57, 68] and vacuous [30]. As can be seen 073 020 sectional model-based quantitative census covering factors in Table1, several tens of millions of images of people are 074 021 such as age, gender, NSFW content scoring, class-wise accu- found in peer-reviewed literature. These images are obtained 075 022 racy, human-cardinality-analysis, and the semanticity of the without consent or awareness of the individuals or IRB ap- 076 023 image class information in order to statistically investigate proval for collection. In Section 5-B of [79], for instance, 077 024 the extent and subtleties of ethical transgressions. We then the authors nonchalantly state “As many images on the web 078 025 use the census to help hand-curate a look-up-table of images contain pictures of people, a large fraction (23%) of the 79 079 026 in the ImageNet-ILSVRC-2012 dataset that fall into the cate- million images in our dataset have people in them”. With this 080 027 gories of verifiably pornographic: shot in a non-consensual background, we now focus on one of the most celebrated and 081 028 setting (up-skirt), beach voyeuristic, and exposed private canonical large scale image datasets: the ImageNet dataset. 082 029 parts. We survey the landscape of harm and threats both 083 030 society broadly and individuals face due to uncritical and 1.1. ImageNet: A brief overview 084 031 ill-considered dataset curation practices. We then propose 085 The emergence of the ImageNet dataset [21] is widely 032 possible courses of correction and critique the pros and cons 086 considered a pivotal moment2 in the Deep Learning revolu- 033 of these. We have duly open-sourced all of the code and the 087 tion that transformed Computer Vision (CV), and Artificial 034 census meta-datasets generated in this endeavor for the com- 088 Intelligence (AI) in general. Prior to ImageNet, computer 035 puter vision community to build on. By unveiling the severity 089 vision and image processing researchers trained image classi- 036 of the threats, our hope is to motivate the constitution of 090 fication models on small dataset such as CalTech101 (9k im- 037 mandatory Institutional Review Boards (IRB) for large scale 091 ages), PASCAL-VOC (30k images), LabelMe (37k images), 038 dataset curation processes. 092 039 and the SUN (131k images) dataset (see slide-37 in [51]). 093 040 1. Introduction ImageNet, with over 14 million images spread across 21,841 094 041 synsets, replete with 1,034,908 bounding box annotations, 095 042 Born from World War II and the haunting and despi- brought in an aspect of scale that was previously missing. A 096 043 cable practices of Nazi era experimentation [4] the 1947 subset of 1.2 million images across 1000 classes was carved 097 044 Nuremberg code [84] and the subsequent 1964 Helsinki dec- out from this dataset to form the ImageNet-1k dataset (pop- 098 045 laration [30], helped to establish the doctrine of Informed ularly called ILSVRC-2012) which formed the basis for 099 046 Consent which builds on the fundamental notions of human the Task-1: classification challenge in the ImageNet Large 100 047 dignity and agency to control dissemination of information Scale Visual Recognition Challenge (ILSVRC). This soon 101 3 048 about oneself. This has shepherded data collection endeav- became widely touted as the Computer Vision Olympics . 102 049 ors in the medical and psychological sciences concerning The vastness of this dataset allowed a Convolutional Neural 103 050 human subjects, including photographic data [8, 56], for the 104 2“The data that transformed AI research—and possibly the world”: 051 105 past several decades. A less stringent version of informed https://bit.ly/2VRxx3L 052 consent, broad consent, proposed in 45 CFR 46.116(d) of the 3https://engineering.missouri.edu/2014/01/team-takes-top-rankings-in- 106 053 Revised Common Rule [24], has been recently introduced computer-vision-olympics/ 107 1 OR OR # # #. OPENREVIEW VERSION 108 Number of Number of 162 Number of images 109 Dataset categories consensual 163 (in millions) 110 (in thousands) images 164 111 JFT-300M ([41]) 300+ 18 0 165 112 Open Images ([50]) 9 20 0 166 113 Tiny-Images ([79]) 79 76 0 167 114 Tencent-ML ([89]) 18 11 0 168 115 ImageNet-(21K,11k1,1k) ([70]) (14, 12, 1) (22, 11, 1) 0 169 116 170 Places ([93]) 11 0.4 0 117 171 118 Table 1: Large scale image datasets containing people’s images 172 119 173 120 174 121 Network (CNN) with 60 million parameters [49] trained by narratives, obscuring political and moral reasoning behind a 175 122 the SuperVision team from University of Toronto to usher in category. Over time, messy and contingent histories hidden 176 123 the rebirth of the CNN-era (see [3]), which is now widely behind a category are forgotten and trivialized [75]. With the 177 124 dubbed the AlexNet moment in AI. adoption of taxonomy sources, image datasets inherit seem- 178 125 Although ImageNet was created over a decade ago, it ingly invisible yet profoundly consequential shortcomings. 179 126 remains one of the most influential and powerful image The dataset creation process, its implication for ML systems, 180 127 databases available today. Its power and magnitude is and subsequently, the societal impact of these systems has 181 128 matched by its unprecedented societal impact. Although attracted a substantial body of critique. We categorize such 182 129 an a posteriori audit might seem redundant a decade after body of work into two groups that compliment one another. 183 130 its creation, ImageNet’s continued significance and the cul- While the first group can be seen as concerned with the broad 184 131 ture it has fostered for other large scale datasets warrants an downstream effects, the other concentrates mainly on the 185 132 ongoing critical dialogue. dataset creation process itself. 186 133 From the questionable ways images were sourced, to 187 134 troublesome labeling of people in images, to the downstream 2.1. Broad critiques: 188 135 effects of training AI models using such images, ImageNet 189 136 and large scale vision datasets (LSVD) in general constitute The absence of critical engagement with canonical 190 137 a Pyrrhic win for computer vision. We argue, this win datasets disproportionately negatively impacts women, racial 191 138 has come at the expense of harm to minoritized groups and and ethnic minorities, and vulnerable individuals and com- 192 139 further aided the gradual erosion of privacy, consent, and munities at the margins of society [7]. For example, im- 193 140 agency of both the individual and the collective. age search results both exaggerate stereotypes and system- 194 141 The rest of this paper is structured as follows. In sec- atically under-represent women in search results for occu- 195 142 tion 2, we cover related work that has explored the ethi- pations [47]; object detection systems designed to detect 196 143 cal dimensions that arise with LSVD. In section 3, we de- pedestrians display higher error rates for recognition of de- 197 144 scribe the landscape of both the immediate and long term mographic groups with dark skin tones [87]; and gender 198 145 threats individuals and society as a whole encounter due to classification systems show disparities in image classifica- 199 146 ill-considered LSVD curation. In Section 4, we propose a tion accuracy where lighter-skin males are classified with 200 147 set of solutions which might assuage some of the concerns the highest accuracy while darker-skin females suffer the 201 148 raised in section 3. In Section 5, we present a template quan- most misclassification [14]. Gender classification systems 202 149 titative auditing procedure using the ILSVRC2012 dataset as that lean on binary and cis-genderist constructs operational- 203 150 an example and describe the data assets we have curated for ize gender in a trans-exclusive way resulting in tangible 204 151 the computer vision community to build on. We conclude harm to trans people [48]. With a persistent trend where 205 152 with broad reflections on LSVDs, society, ethics, and justice. minoritized and vulnerable individuals and communities of- 206 153 ten disproportionately suffer the negative outcomes of ML 207 154 2. Background and related work systems, [25] have called for a shift in rethinking ethics not 208 155 just as a fairness metric to mitigate the narrow concept of 209 156 The very declaration of a taxonomy brings some things bias but as practice that results in justice for the most neg- 210 157 into existence while rendering others invisible [9].

Load more