Arxiv:2004.07999V4 [Cs.CV] 23 Jul 2021

REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets Angelina Wang · Alexander Liu · Ryan Zhang · Anat Kleiman · Leslie Kim · Dora Zhao · Iroha Shirai · Arvind Narayanan · Olga Russakovsky Abstract Machine learning models are known to per- petuate and even amplify the biases present in the data. However, these data biases frequently do not become apparent until after the models are deployed. Our work tackles this issue and enables the preemptive analysis of large-scale datasets. REVISE (REvealing VIsual bi- aSEs) is a tool that assists in the investigation of a visual dataset, surfacing potential biases along three Fig. 1: Our tool takes in as input a visual dataset and its dimensions: (1) object-based, (2) person-based, and (3) annotations, and outputs metrics, seeking to produce geography-based. Object-based biases relate to the size, insights and possible actions. context, or diversity of the depicted objects. Person- based metrics focus on analyzing the portrayal of people within the dataset. Geography-based analyses consider the representation of different geographic loca- the visual world, representing a particular distribution tions. These three dimensions are deeply intertwined of visual data. Since then, researchers have noted the in how they interact to bias a dataset, and REVISE under-representation of object classes (Buda et al., 2017; sheds light on this; the responsibility then lies with Liu et al., 2009; Oksuz et al., 2019; Ouyang et al., 2016; the user to consider the cultural and historical con- Salakhutdinov et al., 2011; J. Yang et al., 2014), ob- text, and to determine which of the revealed biases ject contexts (Choi et al., 2012; Rosenfeld et al., 2018), may be problematic. The tool further assists the user object sub-categories (Zhu et al., 2014), scenes (Zhou by suggesting actionable steps that may be taken to et al., 2017), gender (Kay et al., 2015), gender con- mitigate the revealed biases. Overall, the key aim of texts (Burns et al., 2018; J. Zhao et al., 2017), skin our work is to tackle the machine learning bias prob- tones (Buolamwini and Gebru, 2018; Wilson et al., 2019), lem early in the pipeline. REVISE is available at https: geographic locations (Shankar et al., 2017) and cul- //github.com/princetonvisualai/revise-tool. tures (DeVries et al., 2019). The downstream effects arXiv:2004.07999v4 [cs.CV] 23 Jul 2021 of these under-representations range from the more in- Keywords computer vision datasets · bias mitigation · nocuous, like limited generalization of car classifiers (Tor- tool ralba and Efros, 2011), to the much more serious, like having deep societal implications in automated facial analysis (Buolamwini and Gebru, 2018; Hill, 2020). Ef- 1 Introduction forts such as Datasheets for Datasets (Gebru et al., Computer vision dataset bias is a well-known and much- 2018) have played an important role in encouraging studied problem. Torralba and Efros (2011) highlighted dataset transparency through articulating the intent the fact that every dataset is a unique slice through of the dataset creators, summarizing the data collection processes, and warning downstream dataset users Angelina Wang of potential biases in the data. However, this alone is E-mail: [email protected] not sufficient, as there is no algorithm to identify all 2 Angelina Wang et al. biases hiding in the data, and manual review is not a { In the object detection dataset COCO (Lin et al., feasible strategy given the scale of modern datasets. 2014), some objects, e.g., airplane, bed and pizza, are frequently large in the image. This is because Bias detection tool: To mitigate this issue, we pro- fewer images of airplanes appear in the sky (far vide an automated tool for REvealing VIsual biaSEs away; small) than on the ground (close-up; large). (REVISE) in datasets. REVISE is a broad-purpose tool This may be a problem since object size plays a key for surfacing the under- and different- representations role in recognition accuracy. One mitigation is to hiding within visual datasets. For the current explo- query for images of airplane appearing in scenes ration we limit ourselves to three sets of metrics: (1) of mountains, desert, sky. object-based, (2) person-based and (3) geography-based. { The OpenImages dataset (Krasin et al., 2017) de- Object-based analysis is most familiar to the com- picts a large number of people who are too small in puter vision community (Torralba and Efros, 2011), as the image for human annotators to determine their many of the popular visual recognition datasets are gender; nevertheless, we found that annotators in- object-centric (Everingham et al., 2010; Russakovsky fer that they are male 69% of the time, especially in et al., 2015). Thus, these analyses focus on consider- scenes of outdoor sports fields, parks. Com- ing statistics about object frequency, scale, context, or puter vision researchers might want to exercise cau- diversity of representation. tion with these gender annotations so they don't Person-based analyses began to gain attention with propagate into the model. research showing unequal performance for people of { In the computer vision and multimedia dataset different genders and skin tones (Gebru et al., 2018; YFCC100m (Yahoo Flickr Creative Commons 100 J. Zhao et al., 2017). This line of analysis considers million) (Thomee et al., 2016) images come from the representation of people of different demographics 196 different countries. However, we estimate that within the dataset, and allows the user to assess what for around 47% of those countries | especially in potential downstream consequences this may have in developing regions of the world | the images are order to consider how best to intervene. It also builds predominantly photos taken by visitors to the coun- on the object-based analysis by considering how the try rather than by locals, potentially resulting in a representation of objects with people of different demo- stereotypical portrayal. graphic groups differs. A benefit of our tool is that a user doesn't need to Finally, geography-based analysis considers the por- have specific biases in mind, as these can be hard to enu- trayal of different geographic regions within the dataset; merate. Rather, the tool automatically surfaces unusual this is a relatively new but very important conversation patterns. REVISE cannot automatically say which of within the community (DeVries et al., 2019; Shankar et these patterns, or lack of patterns, are problematic, and al., 2017). This axis of analysis is deeply intertwined leaves that analysis up to the user's judgment and ex- with the previous two, as geography influences both pertise. We note that \bias" is a contested term, and the types of objects that are represented, as well as the while our tool seeks to surface a variety of findings that different people that are pictured. are interesting to dataset creators and users, not all We imagine two primary use cases for our tool: (1) may be considered forms of bias by everyone. dataset builders can use the actionable insights pro- duced by our tool during the process of dataset compi- lation to guide the direction of further data collection, 2 Related Work and (2) dataset users who train models can use the tool to understand what kinds of biases their models may Data collection: Visual datasets are constructed in inherit as a result of training on a particular dataset. various ways, with the most common being through keyword queries to search engines, whether singular Example Findings: REVISE automatically surfaces (e.g., ImageNet (Russakovsky et al., 2015)) or pairwise a variety of metrics that highlight unrepresentative or (e.g., COCO (Lin et al., 2014)), or by scraping web- anomalous patterns in the dataset. To validate the use- sites like Flickr (e.g., YFCC100m (Thomee et al., 2016), fulness of the tool, we have used it to analyze several OpenImages (Krasin et al., 2017)). There is extensive datasets commonly used in computer vision: COCO (Lin preprocessing and cleaning done on the datasets. Hu- et al., 2014), OpenImages (Krasin et al., 2017), man annotators, sometimes in conjunction with auto- YFCC100m (Thomee et al., 2016), and BDD100K (Yu mated tools (Zhou et al., 2017), then assign various et al., 2020). Some examples of the kinds of automatic labels and annotations. Dataset collectors put in sig- insights our tool has found include: nificant effort to deal with problems like long-tails to REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets 3 ensure a more balanced distribution, and intra-class di- Pleiss et al., 2017; Zhang et al., 2018) that are often versity by doing things like explicitly seeking out non- deemed to be mathematically incompatible with each iconic images beyond just the object itself in focus. other (Chouldechova, 2017; Kleinberg et al., 2017), but in this work, we look at the problem earlier in the Dataset Bias: Rather than pick a single definition, pipeline from the dataset side. we adopt an inclusive notion of bias and seek to highlight ways in which the dataset builder can monitor and Automated bias detectors: Facebook's Fairness Flow control the distribution of their data. Proposed ways (Facebook AI, 2021) focuses on assessing model predic- to deal with dataset bias include cross-dataset anal- tions for their contextual fairness, though also looks ysis (Torralba and Efros, 2011) and having the ma- at the labels themselves. IBM's AI Fairness 360 (Bel- chine learning community learn from data collection lamy et al., 2018) similarly discovers biases in machine approaches of other disciplines (Brown, 2014; Jo and learning models, and also looks into the datasets. How- Gebru, 2020). Recent work (Prabhu and Birhane, 2020) ever, its look into dataset biases is limited in that it has looked at dataset issues related to consent and jus- first trains a model on that dataset, then interrogates tice; the authors advocate for enforcing Institutional this trained model with specific questions.

Arxiv:2004.07999V4 [Cs.CV] 23 Jul 2021

What's the Point: Semantic Segmentation with Point Supervision

Feature Mining: a Novel Training Strategy for Convolutional Neural

AI for All by Mr. S. Arjun

Crowdsourcing in Computer Vision

Cornernet-Lite- Efficient Keypoint Based Object Detection

Deep Learning with Spherical Data Min Sun National Tsing Hua University Vision Science Lab Self-Intro

WHEN SMART MACHINES ARE BIASED Olga Russakovsky Is Working to Change That

Masters Thesis: Development of a Deep Learning Model for 3D Human Pose Estimation in Monocular Videos

Computer Vision and Its Implications Based on the Paper Imagenet Classification with Deep Convolutional Neural Networks by Alex Krizhevsky, Ilya Sutskever, Geoffrey E

A Study of Face Obfuscation in Imagenet

On Imagenet Roulette and Machine Vision

Demystifying Artificial Intelligence What Business Leaders Need to Know About Cognitive Technologies