Minimax Group Fairness: Algorithms and Experiments
Total Page:16
File Type:pdf, Size:1020Kb
Minimax Group Fairness: Algorithms and Experiments Emily Diana Wesley Gill Michael Kearns University of Pennsylvania University of Pennsylvania University of Pennsylvania Amazon AWS AI Amazon AWS AI Amazon AWS AI [email protected] [email protected] [email protected] Krishnaram Kenthapadi Aaron Roth Amazon AWS AI University of Pennsylvania [email protected] Amazon AWS AI [email protected] ABSTRACT of the most intuitive and well-studied group fairness notions, and We consider a recently introduced framework in which fairness in enforcing it one often implicitly hopes that higher error rates is measured by worst-case outcomes across groups, rather than by on protected or “disadvantaged” groups will be reduced towards the more standard differences between group outcomes. In this the lower error rate of the majority or “advantaged” group. But framework we provide provably convergent oracle-efficient learn- in practice, equalizing error rates and similar notions may require ing algorithms (or equivalently, reductions to non-fair learning) artificially inflating error on easier-to-predict groups — without for minimax group fairness. Here the goal is that of minimizing necessarily decreasing the error for the harder to predict groups — the maximum error across all groups, rather than equalizing group and this may be undesirable for a variety of reasons. errors. Our algorithms apply to both regression and classification For example, consider the many social applications of machine settings and support both overall error and false positive or false learning in which most or even all of the targeted population is negative rates as the fairness measure of interest. They also support disadvantaged, such as predicting domestic situations in which relaxations of the fairness constraints, thus permitting study of the children may be at risk of physical or emotional harm [8]. While tradeoff between overall accuracy and minimax fairness. We com- we might be interested in ensuring that our predictions are roughly pare the experimental behavior and performance of our algorithms equally accurate across racial groups, income levels, or geographic across a variety of fairness-sensitive data sets and show empirical location, if this can only be achieved by raising lower group error cases in which minimax fairness is strictly and strongly preferable rates without lowering the error for any other population, then to equal outcome notions. arguably we will have only worsened overall social welfare, since this is not a setting where we can argue that we are “taking from CCS CONCEPTS the rich and giving to the poor.” Similar arguments can be made in other high-stakes applications, such as predictive modeling for • Theory of computation ! Design and analysis of algorithms. medical care. In these settings it might be preferable to consider KEYWORDS the alternative fairness criterion of achieving minimax group error, in which we seek not to equalize error rates, but to minimize the Fair Machine Learning; Minimax Fairness; Game Theory largest group error rate — that is, to make sure that the worst-off ACM Reference Format: group is as well-off as possible. Emily Diana, Wesley Gill, Michael Kearns, Krishnaram Kenthapadi, and Aaron Minimax group fairness, which was recently proposed by [21] in Roth. 2021. Minimax Group Fairness: Algorithms and Experiments. In Pro- the context of classification, has the property that any model that ceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES achieves it Pareto dominates (at least weakly, and possibly strictly) ’21), May 19–21, 2021, Virtual Event, USA. ACM, New York, NY, USA, 11 pages. an equalized-error model with respect to group error rates — that https://doi.org/10.1145/3461702.3462523 0 is, if 68 is the error rate on group 8 in a minimax solution, and 6 is the common error rate for all groups in a solution that equalizes 0 1 INTRODUCTION group error rates, then 6 ≥ 68 for all 8. If one or more of these Machine learning researchers and practitioners have often focused inequalities is strict, it constitutes a proof that equalized errors can on achieving group fairness with respect to protected attributes only be achieved by deliberately inflating the error of one or more such as race, gender, or ethnicity. Equality of error rates is one groups in the minimax solution. Put another way, one technique for finding an optimal solution subject to an equality of group error Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed rates constraint is to first find a minimax solution, and thento for profit or commercial advantage and that copies bear this notice and the full citation artifically inflate the error rate on any group that does not saturate on the first page. Copyrights for components of this work owned by others than ACM the minimax constraint — an “optimal algorithm” that makes plain must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a the deficiencies of equal error solutions. fee. Request permissions from [email protected]. In contrast to approaches that learn separate models for each AIES ’21, May 19–21, 2021, Virtual Event, USA protected group — which also satisfy minimax error, simply by © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-8473-5/21/05...$15.00 https://doi.org/10.1145/3461702.3462523 optimizing each group independently — (e.g. [11, 30]), the minimax set of non-convex losses. In applying similar techniques to a fair approach we use here has two key advantages: classification domain, our conceptual contributions include the • The minimax approach does not require that groups of in- ability to handle overlapping groups. terest be disjoint, which is a requirement for the approach Our work builds upon the notion of minimax fairness proposed of learning a different model for each group. This allows for by Martinez et al. [21] in the context of classification. Their algo- protecting groups defined by intersectional characteristics as rithm – Approximate Projection onto Start Sets (APStar) – is also in [17, 18], protecting (for example) not just groups defined an iterative process which alternates between finding a model that by race or gender alone, but also by combinations of race minimizes risk and updating group weightings using a variant of and gender. gradient descent, but they provide no convergence analysis. More • The minimax approach does not require that protected at- importantly, APStar relies on knowledge of the base distributions tributes be given as inputs to the trained model. This can be of the samples (while our algorithms do not) and it does not provide extremely important in domains (like credit and insurance) the capability to relax the minimax constraint and explore an error in which using protected attributes as features for prediction vs. fairness tradeoff. [21] analyze only the classification setting, but is illegal. we provide theory and perform experiments in both classification and regression settings. Because our meta-algorithm is easily exten- Our primary contributions are as follows: sible, we are able to generalize to non-disjoint groups and various (1) First, we propose two algorithms. The first finds a minimax error types (with an emphasis on false positive and false negative group fair model from a given statistical class, and the second rates). navigates tradeoffs between a relaxed notion of minimax Achieving minimax fairness over a large number of groups has fairness and overall accuracy. been proposed by [20] as a technique for achieving fairness when (2) Second, we prove that both algorithms converge and are or- protected group labels are not available. Our work relates to [20] acle efficient — meaning that they can be viewed as efficient in much the same way as it relates to [21], in that [20] is purely reductions to the problem of unconstrained (nonfair) learn- empirical, whereas we provide a formal analysis of our algorithm, ing over the same class. We also study their generalization as well as a version of the algorithm that allows us to relax the properties. minimax constraint and explore performance tradeoffs. (3) Third, we show how our framework can be easily extended Technically, our work uses a similar approach to [1, 2], which also to incorporate different types of error rates – such as false reduce a “fair” classification problem to a sequence of unconstrained positive and negative rates – and explain how to handle cost-sensitive classification problems via the introduction of a zero- overlapping groups. sum game. Crucial to this line of work is the connection between (4) Finally, we provide a thorough experimental analysis of our no regret learning and equilibrium convergence, originally due to two algorithms under different prediction regimes. In this [14]. section, we focus on the following: • We start with a demonstration of the learning process of 2 FRAMEWORK AND PRELIMINARIES learning a fair predictor from the class of linear regression We consider pairs of feature vectors and labels ¹G ,~ º= , where G models. This setting matches our theory exactly, because 8 8 8=1 8 is a feature vector, divided into groups f퐺 , . ,퐺 g. We choose weighted least squares regression is a convex problem, 1 a class 퐻 of models (either classification or regression models), and so we really do have efficient subroutines for the mapping features to predicted labels, with loss function ! taking unconstrained (nonfair) learning problem. values in »0, 1¼, average population error: • We conduct an exploration of the fairness vs. accuracy = tradeoff for regression and highlight an example in which 1 Õ n¹ℎº = !¹ℎ¹G º,~ º our minimax algorithms provide a substantial Pareto im- = 8 8 provement over the equality of error rates notion.