Automating Procedurally Fair Feature Selection in Machine Learning

Automating Procedurally Fair Feature Selection in Machine Learning Clara Belitz Lan Jiang Nigel Bosch University of Illinois University of Illinois University of Illinois Urbana–Champaign Urbana–Champaign Urbana–Champaign Champaign, IL, USA Champaign, IL, USA Champaign, IL, USA [email protected] [email protected] [email protected] ABSTRACT Conference on AI, Ethics, and Society (AIES ’21), May 19–21, 2021, Virtual In recent years, machine learning has become more common in ev- Event, USA. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/ eryday applications. Consequently, numerous studies have explored 3461702.3462585 issues of unfairness against specific groups or individuals in the context of these applications. Much of the previous work on unfair- 1 INTRODUCTION ness in machine learning has focused on the fairness of outcomes Approaches to improving algorithmic fairness in the context of ma- rather than process. We propose a feature selection method inspired chine learning applications have mainly focused on three categories by fair process (procedural fairness) in addition to fair outcome. of methods: pre-processing [9, 28], in-processing [17, 29, 46], and Specifically, we introduce the notion of unfairness weight, which post-processing [23]. Each proposes an intervention at a specific indicates how heavily to weight unfairness versus accuracy when stage of the machine learning process to achieve fairness, either measuring the marginal benefit of adding a new feature to a model. before, during, or after the model training process. In addition, a Our goal is to maintain accuracy while reducing unfairness, as fourth category of work involves ensuring that even earlier steps, defined by six common statistical definitions. We show that thisap- like data collection and annotation, are fair [27]. Most of these proach demonstrably decreases unfairness as the unfairness weight methods focus on fair predictions and derive their assessment of is increased, for most combinations of metrics and classifiers used. fairness by measuring outcome alone. In this paper, however, we A small subset of all the combinations of datasets (4), unfairness explore a machine learning method that modifies the fairness of metrics (6), and classifiers (3), however, demonstrated relatively the model building process by selecting which variables may be low unfairness initially. For these specific combinations, neither un- used when making decisions. This allows us to balance not only fairness nor accuracy were affected as unfairness weight changed, the compromise between accuracy and cost in the outcome, but demonstrating that this method does not reduce accuracy unless also the fairness of the features used in the learning process. there is also an equivalent decrease in unfairness. We also show The previously described approaches tend to focus specifically that this approach selects unfair features and sensitive features on the concept of fairness. In related work outside of machine for the model less frequently as the unfairness weight increases. learning, however, a distinction has been drawn between that of As such, this procedure is an effective approach to constructing fairness and unfairness. For example, Cojuharenco and Patient [13] classifiers that both reduce unfairness and are less likely to include show that when presented with language around unfairness, peo- unfair features in the modeling process. ple were more likely to consider inputs and processes rather than outcomes. Given our focus on inputs (features), we therefore use CCS CONCEPTS measurements of unfairness in this paper. This allows us to measure • Computing methodologies ! Feature selection; Machine the reduction of unfairness as we improve processes, rather than learning. outcome fairness alone. Outcome fairness is also referred to as distributive fairness, which KEYWORDS refers to the fairness of the outcomes of decision making, while procedural fairness refers to the fairness of the decision making Feature selection, fairness, bias, machine learning processes that lead to those outcomes [22]. We can interpret the ACM Reference Format: idea of fair process in this context to mean building the model itself Clara Belitz, Lan Jiang, and Nigel Bosch. 2021. Automating Procedurally Fair in a way that incorporates concerns of fairness [22]. This leads Feature Selection in Machine Learning. In Proceedings of the 2021 AAAI/ACM us to ask questions of the model. For example: are protected features included? Protected features describe groups that must not Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed be discriminated against in a formal or legal sense, as defined by for profit or commercial advantage and that copies bear this notice and the full citation each notion of (un)fairness [33]; examples of common protected on the first page. Copyrights for components of this work owned by others than ACM features are race and gender. Other demographic categories may must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a not be legally protected, but may still be questionable to use, such fee. Request permissions from [email protected]. as whether a student lives in a rural area. The combination of pro- AIES ’21, May 19–21, 2021, Virtual Event, USA. tected and otherwise undesirable features will together be referred © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-8473-5/21/05...$15.00 to as sensitive features in this paper. A second question we might ask https://doi.org/10.1145/3461702.3462585 is: are unfair features being included in decision making processes? Unfair features are those where one group is likely to benefit more statistical parity is a measure of population-wide fairness, and may than another from their inclusion. Unfair features may be a proxy not account for discrimination against specific demographics [15]. for protected features (e.g., ZIP code as a proxy for race), biased due Given this constraint, and the fact that fairness is a social concept to historical or social contexts (e.g., standardized college entrance [11], it is generally necessary for researchers and users to choose exam scores [37]), or otherwise statistically skewed. Of course, a which definitions are appropriate to apply to a given classifier and dataset may include more than one sensitive or unfair feature and dataset, based on the targeted fair prediction outcomes. In this pa- it can be difficult to determine the full set of these features. Assuch, per, we thus demonstrate the effect of our proposed strategy ona one previous approach has examined how humans rate the inclu- variety of these fairness measures. sion of sensitive features, in general and when given knowledge Our proposed strategy is based on altering the definition of about how those features affect the accuracy and fairness of the accuracy to incorporate a conceptualization of unfairness during classifier [22]. In this paper we take inspiration from Grgić-Hlača the feature selection process. Selecting which features to include et al. [22] to move beyond distributive fairness, and examine how in a model is crucial because adding more features may lead to automating feature selection can contribute to a fair process for worse predictions [44], in addition to increasing model complexity. building classifiers. The value of our approach to practitioners is When training a classifier, then, only the “best” features should that our feature selection process can avoid sensitive features that be included in the model. How “best” is defined, however, has a may have been overlooked. Rather than declaring one or more strong effect on the resulting model41 [ ]. Past research has tended features sensitive and thus off-limits, this approach generalizes the to use “most accurate,” as defined by a specific measure of error, desired statistical measure of unfairness to ensure that each feature as the indicator of best. However, measures of accuracy optimize selected is more fair to use. prediction outcomes based on a particular dataset, but the data In sum, in this paper we define a straightforward process for used can be unfair because it was created in an unfair world, it building procedurally fair classifiers by incorporating both unfair- is missing information, or it is otherwise unrepresentative [3, 5]. ness and accuracy considerations during feature selection. We in- Therefore, accuracy alone is not a neutral measure of classifier value. vestigate how this process affects both accuracy and unfairness Merely excluding sensitive features (e.g., protected group status outcomes in practice. As such, we explore how an automated fea- like race) is not, however, a perfect remedy for fixing unfairness. ture selection process can incorporate elements of fairness in order Often, removing a sensitive feature does not remove all correlated to improve both process and outcomes. We also specifically inves- proxy features, and even removing all of the correlated features tigate how a fairer feature selection process affects the inclusion does not always provide an acceptable trade-off in accuracy and of both sensitive features and unfair features in models. We ex- fairness [33]. In addition, previous work has shown that including plore how this approach works when applied to three commonly

Load more