Bias Mitigation in Data Sets Shea Brown Ryan Carrier Merve Hickok Adam Leon Smith ForHumanity, BABL ForHumanity ForHumanity, AIethicist.org ForHumanity, Dragonfly
[email protected] [email protected] [email protected] adam@wearedragonfly.co Abstract—Tackling sample bias, Non-Response Bias, Cognitive 2) Ethical and reputational risk - perpetuation of stereotypes Bias, and disparate impact associated with Protected Categories and discrimination, failure to abide by a Code of Ethics in three parts/papers, data, algorithm construction, and output or social responsibility policy impact. This paper covers the Data section. 3) Functional risk - bad validity, poor fit, and misalignment Index Terms—artificial intelligence, machine learning, bias, data quality with the intended scope, nature, context and purpose of the system The desire of the authors, and ForHumanity’s Independent I. INTRODUCTION Audit of AI Systems, is to explain and document mitigations Bias is a loaded word. It has serious implications and gets related to all bias in its many manifestations in the data input used a lot in the debates about algorithms that occur today. phase of AI/ML model development. Research around bias in AI/ML systems often draws in fairness, Circling back to the societal concern with the statistical even though they are two very different concepts. To avoid notion of bias - societies, under the law of different jurisdictions, confusion and help with bias mitigation, we will keep the have determined that bias is prohibited with respect to certain concepts separate. In its widest sense, bias can originate, data characteristics - human characteristics, such as race, age, or be found, across the whole spectrum of decisions made gender etc.