Decision Making with Limited Feedback:Error Bounds for Recidivism
Total Page:16
File Type:pdf, Size:1020Kb
Decision making with limited feedback: Error bounds for recidivism prediction and predictive policing Danielle Ensign Sorelle A. Friedler Scott Neville University of Utah Haverford College University of Utah Carlos Scheidegger Suresh Venkatasubramanian University of Arizona University of Utah ABSTRACT These deployed models mostly use traditional batch-mode When models are trained for deployment in decision-making machine learning, where decisions are made and observed in various real-world settings, they are typically trained in results supplement the training data for the next batch. batch mode. Historical data is used to train and validate the However, the problem of feedback makes traditional batch models prior to deployment. However, in many settings, feed- learning frameworks both inappropriate and incorrect. Hiring back changes the nature of the training process. Either the algorithms only receive feedback on people who were hired, learner does not get full feedback on its actions, or the deci- predictive policing algorithms only observe crime in neigh- sions made by the trained model influence what future train- borhoods they patrol, and so on. Secondly, decisions made by ing data it will see. We focus on the problems of recidivism the system influence the data that is fed to it in the future. For prediction and predictive policing, showing that both prob- example, once a decision has been made to patrol a certain lems (and others like these) can be abstracted into a general neighborhood, crime from that neighborhood will be fed into reinforcement learning framework called partial monitoring. the training apparatus for the next round of decision-making. We then design algorithms that yield provable guarantees on In this paper, we model these problems in a reinforcement regret for these problems, and discuss the policy implications learning setting, and derive algorithms with provable error of these solutions. bounds. Interestingly, these algorithms also translate into con- crete procedures that differ from current practice in the prob- CCS CONCEPTS lems under study. • Theory of computation ! Reinforcement learning; Regret The problems. We will focus on two problems that are of par- bounds; • Computing methodologies ! Machine learning al- ticular societal importance: recidivism prediction and predic- gorithms; tive policing. These problems are at the core of the algorithmic pipeline in criminal justice through which automated decision- KEYWORDS making has a huge material impact on society. In addition, Feedback loops, predictive policing, recidivism prediction, we focus on these problems because they serve as archetypal partial monitoring problems through which we can gain an understanding of ACM Reference format: generalizable issues faced in deployment. Another motivat- Danielle Ensign, Sorelle A. Friedler, Scott Neville, Carlos Scheidegger, ing factor in the choice of these questions is that systems for and Suresh Venkatasubramanian. 2017. Decision making with limited solving these problems are already in place and issues with feedback: Error bounds for recidivism prediction and predictive polic- these processes are already documented, making the discus- ing. In Proceedings of FAT/ML 2017, Halifax, Nova Scotia, Canada, August sion of remedies more urgent. The problems with recidivism 2017, 5 pages. prediction have been documented in the well-publicized and https://doi.org/10.1145/nnnnnnn.nnnnnnn Pulitzer-prize finalist work by ProPublica1 [ ], although in- terestingly the problems of limited feedback have not been 1 INTRODUCTION discussed. PredPol, a predictive policing system, has been shown to produce inaccurate feedback loops when deployed Machine learning models are increasingly being used to make in batch mode [16], so that police are repeatedly sent back to real-world decisions, such as who to hire, who should receive the same neighborhoods, even though the underlying crime a loan, where to send police, and who should receive parole. rate would suggest a different deployment. This research is funded in parts by the NSF under grants IIS-1633387, IIS-1633724 and IIS-1513651. Definition 1.1 (Recidivism Prediction). Given an inmate up Permission to make digital or hard copies of part or all of this work for personal for parole, use a model of re-offense (whether the individual or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice will reoffend within a fixed time period after being released) and the full citation on the first page. Copyrights for third-party components of to determine whether they should be granted parole. this work must be honored. For all other uses, contact the owner/author(s). FAT/ML 2017, August 2017, Halifax, Nova Scotia, Canada Definition 1.2 (Predictive Policing). Given historical crime © 2017 Copyright held by the owner/author(s). data for a collection of regions, decide how to allocate patrol ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. $15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnn officers to areas to detect crime. FAT/ML 2017, August 2017, Halifax, Nova Scotia, Canada Ensign, Friedler, Neville, Scheidegger and Venkatasubramanian Results. We formalize recidivism prediction and predictive In general, proving regret bounds for partial monitoring is policing in the context of reinforcement learning, providing hard because the feedback matrix H might bear no relation regret bounds in both strong and weak settings. Specifically, to the true loss matrix L. Thus, results in this area take two we present an O(T2/3) regret bound for predictive policing forms. One class of results look at general bounds on partial in the setting where we havep multiple officers patrolling two monitoring under assumptions about the relation between H neighborhoods, and O( T) mistake bounds for recidivism and L[6] and another class of results look at special sub-cases prediction. that are more amenable to analysis (such as the vast literature on bandits[8]). 2 RELATED WORK Context. In certain scenarios, the algorithm might be pro- While our work does not directly address issues of fairness, ac- vided with context to help it decide an appropriate action. The countability and transparency, it fits into the larger framework intuition behind context is that it provides a signal for what of the social implications of the use of algorithmic decision- the outcome might be. Formally, fix a class of functions F from making. The narrower question of defining notions of fair- the set of contexts X to distributions over the space of out- ness in sequential learning settings such as the ones we de- comes. This class is known to both algorithm and adversary. scribe has been studied extensively, primarily in the setting Before the interaction starts, the adversary secretly picks some of bandits (regular, contextual and linear) and Markov deci- f 2 F. At the start of each round, the algorithm is supplied sion processes[15, 14, 12, 13]. There, the primary goal is to with context xt 2 X . The adversary is then constrained to pick understand how to define fairness in such a process, and how an outcome as a random draw from the distribution f (xt) [4]. ensuring fairness might affect the ability to learn an accurate Regret and Mistake Bounds. For any partial monitoring al- model. gorithm, let the algorithm actions be a1, a2, ... , aT with corre- We note that the perspective from Markov decision pro- sponding outcomes o1, o2, ... , oT. Note that the actions might cesses (and POMDPs) has much to offer: however, the prob- be random variables. Then the (weak) regret of the algorithm lems of limited feedback relate more directly to the area of is its loss compared to the loss of any fixed action: partial monitoring [9] which we employ heavily in this paper. There are a number of systems currently in place for recidi- RT = `(ai, oi) − min `(a, oi) ∑ 2 ∑ vism prediction and predictive policing. While the details of i2T a A t≤T the actual implementations (such as COMPAS [18]) remain and the expected weak regret is E[R ]. We in turn maximize this proprietary, [7] provide a comprehensive review of the meth- T over all outcomes, yielding the worst-case expected (weak) ods used in this area. There has been important empirical work regret. Alternately, we can measure algorithm performance in [16] demonstrating the consequences of feedback loops in terms of mistake bounds. A mistake is an action-outcome pair simulation in the predictive policing setting (specifically the for which `(a, o) > 0, and the mistake bound of an algorithm system known as PREDPOL [17]. is the number of mistakes. Note that mistake bounds are not relative with respect to some fixed action. 3 PARTIAL MONITORING The reinforcement learning framework we will be using to 4 RECIDIVISM PREDICTION evaluate the above problems is the well-known partial moni- We now formalize the problem of recidivism prediction in the toring[19],[9, Chapter 6] framework. Formally, a partial moni- context of partial monitoring. Recall from Section1 that re- toring problem P = (A, Y, H, L) consists of a set of n actions cidivism prediction is the problem of determining if someone A = fa1, a2, ..., ang and a set of m outcomes (adversary actions) convicted of a crime will reoffend if released (often measured Y = fy1, y2, ..., ymg. There is a feedback function (also called a based on rearrest within a fixed time period, say, 2 years). feedback matrix) H : A × Y ! S that takes in a learner action Such predictions are then used to determine if parole should and an outcome and outputs some symbol s 2 S denoting be granted. The action here is the decision to grant parole, and information that the learner receives. Finally there is a hidden the outcome is whether a crime is subsequently committed or loss function (also called a loss matrix) L : A × Y ! R that not.