
Proceedings of Machine Learning Research 83:1{9, 2018 Algorithmic Learning Theory 2018 Decision making with limited feedback: Error bounds for predictive policing and recidivism prediction ∗ Danielle Ensign [email protected] School of Computing, University of Utah Sorelle A. Frielder [email protected] Haverford College Scott Neville [email protected] School of Computing, University of Utah Carlos Scheidegger [email protected] University of Arizona Suresh Venkatasubramanian [email protected] School of Computing, University of Utah Editors: Mehryar Mohri and Karthik Sridharan Abstract 1. Introduction When models are trained for deployment in Machine learning models are increasingly being decision-making in various real-world set- used to make real-world decisions such as who to tings, they are typically trained in batch hire, who should receive a loan, where to send mode. Historical data is used to train and police, and who should receive parole. These validate the models prior to deployment. deployed models mostly use traditional batch- However, in many settings, feedback changes mode machine learning, where decisions are made the nature of the training process. Either the learner does not get full feedback on and observed results supplement the training data its actions, or the decisions made by the for the next batch. trained model influence what future train- However, the problem of feedback makes tradi- ing data it will see. tional batch learning frameworks both inappropri- ate and incorrect. Hiring algorithms only receive In this paper, we focus on the problems of feedback on people who were hired, and predictive recidivism prediction and predictive polic- ing. We present the first algorithms with policing algorithms only observe crime in neigh- provable regret for these problems, by show- borhoods they patrol. Secondly, decisions made ing that both problems (and others like by the system influence the data that is fed to these) can be abstracted into a general rein- it in the future. For example, once a decision forcement learning framework called partial has been made to patrol a certain neighborhood, monitoring. We also discuss the policy im- crime from that neighborhood will be fed into the plications of these solutions. training apparatus for the next round of decision- making. Keywords: Partial monitoring, online learning, predictive policing, recidivism pre- In this paper, we model these problems in a diction reinforcement learning setting, and derive algo- rithms with provable error bounds. Notably, these algorithms also translate into concrete procedures that differ from current practice in the problems ∗ This research is funded in parts by the NSF under under study. grants IIS-1633387, IIS-1633724 and IIS-1513651. c 2018 D. Ensign, S.A. Frielder, S. Neville, C. Scheidegger & S. Venkatasubramanian. Decision making with limited feedback The problems We will focus on two problems We also consider the degree to which feedback that are of particular societal importance: predic- affects algorithm performance, by considering in- tive policing and recidivism prediction. These stead the case when crime is reported instead of problems are at the core of the algorithmic discovered by patrol officers. Using our framework pipeline in criminal justice through which auto- from above, we show that this can be analyzed mated decision-making has a material impact on using a full information online linear optimiza- society. They also serve as archetypal problems tion framework,p yielding an algorithm with regret through which we can gain an understanding of O(kd T log k). generalizable issues faced in deployment. Another Turning now to recidivism prediction, we show motivating factor is that systems for solving these that it too has a natural analog in the partial mon- problems are already in use and issues with these itoring literature, in the form of the apple tasting processes are already documented, making the problem. By invoking results in that model, we discussion of remedies urgent. While problems present an algorithm (the first with a provable with recidivism prediction have been documented guarantee) for recidivismp prediction that achieves in the well-publicized and Pulitzer-prize finalist a mistake bound of T . work by ProPublica (Angwin et al., 2016), the We also examine the policy implications of these complications that arise from limited feedback results. In the case of predictive policing, our re- have not been discussed. PredPol, a predictive sults provide an alternative to current deployed policing system, has been shown to produce in- algorithms that are based on batch learning and accurate feedback loops when deployed in batch are vulnerable to runaway feedback loops (Ensign mode (Lum and Isaac, 2016), so that police are et al., 2017). In the case of recidivism predic- repeatedly sent back to the same neighborhoods, tion, our algorithm suggests a random process even though the underlying crime rate would sug- by which inmates are released: while this might gest a different deployment. not be a tenable practical solution, it resembles closely practical approaches involving the random Definition 1 (Predictive Policing) Given assignment of judges to decision-making. historical crime data for a collection of d regions, decide how to allocate k patrol officers to areas to detect crime. 2. Related Work Our work fits into the larger framework of the Definition 2 (Recidivism Prediction) social implications of algorithmic decision-making, Given an inmate up for parole, use a model of and as such it overlaps with the recent interest re-offense (whether the individual will reoffend in fairness, accountability, and transparency of within a fixed time period after being released) to these systems. The narrower question of defining determine whether they should be granted parole. notions of fairness in sequential learning settings such as the ones we describe has been studied Contributions. Our first contribution is a for- extensively, primarily in the setting of bandits mal model for predictive policing which places (regular, contextual and linear) and Markov deci- it in the framework of partial monitoring. We sion processes (Kannan et al., 2017; Joseph et al., exploit structure within the problem to reduce 2016b; Jabbari et al., 2016; Joseph et al., 2016a). it to a combinatorial semi-bandit problem. Our There, the primary goal is to understand how to reduction, combined with existing regret bounds define fairness in such a process, and how ensur- for such problems, yields an algorithm (the first ing fairness might affect the ability to learn an of itsp kind) for predictive policing that exhibits accurate model. O(kd kT ) regret over T iterations. This result, We note that the perspective from Markov deci- and the method used to prove it, is somewhat sion processes (and POMDPs) has much to offer: counter-intuitive: the \true loss" i.e the actual however, the problems of limited feedback relate crime rate is not revealed to the learner, but we more directly to the area of partial monitoring show that there are fully observable proxy losses (Cesa-Bianchi and Lugosi, 2006) which we employ that yield the same instantaneous (and therefore heavily in this paper. There are a number of overall) regret. systems currently in place for recidivism predic- 2 Decision making with limited feedback tion and predictive policing. While the details of and another class of results look at special sub- the actual implementations (such as COMPAS cases that are more amenable to analysis (such (NorthPointe, Inc.)) remain proprietary, Berk as the vast literature on bandits(Bubeck et al., and Bleich(2013) provide a comprehensive re- 2012)). view of the methods used in this area. There has Regret and Mistake Bounds For any partial been important empirical work (Lum and Isaac, monitoring algorithm, let the algorithm actions 2016) demonstrating the consequences of feedback be a ; a ; : : : ; a with corresponding outcomes loops in simulation in the predictive policing set- 1 2 T o ; o ; : : : ; o . Note that the actions might be ting (specifically the system known as PredPol 1 2 T random variables. Then the (weak) regret of the (Mohler et al., 2015)). algorithm is its loss compared to the loss of any fixed action: 3. Background X X RT = `(ai; oi) − min `(a; oi) a2A The reinforcement learning framework we will i2T t≤T be using to evaluate the above problems is the well-known partial monitoring framework (Pic- and the expected weak regret is E[RT ]. Our goal colboni and Schindelhauer, 2001),(Cesa-Bianchi will be to optimize this quantity in a minimax and Lugosi, 2006, Chapter 6). Formally, a partial way, (i.e over all adversaries and all strategies). monitoring problem P = (A; Y; H; L) consists of a Alternately, we can measure algorithm perfor- set of n actions A = fa1; a2; :::; ang and a set of m mance in terms of mistake bounds. A mistake is outcomes (adversary actions) Y = fy1; y2; :::; ymg. an action-outcome pair for which `(a; o) > 0, and There is a feedback function (also called a feed- the mistake bound of an algorithm is the number back matrix) H : A × Y ! Σ that takes in a of mistakes. Note that mistake bounds are not learner action and an outcome and outputs some relative with respect to some fixed action. symbol σ 2 Σ denoting information that the learner receives. Finally there is a loss function 4. Modeling Predictive Policing (also called a loss matrix) L: A × Y ! R that takes in an action and an outcome and outputs We now formalize predictive policing in a partial a loss (which is usually assumed to be positive). monitoring setting. Assume we have a police We denote h(at; yt) 2 Σ as the value of H given force consisting of k officers patrolling a set of an action and an outcome, and `(at; yt) 2 R as d regions.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages9 Page
-
File Size-