<<

THE BIASED ALGORITHM: EVIDENCE OF DISPARATE IMPACT ON HISPANICS

Melissa Hamilton* I. INTRODUCTION ...... 1

II. ALGORITHMIC RISK ASSESSMENT ...... 3

A. THE RISE OF ALGORITHMIC RISK ASSESSMENT IN CRIMINAL JUSTICE ...... 3 B. PRETRIAL RISK ASSESSMENT ...... 5 C. THE PROPUBLICA STUDY ...... 6 III. AUDITING THE BLACK BOX ...... 7

A. CALLS FOR THIRD PARTY AUDITS...... 7 B. BLACK-BOX TOOLS AND ETHNIC MINORITIES ...... 9 C. LEGAL CHALLENGES ...... 11 IV. EVALUATING ALGORITHMIC FAIRNESS WITH ETHNICITY ...... 13

A. THE SAMPLES AND THE TEST ...... 13 B. DIFFERENTIAL VALIDITY MEASURES ...... 14 C. TEST BIAS ...... 18 D. DIFFERENTIAL PREDICTION VIA E/O MEASURES...... 22 E. MEAN SCORE DIFFERENCES ...... 24 F. CLASSIFICATION ERRORS ...... 25 G. LIMITATIONS ...... 28 V. CONCLUSIONS ...... 29

I. INTRODUCTION Automated risk assessment is all the rage in the criminal justice system. Proponents view risk assessment as an objective way to reduce mass incarceration without sacrificing public safety. Officials thus are becoming heavily invested in risk assessment tools—with their reliance upon big data and algorithmic processing—to inform decisions on managing offenders according to their risk profiles. While the rise in algorithmic risk assessment tools has earned , a group of over 100 legal organizations, government watch groups, and minority rights associations (including the ACLU, NAACP, and Electronic

* Senior Lecturer of Law & Criminal Justice, University of Surrey School of Law; J.D., The University of Texas at Austin School of Law; Ph.D, The University of Texas at Austin (criminology/criminal justice).

Electronic copy available at: https://ssrn.com/abstract=3251763 2 56 AM. CRIM L. REV. (forthcoming 2019)

Frontier Foundation) recently signed onto “A Shared Statement of Civil Rights Concerns” expressing unease with whether the algorithms are fair.1 In 2016, the investigative journalist group ProPublica kickstarted a public debate on the topic when it proclaimed that a popular risk tool called COMPAS was biased against Blacks.2 Prominent news sites highlighted ProPublica’s message that this proved yet again an area in which criminal justice consequences were racist.3 Yet the potential that risk algorithms are unfair to another minority group has received far less in the media or amongst risk assessment scholars and statisticians: Hispanics.4 The general disregard here exists despite Hispanics representing an important cultural group in the American population considering recent estimates reveal that almost 58 million Hispanics live in the United States, they are the second largest minority, and their numbers are rising quickly.5 This Article intends to partly remedy this gap in interest by reporting on an empirical study about risk assessment with Hispanics at the center. The study uses a large dataset of pretrial defendants who were scored on a widely- used algorithmic risk assessment tool soon after their arrests. The report proceeds as follows. Section II briefly reviews the rise in algorithmic risk assessment in criminal justice generally, and then in pretrial contexts more specifically. The discussion summarizes the ProPublica findings regarding the risk tool COMPAS after it analyzed COMPAS scores comparing Blacks and Whites.

1 Ted Gest, Civil Rights Advocates Say Risk Assessment may “Worsen Racial Disparities” in Bail Decisions, THE CRIME REPORT (July 31, 2018), https://thecrimereport.org/2018/07/31/civil-rights-advocates-say-risk-assessment-may-wor sen-racial-disparities/. 2 See infra Section II.C. 3 E.g., Li Zhou, Is Your Software Racist?, POLITICO (Feb. 7, 2018, 5:05 AM), https://www.politico.com/agenda/story/2018/02/07/algorithmic-bias-software-recommenda tions-000631; Ed Yong, A Popular Algorithm is no Better at Predicting Crime than Random People, THE ATLANTIC (Jan. 18, 2018); Max Ehrenfreund, The Machines that Could Rid Courtrooms of Racism, WASHINGTON POST (Aug. 18, 2016), https://www.washingtonpost.com/news/wonk/wp/2016/08/18/why-a-computer-program- that-judges-rely-on-around-the-country-was-accused-of-racism/?noredirect=on&utm_term =.ce854f237cfe; NPR, The Hidden Discrimination in Criminal Risk-Assessment Scores (May 24, 2016), https://www npr.org/2016/05/24/479349654/the-hidden-discrimination-in- criminal-risk-assessment-scores. 4 Stephane M. Shepherd & Roberto Lewis-Fernandez, Forensic Risk Assessment and Cultural Diversity: Contemporary Challenges and Future Directions, 22 PSYCHOL. PUB. POL’Y & L. 427, 428 (2016). 5 Antonio Flores, How the U.S. Hispanic Population is Changing, PEW RESEARCH (Sept. 18, 2017), http://www.pewresearch.org/fact-tank/2017/09/18/how-the-u-s-hispanic- population-is-changing/.

Electronic copy available at: https://ssrn.com/abstract=3251763 Working Paper] The Biased Algorithm 3

Section III discusses further concerns that algorithmic-based risk tools may not be as transparent and neutral as many presume them to be. Insights from behavioral sciences literature suggest that risk tools may not necessarily incorporate factors that are universal or culturally-neutral. Hence, risk tools developed mainly on Whites may not perform as well on heterogeneous minority groups. As a result of these suspicions, experts are calling on third parties to independently audit the accuracy and fairness of risk algorithms. The study reported in Section IV responds to this invitation. Using the same dataset as ProPublica, we offer a range of statistical measures testing COMPAS’ accuracy and comparing outcomes for Hispanics versus non- Hispanics. Such measures address questions about the tool’s validity, predictive ability, and the potential for algorithmic unfairness and disparate impact upon Hispanics. Conclusions follow.

II. ALGORITHMIC RISK ASSESSMENT Risk assessment in criminal justice is about predicting an individual’s potential for recidivism in the future.6 Predictions have long been a part of criminal justice decisionmaking because of legitimate goals of protecting the public from those who have already been identified as offenders.7 Historically, risk predictions were generally based on gut instinct or the personal experience of the official responsible for making the relevant decision.8 Yet, advances in behavioral sciences, the availability of big data, and improvements in statistical modeling have ushered in a wave of more empirically-informed risk assessment tools. A. The Rise of Algorithmic Risk Assessment in Criminal Justice The “evidence-based practices movement” is the now popular term to describe the turn to drawing from behavioral sciences data to improve offender classifications.9 Scientific studies on recidivism outcomes are benefiting from the availability of large datasets (i.e., big data) tracking offenders post-release to statistically test for factors which that correlate with recidivism.10 Risk assessment tool developers use computer modeling to

6 Melissa Hamilton, Risk and Needs Assessment: Constitutional and Ethical Challenges, 52 AM. CRIM. L. REV. 231, 232 (2015). 7 Jordan M. Hyatt et al., Reform in Motion: The Promise and Perils of Incorporating Risk Assessments and Cost-Benefit Analysis into Pennsylvania Sentencing, 49 DUQ. L. REV. 707, 724-25 (2011). 8 Cecelia Klingele, The Promises and Perils of Evidence-Based Corrections, 91 NOTRE DAME L. REV. 537, 556 (2015). 9 Faye S. Taxman, The Partially Clothed Emperor: Evidence-Based Practices, 34 J. CONTEMP. CRIM. JUST. 97, 97-98 (2018). 10 Kelly Hannah-Moffat, Algorithmic Risk Governance: Big Data Analytics, Race and

Electronic copy available at: https://ssrn.com/abstract=3251763 4 56 AM. CRIM L. REV. (forthcoming 2019) combine factors of sufficiently high correlation and to weight them accordingly with increasingly complex algorithms.11 Broadly speaking, “[d]ata-driven algorithmic decision making may enhance overall government efficiency and public service delivery, by optimizing bureaucratic processes, providing real-time feedback and predicting outcomes.”12 With such a tool in hand, criminal justice officials can more consistently input relevant data and receive software-produced risk classifications.13 Dozens of automated risk assessment tools to predict recidivism are now available.14 They are popular. The utility of risk instruments has attracted energetic support from reputable policy centers, namely the Justice Center of the Council of State Governments,15 the Justice Management Institute,16 the Center for Effective Public Policy,17 the Vera Institute,18 and the Center for Court Innovation.19 News headlines and academic literature have also been expounding upon the benefits generated by the government’s use of big data to predict the future risk posed by individuals.20 Algorithmic risk assessment tools offer the ability

Information Activism in Criminal Justice Debates, THEORETICAL CRIMINOLOGY (forthcoming 2018). 11 An algorithm refers to “computation procedures (which can be more or less complex) drawing on some type of digital data (“big” or not) that provide some kind of quantitative output (be it a single score or multiple metrics) through a software program.” Angéle Christin, Algorithms in Practice: Comparing Web Journalism and Criminal Justice, BIG DATA & SOC’Y 1, 2 (July-Dec. 2018). 12 Bruno Lepri et al., Fair, Transparent, and Accountable Algorithmic Decision-Making Processes, PHIL. & TECH. 1, 1 (forthcoming 2018), www.nuriaoliver.com /papers/Philosophy_and_Technology_final.pdf. 13 J. Stephen Wormith, Automated Offender Risk Assessment, 16 CRIMOLOGY & PUB. POL’Y 281, 285 (2017). 14 Daniele Kehl et al., Algorithms in the Criminal Justice System: Assessing the Use of Risk Assessments in Sentencing 3, DASH (2017), https://dash.harvard.edu/bitstream/handle/1/33746041/2017-07_responsivecommunities_ 2.pdf?sequence=1. 15 COUNCIL OF STATE GOV’T, LESSONS FROM THE STATES: REDUCING RECIDIVISM AND CURVING CORRECTIONS COSTS THROUGH JUSTICE REINVESTMENT 6-7 (2013), http://csgjusticecenter.org/jr/publications/lessons-from-the-states. 16 MAREA BEEMAN & AIMEE WICKMAN, THE JUST. MGMT. INST., RISK AND NEEDS ASSESSMENT 3 (2013). 17 Ctr. for Effective Pub. Pol’y, Behavior Management of Justice0Involved Individuals 19 (2015), https://s3.amazonaws.com/static nicic.gov/Library/029553.pdf. 18 Memorandum from Vera Inst. of Just. to Delaware Just. Reinvestment Task Force, Oct. 12, 2011, at 1-2. 19 MICHAEL REMPEL, CTR. FOR COURT INNOVATION, EVIDENCE-BASED STRATEGIES FOR WORKING WITH OFFENDERS 1-2 (2014). 20 E.g., Crysta Jentile & Michelle Lawrence, How Government Use of Big Data can Harm Communities, FORD FOUNDATION (Aug. 30, 2016), https://www fordfoundation.org/ideas/equals-change-blog/posts/how-government-use-of- big-data-can-harm-communities/; Sony Kassam, Legality of Using Predictive Data to Determine Sentences Challenged in Wisconsin Supreme Court Case, A.B.A J. (June 27,

Electronic copy available at: https://ssrn.com/abstract=3251763 Working Paper] The Biased Algorithm 5 to reduce mass incarceration by diverting low risk defendants from prison, while targeting greater supervision and services to those at higher risk.21 Many parties presume that algorithmic risk assessment tools developed on big data represent a transparent, consistent, and logical method for classifying offenders.22 The mathematical character of risk assessment suggests the ability to quantify the future and transport it into the present.23 Evidence-based practices thereby present a welcome displacement of human instinct.24 Risk assessment practices have been heavily oriented to back-end decisions, such as sentencing, early release decisions, and post-incarceration supervision.25 More recent attention considers the potential benefits that automated risk assessment practices provide in pretrial settings.26 B. Pretrial Risk Assessment Algorithmic risk assessment informs such pretrial decisions as deferred adjudication and bail.27 The basic idea of risk-informed decisions for pretrial purposes has a longer trajectory, being first approved by the United States Supreme Court in a 1987 opinion. In United States v. Salerno, the Court found constitutional the practice of ordering pretrial detention based on an estimate of the individual defendant’s future dangerousness.28 Still, these predictions must be taken with care because of the potential consequences to individual rights. “Pretrial decision-making involves a fundamental tension between the court’s desire to protect citizens from dangerous criminals, ensure that accused individuals are judged before the law, and minimize the amount of pretrial meted out to legally innocent defendants.”29 The terminology has changed since Salerno from the vagueness of “future

2016, 1:07 PM), http://www.abajournal.com/news/article/legality_of_using_predictive _data_to_determine_sentences_challenged_in_wisc; Andrew Guthrie Ferguson, Big Data and Predictive Reasonable Suspicion, 163 U. PENN. L. REV. 327, 407 (2015). 21 Sarah L. Desmarais et al., Performance of Recidivism Risk Assessment Instruments in U.S. Correctional Settings, 13 PSYCHOL. SCI. 206, 206 (2016). 22 Jordan M. Hyatt et al., Reform in Motion: The Promise and Perils of Incorporating Risk Assessments and Cost-Benefit Analysis into Pennsylvania Sentencing, 49 DUQ. L. REV. 707, 725 (2011). 23 M. Roffey & S.Z. Kaliski, To Predict or not to Predict-That is the Question, 15 AFR. J. PSYCHIATRY 227, 227 (2012). 24 Alfred Blumstein, Some Perspectives on Quantitative Criminology Pre-JQC: And then Some, 26 J. QUANTITATIVE CRIMINOLOGY 549, 554 (2010). 25 Jessica M. Eaglin, Constructing Recidivism Risk, 67 EMORY L.J. 59, 67 (2017). 26 Kristin Bechtel et al., A Meta-Analytic Review of Pretrial Research: Risk Assessment, Bond Type, and Interventions, 42 AM. J. CRIM. JUST. 443, 444 (2017). 27 MAREA BEEMAN & AIMEE WICKMAN, THE JUST. MGMT. INST., RISK AND NEEDS ASSESSMENT 3 (2013). 28 United States v. Salerno, 481 U.S. 739, 751 (1987). 29 THOMAS BLOMBERG ET AL., VALIDATION OF THE COMPAS RISK ASSESSMENT CLASSIFICATION INSTRUMENT 4 (2010).

Electronic copy available at: https://ssrn.com/abstract=3251763 6 56 AM. CRIM L. REV. (forthcoming 2019) dangerousness” to a more refined perspective of “risk assessment.”30 Then risk assessment has evolved over the last three decades from unstructured clinical decisions to the actuarial form involving algorithmic processing.31 To improve the fairness and effectiveness of pretrial decisions, behavioral science experts encourage officials to use more objective criteria, such as those offered by evidence-based risk tools.32 Legal reformers generally welcome this practice as well. Risk assessment has become a foundation for the bail reform movement by offering a substitute to a long-standing dependence upon monetary bail.33 Releasing more defendants who do not pose a substantial risk can alleviate the harms that money bail systems disproportionately wreak on poor and minority defendants.34 At the same time, reducing the rate of pretrial detention prevents other negative consequences to individual offenders as studies consistently show that pretrial detention is correlated with a greater likelihood of a guilty plea, a longer sentence, job loss, family disruption, and violent victimization in jail.35 Notwithstanding the broad support and high hopes for algorithmic risk assessment practices in criminal justice, an investigative report publicized in 2016 called into question their objectivity and fairness. C. The ProPublica Study News journalists at ProPublica reported on statistical analyses the group had conducted involving a real dataset and a popular risk tool named COMPAS—the acronym for Correctional Offender Management Profiling for Alternative Sanctions. ProPublica investigators obtained through Freedom of Information Act requests the data on over 7,000 arrestees who were scored on COMPAS in a pretrial setting in a southern county of Florida.36 ProPublica concluded COMPAS was racist in that its algorithm

30 KIRK HEILBRUN, EVALUATION FOR RISK OF VIOLENCE IN ADULTS 708 (2009). 31 Cecelia Klingele, The Promises and Perils of Evidence-Based Corrections, 91 NOTRE DAME L. REV. 537 (2015). 32 Arthur Rizer & Caleb Watney, Artificial Intelligence can Make our Jail System More Efficient, Equitable and Just, TEX. REV. L. & POLITICS 1, 5 (forthcoming 2018). 33 Megan Stevenson, Assessing Risk Assessment, 102 MINN. L. REV. (forthcoming 2018), https://papers.ssrn.com/sol3/Delivery.cfm/SSRN_ID3139944_code2420348.pdf? abstractid=3016088&mirid=1. 34 John Logan Koepke & David G. Robinson, Danger Ahead: Risk Assessment and the Future of Bail Reform, WASH. L. REV. (forthcoming 2018), https://papers.ssrn.com/sol3/Delivery.cfm/SSRN_ID3142948_code2621669.pdf?abstractid =3041622&mirid=1. 35 Laura I. Appleman, Justice in the Shadowlands: Pretrial Detention, Punishment, & the Sixth Amendment, 69 WASH. & LEE L. REV. 1297, 1320 (2012). 36 Julia Angwin et al., Machine Bias, PROPUBLICA (May 23, 2016), https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.

Electronic copy available at: https://ssrn.com/abstract=3251763 Working Paper] The Biased Algorithm 7 produced a much higher false positive rate for Blacks than Whites, meaning that it overpredicted high risk for Blacks.37 COMPAS’s corporate owner, Northpointe, quickly rejected such characterization.38 After running their own statistical analyses on the same dataset ProPublica had compiled, Northpointe statisticians asserted that their results demonstrated COMPAS outcomes achieved predictive parity for Blacks and Whites.39 It turns out a rather simple explanation accounts for the dispute: contrasting measures of algorithmic fairness. ProPublica touted the false positive rate, while Northpointe preferred an alternative measure called the positive predictive value.40 As will be addressed later below, these measures are not synonymous and offer distinct, sometimes conflicting, impressions of a tool’s accuracy.41 III. AUDITING THE BLACK BOX For purposes here, a “black-box” tool refers to an algorithmic risk instrument which is not transparent about what is input into the software program and/or how the outputs are generated and quantified.42 This characterization is more probably appropriate in the case of an algorithmic instrument that is proprietary and its owner declines to reveal much information based on a claim of trade secrets.43 COMPAS, for example, is proprietary and its corporate owner declines to reveal its algorithm, which likely is a reason for ProPublica’s interest in auditing it. A. Calls for Third Party Audits ProPublica’s study certainly brought the issue of algorithmic fairness to the forefront in the popular media.44 Questions are being raised in the

37 Id. 38 Northpointe rebranded with the trade name equivant (lower case intended) in January 2017. Press Release, equivant, Courtview, Constellation & Northpointe Re-brand to equivant (2017), http://www.equivant.com/blog/we-have-rebranded-to-equivant. 39 William Dieterich et al., COMPAS Risk Scales: Demonstrating Accuracy Equity and Predictive Parity 2 (July 8, 2016), http://go.volarisgroup.com/rs/430-MBX- 989/images/ProPublica_Commentary_Final_070616.pdf. 40 Tafari Mbadiwe, Algorithmic Injustice, THE NEW ATLANTIC 1, 18 (Winter 2018), https://www.thenewatlantis.com/publications/algorithmic-injustice. 41 Infra Section IV.F. 42 ROBYN CAPLAN ET AL., ALGORITHMIC ACCOUNTABILITY: A PRIMER 2-3 (Apr. 18, 2018), https://datasociety.net/pubs/alg_accountability.pdf. 43 Sarah Tan et al., Auditing Black-Box Models Using Transparent Model Distillation with Side Information, Paper presented at NIPS 2017 Symposium on Interpretable Machine Learning (2017), https://arxiv.org/pdf/1710.06169. 44 See e.g., Li Zhou, Is Your Software Racist?, POLITICO (Feb. 7, 2018, 5:05 AM), https://www.politico.com/agenda/story/2018/02/07/algorithmic-bias-software-recommenda tions-000631.

Electronic copy available at: https://ssrn.com/abstract=3251763 8 56 AM. CRIM L. REV. (forthcoming 2019) scientific and policy communities that even “reasonable algorithms” may fail to result in fair and equitable treatment of diverse populations.45 There is even a new scientific literature on the topic named FATML, or “fairness, accountability and transparency in machine learning.”46 There is some overlap among the FATML goals: Fairness can . . . be related to the notion of transparency – the question of how much we are entitled to know about any automated system that is used to make or inform a decision that affects us. Hiding the inner workings of an algorithm from public view might seem preferable, to avoid gaming the system. But without transparency, how can decisions be probed and challenged?47 Importantly, no formal mechanism in the law or in the sciences exists to consistently enforce any form of algorithmic accountability.48 Thus, observers call for third party auditing, such as exemplified by ProPublica’s efforts, to engage in any form of scientific inquiry that may be possible to reveal information about the empirical validity and fairness of black-box tools.49 Such data will be useful to legal practitioners and policymakers in considering—or reevaluating—the use of automated risk assessment to inform criminal justice decisions which carry significant consequences for individuals.50 Moreover, despite the many advantages of algorithmic assessment, risk profiling may fail to alleviate all of the harms of mass incarceration as some “scholars are suspicious that contemporary extensions of risk assessment and risk reduction will likely only reproduce, or may even exacerbate, the injustices of contemporary criminal justice policy under a more ‘objective’ guise.”51 For instance, a White House Report on Big Data from the Obama administration lauded the public benefits of big data in criminal justice, but also promoted academic research into big data systems “to ensure that people are treated fairly.”52

45 OSONDE OSABA & WILLIAM WELSER IV, AN INTELLIGENCE IN OUR IMAGE: THE RISKS OF BIAS AND ERRORS IN ARTIFICIAL INTELLIGENCE 19 (2017) (Rand Corporation publication), https://www rand.org/pubs/research_reports/RR1744 html. 46 Harsh Gupta, Constitutional Perspectives on Machine Learning 4 (Dec. 17, 2017), https://osf.io/preprints/socarxiv/9v8js/download?format=pdf. 47 Sofia Olhede & Patrick Wolfe, When Algorithms go Wrong, Who is Liable?, 14 SIGNIFICANCE 8, 9 (2017). 48 See Robyn Caplan et al., Algorithmic Accountability: A Primer 10 (Apr. 18, 2018), https://datasociety.net/pubs/alg_accountability.pdf. 49 See Id. 50 Jennifer Skeem et al., Gender, Risk Assessment, and Sanctioning, 40 LAW & HUM. BEHAV. 580, 590 (2016). 51 Seth J. Prins & Adam Reich, Can we Avoid Reductionism in Risk Reduction?, 22 THEORETICAL CRIMINOLOGY 258, 259 (2018). 52 Executive Office of the President, Big Data: A Report on Algorithmic Systems,

Electronic copy available at: https://ssrn.com/abstract=3251763 Working Paper] The Biased Algorithm 9

A particularly acute focus is the concern that potential unfairness will likely fall mostly upon already beleaguered minority groups. “The use of big data analytics might impose disproportionate adverse impacts on protected classes, even when organizations do not intend to discriminate and do not use sensitive classifiers like race and gender.”53 Moreover, cross-disciplinary sharing is necessary because of difficulties in translation where data scientists are often not trained in law and policy, while civil rights experts in turn may not have statistical expertise.54 In sum, many are just realizing that big data analytics can create civil rights problems,55 such that the “patina of fairness” that otherwise seems to attach to big data algorithms may be unjustified.56 Thus, to the extent that justice decisions may bring negative consequences upon defendants, it is particularly advisable to study whether and how an algorithmic tool disparately impacts protected groups.57 B. Black-Box Tools and Ethnic Minorities The ProPublica study was concerned with Black minorities. This paper focuses on the ethnic minority group of Hispanics. This Article follows the tradition of the United States Census Bureau in classifying Hispanics as an ethnicity rather than a race.58 Importantly, Hispanics comprise a significant proportion of the American population, making them a reasonable population to analyze. Plus, reasons exist to suspect that an algorithm may not assess an ethnic minority group very well. “[A] transparent, facially neutral algorithm can still produce discriminatory results.”59 Even if a particular tool is shown to perform well on its training sample(s), it is not advisable to simply transport that tool across to new populations and settings because of the potential for risk-relevant differences in offenders and the availability of rehabilitation-oriented services that can undermine the tool’s performance.60 Unfortunately, officials

Opportunity, and Civil Rights 22-23 (May 2016), https://obamaWhitehouse.archives.gov/sites/default/files/microsites/ostp/2016_0504_data_ discrimination.pdf. 53 Mark MacCarthy, Standards of Fairness for Disparate Impact Assessment of Big Data Algorithms, 48 CUM. L. REV. 67, 79 (2017). 54 Solon Barocas et al., Big Data, Data Science, and Civil Rights 6 (2017), https://arxiv.org/pdf/1706.03102. 55 Id. at 1. 56 Anupam Chander, The Racist Algorithm?, 115 MICH. L. REV. 1023, 1034 (2017). 57 Mark MacCarthy, Standards of Fairness for Disparate Impact Assessment of Big Data Algorithms, 48 CUMB. L. REV. 67, 67 (2017). 58 U.S. Census Bureau, Race and Ethnicity, https://www.census.gov/mso/ www/training/pdf/race-ethnicity-onepager.pdf. 59 Anupam Chander, The Racist Algorithm?, 115 MICH. L. REV. 1023, 1024 (2017). 60 Sarah L. Desmarais et al., Performance of Recidivism Risk Assessment Instruments in

Electronic copy available at: https://ssrn.com/abstract=3251763 10 56 AM. CRIM L. REV. (forthcoming 2019) often disregard this advisory. Recent experience informs that “the application of risk knowledge is often haphazard: jurisdictions frequently deploy pre- existing screening tools in settings for which they were neither designed nor calibrated.”61 Another reason for suspicion is that, despite their disproportionate presence in criminal justice statistics, minorities tend to be underrepresented in testing or validation samples for most risk assessment tools.62 Yet it is unreasonable to assume risk assessment tools will perform as well for culturally diverse minority groups.63 A risk assessment process that presumes that risk tools are somehow universal, generic, or culturally-neutral may well be flawed on the following grounds: The over or under estimation of risk that can ensue from this process is entirely plausible given (a) the potential omission of meaningful risk items specific to minority populations, (b) the inclusion of risk factors that are more relevant to White offenders, and (c) variation in the cross-cultural manifestation and expression of existing risk items.64 For example, a risk tool may yield unequal results for cultural minorities if it fails to incorporate or otherwise consider their unique “behavioral practices and expectations, health beliefs, social/environmental experiences, phenomenology, illness narratives, deviant conduct, and worldview.”65 It is not surprising, then, when risk tools are originally normed on largely White samples that at least some studies show risk tools provide more accurate predictions for Whites than other groups.66 Regrettably, relatively few validation studies on ethnic minorities exist.67 Two recent validation studies that have included an ethnic minority group may be telling. The data reported with revalidation studies of the federal Pretrial Risk Assessment and the Post Conviction Risk Assessment tools show that each tool overpredicted or underpredicted recidivism for Hispanics

U.S. Correctional Settings, 13 PSYCHOL. SERV. 206, 207 (2016). 61 Seth J. Prins & Adam Reich, Can we Avoid Reductionism in Risk Reduction?, 22 THEORETICAL CRIMINOLOGY 258, 260 (2018) (internal citations omitted). 62 Stephane M. Shepherd & Roberto Lewis-Fernandez, Forensic Risk Assessment and Cultural Diversity: Contemporary Challenges and Future Directions, 22 PSYCHOL. PUB. POL’Y & L. 427, 428 (2016). 63 Id. 64 Id. at 429. 65 Id. 66 Jay P. Singh et al., Comparative Study of Violence Risk Assessment Tools: A Systematic Review and Metaregression Analysis of 68 Studies Involving 25,980 Participants, CLINICAL PSYCHOL. REV. (2011); Jay P. Singh & Seena Fazel, Forensic Risk Assessment: A Metareview, 37 CRIM. JUST. & BEHAV. 965, 978 (2010). 67 T. Douglas et al., Risk Assessment Tools in Criminal Justice and Forensic Psychiatry: The Need for Better Data, 42 EUR. PSYCHIATRY 134, 135 (2017).

Electronic copy available at: https://ssrn.com/abstract=3251763 Working Paper] The Biased Algorithm 11 for at least a few of the recidivism outcomes tested.68 Despite the seeming importance of these findings, the authors do not expound upon these results. With respect specifically to Hispanic Americans, academics have suggested that risk assessment tools may not perform well if they fail to “consider the centrality of family, acculturation strain, religiosity, gender role expectations, and culturally stoic responses to adversity” unique to this particular cultural group.69 C. Legal Challenges The legal and big data communities are beginning to realize that there may be legal implications in terms of disparate impact if algorithmic bias exists.70 One commentator, while acknowledging the potential for unfairness in risk assessment-led decisions in criminal justice, has suggested that the practice at least is better than uninformed judgments about dangerousness and that the adversarial process will encourage attorneys to challenge the tools, which may yield improvements to the science underlying them.71 Yet, so far there is very little evidence of such confrontations in the courts. An exception is a 2016 case, in which a defendant protested the black-box nature of COMPAS (the same tool in ProPublica’s research and used in the study presented herein). In Loomis v. Wisconsin, the defendant claimed the use of COMPAS in his sentencing proceeding constituted a due process violation because the software owner’s claims of trade secrets prevented him from challenging the tool’s validity or its algorithmic scoring.72 The Wisconsin Supreme Court found against Loomis, but with some caveats.73 The majority ruled that any use of COMPAS must come with certain written cautions, including warnings regarding the proprietary (hence, secretive) nature of the tool and

68 Thomas A. Cohen & Christopher Lowenkamp, Revalidation of the Federal Pretrial Risk Assessment Instrument (PTRA): Testing the PTRA for Predictive Biases 23 (Feb. 2018), https://www researchgate.net/publication/322958782_Revalidation_of_the_Federal_Pretria l_Risk_Assessment_Instrument_PTRA_Testing_the_PTRA_for_Predictive_Biases; Christopher T. Lowenkamp, PCRA Revisited: Testing the Validity of the Federal Post Conviction Risk Assessment (PCRA), 12 PSYCHOL. SERV. 149 tbl. 5 (2015). 69 Stephane M. Shepherd & Roberto Lewis-Fernandez, Forensic Risk Assessment and Cultural Diversity: Contemporary Challenges and Future Directions, 22 PSYCHOL. PUB. POL’Y & L. 427, 498 (2016). 70 Osonde Osaba & William Welser IV, An Intelligence in Our Image: The Risks of Bias and Errors in Artificial Intelligence, RAND 19 (2017) https://www rand.org/content/dam/rand/pubs/research_reports/RR1700/RR1744/RAND_R R1744.pdf. 71 David E. Patton, Guns, Crime Control, and a Systemic Approach to Federal Sentencing, 32 CARDOZO L. REV. 1427, 1456-58 (2011). 72 State v. Loomis, 881 N.W.2d 749, 753 (Wisc. 2016). 73 Id. at 769.

Electronic copy available at: https://ssrn.com/abstract=3251763 12 56 AM. CRIM L. REV. (forthcoming 2019) the potential that it may disproportionately classify minorities as high risk.74 The court seemed marginally concerned with the fact that COMPAS had never been validated on a Wisconsin-based population. The court suggested that more or less weight might be put on a COMPAS score if such a localized validation study were to be conducted.75 In contrast, a very recent decision by the Canadian Supreme Court found the lack of a relevant validation study dispositive. In Ewert v. Canada, the defendant, an indigenous minority, successfully protested the use of a different risk assessment tool (i.e., not COMPAS) because of the nonexistence of any validation studies of the tool specifically on indigenous groups.76 The Canadian Supreme Court, ruling in Ewert’s favor, determined that without evidence that the particular tool was free of cultural bias, it was unjust to use it on indigenous inmates.77 The justices observed that “substantive equality requires more than simply equal treatment” as treating groups identically may itself produce inequalities.78 Despite the divergence of the rulings in Loomis and Ewert, both courts supported the relevance of validation studies appropriate to the underlying population(s) on which the tool is to be used. According to the 100 plus groups of civil rights and defense counsel organizations in their “Shared Statement of Civil Rights Concerns,” the stakes are significant. “If the use of a particular pretrial risk assessment instrument by itself does not result in an independently audited, measurable decrease in the number of people detained pretrial, the tool should be pulled from use until it is recalibrated to cause demonstrably decarceral results.”79 Moreover, this Shared Statement encourages defense attorneys to inspect any risk tool to review how it was created, the input factors used, weights assigned to the factors, thresholds for risk bins, and outcome data used in its development.80 The next Section responds to the call for third-party auditing by academics to evaluate how a tool may perform on a minority group.

74 Id. 75 Id. 76 Ewert v. Canada, 2018 S.C.R. 30 (S.C.C. June 13, 2018). 77 Id. at ¶63. 78 Id. at ¶54. 79 African American Ministers in Action et al., The Use of Pretrial “Risk Assessment” Instruments: A Shared Statement of Civil Rights Concerns 5 (2018), http://civilrightsdocs.info/pdf/criminal-justice/Pretrial-Risk-Assessment-Full.pdf. 80 Id. at 7.

Electronic copy available at: https://ssrn.com/abstract=3251763 Working Paper] The Biased Algorithm 13

IV. EVALUATING ALGORITHMIC FAIRNESS WITH ETHNICITY This section reports on a study of COMPAS, a popular algorithmic risk assessment tool, employing the same real-world dataset that ProPublica compiled for its evaluation of outcomes for Blacks versus Whites.81 The data and the tool will be briefly addressed. Then a variety of empirical methods assess the accuracy, validity, and predictive ability of the tool and further to determine whether algorithmic unfairness or disparate impact appear present. A. The Samples and the Test The primary dataset includes individuals arrested in Broward County, Florida who were scored on a COMPAS risk scale soon after their arrests in 2013 and 2014.82 Notably, Broward County is among the top twenty largest American counties by population, thus improving the potential for a large and diverse sample set. The pretrial services division of the Broward County Sheriff’s Department has been using COMPAS since 2008 to inform judicial determinations concerning pretrial release.83 This study uses two subsets of data, one of which tracks general recidivism (n=6,172) and the other violent recidivism (n=4,020). The follow-up recidivism period is two years. COMPAS is a software application widely used across correctional institutions, offering a general recidivism risk scale and a violent recidivism risk scale.84 The general recidivism risk scale contains about two dozen items related to age at first arrest, age at intake, criminal history, drug problems, and vocational/educational problems (e.g., employment, possessing a skill or trade, high school grades, suspensions).85 The violent recidivism scale differs in that in lieu of criminal history generally it uses a history of violence and instead of drug problems, it incorporates factors related to a history of non- compliance (e.g., previous parole violations, arrested while on probation).86 The COMPAS algorithms produce outcomes as decile scores of 1-10 with

81 COMPAS is the “most successful and popular application of machine learning.” J. Stephen Wormith, Automated Offender Risk Assessment, 16 CRIMINOLOGY & PUB. POL’Y 281, 285 (2017) (citing developers using a decision tree model to educate the tool). 82 See generally Jeff Larson et al., How We Analyzed the COMPAS Recidivism Algorithm, PROPUBLICA (May 23, 2016), https://www.propublica.org/article/how-we- analyzed-the-compas-recidivism-algorithm. ProPublica has generously made the data available for other researchers to access. https://github.com/propublica/compas-analysis. 83 THOMAS BLOMBERG ET AL., VALIDATION OF THE COMPAS RISK ASSESSMENT CLASSIFICATION INSTRUMENT 15-16 (2010). 84 EQUIVANT, COMPAS CLASSIFICATION 2 (2017), http://equivant.volarisgroup.com/ assets/img/content/Classification.pdf. 85 NORTHPOINTE, COMPAS CORE NORMS FOR ADULT INSTITUTIONS 80 tbl. 3.41 (Feb. 11, 2014), https://epic.org/algorithmic-transparency/crim-justice/EPIC-16-06-23-WI-FOIA- 201600805-WIDOC_DAI_norm_report021114.pdf. 86 NORTHPOINTE, PRACTITIONER’S GUIDE TO COMPAS CORE 28, 29 (2015).

Electronic copy available at: https://ssrn.com/abstract=3251763 14 56 AM. CRIM L. REV. (forthcoming 2019) higher deciles representing greater predicted risk. COMPAS then subdivides decile scores into three, ordinal risk bins: low (deciles 1-4), medium (deciles 5-7), and high (deciles 8-10). These analyses generally followed the methodology of the ProPublica researchers in terms of defining what acts comprise general or violent recidivism, excluding cases with missing data, and excluding cases where the individuals were not scored on COMPAS in a timely manner.87 B. Differential Validity Measures Validity, while a technical term, simply means the extent to which a test properly reflects the concept it is designed to reflect.88 For purposes here, validity asks whether COMPAS adequately measures recidivism risk. The term differential validity applies when test validity varies between groups.89 Multiple measures related to validity are available to judge the diagnostic and prognostic capabilities of an assessment tool. Discrimination reflects how well the tool distinguishes higher risk from lower risk (i.e., relative accuracy).90 Calibration concerns how accurate the tool statistically estimates the outcome of interest (i.e., absolute predictive accuracy).91 A few descriptive statistics of the samples are provided in Figure 1. Figure 1. Descriptive Statistics (means or proportions) General Recidivism Violent Recidivism Sample Sample Factor Hispanic Non-Hispanic Hispanic Non-Hispanic Recidivism Rate .37 .46*** .10 .17*** Decile Score 3.38 4.51*** 2.64 3.33*** No. Prior Offenses 2.10 3.35*** 1.70 2.52*** Age 35.02 34.49 36.14 35.70 Females .16 .19 .17 .21 n 509 5663 355 3665 Notes: * p<.05, ** p<.01, *** p<.001. Figure 1 shows that general and violent recidivism rates are substantially

87 See generally Jeff Larson et al., How We Analyzed the COMPAS Recidivism Algorithm, PROPUBLICA (May 23, 2016), https://www.propublica.org/article/how-we- analyzed-the-compas-recidivism-algorithm. 88 MICHAEL G. MAXFIELD & EARL BABBIE, RESEARCH METHODS IN CRIMINAL JUSTICE AND CRIMINOLOGY 109 (2nd ed. 1998). 89 Christopher M. Berry et al., Can Racial/Ethnic Subgroup Criterion-to-Test Standard Deviant Ratios Account for Conflicting Differential Validity and Differential Prediction Evidence for Cognitive Ability Tests?, 87 J. OCCUPATIONAL & ORG. PSYCHOL. 208, 209 (2014). 90 L. Maaike Helmus & Kelly M. Babchishin, Primer on Risk Assessment and the Statistics Used to Evaluate its Accuracy, 44 CRIM. JUST. & BEHAV. 8, 11 (2017). 91 Id.

Electronic copy available at: https://ssrn.com/abstract=3251763 Working Paper] The Biased Algorithm 15 lower for Hispanics, that is at 20 and 41 percent lower than non-Hispanics, respectively. Decile score outcomes on both scales for Hispanics were also substantially lower. Lower mean decile scores for Hispanics can partly be explained by their also having lower mean numbers of prior offences as each scale includes factors that count criminal history. The table also includes a comparison on age and gender as these factors have been shown in previous studies to be strong predictors of recidivism.92 Here, test comparison results between groups of the average age of offenders and their gender makeup are not statistically significant. Thus, these results generally indicate that any differential validity between these groups is not obviously a result of age or gender disparities. Figure 2 includes statistics to explore the degree (i.e., the strength) of the relationship between COMPAS scores and recidivism for each scale and compares Hispanics with non-Hispanics.

Figure 2. COMPAS Failure Rates General Recidivism Violent Recidivism Sample Sample Hispanic Non-Hispanic Hispanic Non-Hispanic Risk Level Low .30 .32 .09 .11 Medium .55 .55 .08 .27*** High .57 .75*** .46 .49 AUC .637 .714*** .641 .722** r .238 .372** .148 .319*** Notes: * p<.05, ** p<.01, *** p<.001. The tabular statistics in Figure 2 contain a host of relevant information about COMPAS’ performance. 1. Validity of Risk Bins Three rows display actual rates of recidivism by offenders classified in each of the COMPAS risk bins of low, medium, and high. The results show both intergroup and intragroup disparities. Significant intergroup differences in recidivism rates occur in the high risk classification for the general recidivism risk scale (57% for Hispanics versus 75% for non-Hispanics) and in the medium risk bin for the violent recidivism scale (8% for Hispanics versus 27% for non-Hispanics). Thus, these two risk bins perform disparately regarding reoffending based on ethnicity. For the non-Hispanics group, actual recidivism rates operate in a

92 See generally David E. Olson et al., Comparing Male and Female Prison Releasees across Risk Factors and Postprison Recidivism, 26 WOMEN & CRIM. JUST. 122 (2016) (listing studies).

Electronic copy available at: https://ssrn.com/abstract=3251763 16 56 AM. CRIM L. REV. (forthcoming 2019) monotonic fashion as risk bins rise (low-medium-high) with substantively increasing rates of recidivism on each scale, thereby exemplifying favorable performance. Contrastingly, this is not the case for Hispanics. Highlighted within Table 2, one can observe that with the general recidivism scale, the recidivism rate for Hispanics is only slightly higher in the high risk bin than the medium bin, with the difference not statistically significant. Then with the violent recidivism scale, the rate for Hispanics unexpectedly decreases from the low to the medium bins. Hence, these measures confirm differential validity in that COMPAS fails to perform equivalently for these groups. Indeed, the three risk bin strategy collapses for Hispanics. (A two risk bin scheme appears more appropriate for Hispanics whereby the medium and high risk bins could be combined for general recidivism while a low and medium risk bin combination might better fit the data for violent recidivism.) 2. The Area Under the Curve The next row in Figure 2 involves a metric called the area under the curve (AUC)—it is derived from a statistical plotting of true positives and false positives across a risk tool’s rating system.93 More specifically, an AUC is a discrimination index that represents the probability that a randomly selected recidivist received a higher risk classification than a randomly selected non- recidivist.94 AUCs range from 0-1.0 with 0.5 indicating no better accuracy than chance and a 1.0 meaning perfect discrimination.95 For this statistic, COMPAS decile scores are utilized, rather than the ordinal bins. Risk assessment scholars often refer to AUCs of .56, .64, and .71 as the thresholds for small, medium, and large effect sizes, respectively.96 Using these suggestions, it would mean that COMPAS operates effectively for both groups, albeit at disparate magnitudes. The AUCs would represent moderate effect sizes for Hispanics and large effect sizes for non-Hispanics. Notwithstanding, agreement on the strength of AUCs is not universal.97 A more conservative conceptualization is that AUCs between .60 and .69 are poor, from .70 to .79 are fair, .80 to .89 are good, and over .90 are excellent.98

93 Jay P. Singh, Predictive Validity Performance Indicators in Violent Risk Assessment, 31 BEHAV. SCI. & L. 8, 15 (2013). 94 Jay P. Singh et al., Measurement of Predictive Validity in Violence Risk Assessment Studies, 31 BEHAV. SCI. & L. 55, 64 (2013). 95 Martin Rettenberger et al., Prospective Actuarial Risk Assessment: A Comparison of Five Risk Assessment Instruments in Different Sexual Offender Subtypes, 54 INT'L J. OFFENDER THERAPY & COMP. CRIMINOLOGY 169, 176 (2010). 96 L. Maaike Helmus & Kelly M. Babchishin, Primer on Risk Assessment and the Statistics Used to Evaluate its Accuracy, 44 CRIM. JUST. & BEHAV. 8, 12 (2017). 97 Jay P. Singh, Five Opportunities for Innovation in Violence Risk Assessment Research, 1 J. THREAT ASSESSMENT & MGMT. 179 181 (2014). 98 L. Maaike Helmus & Kelly M. Babchishin, Primer on Risk Assessment and the Statistics Used to Evaluate its Accuracy, 44 CRIM. JUST. & BEHAV. 8, 9 (2017).

Electronic copy available at: https://ssrn.com/abstract=3251763 Working Paper] The Biased Algorithm 17

Based on these standards, the AUCs for Hispanics would be judged as poor and for non-Hispanics as fair. Still, the AUC has serious limitations as discussed elsewhere and thus cannot present a holistic portrait of a tool’s abilities.99 Unfortunately, the AUC is too commonly misinterpreted as measuring calibration accuracy; but a higher AUC does not mean more accurate prospective prediction.100 Further, the AUC cannot calculate how well an instrument selects those at high risk.101 For example, the AUC could be very high even if no recidivists were ranked as high risk. To use a hypothetical, the AUC for COMPAS would actually reflect perfect accuracy (AUC=1.0) where all recidivists were classified Decile 2 and all non-recidivists as Decile 1 (i.e., all were classified as low risk), with very little distinction considering the decile scale ranges from 1 to 10. Importantly, substantial group differences indicated by corresponding z- tests for AUCs will indicate the presence of differential discrimination accuracy in terms of degree.102 Here, the AUCs indicate that COMPAS is a weaker discriminatory tool for Hispanics than non-Hispanics as the AUC measures on both the general and violent recidivism tools are lower for Hispanics, and with statistical significance (z-tests of the differences in proportions indicate levels of p=.000 and p=.001 for general and violent recidivism, respectively). These results on the AUCs are another indicator that COMPAS exhibits differential accuracy in its degree of discrimination utility on both scales relative to Hispanic ethnicity. 3. Correlations The final row in Figure 1 contains correlation coefficients representing another barometer of the strength of COMPAS decile scores relative to recidivism. Like the AUC, correlations test discrimination ability.103 Separate

99 Jay P. Singh, Predictive Validity Performance Indicators in Violent Risk Assessment, 31 BEHAV. SCI. & L. 8, 16-18 (2013); Stephane M. Shepherd & Danny Sullivan, Covert and Implicit Influences on the Interpretation of Violence Risk Instruments, 24 PSYCHIATRY, PSYCHOL. & L. 292, 294 (2017). 100 Jay P. Singh, Predictive Validity Performance Indicators in Violence Risk Assessment: A Methodological Primer, 31 BEHAV. SCI. & L. 8, 16 (2013). See e.g., Anthony W. Flores et al., False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And It’s Biased Against Blacks,” 80 FED. PROBATION 38, 41 (2016) (incorrectly referring to the AUC of COMPAS using the Broward County data as indicating “predictive accuracy”). 101 Jay P. Singh, Predictive Validity Performance Indicators in Violence Risk Assessment: A Methodological Primer, 31 BEHAV. SCI. & L. 8, 17 (2013). 102 Anthony W. Flores et al., False Positives, False Negatives, and False Analyses, 80 FED. PROBATION 38, 41 (2016). 103 L. Maaike Helmus & Kelly M. Babchishin, Primer on Risk Assessment and the Statistics Used to Evaluate its Accuracy, 44 CRIM. JUST. & BEHAV. 8, 11 (2017).

Electronic copy available at: https://ssrn.com/abstract=3251763 18 56 AM. CRIM L. REV. (forthcoming 2019) statistical runs (not shown in Figure 2) find that each of the four individual correlation coefficients are statistically significant at p=.005 or below. These individual correlations thus signify that the scales achieve some statistical strength relative to recidivism in both groups. Notwithstanding, correlation coefficients can signify differential validity as well.104 Here, comparing correlation statistics shows that COMPAS scales are more weakly associated with general and violent recidivism for Hispanics, and that these differences are statistically significant. In sum, this result adds to the cumulative evidence of differential validity for COMPAS regarding Hispanics. C. Test Bias A well calibrated instrument is one that is also “free from predictive bias or differential prediction.”105 Differential prediction demonstrates group disparities in predictive ability. Researchers examining group bias in psychological testing in education have standardized a methodology to empirically study it, with the endorsement of the American Psychological Association.106 Group bias represents test bias, which refers to the existence of systematic errors in how a test measures members of any group.107 This gold standard for evaluating test bias involves a series of nested models of regression equations involving the test, the group(s) of interest, and an interaction term (test*group) as predictors of test outcomes.108 Basically, a regression analysis is a statistical method to evaluate the relationship between one or more predictors with a response (outcome) variable.109 Then, an interaction term refers to the product of two predictor variables (here, test and group) to determine whether the effect on the outcome of either predictor is moderated by the presence of

104 Christopher M. Berry et al., Can Racial/Ethnic Subgroup Criterion-to-Test Standard Deviant Ratios Account for Conflicting Differential Validity and Differential Prediction Evidence for Cognitive Ability Tests?, 87 J. OCCUPATIONAL & ORG. PSYCHOL. 208, 209 (2014). 105 Alexandria Chouldechova, Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments, 5 BIG DATA 153, 154 (2017) (emphasis in original). 106 Nathan R. Kuncel & Davide M. Kleiger, Predictive Bias in Work and Educational Settings, in THE OXFORD HANDBOOK OF PERSONNEL ASSESSMENT 462, 463 (Neil Schmitt ed., 2012) (confirming endorsements also from the National Council on Measurement in Education and the American Educational Research Association). 107 Adam W. Meade & Michael Fetzer, Test Bias, Differential Prediction, and a Revised Approach for Determining the Suitability of a Predictor in a Selection Context, 12 ORG. RES. METHODS 738, 738 (2009). 108 Jeanne A. Teresi & Richard N. Jones, Bias in Psychological Assessment and Other Measures, in APA HANDBOOK OF TESTING AND ASSESSMENT IN PSYCHOLOGY 139, 144 (vol. 1 2013). 109 RONET D. BACHMAN & RAYMOND PATERNOSTER, STATISTICS FOR CRIMINOLOGY AND CRIMINAL JUSTICE 675 (1997).

Electronic copy available at: https://ssrn.com/abstract=3251763 Working Paper] The Biased Algorithm 19 the other (i.e., changes its strength or direction of influence).110 This nested models method detects group differences in the form of the relationship between the test and the outcome in terms of intercepts and slopes111 in order to reveal differential prediction.112 The rule of thumb in the psychological assessment field is that a significant group difference in either the intercept or the slope represents that a single regression equation for the groups combined will predict inaccurately for one or both groups, and therefore a separate equation for each group must be considered.113 Unequal intercepts or slopes also signify disparate impact.114 A system that treats persons unfairly suggests disparate impact may be present without requiring evidence of any discriminatory intent.115 Selected researchers in criminal justice have recently begun to apply this methodological practice of nested models to evaluate group bias in recidivism risk tools.116 The nested model structure here utilized variables labeled as Hispanic (coded as Hispanic=1, non-Hispanic=0), the COMPAS decile score, and an interaction between them as Hispanic*COMPAS decile score. A four model structure is employed with the outcome variable being recidivism. Model 1 tests only the Hispanic variable; Model 2 tests just COMPAS Decile score; Model 3 includes both the Hispanic and COMPAS decile score variables; and Model 4 retains Hispanic and COMPAS decile score while adding the interaction term. The general recidivism model utilizes the COMPAS general recidivism scale while the violent recidivism model relies upon the COMPAS violent recidivism scale. Figures 3 and 4 provide the relevant logistic regression equation results for general recidivism and violent recidivism, respectively. Logistic regression applies when investigating an association between one or more predictor variables with an outcome of interest that is dichotomous in nature

110 JAMES JACCARD, INTERACTION EFFECTS IN LOGISTIC REGRESSION 12 (2001). 111 Jennifer L. Skeem & Christopher T. Lowenkamp, Race, Risk, and Recidivism: Predictive Bias and Disparate Impact, 54 CRIMINOLOGY 680, 692 (2016). 112 Adam W. Meade & Michael Fetzer, Test Bias, Differential Prediction, and a Revised Approach for Determining the Suitability of a Predictor in a Selection Context, 12 ORG. RES. METHODS 738, 740 (2009). 113 Cecil R. Reynolds & Lisa A. Suzuki, Bias in Psychological Assessment: An Empirical Review and Recommendations, in HANDBOOK OF PSYCHOLOGY 82, 101 (Irving B. Weiner ed., 2003). 114 Adam W. Meade & Michael Fetzer, Test Bias, Differential Prediction, and a Revised Approach for Determining the Suitability of a Predictor in a Selection Context, 12 ORG. RES. METHODS 738, 741 (2009). 115 Andrew D. Selbst, Disparate Impact in Big Data Policing, 52 GA. L. REV. 109. 121- 22 (2017). 116 Jennifer Skeem et al., Gender, Risk, Assessment, and Sanctioning: The Cost of Treating Women Like Men, 40 LAW & HUM. BEHAV. 580, 585 (2016).

Electronic copy available at: https://ssrn.com/abstract=3251763 20 56 AM. CRIM L. REV. (forthcoming 2019)

(e.g., recidivist=yes/no).117 The logistic coefficients have been translated in the tables as odds ratios for interpretive purposes. Figure 3. Logistic Regression Predicting the Odds of General Recidivism Predictor Model 1 Model 2 Model 3 Model 4 Hispanic .686*** --- .914 1.284 Decile --- 1.323*** 1.322*** 1.331*** Hispanic*Decile Interaction ------.911* Constant .861 .239 .242 .235 -2LL 8490.49 7650.88 7650.11 7644.43 ϗ2 15.93 855.53 856.30 861.99 n=6,172 Notes: * p<.05, ** p<.01, *** p<.001. Coefficients represent odds ratios. Model 1 indicates that Hispanics are significantly less likely to engage in any act of recidivism. Model 2 supports the overall utility of COMPAS for the groups combined in that the odds of general recidivism increases 32% for every one unit increase in COMPAS decile score. In Models 3 and 4, when adding the decile score, Hispanic ethnicity is no longer statistically significant. While the odds are less for Hispanics in Model 3, such that there is evidence of overprediction for them, it is not statistically significant (p=.380). However, the interaction term in Model 4 is statistically significant (p=.015). This means that Hispanic ethnicity significantly moderates the relationship between COMPAS decile score and general recidivism. As the interaction term is below one, the regression slope is less steep for Hispanics, revealing that as the COMPAS score increases, it has weaker influence on predicting recidivism for Hispanics. In other words, COMPAS does not predict as strongly for Hispanics and thus is not as valid a predictor for that group.118 Further, unequal slopes reflect differential test prediction and thus present as disparate impact.

117 FRED C. PAMPEL, LOGISTIC REGRESSION 1 (2000). 118 See Christopher M. Berry, Differential Validity and Differential Prediction in Cognitive Ability Tests, 2 ANN. REV. ORG. PSYCHOL. & ORG. BEHAV. 435, 443 (2015).

Electronic copy available at: https://ssrn.com/abstract=3251763 Working Paper] The Biased Algorithm 21

Figure 4. Logistic Regression Predicting the Odds of Violent Recidivism Predictor Model 1 Model 2 Model 3 Model 4 Hispanic .540*** --- .678* 1.033 Decile --- 1.379*** 1.375*** 1.384*** Hispanic*Decile Interaction ------.891 Constant .202 .056 .058 .057 -2LL 3550.99 3207.91 3203.36 3201.18 ϗ2 13.00 356.08 360.63 362.80 n=4,020 Notes: * p<.05, ** p<.01, *** p<.001. Coefficients represent odds ratios. With respect to the COMPAS violent risk scale, there are some consistencies with the general risk scale just discussed. In Figure 4, Model 1 shows that Hispanics are significantly less likely to commit a violent offense. The utility of the COMPAS violent recidivism scale generally is supported in Model 2. For every one unit increase in the violent decile score, the odds of violent recidivism increases by 38%. The results here diverge from the regression models for general recidivism in two ways. The Hispanic variable in Model 3 remains statistically significant (p=.04), meaning that there are discrepancies in the intercepts of the regression lines for violent recidivism. The lower intercept means overprediction of Hispanics on the violent recidivism scale.119 But as the Model 4 interaction is not significant (p=.143), there is no detectable variation in the slopes. In sum, the nested regression models reveal differential predictive validity in COMPAS based on Hispanic ethnicity, though the form varies. For general recidivism, there is asymmetry in the slopes, while for the violent recidivism scale the problem is unequal intercepts. In both cases, though, differential predictive validity is shown and COMPAS can be challenged as presenting unfair and biased algorithmic results based on Hispanic ethnicity. Test bias also symbolizes that while COMPAS scores may have some meaning within groups, comparisons across groups are problematic.120 One may wonder if these regression models are too simplistic. For example, other researchers have found that gender, age, and criminal history are statistically significant factors in recidivism risk.121 Because of this,

119 See Christopher M. Berry, Differential Validity and Differential Prediction in Cognitive Ability Tests, 2 ANN. REV. ORG. PSYCHOL. & ORG. BEHAV. 435, 443 (2015). 120 See Adam W. Meade & Michael Fetzer, Test Bias, Differential Prediction, and a Revised Approach for Determining the Suitability of a Predictor in a Selection Context, 12 ORG. RES. METHODS 738, 738 (2009). 121 John Monahan et al., Age, Risk Assessment, and Sanctioning, 41 LAW & HUM. BEHAV. 191 (2017); Jennifer Skeem et al., Gender, Risk, Assessment, and Sanctioning: The Cost of Treating Women Like Men, 40 LAW & HUM. BEHAV. 580 (2016); Nicholas Scurich

Electronic copy available at: https://ssrn.com/abstract=3251763 22 56 AM. CRIM L. REV. (forthcoming 2019) separate regression models (not offered herein) were run to include controls for gender, age, and number of prior counts. The results were substantially the same: unequal slopes for the general recidivism scale and unequal intercepts for the violent recidivism scale. D. Differential Prediction via E/O Measures An alternative perspective on differential prediction looks to whether expected rates of recidivism are the same as observed recidivism rates in action, and that this is the case irrespective of group membership.122 Expected rates are generated from logistic regression models whereas observed rates are the actual recidivism statistics. We can show the differences by plotting expected rates of recidivism against the observed rates at every COMPAS decile score (with expected versus observed referred to herein as “E/O”). For each COMPAS scale, a single expected rate line was computed using Model 2 statistics for general and violent recidivism from Figures 3 and 4, respectively. The single expected rate line represents the expected recidivism rate at each decile score, regardless of group. Two lines of observed rates of recidivism are presented (one for Hispanics and the other for non-Hispanics). Figures 5 and 6 provide visual bar graphs of these E/O plots for general recidivism and violent recidivism, respectively.

& John Monahan, Evidence-Based Sentencing: Public Openness and Opposition to Using Gender, Age, and Race as Risk Factors for Recidivism, 40 LAW & HUM. BEHAV. 36, 37 (2016). 122 Richard Berk, Accuracy and Fairness for Juvenile Justice Risks Assessments 18-19 (2017), https://crim.sas.upenn.edu/sites/crim.sas.upenn.edu/files/Berk_FairJuvy_1.2.2018 .pdf.

Electronic copy available at: https://ssrn.com/abstract=3251763 Working Paper] The Biased Algorithm 23

Figure 5. E/O General Recidivism Plots by Ethnicity

Expected Versus Observed General Recidivism Rates 100.0%

80.0%

60.0%

40.0%

20.0%

0.0% 1 2 3 4 5 6 7 8 910

Hispanic Non-Hispanic Expected

The solid line represents expected recidivism rates by decile regardless of grouping. The observed plotting for general recidivism outcomes for non- Hispanics moderately tracks the single, expected rates slope. The observed plotting for Hispanics, though, does not present a similar shape and, indeed, fails to reflect a consistent linear increase in actual recidivism rates as decile scores rise. This plotting visually represents differential prediction for COMPAS in the general recidivism scale. Figure 5 also graphically depicts accuracy errors for Hispanics.

Electronic copy available at: https://ssrn.com/abstract=3251763 24 56 AM. CRIM L. REV. (forthcoming 2019)

Figure 6. E/O Violent Recidivism Plots by Ethnicity

Expected Versus Observed Violent Recivism Rates 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% 1 2 3 4 5 6 7 8 910

Hispanic Non-Hispanic Predicted

The plotting in Figure 6 follows the same patterns just discussed for general recidivism rates. Though, the discrepancies are more dramatic. Actual recidivism rates for Hispanics oddly decline from decile 2 to a 0% recidivism rate at decile 7, then increase substantially in deciles 8 and 9, before plummeting back to 0% in decile 10. Hence, COMPAS violent decile score does not have a linear relationship to violent recidivism for Hispanics, thus undermining its performance for this group. This visual plotting also confirms differential prediction for COMPAS in its violent recidivism scale regarding Hispanic ethnicity. So far, comparisons of risk bin outcomes, AUCs, and correlation coefficients reflect differential validity, while logistic regressions and E/O plots present differential prediction for COMPAS. Some argue that these are not the only appropriate measures of a test’s ability to classify individuals, to predict risk, or to reflect unfairness.123 E. Mean Score Differences It was shown earlier that average COMPAS decile scores on the general and violent recidivism scales significantly varied by Hispanic ethnicity. Yet it is too simplistic to assert that mean score differences signify test bias (even though it violates the algorithmic fairness notion of statistical parity124) as

123 Richard W. Elwood, Calculating Probability in Sex Offender Risk Assessment, 62 INT’L J. OFFENDER THERAPY & COMP. CRIMINOLOGY 1262, 1264 (2018). 124 Jon Kleinberg et al., Inherent Trade-Offs in the Fair Determination of Risk Scores, Conf. on Innovations in Theoretical Computer Science 11 (2017), https://arxiv.org/abs/1609.05807.

Electronic copy available at: https://ssrn.com/abstract=3251763 Working Paper] The Biased Algorithm 25 there may be some risk-relevant differences between groups.125 Still, closely- related fairness indices are indicative of differential calibration—group variances in calibration—and potentially disparate impact. The algorithmic fairness concept of “balance for the positive class” refers to requiring that the mean test score for those in the positive class—here, meaning recidivists— be the same across groups.126 Correspondingly, “balance for the negative class” requires equal mean test score for those in the negative class—i.e., non-recidivists. A test is well-calibrated with respect to these definitions of algorithmic fairness then if a particular score is associated with the same likelihood of the outcome occurring, regardless of group membership.127 Figure 7 presents mean COMPAS scores comparing recidivists and non- recidivists on both the general and violent recidivism scales. Figure 7. Mean Decile Scores General Recidivism Scale Violence Recidivism Scale Recidivists Non-Recidivists Recidivists Non-Recidivists Hispanic 4.17 2.92*** 3.54 2.54** Non-Hispanic 5.65 3.53*** 5.03 2.98*** Notes: * p<.05, ** p<.01, *** p<.001. All four comparisons of mean decile score differences between groups are statistically significant at the level of p=.005 or below. Mean scores for recidivist and non-recidivist Hispanics are substantially lower than non- Hispanics. Practical differences exist as well. Mean scores for Hispanics exhibit much less variance between the recidivists and non-recidivists in both scales, being approximately one decile apart. The means for the non- Hispanics are at least two deciles apart, thus signifying greater mean score differences that distinguish non-Hispanic recidivists from non-recidivists in scoring to a greater degree. Overall, the metrics in Figure 7 signify a failure in balances for the positive and negative classes and thus again indicate algorithmic unfairness and disparate impact of COMPAS scoring based on Hispanic ethnicity. F. Classification Errors Additional computations measuring algorithmic accuracy and fairness are popular in the behavioral sciences and machine learning literatures. A few of

125 Russell T. Warne et al., Exploring the Various Interpretations of “Test Bias,” 20 CULTURAL DIVERSITY & ETHNIC MINORITY PSYCHOL. 570, 572 (2014). 126 Jon Kleinberg et al., Inherent Trade-Offs in the Fair Determination of Risk Scores, Conf. on Innovations in Theoretical Computer Science 2 (2017), https://arxiv.org/abs/1609.05807 (emphasis in original). 127 Alexandria Chouldechova, Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments, 5 BIG DATA 153, 154 (2017).

Electronic copy available at: https://ssrn.com/abstract=3251763 26 56 AM. CRIM L. REV. (forthcoming 2019) these can provide further insight into possible disparate impact which can occur when accuracy and error rates between groups are unequal.128 The first two are the true positive rate (TPR) and the true negative rate (TNR), representing a high risk and a low risk discrimination metric, respectively.129 The TPR is alternatively titled sensitivity in the field of statistics and represents the nonerror rate for the recidivists.130 The TNR is alternatively titled specificity and represents the nonerror rate for the non- recidivists.131 The TPR and TNR are retrospective in nature. Two metrics which more appropriately measure prospective predictive accuracy, and thereby are more important to practitioners who are interested in the predictive validity of risk tools, are the positive predictive value (PPV) and negative predictive value (NPV).132 The PPV represents the probability that a high risk score will be correct, i.e., the proportion of high risk predictions who were recidivists.133 The NPV then is the proportion of those classified as low risk who did not recidivate. The PPV is a high risk calibration measure while the NPV is a low risk calibration measure.134 These calculations require that the sample be divided into two groupings: one representing individuals predicted to be recidivists and the other non- recividivists. Regarding COMPAS, researchers typically opt to lump together COMPAS’ low and medium risk bins into one group, merge the medium and high risk bins together, or for better measure offer both.135 Figures 8 and 9 include the two different cutpoints for dichotomizing the COMPAS general and violent recidivism scales, respectively.

128 Geoff Pleiss et al., On Fairness and Calibration 2, arxiv.org/pdf/1709.02012.pdf. 129 Jay P. Singh, Predictive Validity Performance Indicators in Violence Risk Assessment: A Methodological Primer, 31 BEHAV. SCI. & L. 8, 9 (2013). 130 Kristian Linnet et al., Quantifying the Accuracy of a Diagnostic Test or Marker, 58 CLINICAL CHEMISTRY 1292, 1296 (2012). 131 Id. 132 Jay P. Singh, Predictive Validity Performance Indicators for Violence Risk Assessment: A Methodological Primer, 31 BEHAV. SCI. & L. 8, 12 (2013). 133 ROBERT H. RIFFENBURGH, STATISTICS IN MEDICINE 254 tbl. 15.13 (2013). 134 Jay P. Singh, Predictive Validity Performance Indicators in Violence Risk Assessment: A Methodological Primer, 31 BEHAV. SCI. & L. 8, 11 fig. 1 (2013). 135 Anthony W. Flores et al., False Positives, False Negatives, and False Analyses, 80 FED. PROBATION 38, 42 (2016).

Electronic copy available at: https://ssrn.com/abstract=3251763 Working Paper] The Biased Algorithm 27

Figure 8. Measures for General Recidivism Low v. Medium/High Low/Medium v. High Measure Hispanic Non-Hispanic Hispanic Non-Hispanic Discrimination TPR .42 .63*** .14 .31*** TNR .81 .69*** .94 .91 Calibration PPV .56 .63** .57 .75*** NPV .70 .68 .65 .61 Notes: * p<.05, ** p<.01, *** p<.001. In Figure 8, five of the eight comparisons are statistically significant, again confirming differential validity and differential prediction. With respect to the True Positive Rate, the tool is significantly better at discriminating the recidivists from the non-recidivists for non-Hispanics at both cutpoints. Indeed, the general recidivism TPRs for non-Hispanics are approximately 50% and 120% better than for Hispanics. The low TPRs for Hispanics are actually quite poor from quantitative and qualitative perspectives. For example, at the higher cutpoint (low/medium versus high) the TPR is only 14%, meaning the test is wrong at least eight out of ten times for Hispanics. Likewise, the PPV differences show that COMPAS is a better predictor of recidivism for non-Hispanics at both cutpoints. For example, at the higher cutpoint, of those that were predicted at high risk of general recidivism, 57% of Hispanics were reoffenders while 75% of non-Hispanics were correctly classified. The PPVs for Hispanics at both cutpoints are also considered poor, and indicate overprediction.136 Interestingly one need not here choose between the preference of ProPublica for the TPR or Northpointe (the owner of COMPAS) regarding the PPV, or which cutpoint to use. COMPAS on any of those measures performs far worse at predicting recidivists for Hispanics. The results thus suggest algorithmic unfairness. In terms of identifying non-recidivists, one of the four measures is statistically significant. The tool performs modestly better at identifying non- recidivists among Hispanics.

136 See James C. DiPerna et al., Broadband Screening of Academic and Social Behavior, in UNIVERSAL SCREENING IN EDUCATIONAL SETTINGS 223, 235, 239 (R.J. Kettler et al. eds., 2014).

Electronic copy available at: https://ssrn.com/abstract=3251763 28 56 AM. CRIM L. REV. (forthcoming 2019)

Figure 9. Measures for Violent Recidivism Low v. Medium/High Low/Medium v. High Measure Hispanic Non-Hispanic Hispanic Non-Hispanic Discrimination TPR .29 .54*** .14 .21** TNR .81 .77* .98 .96 Calibration PPV .14 .32*** .45 .49 NPV .91 .89 .91 .86** Notes: * p<.05, ** p<.01, *** p<.001. Similar results are shown for violent recidivism in Figure 9. COMPAS generally performs much better from discrimination and calibration perspectives on classifying violent recidivists for non-Hispanics, with one caveat of the PPV at the higher cutoff where the PPV was still higher for non- Hispanics but not to a statistically significant degree. The violent recidivism TPRs for non-Hispanics are 86% and 50% better at the lower and higher cutpoints, respectively. The low PPVs for Hispanics again support the tendency of the tool to overclassify them. At the lower cutpoint, the predictions of violent recidivism for Hispanics were wrong at least 8 out of 10 times. Then, like the general recidivism scale, the TPRs and PPVs at both cutpoints indicate COMPAS achieves weaker accuracy at selecting recidivists for Hispanics. One need not choose between the TPR or PPV here either: either definition of algorithmic fairness indicates unfairness to Hispanics. Like the general recidivism scale, the tool performs modestly better at selecting violent non-recidivists among Hispanics.

G. Limitations Several limitations of this study should be mentioned. The single site limits generalization of results. This study relied upon archival data, and it is thereby possible for there to have been systematic errors in data collection that are not observable on secondary data analysis. Recidivism outcomes were from official records and thus will not include undetected crimes. The dataset did not include interrater reliability scores that would confirm the dependability of COMPAS scoring across evaluators and over time. Then, it would have been preferable to control for aspects of supervision as pretrial services/conditions may moderate reoffending rates, but such an option is also not available in this secondary data analysis.

Electronic copy available at: https://ssrn.com/abstract=3251763 Working Paper] The Biased Algorithm 29

V. CONCLUSIONS Algorithmic risk assessment holds promise in informing decisions that can reduce mass incarceration by releasing more prisoners through risk-based selections that consider public safety. Yet caution is in order whereby presumptions of transparency, objectivity, and fairness of the algorithmic process may be unwarranted. Calls from those who heed such caution for third party audits of risk tools led to the study presented herein. This study rather uniquely focused on the potential of unfairness for Hispanics. Using multiple definitions of algorithmic unfairness, results consistently showed that COMPAS, a popular risk tool, is not well calibrated for Hispanics. The statistics presented evidence of differential validity and differential predictive ability based on Hispanic ethnicity. The tool fails to accurately predict actual outcomes in a linear manner and overpredicts risk for Hispanics. Overall, there is cumulative evidence of disparate impact. It appears quite likely that factors extraneous to those scored by the COMPAS risk scales related to cultural differences account for these results.137 This information should inform officials that greater care should be taken to ensure that proper validation studies are undertaken to confirm that any algorithmic risk tool used is fair for its intended population and subpopulations.

137 See Adam W. Meade & Michael Fetzer, Test Bias, Differential Prediction, and a Revised Approach for Determining the Suitability of a Predictor in a Selection Context, 12 ORG. RES. METHODS 738, 741 (2009).

Electronic copy available at: https://ssrn.com/abstract=3251763 BY MEGAN T. STEVENSON AND JENNIFER L. DOLEAC November 2018 TABLE OF CONTENTS

02 | SUMMARY 03 | BACKGROUND 05 | ANALYSIS AND FINDINGS 05 | KENTUCKY 06 | VIRGINIA 08 | DISCUSSION 10 | ATTRIBUTIONS 11 | SOURCES summary

Algorithmic risk assessment tools have become a popular element of criminal justice reforms, often with the explicit goal of reducing incarceration rates. The hope is that these data-driven tools will standardize decisions about pretrial detention and sentencing, and ensure that only the most high-risk offenders are incarcerated. However, the effects of these tools depend crucially on how judges use them.

This brief considers judicial reforms in Kentucky and Virginia as case studies of the effects of algorithmic risk assessment tools in practice. We show that, in both states, reforms aimed at reducing incarceration for low- risk offenders had little to no impact on incarceration rates. While these tools clearly recommended less incarceration for a large share of defendants, they had little effect on judges’ incarceration decisions. However, there is tremendous variation across judges in how closely they follow the risk assessment recommendations. We discuss what these results mean in terms of how to affect meaningful criminal justice reform, including ways citizens and policymakers can align judicial incentives so that reforms have their intended effects.

2 KEY FINDINGS: designed by estimating the correlation between “inputs” —criminal history, education, employment, age, gender, etc. — and the outcome that the tool is trying to predict. • If Kentucky judges had followed the recommendations These statistical correlations will then be converted into associated with the risk assessment in all cases, the an algorithm, in which points are added or subtracted pretrial release rate among low and moderate risk from the total risk score based on the input values. A defendants would have jumped up by 37 percentage person will be classified as “high risk” if their total points points after risk assessment was made mandatory. exceed a certain cutoff. Instead, the pretrial release rate for low and moderate risk defendants increased by only four percentage Proponents of risk assessment have argued that they points, from 63% to 67%. will help to lower incarceration rates with minimal effect on public safety. In fact, jurisdictions often adopt • The median judge in Kentucky grants release without risk assessment with the stated goal of lowering jail monetary bail to only 37% of defendants with low or and prison populations. The risk evaluations are paired moderate risk status. In other words, the median judge with explicit action recommendations: for example, a overrules the presumptive default associated with the jurisdiction might recommend release or diversion for risk assessment about 2/3 of the time. all defendants who are ranked low or moderate risk. However, the final decision is usually left up to the judge. • If judges had followed the recommendations associated with the risk assessment, there would have been up Risk-assessment-based criminal justice reform has not to a 25% increase in the diversion rate among risk- always had the intended impact. Judges vary widely in assessment-eligible offenders after risk assessment their propensity to release low-risk defendants. Some was adopted statewide. Instead, the diversion rate follow the recommendations associated with the risk remained virtually unchanged, dipping just slightly from assessment tool most of the time; others only rarely. As 34% to 33%. a result, risk-assessment based reforms that are intended • The median judge in Virginia diverts only about 40% of to lower incarceration rates may have little to no effect. those who were recommended for diversion by the risk assessment instrument. Background: Algorithmic Risk Assessment as Criminal “Proponents of risk Justice Reform assessment have argued On August 28, 2018, California’s governor signed a that they will help to major criminal justice reform bill into law. This bill will replace monetary bail with a system in which pretrial lower incarceration rates custody is determined partially by an algorithmic evaluation of the likelihood that a defendant will commit with minimal effect on future crime or fail to appear in court. California is following in the footsteps of many other jurisdictions that have implemented criminal justice reform by way of public safety.” algorithmic risk evaluations. In recent years, pretrial risk assessment has been adopted in dozens of jurisdictions and at least six entire states.1 Risk assessment is used in sentencing in at least twenty-eight full states; at least seven additional states have at least one county that uses risk assessment tools at sentencing.2 It is also widely used in determining whether a prisoner will be released on parole, in determining supervision levels for probation, and in selecting prison security level.

Risk assessment algorithms are designed to predict the likelihood of committing new crime or failing to appear in court. Built by data scientists, risk assessments are

3 Background: State Court Judicial Selection

Judges are people. They come with their own beliefs short terms for most trial court judges, mean that judges about what kind of criminal justice response is are routinely confronted with the reality that voters or appropriate in each particular circumstance. But being legislators will be passing judgement on their judicial a judge is also a job. Understanding judicial incentives records. The incentive to avoid being characterized as requires understanding the process by which judges are “soft on crime” – and thus at risk of losing their position hired, retained, and promoted. —is powerful.

Almost all state court judges must obtain and keep The states that are the subject of this report – Kentucky their positions by means of highly-politicized election and Virginia – employ systems of judicial selection and and appointment schemes. This is very different from retention which have the potential to expose judges to the federal judicial system, in which, after judges are significant political pressures relating to criminal justice nominated by the President and confirmed by the Senate, issues. In both states the general jurisdiction trial courts they become largely insulated from political pressure by with jurisdiction over felonies are called Circuit Courts. lifetime terms. Even after gaining office through election Under Kentucky law, Circuit Court judges obtain and or appointment, almost all state court judges must retain office by running in nonpartisan elections, except routinely seek re-election or re-appointment on a regular in cases of filling an unexpired term, in which case they basis. Trial court judges, who decide criminal cases, are are appointed by the Governor. They serve eight-year often especially exposed to these political pressures. terms.3 In Virginia, Circuit Court judges are appointed by the legislature, and also serve eight-year terms, at the Twenty-nine states elect their trial court judges, while conclusion of which they may be re-appointed two use a system of legislative appointment and re- or replaced.4 appointment. Such systems, combined with relatively

Figure 1. Judicial Selection Map

Nonpartisan Elections (20)

Gubernatorial Appointment (9)

Missouri Plan/ Merit Selection (9)

Partisan Elections (9)

Hybrid (2)

Legislative Appointment (2)

Source: Brennan Center for Justice.

4 Analysis and Findings Kentucky Here we analyze the impacts of risk-assessment based Pretrial risk assessment tools had long been available in criminal justice reforms in two different contexts: Kentucky, but they were not consistently used. In 2011, determining pretrial custody and bail, and determining Kentucky adopted a law making the risk assessment a sentences. We analyze the impacts of pretrial risk mandatory part of determining bail and pretrial release.7 assessment in Kentucky, and the impacts of sentencing Furthermore, they adopted explicit recommendations for risk assessment in Virginia. These locations were selected how the risk evaluation should influence pretrial custody. for several reasons. First, their reforms occurred at Kentucky statute declared that the presumptive default the statewide level instead of at the county level. This for all defendants ranked low and moderate risk would ensures that there is a large number of observations on be immediate release without secured financial bond.8 which to base our analysis. This also allows us to identify However, the final decision was left up to the judges. cross- county and cross-judge variation in compliance with the reform efforts. If Kentucky judges had followed the recommendations associated with the risk assessment in all cases, the Second, these states were both early adopters of pretrial release rate among low and moderate risk algorithmic tools to aid in criminal justice decision- defendants would have jumped up by 37 percentage making. Kentucky has used some sort of algorithm to points after risk assessment was made mandatory. aid in the pretrial custody decision since the 1970s, and Instead, the pretrial release rate for low and moderate made its use mandatory in 2011.5 Virginia is one of the risk defendants increased by only four percentage points, first states to incorporate algorithmic risk assessment from 63% to 67%. Furthermore, this change was not into sentencing; they piloted risk assessment in the late permanent. Within a couple of years the pretrial release 1990’s and adopted it statewide in 2002.6 rate had fallen even lower than it was before the reform.9

Virginia and Kentucky were also selected because These results can be seen in Figure 1, which plots the they adopted risk assessment with the explicit goal of average pretrial release rate on the vertical axis against reducing jail and prison populations. Kentucky adopted the date of arrest on the horizontal axis. The reform bill risk assessment after a decade of substantial growth in was passed in February of 2011, at the dashed vertical incarceration. By 2010, the jail and prison population line, and implemented in June of 2011, at the solid imposed a significant burden on the state budget. vertical line. Risk assessment was proposed as a “smart on crime” method of countering this growth. Virginia adopted risk assessment after a major “truth in sentencing” reform that increased sentences for violent offenders and mandated that felony offenders would serve at least Figure 1. Kentucky’s Pretrial Release Rate Before and After Reform 85% of their sentence. Concerned that this reform led to over-crowded jails and prisons, the Virginia legislature adopted risk assessment as a release valve. They asked Risk assessment made mandatory the Virginia Sentencing Commission to design a risk 1.0 assessment tool that could identify 25% of the lowest risk 0.9 nonviolent offenders for diversion from jail or prison. 0.8

Kentucky and Virginia thus provide interesting case 0.7 studies in how risk-assessment-based criminal justice reform affects incarceration rates. While each 0.6 jurisdiction will have its own unique experience, evidence 0.5 from Kentucky and Virginia shows that risk-assessment- 0.4 based reform may not have the intended effect. Pretrial Release Rate

0.3

0.2

0.1

0

2010 2011 2012 2013 2014 2015 Booking Date

5 Kentucky’s judges vary widely in the extent to which they follow the recommendations associated with the Figure 2: Wide Variance Across Kentucky risk assessment. Some judges grant non-financial release Judges in the Fraction of Low/Moderate (release without secured monetary bond) for more Risk Defendants who are Granted Non- than 70% of low and moderate risk defendants. Others Financial Release virtually never grant non-financial release. This variation can be seen in Figure 2. Each spike in the figure shows the average non-financial release rate for a particular judge. The judges are ordered from the most strict on the left to the most lenient on the right. The median judge, represented by the red vertical line, grants non- financial release to only 37% of low and moderate risk defendants. In other words, the median judge overrules the presumptive default about 2/3 of the time. VIRGINIA

In 2002, Virginia adopted a risk assessment tool with the goal of identifying the 25% lowest-risk non-violent offenders for diversion from prison or jail.10 In this context, diversion can mean one of two things. For defendants who were recommended to be placed in prison by the sentence guidelines, diversion means either probation, community sanctions, or a shorter jail sentence. For defendants who were recommended to be placed in jail by the sentence guidelines, diversion means being placed on probation or community Median sanctions instead. Judge

Figure 3: No Change in the Diversion Rate for Virginia’s Risk-Assessment Eligible Offenders as a Result of Risk Assessment

Risk assessment adopted statewide in FY 2003 1.0

0.9

0.8

0.7

0.6

0.5

0.4 Diversion Rate 0.3

0.2 0 0.2 0.4 0.6 0.8

0.1 Non-Financial Release Rate 0

2000 2005 2010 2015 Fiscal Year

6 Use of the nonviolent risk assessment tool was limited to defendants convicted of a drug, larceny or fraud offense. A prior or concurrent violent conviction rendered them Figure 4. No Change in the Incarceration Rate Among ineligible for risk assessment. Furthermore, since the Risk-Assessment-Eligible Cases After Risk Assessment tool was designed for diversion, it was only used on is Adopted Statewide defendants who were recommended for prison or jail by the sentence guidelines. The incarceration rate also Risk assessment adopted statewide in FY 2003 remained virtually unchanged, increasing just slightly 1.0 from 81% to 82% (Figure 4).12 0.9

As in Kentucky, Virginia judges vary widely in their 0.8 propensity to follow the recommendations associated with the risk assessment. This is seen in Figure 5, in 0.7 which each vertical spike shows the fraction of low-risk 0.6 offenders who were granted diversion by a particular judge. The strictest judges are to the left of the graph; 0.5 they divert only around 20% of low-risk offenders. The 0.4 more lenient judges, shown on the right of the graph, divert almost 70%. The median judge diverts only about 0.3 Incarceration Rate 40% of those who were recommended for diversion by the risk assessment instrument. (Some judges may 0.2 prefer to be more lenient but feel that their jurisdiction 0.1 lacks sufficient alternatives to incarceration.13 However diversion rates remain low even in many of Virginia’s 0 most populous counties.) 2000 2005 2010 2015 Fiscal Year

Figure 5. Wide Variance in Virginia Judges’ Propensity to Divert Low-Risk, Non-Violent Offenders from Jail or Prison.

Median 0.7

0.6

0.5

0.4 Diversion Rate 0.3

0.2 Judge

7 There is also significant cross-county variation in adherence to the risk assessment. This is shown in Figure 6, which shows the per-county diversion rate for low-risk defendants in Virginia. There is no consistent regional pattern, nor does this diversion rate correlate strongly with population density.14

Figure 6. Diversion Rate for Low-Risk Defendants by County

.15 – .26

.27 – .37

.38 – .49

.5 – .59

.6 – .72

Too Few Observations

Discussion

This evidence from Kentucky and Virginia contributes to not even notice when they lock up someone who is not a growing literature on judicial decision making. In this a threat. As long as they are accountable to voters who section we review the prior literature and discuss how it behave in this way, judges will over-incarcerate – especially might inform our empirical findings. in cases where there is substantial media coverage.15 In addition, judges’ sentencing behavior changes at different Judges come to the bench with their own preferences points in the election cycle. In particular, they tend to opt about sentencing, ranging from lenient to harsh in for harsher sentences when they are nearing reelection, to particular types of cases or with particular types of satisfy an electorate that (surveys suggest) tends to favor defendants. This variation is natural but undesirable in harsher sentences.16 This variation in behavior over time a system where we hope the law provides an objective is problematic because it means defendants’ outcomes standard and that a defendant’s outcome should not depend on when their case is heard. However, it shows depend on which judge hears his case. While they have that voters can influence judges’ sentencing decisions their own preferences, judges also respond to incentives. when they pay attention. Community oversight will be Judges know that some mistakes are costlier than others: more effective – and more fair to defendants – if it It is commonly believed that voters will punish a judge for occurs throughout a judge’s term, not only during releasing someone who then commits a crime, but may election seasons.

8 “It is commonly believed that voters will punish a judge for releasing someone who then commits a crime, but may not even notice when they lock up someone who is not a threat.”

Legislatures have some tools at their disposal to shape That’s not to say that legislative approaches like the judges’ behavior. The way judges are selected to begin with adoption of risk assessment won’t ever result in lower is important. For instance, nonpartisan elected judges and incarceration rates. New Jersey adopted a comprehensive judges selected by merit commissions write higher-quality bail reform bill that included the implementation of risk judicial opinions than partisan elected judges, as measured assessment and jail populations plummeted. However, by future citations of those opinions.17 Giving commission- there are noteworthy differences between New Jersey selected judges stronger tenure (that is, renewing their and the states we analyze here. Among them is the fact terms via uncontested retention elections) improves that judges in New Jersey are appointed by the governor, the quality of their opinions even more, by reducing the and both Governor Christie and Governor Murphy have political pressure they face while on the bench.18 been champions of criminal justice reform. Judges face a different incentive structure when they are appointed Legislatively-imposed constraints on judicial discretion by someone who is supportive of reform, as opposed to – such as sentencing guidelines – can also limit judges’ gaining and retaining their jobs in a general election (as is responses to perceived political pressure.19 There is only the case in Kentucky) or via a legislature that likely is not so much harsher a judge can make sentences in the run-up as tightly focused on carrying through an agenda (as in to an election when there is an externally-determined Virginia). cap on sentence length. Similarly, mandating the use of algorithmic risk assessment scores could theoretically Empirical evidence shows that voters have tremendous mitigate the worst effects of political pressure by giving power to shape judges’ decisions, by paying attention to judges political cover to opt for more lenient sentences their behavior on the bench and voting based on whether when that is their preference and offenders are rated low- they like what they see. But it also shows that voters must risk. This may be particularly beneficial when the public or play a continuous and active role if they want to build a media exhibits bias based on race, socioeconomic status, criminal justice system that serves the best interests of or gender. their communities. Ultimately, policymakers interested in a less punitive and more consistent criminal justice As we show here, however, legislative approaches system may also need to consider reforming how judges are not always effective. Both Kentucky and Virginia are selected and kept in their jobs. So long as judges are adopted algorithmic risk assessments with the stated subject to politically-driven methods such as elections or goal of reducing incarceration for low-risk offenders, legislative appointment and retention, strong incentives but incarceration rates did not fall substantially in either will remain in place to be perceived as “tough on crime.” state. There is good reason to allow judicial discretion Until judges are freed from these political pressures, to override the recommendation of one-size-fits-all risk expecting them to adhere to decarceral reform initiatives scores, sentencing guidelines, and so on. But the existence may be little more than wishful thinking. of judicial discretion means that voters cannot assume that legislative fixes will achieve their stated goals.

9 ATTRIBUTION ABOUT THE AUTHORS

MEGAN T. STEVENSON Assistant Professor of Law at George Mason University

Megan T. Stevenson is an economist and legal scholar who is currently an Assistant Professor of Law at George Mason University. Her research uses econometric techniques to evaluate criminal law and policy in areas such as bail, pretrial detention, juvenile justice, misdemeanors, and risk assessment. Her studies have been published in top journals in both economics and law, such as the Stanford Law Review and the Review of Economics and Statistics. Her research has been funded by the National Science Foundation, the Russell Sage Foundation, and the Laura and John Arnold Foundation. Prior to joining the law faculty at George Mason, Professor Stevenson was a fellow at the Quattrone Center for the Fair Administration of Justice at the University of Pennsylvania Law School (2015-2017). She holds a BA in Interdisciplinary Studies (2009, with highest distinction) and a PhD in Agricultural and Resource Economics (2016), both from the University of California, Berkeley. She can be found on Twitter at @MeganTStevenson.

JENNIFER L. DOLEAC Associate Professor of Economics at Texas A&M University, and Director of the Justice Tech Lab

Jennifer L. Doleac is an Associate Professor of Economics at Texas A&M University, and Director of the Justice Tech Lab. She is also a Nonresident Fellow in Economic Studies at the Brookings Institution. Professor Doleac studies the economics of crime and discrimination, with a particular focus on the effects of technology on public safety. Past and current work addresses topics such as DNA databases, gun violence, and prisoner reentry. Her research has been supported by several governmental and philanthropic organizations, and published in leading academic journals including the Review of Economics and Statistics, the American Economic Journal: Applied Economics, and the Economic Journal. Professor Doleac’s work has also been highlighted in an array of media outlets, including the Washington Post, the Wall Street Journal, The Atlantic, The Guardian, and Time. She holds a Ph.D. in Economics from Stanford University, and a B.A. in Mathematics and Economics from Williams College. She tweets about criminal justice research and policy at @jenniferdoleac.

The American Constitution Society works to advance our democracy by promoting the law as a force to improve the lives of all people. Progress requires both ideas and the people to carry them through positions of leadership to be realized in legal precedent, policy, and practice. ACS works to shape the future of our democracy through building a pipeline and supporting networks of progressives legal minds, shaping the debate on the issues that dominate our national discussion, and providing a platform for progressive voices to advance the law as a force that protects fundamental rights.

10 Sources 1 A pretrial risk assessment tool designed by the Laura and John Arnold Foundation is used in over 40 jurisdictions, including three entire states. Pretrial Justice, Laura and John Arnold Foundation, http://www.arnoldfoundation.org/ initiative/criminal-justice/pretrial-justice/ (last visited Oct. 4, 2018). Pretrial risk assessment is also used statewide in Indiana and Ohio. Indiana Justice Model-Indiana Risk Factor Assessment & Case Plan Component, Ind. Dep’t of Corr., https:// www.in.gov/idoc/2900.htm (last visited Oct. 4, 2018); Ohio Risk Assessment System, Ohio Dep’t of Rehab. & Corr., http:// drc.ohio.gov/oras (last visited Oct. 4, 2018). The newly adopted State Bill 10 will implement pretrial risk assessment statewide in California. Alexei Koseff, Jerry Brown Signs Bill Eliminating Money Bail in California, Sacramento Bee (Aug. 29, 2018), https://www.sacbee.com/news/politics-government/capitol-alert/article217461380.html.

2 Risk assessment is used in sentencing statewide in the following states: Alabama. Ala. Code § 12-25-33(6); Arizona, Indiana, Kentucky, Michigan, Missouri, Ohio, Oklahoma, Utah, Virginia, Washington and West Virginia. See Sonja B. Starr, Evidence-Based Sentencing and the Scientific Rationalization of Discrimination, 66 Stan. L. Rev. 803 (2014), available at http://www.stanfordlawreview.org/wp-content/uploads/sites/3/2014/04/66_Stan_L_Rev_803-Starr.pdf; California, Florida, and Wisconsin. See Danielle Kehl, Priscilla Guo & Samuel Kessler, Harvard Law Sch., Algorithms in the Criminal Justice System: Assessing the Use of Risk Assessments in Sentencing (2017), available at https://dash. harvard.edu/bitstream/handle/1/33746041/201707_responsivecommunities_2.pdf?sequence=1&isAllowed=y; Colorado. Colo. Rev. Stat. Ann. § 16-11-102 (1)(b)(II); Hawaii and Illinois. See Tammy Howell, LSI-R, LS/RNR and LS/ CMI Documentation, Pub. Safety Div., Reduce Recidivism, Reduce Unnecessary Spending, Increase Public Safety, available at https://www.scstatehouse.gov/Archives/CitizensInterestPage/SentencingReformCommission/Miscellaneous/ exhibittoDanford100809presentationsLSIdocumentation.doc; Idaho, Nebraska, Oregon. See Jennifer K. Elek, Roger K. Warren & Ramela M. Casey, Nat’l Ctr. for State Courts, Using Risk and Needs Assessment Information at Sentencing: Observations from Ten Jurisdictions, available at https://www.ncsc.org/~/media/Microsites/Files/CSI/RNA%202015/ Final%20PEW%20Report%20updated%2010-5-15.ashx; Iowa. See Presentation to the Iowa Board of Corrections: Risk Assessments in Presentence Investigations (2011), available at http://justicereformconsortium.org/wpcontent/ uploads/2011/11/BOC_LSIR1.pdf; Kansas. Kan. S. Ct. R. 110B, available at http://www.kscourts.org/rules/District_Rules/ Rule%20110B.pdf; New York. See N.y. State, Div. of Criminal Justice Servs., New York Correctional Offender Management Profiling for Alternative Sanctions (NYCOMPAS) Risk and Needs Assessment Instrument (2015), available at http://www. criminaljustice.ny.gov/opca/pdfs/2015-5-NYCOMPAS-Guidance-August-4-2015.pdf; North Dakota. See N. D. Dep’t of Corr. And Rehab., 2011-2013 Biennial Report (2013), available at https://docr.nd.gov/sites/www/files/documents/ Biennial%20Report%20Archive/Biannual%20Report%202011-2013.pdf; Pennsylvania. See Ryan S. Meyers, Pa. Comm’n on Sentencing, Introducing Risk Assessment at Sentencing in Pennsylvania (2018), available at https://www.pccd.pa.gov/training/ Documents/Conferences%20and%20Training/Sentence%20Risk%20Assessment.pdf); Tennessee. Tenn. Code Ann. § 41- 1-412; Vermont. See State Of Vt., Agency of Human Servs., Dep’t of Corr., Pre-Sentence Investigation (PSI) Reports (2011), available at http://doc.vermont.gov/about/policies/rpd/correctional-services-301-550/335-350-district-offices-general/ copy_of_342-01-pre-sentence-investigation-psi-reports). Risk assessment is used at sentencing in at least one county in the following states: Arkansas. See Roger K. Warren, Ctr. For Sentencing Initiatives, State Judicial Branch Leadership In Sentencing and Corrections Reforms (2013), available at https://www.ncsc.org/~/media/Microsites/Files/CSI/State%20 Judicial%20Branch%20Leadership%20Brief%20csi.ashx; Louisiana. La. Rev. Stat. Ann. § 15:326(A); Maine and Texas. See Starr supra; Elek, Warren, & Casey supra; Minnesota. See Minn. Dep’t of Corr., Study of Evidence-Based Practices in Minnesota: 2011 Report to the Legislature (2011), available at https://mn.gov/doc/assets/12-10EBPreport_tcm1089-271698. pdf; New Mexico. See Bernalillo County, State of N.m., New Mexico 2nd Judicial District Criminal Justice Strategic Plan (2012) available at https://www.bernco.gov/uploads/FileLinks/33e0766212d24ba7a61e6d4557817134/Bernalillo_County_ Criminal_Justice_Strategic_Plan.pdf; North Carolina. See Howell supra.

3 Ky. Const. § 119.

4 Va. Const. art. VI, § 7.

5 See Megan Stevenson, Assessing Risk Assessment in Action, 103 Minn. L. Rev., (forthcoming 2018), available at https:// papers.ssrn.com/sol3/papers.cfm?abstract_id=3016088, for more information on Kentucky’s use of pretrial risk assessment.

6 See Brian J. Ostrom & Neil B. Kauder, The Evolution of Offender Risk Assessment in Virginia, 25 Fed. Sent’g Rep. 161 (2013), for more information on Virginia’s use of risk assessment at sentencing.

7 Public Safety and Offender Accountability Act, H.B. 463, 2011 Gen. Assemb., Reg. Sess. (Ky. 2011).

11 8 Ky. Rev. Stat. Ann. § 431.066(2) (instructing judges to consider the risk assessment when considering release and bail); Ky. Rev. Stat. Ann. § 431.066(3) (instructing release on unsecured bond or own recognizance for low risk defendants); Ky. Rev. Stat. Ann. § 431.066(4) (instructing release on unsecured bond or own recognizance for moderate risk defendants with possible supervision, monitoring or other conditions of release).

9 The data used for this analysis was provided by the Kentucky Administrative Office of the Courts. See Stevenson, supra note 3, for more detail.

10 See Ostrom & Kauder, supra note 4. 11 Judicial circuits that had participated in Virginia’s risk assessment pilot – and therefore had access to a nonviolent risk assessment prior to fiscal year 2003 — were not included in Figure 3 or Figure 4. Nor were they included in the diversion/incarceration statistics cited above.

12 The data used for this analysis was provided by the Virginia Sentencing Commission.

13 Brandon Garrett & John Monahan, Judging Risk, 2018 U. Va. Sch. L. Pub. L. & Legal Theory Res. Paper 44 (2018), available at https://papers.ssrn.com/sol3/papers.cfm?abstract id=3190403. 14 County identifiers are not available in the Kentucky data.

15 See Claire S.H. Lim, James M. Snyder Jr. & David Strömberg, The Judge, the Politician, and the Press: Newspaper Coverage and Criminal Sentencing Across Electoral Systems, 7 Am. Econ. J.: Applied Econ. 103 (2015).

16 See Herbert M. Kritzer, Impact of Judicial Elections on Judicial Decisions, 12 Ann. Rev. of L. and Soc. Science 353 (2016).

17 See Elliott Ash & W. Bentley MacLeod, The Performance of Elected Officials: Evidence from State Supreme Courts (Nat’l Bureau of Econ. Research, Working Paper No. 22071, 2016).

18 Id.

19 See Carlos Berdejo and Noam Yuchtman, Crime, Punishment, and Politics: An Analysis of Political Cycles in Criminal Sentencing, 95 Rev. of Econ. and Stat. 741 (2013).

12

DANGER AHEAD: RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM

John Logan Koepke & David G. Robinson*

Abstract In the last five years, legislators in all fifty states have made changes to their pretrial justice systems. Reform efforts aim to shrink jails by incarcerating fewer people— particularly poor, low-risk defendants and racial minorities. Many jurisdictions are embracing pretrial risk assessment instruments—statistical tools that use historical data to forecast which defendants can safely be released—as a centerpiece of reform. Now, many are questioning the extent to which pretrial risk assessment instruments actually serve reform goals. Existing scholarship and debate centers on how the instruments themselves may reinforce racial disparities and on how their opaque algorithms may frustrate due process interests. This Article highlights three underlying challenges that have yet to receive the attention they require. First, today’s risk assessment tools lead to what we term “zombie predictions.” That is, predictive models trained on data from older bail regimes are blind to the risk- reducing benefits of recent bail reforms. This may cause predictions that systematically overestimate risk. Second, “decision-making frameworks” that mediate the court system’s use of risk estimates embody crucial moral judgments, yet currently escape appropriate public scrutiny. Third, in the long-term, these tools risk giving an imprimatur of scientific objectivity to ill-defined concepts of “dangerousness,” may entrench the Supreme Court’s historically recent blessing of preventive detention for dangerousness, and could pave the

way for an increase in preventive detention. Pretrial risk assessment instruments, as they are currently built and used, cannot safely be assumed to support reformist goals of reducing incarceration and addressing racial and poverty-based inequities. This Article contends that system stakeholders who share those goals are best off focusing their reformist energies on other steps that can more directly promote decarceral changes and greater equity in pretrial justice. Where pretrial risk assessments remain in use, this Article proposes two vital steps that should be seen as

* Respectively, Senior Policy Analyst, Upturn; Managing Director and co-founder, Upturn, and Visiting Scientist, AI Policy and Practice Initiative, Cornell University College of Computing and Information Science. The authors are grateful to those who provided feedback and guidance on drafts of this Article, including Alvaro Bedoya, Robert Brauneis, Julie Cohen, Rebecca Crootof, Malcom Feeley, Brandon L. Garrett, William Isaac, John Monahan, Janet Haven, Daniel J. Hemel, Sarah Lustbader, Sandra Mayson, Paul Ohm, Andrew Selbst, Megan T. Stevenson, Annie J. Wang, and Rebecca Wexler; to participants in workshops at Georgetown University Law Center, the Information Society Project at Yale Law School, and the Data & Society Research Institute; and to many others who have helped shape and continue to stimulate our thinking, including Mark Houldin, Alec Karakatsanis, Kristian Lum, Hannah Sassaman, and, of course, all of our colleagues at Upturn. We are solely responsible for any remaining errors. Upturn receives substantial support from the Ford, MacArthur, and Open Society Foundations, as well as Luminate, part of The Omidyar Group. Both authors are co-directors, with John Monahan and Kristian Lum, of the MacArthur Foundation’s Pretrial Risk Management Project. The views expressed here are entirely our own.

1725

Electronic copy available at: https://ssrn.com/abstract=3041622

1726 WASHINGTON LAW REVIEW [Vol. 93:1725

minimally necessary to address the challenges surfaced. First, where they choose to embrace risk assessment, jurisdictions must carefully define what they wish to predict, gather and use local, recent data, and continuously update and calibrate any model on which they choose to rely, investing in a robust data infrastructure where necessary to meet these goals. Second, instruments and frameworks must be subject to strong, inclusive governance.

INTRODUCTION ...... 1727 A. The Story of Springfield ...... 1727 B. Bail, Reform, and Risk Assessment ...... 1729 I. AMERICA’S CONTESTED APPROACH TO BAIL ...... 1731 A. The Origins of American Bail ...... 1733 B. The Civil Rights Era: Fighting to End Poverty Jailing ..... 1735 C. The 1970s and 1980s: Reversing Course to Address “Dangerousness” ...... 1737 II. BAIL IN PRACTICE TODAY ...... 1743 A. Motivations for Reform ...... 1745 B. The Shape of Current Reforms: Away from Money, Toward Risk ...... 1746 III. THE CHALLENGES OF PRETRIAL RISK ASSESSMENT 1750 A. How Pretrial Risk Assessment Works ...... 1752 B. Zombie Predictions...... 1755 1. Today’s Predictions Follow Yesterday’s Patterns ...... 1756 2. Expanded Pretrial Services Will Change the True Risk of Failure to Appear ...... 1765 3. Replacing or Cabining Money Bail Could Reduce the True Risk of Rearrest for Those Released ...... 1769 4. Zombie Predictions May Dampen the Positive Effects of Other Reforms ...... 1771 C. Frameworks of Moral Judgment ...... 1776 1. Undemocratic Justice ...... 1778 D. Longer-term Dangers ...... 1780 1. Insulating Nebulous Concepts of “Dangerousness” From Scrutiny ...... 1781 2. The Expansion of Preventive Detention ...... 1782 3. Conceding Salerno ...... 1785 IV. ADDRESSING THE CHALLENGES ...... 1792 A. Pretrial Reform without Risk Assessment ...... 1792 B. Where Risk Assessment Is Used: Relevant, Timely Data 1793 1. Risk Assessment Tools Should Always Rely on Recent, Local Data ...... 1793 2. Regularly Compare Predictions Against Outcomes ... 1795 3. Focus on the Risks that Matter Most ...... 1797 C. Where Risk Assessment Is Used: Strong, Inclusive Governance ...... 1800 1. Risk Assessments and Frameworks Should Not Recommend Detention ...... 1800

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1727

2. Risk Assessments and Frameworks Must Be Public .. 1801 3. Community Oversight of the Tools and Frameworks Is Essential ...... 1805 CONCLUSION ...... 1806

INTRODUCTION

A. The Story of Springfield

Springfield County1 has an open secret: It keeps many low-income people in jail who could safely be released. Most of these people are people of color. Two years ago, policymakers in Springfield joined a national trend, and approved a suite of reforms that they hoped would dramatically reduce the jail population. Their goal was to shrink the jail by detaining only those few defendants who were too risky to safely release pending trial, while sparing most of the large, low-income accused population from the life-altering consequences of even a short jail stay. As a central focus of its reforms, Springfield adopted a popular “pretrial risk assessment tool”—a statistical tool that uses historical data to forecast and advise judges which defendants can be safely released at arraignment.2 County leaders and local advocates believed that the tool itself, by providing objective information about the low risk posed by most defendants, would lead judges to release more defendants, reduce existing racial disparities in the pretrial jail population, and reduce rearrests and missed court dates among those released. Buoyed partly by their optimism about the new digital tool, policymakers also approved several other changes that strengthened alternatives to pretrial incarceration, including drug counseling, text message reminders of upcoming court dates, and ankle-worn GPS monitors that could be mandated in lieu of incarceration. Two years after reform was introduced, the situation remains disappointing. Springfield’s average pretrial jail population has declined slightly, but the sweeping change supporters first imagined has not arrived. Moreover, the number of defendants remanded to custody with no offer of bail actually increased. And the number of individuals with non- financial conditions of release, especially GPS monitoring, has

1. An imaginary—but otherwise typical—U.S. jurisdiction. 2. How pretrial risk assessment tools are developed and work in practice is detailed in section III.A.

Electronic copy available at: https://ssrn.com/abstract=3041622

1728 WASHINGTON LAW REVIEW [Vol. 93:1725 skyrocketed. Rearrest rates remain the same, while failure to appear rates have risen slightly. And judges reject the risk assessment tool’s recommendations in nearly half of cases—nearly always in a more punitive direction relative to what the tool recommended.3 In the wake of what was supposed to be landmark reform, local leaders are left asking: What happened? Several answers are apparent. First, Springfield’s pretrial risk assessment tool has a numbers problem. The tool’s predictions are based on historic patterns in the combined data of many other jurisdictions—places that often had higher crime rates than Springfield and that offered little-to-nothing in the way of pretrial services during the periods when the data was generated. The tool’s predictions are neither tailored to Springfield nor updated to reflect the impact of Springfield’s latest reforms. Taken together, these discrepancies have the perverse effect of not only making the accused in Springfield appear more likely to be rearrested or fail to appear to court than they truly are, but also of continuing patterns of unneeded jailing and encouraging overuse of restrictive conditions, like new GPS monitors. Second, there is a moral question at the heart of every risk assessment tool—namely, how to balance various risks with the liberty and due process interests of the accused—and Springfield’s leaders largely ignored that question. The “decision-making framework,” which converts raw risk assessment scores into proposed conditions of release, was treated as an afterthought, and in the end, the framework recommended far more defendants for burdensome conditions or detention than the reform’s architects anticipated. As a result, the new tool’s recommendations, even if strictly adhered to by judges, would not have done much to help Springfield achieve its stated goal of decarceration. Third, the pretrial risk assessment tool was opaque, with vital information unavailable. Defendants, lawyers, and judges are not able to understand what factors led an individual to be forecast as a high risk of rearrest, nor could they learn precisely what data had been used, and in what way, to create the tool. Even if they had wanted to, Springfield officials would not have been able to update the system’s underlying model with new pretrial release data.

3. Throughout the Article, we use the term “judges” for the actors responsible for making pretrial decisions. Across the country, this title varies. We use “judge” for simplicity’s sake.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1729

B. Bail, Reform, and Risk Assessment

The parable of Springfield is fictional but holds important lessons. For nearly a century, scholars have charted how the American system of money bail needlessly jails low-income defendants, often derailing their lives, simply because they cannot afford to pay for release.4 Today, recognizing this problem, legislators and courts in many jurisdictions are exploring a wide range of reforms that replace or cabin money bail, aiming to provide reasonable assurance of the defendant’s reappearance at trial and the protection of public safety without financial conditions. Reform steps are varied, and include drug diversion programs, GPS monitoring of people who are released, and pretrial services, such as reminding defendants of upcoming court dates.5 One popular reform, addressed in at least twenty laws in fourteen states since 2012, is the introduction of statistical risk assessment tools.6 Such tools use historical data to describe how often defendants similar to the current one failed to appear for a court date, or were rearrested pending resolution of their cases. Until recently, risk assessments were widely portrayed as progressive tools that will help shrink jails by releasing indigent, low-risk defendants who could not afford to pay money bail. Last year, the National Association of Counties even called on local officials to adopt risk assessment tools, and a cohort of prominent public defense and criminal defense groups called for “the use of validated pretrial risk assessment in all jurisdictions, as a necessary

4. See generally ARTHUR BEELEY, THE BAIL SYSTEM IN CHICAGO 160 (1927) (documenting the inequities of the bail system, and also finding that “[t]he present system . . . neither guarantees security to society nor safeguards the rights of the accused”). 5. See NAT’L CONFERENCE OF STATE LEGISLATURES, TRENDS IN PRETRIAL RELEASE: STATE LEGISLATION 1–2 (2018) [hereinafter TRENDS IN PRETRIAL RELEASE 2018], http://www.ncsl.org/portals/1/ImageLibrary/WebImages/Criminal%20Justice/pretrialEnactments_2 017_v03.pdf [https://perma.cc/CUQ5-V7ZP]. 6. AMBER WIDGERY, NAT’L CONFERENCE OF STATE LEGISLATURES. TRENDS IN PRETRIAL RELEASE: STATE LEGISLATION 1 (2015) [hereinafter TRENDS IN PRETRIAL RELEASE 2015], http://www.ncsl.org/portals/1/ImageLibrary/WebImages/Criminal%20Justice/NCSL%20pretrialTre nds_v05.pdf [https://perma.cc/P2YD-QW5W] (noting that “[f]rom 2012 to 2014, 261 new laws in 47 states addressed pretrial policy,” to show that the pace of legislation related to pretrial policy is also rapid). From 2012 to 2017, every state enacted a new pretrial policy—there were 500 new enactments, with 122 new enactments alone in forty-two states in 2015. See NAT’L CONFERENCE OF STATE LEGISLATURES, TRENDS IN PRETRIAL RELEASE: STATE LEGISLATION UPDATE 1–2 (2017) [hereinafter TRENDS IN PRETRIAL RELEASE 2017], https://comm.ncsl.org/productfiles/98120201 /NCSL-Pretrial-Trends-Update.pdf [https://perma.cc/HAN8-QX6N].

Electronic copy available at: https://ssrn.com/abstract=3041622

1730 WASHINGTON LAW REVIEW [Vol. 93:1725 component of a fair pretrial release system.”7 This Article offers a different view. Part I describes how America’s approach to money bail arose, the injustices that American bail practices have always entailed, and the challenges and reversals earlier reform efforts faced. Part II describes today’s reform efforts, which are motivated by the enormous human and social costs that current bail regimes impose on the accused, their families, and their communities. Part III explains the current practice of pretrial risk assessment and describes three underlying challenges that have yet to receive the attention they require. First, today’s risk assessment tools will likely lead to what we term “zombie predictions,” where old data, reflecting outdated bail practices, is newly reanimated through statistical prediction that is blind to the benefits of local or recent reforms. Jurisdictions often do not measure the changing landscape of actual risks their defendants face, let alone update their forecasts of risk to reflect that changing landscape. This Article argues that these issues lead many instruments to systematically overestimate risk, and it reviews early empirical evidence that suggests overestimation may already be occurring. Second, the “decision-making frameworks” that mediate the court system’s understanding and use of risk estimates embody crucial moral judgments and shape the impact of risk assessment tools on incarceration levels, yet currently escape broad public input and scrutiny. Harsh frameworks may undercut the apparent impact of other pretrial reforms, particularly when combined with exaggerated estimates of risk. Third, the embrace of pretrial risk assessment instruments exposes longer-term doctrinal and policy risks for advocates of bail reform. Specifically, these new tools risk giving an imprimatur of scientific objectivity to ill-defined concepts of “dangerousness,” and may entrench the Supreme Court’s recent blessing of preventive detention for dangerousness—which, in turn, could pave the way for a possible increase in preventive detention. Part IV proposes two vital steps that are minimally necessary to address these core challenges. First, where they choose to embrace risk assessment, jurisdictions must rely on local, recent data; continuously update and calibrate any model on which they choose to rely; and carefully define what it is they wish to predict. Notably, these steps require resources: many jurisdictions will need to make new investments

7. GIDEON’S PROMISE ET AL., JOINT STATEMENT IN SUPPORT OF THE USE OF PRETRIAL RISK ASSESSMENT INSTRUMENTS 4 (2017) (emphasis added), https://www.publicdefenders.us/files/Defenders%20Statement%20on%20Pretrial%20RAI%20May %202017.pdf [https://perma.cc/7FE5-NA7R].

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1731 to create a robust data infrastructure in order to become capable of wielding prediction responsibly. Second, where such instruments are used at all, the instruments and frameworks must be subject to strong, inclusive governance. Such governance should mean at least that the tools themselves never recommend detention, that the risk assessments and frameworks are public, and that the communities most impacted by mass incarceration are directly involved in shaping the tools and frameworks. Part V concludes that pretrial risk assessment instruments, as currently used and implemented, cannot safely be assumed to advance reformist goals of reducing incarceration and enhancing the bail system’s fairness. Early evidence remains sparse, and risk assessment instruments may yet prove themselves effective tools in the arsenal of bail reform. But, to date, they have not done so. Without careful design and open governance, this Article finds that it is more likely than not that these tools will perpetuate or worsen the very problems reform advocates hope to solve. The United States has used money bail for more than a century, and reformers have been working for nearly as long to address its ills.8 Now that risk assessment tools are becoming a widespread part of the pretrial landscape, basic justice and equity require a clear-eyed view of these tools, their limits, and how those limits can be addressed. This Article aims to contribute to that effort.

I. AMERICA’S CONTESTED APPROACH TO BAIL

A bail hearing has always involved a prediction, but what is being predicted has changed. Historically, the goal of a bail hearing was to ensure a defendant’s appearance for trial, and the question was what it would take to ensure the defendant’s reappearance in court.9 Bail hearings have since evolved to incorporate (and in many cases to center on) predictions of a criminal defendant’s dangerousness—that is, the risk that he or she will commit future crimes if released.10 The story of this turn toward dangerousness begins, improbably, with the civil rights movement and what commentators often call the “first

8. See infra Part II (detailing the history of bail and bail reform in the United States). 9. Shima Baradaran, Restoring the , 72 OH. ST. L.J. 723, 731–34 (2011). 10. John S. Goldkamp, Danger and Detention A Second Generation of Bail Reform, 76 J. CRIM. L. & CRIMINOLOGY 1, 2, 15–16 (1985).

Electronic copy available at: https://ssrn.com/abstract=3041622

1732 WASHINGTON LAW REVIEW [Vol. 93:1725 formal generation” of bail reform.11 These reforms focused on eliminating inappropriate uses of pretrial detention, especially among poor defendants.12 The story ends after a second wave of bail changes, during which the Supreme Court ultimately held that the Eighth Amendment provides individuals with no absolute right to bail.13 The pendulum of policy change swung first toward more liberal release policies as a part of the civil rights movement, then reversed as conservatives in the Nixon and Reagan years reengineered pretrial practice toward a “law and order” approach.14 Ultimately, this history suggests that risk assessment tools cannot safely be presumed to be instruments of decarceration, notwithstanding the widespread public hope that they will play that role.15

11. Timothy Schnacke, The Third Generation of Bail Reform, in NAT’L CTR. FOR STATE COURTS, TRENDS IN STATE COURTS 8, 9 (Deborah W. Smith, Charles F. Campbell & Blake P. Kavanagh eds., 2017), https://www.ncsc.org/~/media/Microsites/Files/Trends%202017/Trends- 2017-Final-small.ashx [https://perma.cc/8T74-P4B5]. 12. See TIMOTHY R. SCHNACKE, MICHAEL R. JONES & CLAIRE M.B. BOOKER, PRETRIAL JUSTICE INST., THE HISTORY OF BAIL AND PRETRIAL RELEASE 10–12 (2010), https://b.3cdn.net/crjustice/2b990da76de40361b6_rzm6ii4zp.pdf [https://perma.cc/W9XW-NM3P]. 13. United States v. Salerno, 481 U.S. 739 (1987). 14. ELIZABETH HINTON, FROM THE WAR ON POVERTY TO THE WAR ON CRIME: THE MAKING OF MASS INCARCERATION IN AMERICA 134, 139 (2016) (emphasis added). Our understanding and framing is indebted to the work of historians and scholars who have critically examined the role of the Johnson Administration in, paradoxically, helping pave the way for the massive changes under the Nixon and Reagan administrations. See, e.g., id. at 14 (“By expanding the federal government’s power in the pursuit of twinned social welfare and social control goals, Johnson paradoxically paved the way for the anticrime policies of the Nixon and Ford administrations to be turned against his own antipoverty programs. Nixon merely appropriated the regressive aspects of the Johnson administration as his own . . . .”); see also Elizabeth Hinton, “A War Within Our Own Boundaries” Lyndon Johnson’s Great Society and the Rise of the Carceral State, 102 J. AM. HIST. 100, 102 (2015) (“Far from being ambivalent about crime control as a major aim of domestic policy, Johnson and his radical domestic programs laid the foundation of the carceral state, opening an entirely new plane of domestic social programs centered on crime control, surveillance, and incarceration.”). 15. This observation, of course, is not new. Writing in 1992, Malcolm M. Feeley and Jonathan Simon detailed what they termed the “new penology,” which replaced “a moral or clinical description of the individual with an actuarial language of probabilistic calculations and statistical distributions applied to populations.” Malcom M. Feeley & Jonathan Simon, The New Penology Notes on the Emerging Strategy of Corrections and Its Implications, 30 CRIMINOLOGY 449, 452, 455 (1992). In their view, “[t]he new penology is neither about punishing nor about rehabilitating individuals. It is about identifying and managing unruly groups. It is concerned with the rationality not of individual behavior or even community organization, but of managerial processes. Its goal is not to eliminate crime but to make it tolerable through systemic coordination.” Id. See also Eric Silver & Lisa L. Miller, A Cautionary Note on the Use of Actuarial Risk Assessment Tools for Social Control, 48 CRIM. & DELINQ. 138, 157 (2002) (arguing that “[i]nsufficient attention has been paid to the negative potential embodied in actuarial social control technologies that, in the name of science and safety, increasingly conceptualize the individual in terms of population aggregates . . . [and] the activities that must follow once risk is assessed”).

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1733

A. The Origins of American Bail

Historically, American criminal defendants were generally presumed to have a right to bail—the right either to be released outright until their trial, or else to obtain release subject to some judicially imposed (usually financial) condition.16 The exceptions to this general rule were capital cases, where the threat of execution was presumed likely to impel a defendant to flee the jurisdiction if released.17 In early American history, such release conditions typically took the form of a third-party “pledge”—someone known to the court who would be financially liable if the defendant failed to appear when required.18 “Bail historically served the sole purpose of returning the defendant to court for trial, not preventing her from committing additional crimes.”19 Between the late nineteenth and early twentieth centuries, the commercial bond industry superseded the personal surety system.20 Despite that watershed revolution, a bail hearing’s predictive goal remained the same: ensuring a defendant’s appearance. For example, in

16. See, e.g., 4 WILLIAM BLACKSTONE, COMMENTARIES *296 (explaining that the accused must “put in securities for his appearance, to answer the charge against him”); TIMOTHY R. SCHNACKE, NAT’L INST. OF CORR., U.S. DEP’T OF JUSTICE, FUNDAMENTALS OF BAIL: A RESOURCE GUIDE FOR PRETRIAL PRACTITIONERS AND A FRAMEWORK FOR AMERICAN PRETRIAL REFORM (2014). 17. This exception was longstanding. As William Blackstone wrote, “[f]or what is it that a man may not be induced to forfeit to save his own life?” 4 WILLIAM BLACKSTONE, COMMENTARIES *297. Notably, a substantial number of crimes in the eighteenth century were capital offenses. See, e.g., John N. Mitchell, Bail Reform and the Constitutionality of Pretrial Detention, 55 VA. L. REV. 1223, 1227 (1969) (noting that “at that time the great majority of criminal offenses involving a threat of serious physical injury or death were punishable by death under state laws”). 18. SHIMA BARADARAN BAUGHMAN, THE BAIL BOOK: A COMPREHENSIVE LOOK AT BAIL IN AMERICA’S CRIMINAL JUSTICE SYSTEM 164 (2018). 19. Baradaran, supra note 9, 731. 20. First, the personal surety system of the early United States depended upon the sufficient availability of community members able to serve as sureties. But the pace at which the United States grew diluted the important community ties that made the personal surety click. Compounding this problem was a seemingly ever-expanding western frontier, which only appeared to increase an individual’s likelihood of flight. Peggy M. Tobolowsky & James F. Quinn, Pretrial Release in the 1990s Texas Takes Another Look at Nonfinancial Release Conditions, 19 NEW ENG. J. ON CRIM. & CIV. CONFINEMENT 267, 274 n.38 (1993). Second, changes in court practice contributed to the demise of the personal surety system and “courts began eroding historic rules against profiting from bail and indemnifying sureties, slowly ushering in the commercial bail bonding business at the end of the century.” TIMOTHY R. SCHNACKE, NAT’L INST. OF CORR., U.S. DEP’T OF JUSTICE, MONEY AS A CRIMINAL JUSTICE STAKEHOLDER: THE JUDGE’S DECISION TO RELEASE OR DETAIN A DEFENDANT PRETRIAL 26 (2014) [hereinafter MONEY AS A STAKEHOLDER]. These changes led states to experiment with new ways of ensuring a defendant’s appearance and administering bail, all of which “combined to give birth to . . . the commercial money bail bond industry.” SCHNACKE ET AL., supra note 12, at 6.

Electronic copy available at: https://ssrn.com/abstract=3041622

1734 WASHINGTON LAW REVIEW [Vol. 93:1725

Stack v. Boyle,21 the Supreme Court detailed how the commercial bail bond industry complemented the longstanding purpose of bail: The right to release before trial is conditioned upon the accused’s giving adequate assurance that he will stand trial . . . . Like the ancient practice of securing the oaths of responsible persons to stand as sureties for the accused, the modern practice of requiring a bail bond or the deposit of a sum of money subject to forfeiture serves as additional assurance of the presence of an accused.22 A bail hearing’s officially narrow focus on ensuring a defendant would reappear at trial, however, was never the whole story. Unofficially, a defendant’s predicted dangerousness has always mattered to some degree, and it “was widely acknowledged that judges deliberately set unaffordable bail amounts on pretextual flight risk grounds so that dangerous individuals would be detained until trial.”23 By setting unattainable bail amounts—a practice known as sub rosa preventive detention—judges were able to prevent defendants they believed to be dangerous from getting out of jail.24 Despite a recognition that judges sometimes did look to considerations other than flight risk in setting bail, the propriety of looking beyond flight risk was hotly contested. At the National Conference on Bail and Criminal Justice in 1964, the issue of preventive detention on account of a defendant’s perceived dangerousness was described as “[p]erhaps the most perplexing of all problems.”25 Some argued that the “jailing of persons by courts because of anticipated, but uncommitted crimes, is a concept wholly at war with the basic traditions of American justice.”26 Others argued that the “possibility of preventive detention should be a matter of discretion in cases where the welfare and safety of the public is in peril.”27 The debate would run for more than a decade and, in the end, transform bail.

21. 342 U.S. 1 (1951). 22. Id. at 4–5 (emphasis added). 23. Lauryn P. Gouldin, Disentangling Flight Risk from Dangerousness, 2016 BYU L. REV. 837, 848 (2016). 24. Sandra G. Mayson, Dangerous Defendants, 127 YALE L.J. 490, 503 (2018). 25. U.S. DEP’T OF JUSTICE & THE VERA FOUND., PROCEEDINGS AND INTERIM REPORT OF NATIONAL CONFERENCE ON BAIL AND CRIMINAL JUSTICE xxix (1965), https://www.ncjrs.gov/pdffiles1/Photocopy/355NCJRS.pdf [https://perma.cc/XN24-GQ4W]. 26. Id. at 170. 27. Id. at 165.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1735

B. The Civil Rights Era: Fighting to End Poverty Jailing

The early 1960s bail reform effort was tied to the larger movement for civil rights. It focused on the plight of poor defendants in crowded jails.28 The commercial bail bond industry promised to assist defendants in financial distress. In reality, however, it helped create that distress, mostly by supporting widespread financial conditions and then charging steep up-front fees and collateral for their services.29 Civil-rights era reformers built their advocacy, in part, on empirical work that established two key findings: that jails were overcrowded with defendants who could not meet financial conditions of release,30 and that defendants with community ties could, in fact, be released safely—even when they could not afford to pay bail.31 These findings were not necessarily new. As early as 1927, a seminal report on Chicago’s bail system detailed how poor defendants languished in pretrial detention solely because they could not pay small bail amounts.32 Similarly, a 1954 study of bail in Philadelphia found that the “practical effect of Philadelphia[’s] methods for determining the amount of bail is to deny bail to . . . a substantial proportion of those charged with lesser crimes [and also] explain the chronic overcrowding in the untried department of the County Prison.”33 Concern over New York’s money bail system led the Vera Institute of Justice to design and implement the 1961 Manhattan Bail Project, which demonstrated that defendants with strong community ties could be released on their own recognizance without increasing rates of failure to appear.34 Even if they

28. Goldkamp, supra note 10, at 2; see also Jeffrey Fagan & Martin Guggenheim, Preventive Detention and the Judicial Prediction of Dangerousness for Juveniles A Natural Experiment, 86 J. CRIM L. & CRIMINOLOGY 415, 417 (“The first reforms, in the 1960s, were aimed principally at eliminating the unregulated use of pretrial detention, primarily among poor defendants in urban jails. Reformers were critical of the conditions of confinement in American jails, the discriminatory setting of unaffordable bail for the urban poor and the indirect use of punitive detention.”). 29. MONEY AS A STAKEHOLDER, supra note 20, at 31. 30. Compelling Appearance in Court Administration of Bail in Philadelphia, 102 U. PA. L. REV. 1031 (1954) [hereinafter Administration of Bail in Philadelphia]. 31. See EVIE LOTZE ET AL., PRETRIAL SERVS. RES. CTR., THE PRETRIAL SERVICES REFERENCE BOOK 9 (1999), https://pretrial.harriscountytx.gov/Library1/The%20Pretrial%20Services%20 Reference%20Book.pdf [https://perma.cc/D346-R7NU]. The Manhattan Bail Project found that, after three years of operation, 65% of interviewees/arrestees could be safely released pretrial with only 1% of them failing to appear for trial. Id. 32. ARTHUR BEELEY, THE BAIL SYSTEM IN CHICAGO (1927). 33. Administration of Bail in Philadelphia, supra note 30, at 1048–49. 34. SCHNACKE ET AL., supra note 12, at 10 (“The project generated national interest in bail reform, and within two years programs modeled after the Manhattan Bail Project were launched in St. Louis, Chicago, Tulsa, Washington D.C., Des Moines, and Los Angeles.”).

Electronic copy available at: https://ssrn.com/abstract=3041622

1736 WASHINGTON LAW REVIEW [Vol. 93:1725 were not necessarily revelatory, these findings spurred reform efforts that sought to minimize cash bail and increase the use of release alternatives. In 1966 Congress responded by nearly unanimously passing the Bail Reform Act to “assure that all persons, regardless of their financial status, shall not needlessly be detained” pretrial.35 The Bail Reform Act of 1966 sought to promote release on recognizance, and minimize reliance on money bail. It established that a defendant’s financial status should not be a reason for denying their pretrial release,36 made clear that the risk of nonappearance at trial should be the only criterion considered when bail is assessed,37 and mandated that non-capital defendants be released with the least restrictive set of conditions that would ensure their appearance at trial.38 The Act also generally forbade judges from treating a defendant’s dangerousness or risk to public safety as a reason for detention.39 There were, however, three key exceptions to that rule: capital cases, cases where convicted defendants awaited sentencing, and cases where convicted defendants filed an appeal.40 These exceptions represented the

35. Bail Reform Act of 1966, Pub. L. 89-465, 80 Stat. 214 (codified as amended at 18 U.S.C. §§ 3146-3151 (2018)). 36. Id. § 2 (“The purpose of this Act is to revise the practices relating to bail to assure that all persons, regardless of their financial status, shall not needlessly be detained pending their appearance . . . .”). 37. In making the determination that an individual would be likely to appear in court, the Act allowed judges to consider a wide range of factors, including the defendant’s “family ties, employment, financial resources, character and mental condition, the length of his residence in the community, his record of convictions, and his record of appearance at court proceedings or of flight to avoid prosecution or failure to appear at court.” 18 U.S.C. § 3146(b). 38. Id. § 3146(a). 39. In United States v. Leathers, the D.C. Circuit Court of Appeals held that the structure of the 1966 Act and “its legislative history make it clear that in noncapital cases pretrial detention cannot be premised upon an assessment of danger to the public.” 412 F.2d 169, 171 (D.C. Cir. 1969). That is because “only limited consideration was given to the protection of society from crimes which might be perpetrated by persons released under the Act; in fact, Congress specifically postponed consideration of those issues relating to crimes committed by persons released pending trial.” Warren L. Miller, The Bail Reform Act of 1966 Need for Reform in 1969, 19 CATH. U. L. REV. 24, 32 (1969). Nevertheless, some of the legislative history indicates a belief that detaining a defendant because of “predicted—but as yet unconsummated—offense” was illegal. Federal Bail Procedures Hearing on S. 1357 Before the Subcomm. on Constitutional Rights and the Subcomm. on Improvements in Judicial Machinery of the S. Comm. on the Judiciary, 89th Cong. 3 (1965) (statement of Sen. Ervin, Chairman, S. Subcomm. on Constitutional Rights). 40. In the section of the Act that governed “release in capital cases or after conviction,” judges were expressly authorized to consider “danger to any other person or to the community” as a proper element in setting bail in such cases. Pub. L. 89-465, 80 Stat. 214, 216. Thus, a person accused of a capital offense and those who had been convicted of any offense and were appealing that conviction

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1737 first time in American history that a law authorized a judge to consider dangerousness as a legitimate reason to deny bail.41 By allowing consideration of future dangerousness for a limited set of defendants, the Act opened a new door. If judges could consider future dangerousness for capital defendants, why not for other defendants, too? Were the circumstances so different?42 By allowing judges to consider future dangerousness for one set of defendants, the new law legitimated the project of judicial predictions of dangerousness.43

C. The 1970s and 1980s: Reversing Course to Address “Dangerousness”

Shortly after these reforms arrived, the political consensus shifted decisively in the opposite direction.44 Rising crime rates fed a perception that earlier efforts had focused too much on the welfare of the accused, and not enough on the welfare of the public.45 Commentators observed or were awaiting sentencing could have their “danger to any other person or to the community” considered at a bail hearing. Id. 41. Matthew J. Hegreness, America’s Fundamental and Vanishing Right to Bail, 55 ARIZ. L. REV. 909, 958 (2013). 42. For an even more foundational examination of this line of questioning, see Mayson, supra note 24, at 497 (“One way to start thinking about what level of risk justifies restraint is to ask a related question: is the answer different for defendants than for people not accused of any crime?”). 43. John B. Howard, The Trial of Pretrial Dangerousness Preventive Detention after United States v. Salerno, 75 VA. L. REV. 639, 645 (1989) (“Although some argued that the exception simply recognized the unique temptation of the capital defendant to flee, others justified the exception by pointing to the dangers to the community of releasing a capital defendant. This latter argument, coupled with the view that bail is a statutory and not a constitutional right, formed the foundation of the argument in favor of the constitutionality of preventive pretrial detention.” (emphasis added)). 44. See COMM. ON CAUSES AND CONSEQUENCES OF HIGH RATES OF INCARCERATION, NAT’L RESEARCH COUNCIL OF THE NAT’L ACADEMIES, THE GROWTH OF INCARCERATION IN THE UNITED STATES: EXPLORING CAUSES AND CONSEQUENCES (Jeremy Travis, Bruce Western & Steven Redburn eds., 2014) (“The unprecedented rise in incarceration rates can be attributed to an increasingly punitive political climate surrounding criminal justice policy formed in a period of rising crime and rapid social change.”). 45. The recidivism problem might have been overstated. See, e.g., J.W. LOCKE ET AL., U.S. DEP’T OF COMMERCE, NAT’L BUREAU OF STANDARDS, COMPILATION AND USE OF CRIMINAL COURT DATA IN RELATION TO PRE-TRIAL RELEASE OF DEFENDANTS: PILOT STUDY 2 (1970), https://www.gpo.gov/fdsys/pkg/GOVPUB-C13-f5b5e5adeed0d6f0c7faf442cd8ee7d4/pdf/GOVPUB -C13-f5b5e5adeed0d6f0c7faf442cd8ee7d4.pdf [https://perma.cc/8DEP-Z76R] (providing statistics from a four-week period in 1968 in Washington D.C. showing that of 712 defendants who entered the District of Columbia Criminal Justice System, 11% of those released charged with misdemeanors or felonies were subsequently re-arrested on a second charge during the release period). But this rate is not wildly out of line with today’s levels of recidivism upon release—it is actually lower than what some jurisdictions experience today. See BRIAN REAVES, U.S. DEP’T OF JUSTICE, FELONY DEFENDANTS IN LARGE URBAN COUNTIES, 2009 - STATISTICAL TABLES 15

Electronic copy available at: https://ssrn.com/abstract=3041622

1738 WASHINGTON LAW REVIEW [Vol. 93:1725 that crimes committed by individuals released pretrial remained a significant problem,46 in spite of the 1966 reforms.47 Civil unrest during the mid-to-late 1960s played a critical role in shifting perspectives.48 Across the country, prosecutors and courts adopted ad hoc policies of preventive detention to “safeguard” the community from further unrest.49 The 1968 Kerner Commission50 even recommended that under emergency conditions like civil disorder, the judiciary should have pretrial plans and procedures that “permit separation of minor offenders from those dangerous to the community, in order that serious offenders may be detained.”51 In many ways, the

(2013), https://www.bjs.gov/content/pub/pdf/fdluc09.pdf [https://perma.cc/C29F-J93B] (showing that nationally, 16% of released defendants were rearrested). 46. Here, the perception of pretrial recidivism mattered as much as—if not more than—reality. In 1969, Alan Dershowitz noted that the “net result of bail reform [from 1966] . . . has been that more criminal defendants spend more time out on the street awaiting their trials than ever before. This has led to an increase—or at least the appearance of an increase—in the number of crimes committed by some of these defendants.” Alan M. Dershowitz, On Preventive Detention’, N.Y. REV. BOOKS, Mar. 13, 1969, at 22 (emphasis added). 47. Miller, supra note 39, at 32. 48. Of course, civil unrest and disorder of the 1960s “helped foster a receptive environment for political appeals for harsher criminal justice policies and laws.” COMM. ON CAUSES AND CONSEQUENCES OF HIGH RATES OF INCARCERATION, supra note 44, at 115. 49. In fact, civil unrest during the mid-to-late 1960s played a direct role in these changes. The 12th Street Riot in Detroit, Michigan is an illustrative example. On July 23, 1967, police raided an unlicensed speakeasy, where nearly 100 people were celebrating the return of two black servicemen from Vietnam. Soon, a riot started, sparked by rumors of police . The riot lasted for five days and left forty-three people dead, over 1,000 injured, and more than 7,000 arrested. In response to the disorder, Detroit’s public prosecutor stated that his office would ask for prohibitively high bonds on all those arrested “so that even though they had not been adjudged guilty, we would eliminate the danger of returning some of those who had caused the riot to the street during the time of stress.” William A. Dobrovir, Preventive Detention The Lesson of Civil Disorders, 15 VILL. L. REV. 313, 316–17 (1970) (citing Comment, The Administration of Justice in the Wake of the Detroit Civil Disorders of July 1967, 66 MICH. L. REV. 1542, 1549–50 (1968)) (emphasis added). One Detroit judge was quoted as saying that in cases like this, “[w]e will . . . allocate an extraordinary bond. We must keep these people off the streets. We will keep them off.” Id. at 317; see also Criminal Justice in Extremis Administration of Justice During the April 1968 Chicago Disorder, 36 U. CHI. L. REV. 455, 576 (1969). The April 1968 riots which dominated Washington D.C. also temporarily brought a new standard for determining whether or not an arrestee should be released before trial: “whether in the judge’s view he was likely to contribute to further disorder, to commit further offenses.” Dobrovir, supra, at 322 (emphasis added). 50. The Kerner Commission, also known as the National Advisory Commission on Civil Disorders, was established by President Lyndon B. Johnson to investigate the underlying causes behind the 1967 race riots in the United States. The Commission found that poverty and institutional racism drove inner-city violence and called for aggressive federal spending to advance opportunities in African-American communities. See NAT’L CRIMINAL JUSTICE REFERENCE SERV., REPORT OF THE NATIONAL ADVISORY COMMISSION ON CIVIL DISORDERS (1968), https://www.ncjrs.gov/pdffiles1/Digitization/8073NCjRS.pdf [https://perma.cc/373j-Ajy7]. 51. Id. at 17.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1739 judicial response to the civil unrest of 1967 and 1968 not only further normalized the task of predicting dangerousness, but also of preventive detention more generally. Against this backdrop, President Richard M. Nixon, elected in November 1968, included in his “War on Crime” a call for “temporary pretrial detention . . . [for people whose] pretrial release presents a clear danger to the community.”52 Citing high-profile crimes committed in D.C. and other cities by defendants released pretrial,53 the Nixon administration played a key role in raising the issue’s profile.54 Of course, the focus of Nixon’s campaign and administration was never really on crime per se; race, more than anything, loomed large behind Nixon’s “War on Crime.”55 In 1969, President Nixon’s Attorney General John Mitchell made the public argument for the necessity of predictions of dangerousness and preventive detention.56 Critically, he did so by drawing upon and exploiting logic of the Bail Reform Act of 1966. For example, Mitchell observed that the 1966 Act “specifically permits pretrial detention of defendants who are charged with capital crimes and are considered likely either to flee or pose a danger to the community.”57 Next, Mitchell pointed out that no serious constitutional objection had been lodged against the practice of predicting a capital defendant’s future dangerousness. Finally, he noted that “objections to pretrial detention of dangerous defendants on the ground that it is improper to confine those not yet convicted apply with equal force to existing pretrial detention practices—detention because of risk of flight or of dangerous capital offense defendants.”58 Above all, Mitchell argued that society had an equal right to assure that “those charged with noncapital but dangerous

52. Presidential Report, 27 CONG. Q. WKLY. REP. 212, 238 (1969) (statement by President Nixon). 53. See PRESIDENT’S COMM’N ON CRIME IN D.C., REPORT ON THE METROPOLITAN POLICE DEPARTMENT 523 (1966), https://babel.hathitrust.org/cgi/pt?id=umn.31951d02987368r;view=1 up;seq=1 (last visited Nov. 12, 2018). 54. KARL E. CAMPBELL & SAM ERVIN, LAST OF THE FOUNDING FATHERS 217 (2007). 55. While the “southern strategy” is a historically problematic term, as both Republicans and Democrats had associated blackness with criminality, Nixon’s southern strategy was unique in that it “rested on politicizing the crime issue in a racially coded manner. Effectively politicizing crime and other wedge issues—such as welfare—would require the use of a form of racial coding that did not appear on its face to be at odds with the new norms of racial equality.” See COMM. ON CAUSES AND CONSEQUENCES OF HIGH RATES OF INCARCERATION, supra note 44, at 116. 56. John N. Mitchell, Bail Reform and the Constitutionality of Pretrial Detention, 55 VA. L. REV. 1223, 1240 (1969). 57. Id. 58. Id. (emphasis added).

Electronic copy available at: https://ssrn.com/abstract=3041622

1740 WASHINGTON LAW REVIEW [Vol. 93:1725 crimes will not expose the community to unreasonable risks of danger prior to trial” and that, in making those predictions, “due process of law requires fundamental fairness, not perfect accuracy.”59 Many objected to this line of argument,60 but Mitchell’s arguments carried the day. Though the underlying philosophies motivating policy change could not have been more different, Nixon-era reformers exploited the logic and toolkit of reforms driven by civil rights leaders for their own purposes. Soon, a second generation of bail changes swept state and federal courts, enlisting judges en masse in the work of predicting defendants’ dangerousness. The 1970 District of Columbia Court Reform and Criminal Procedure Act61 (the D.C. Act) represented the first legislative move in this second wave of reform efforts, reaching outside the context of capital cases to allow “judges to detain a defendant pretrial without setting any bail if the defendant was deemed dangerous to society.”62 The D.C. Act had an immediate impact across the country: within eight years of its enactment, almost half of all states passed legislation pointing to danger as a factor in bail decisions.63 By 1984, the number of laws passed had grown to thirty-four.64 Four states—Nebraska and Texas in 1977, Michigan in 1978, and Wisconsin in 1981—amended their state constitutions to allow of bail to defendants deemed to be dangerous.65 Meanwhile, at the national level, a progression of Supreme Court cases—including Jurek v. Texas,66 Bell v. Wolfish,67

59. Id. at 1241–42. 60. The American Bar Association argued that courts could not “with any degree of tolerable accuracy predict in advance the defendants who will commit a further crime.” ABA Opposes Preventive Detention in Congressional Testimony, 1 ABA SEC. INDIVIDUAL RTS. & RESPS. NEWSL. 4 (1969). An ABA committee considering the issue expressed “serious misgivings” on the proposals, noting that the “purpose of bail is to insure the defendant’s presence at the time of trial.” William H. Erickson, The Standards of Criminal Justice in a Nutshell, 32 LA. L. REV. 369, 377 (1972). Laurence Tribe and other scholars replied in part that a scheme authorizing pretrial detention based on future danger “has all the vices inherent in a law that makes the crime fit the criminal.” Laurence H. Tribe, An Ounce of Detention Preventive Justice in the World of John Mitchell, 56 VA. L. REV. 375, 392 (1970). 61. District of Columbia Court Reform and Criminal Procedure Act of 1970, Pub. L. No. 91-358, 84 Stat. 473. 62. Shima Baradaran & Frank L. McIntyre, Predicting Violence, 90 TEX. L. REV. 497, 504 (2012). 63. Id. at 506. 64. Id. 65. Donald B. Jr. Verrilli, Note, The Eighth Amendment and the Right to Bail Historical Perspectives, 82 COLUM. L. REV. 328, 353–54 (1982). 66. 428 U.S. 262 (1976). 67. 441 U.S. 520 (1979).

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1741

Barefoot v. Estelle,68 and Schall v. Martin69—led the Court to bless predictions of dangerousness, and approve the pretrial detention of allegedly dangerous defendants, even when they did not pose a flight risk.70 The Reagan administration, with overwhelming Democratic support, pushed these changes further, relying on similar tactics and rhetoric as the Nixon administration.71 The eventual 1984 Bail Reform Act became

68. 463 U.S. 880 (1983). 69. 467 U.S. 253 (1984). 70. Jurek v. Texas did not involve bail or pretrial detention. 428 U.S. at 262. In upholding the Texas death penalty statute, the Supreme Court concluded that predictions of dangerousness are “an essential element in many of the decisions rendered throughout our criminal justice system. The decision whether to admit a defendant to bail, for instance, must often turn on a judge’s prediction of the defendant’s future conduct.” Id. at 275. Importantly, the Court observed that though “[i]t is, of course, not easy to predict future behavior,” the “fact that such a determination is difficult, however, does not mean that it cannot be made.” Id. at 274–75. Bell v. Wolfish considered the conditions of confinement for pretrial detainees at the Metropolitan Correctional Center (MCC) in New York City. 441 U.S. at 539. In finding that the MCC’s conditions of confinement did not infringe on a pretrial detainee’s constitutional rights, the Court held that, as one scholar puts it, “due process only requires that pretrial detainees be free from ‘punishment,’ rather than from a restraint of liberty.” Baradaran, supra note 9, at 744. The Court noted that punishment does not exist pretrial if an action is “reasonably related to a legitimate governmental objective.” Bell, 441 U.S. at 539. “[I]n addition to ensuring the detainees’ presence at trial, [other objectives] may justify [the] imposition of conditions and restrictions of pretrial detention and dispel any inference that such restrictions are intended as punishment.” Id. at 440. In Barefoot, the Court authorized contentious expert testimony about dangerousness in a capital case, further illustrating a growing openness toward predictions of future criminality. 463 U.S. at 916. Schall upheld a New York state statute that authorized the preventive detention of juvenile delinquents. 467 U.S. at 253. Echoing the sentiments of Jurek, Chief Justice Rehnquist noted that, “from a legal point of view there is nothing inherently unattainable about a prediction of future criminal conduct” and that the Court had “specifically rejected the contention . . . ‘that it is impossible to predict future behavior and that the question is so vague as to be meaningless.’” Id. at 278–79 (citation omitted). And though the Court had previously left open the question as to whether any government objective other than ensuring a pretrial detainee’s presence at trial could survive constitutional scrutiny, the Court in Schall definitively closed the door. “Preventive detention [under the statute] serves the legitimate state objective, held in common with every State in the country, of protecting both the juvenile and society from the hazards of pretrial crime.” Id. at 274. In other words, in Schall, the Court for the first time held that pretrial detention based on objective other than ensuring a defendant’s appearance at trial passed constitutional muster. 71. For example, the case for bail reform in the 1980s rested on the familiar perception that the rate of pretrial recidivism was “extremely high,” despite strong contradictory evidence. See Comprehensive Crime Control Act of 1983: Hearing on S. 829 Before the Subcomm. on Criminal Law of the S. Comm. on the Judiciary, 98th Cong. 285, 286 (1983) (statement of LeRoy S. Zimmerman, National Association of Attorneys General) (testifying that because “the rate of recidivism for individuals released on bail is extremely high, consideration must be given to the dangerousness of the defendant and the risk to the community should he be released on bail pending trial”); H.R. REP. NO. 98-1121, at 11 (1984); Bail Reform Act of 1981–82 Hearing on H.R. 3006, H.R. 4264, and H.R. 4362 Before the H. Subcomm. on Courts, Civil Liberties, & the Admin. of Justice of the H. Comm. on the Judiciary, 97th Cong. 87 (1981) (prepared statement of Guy Willetts, Chief of the Pretrial Services Branch, Division of Probation, Administrative Offices of the

Electronic copy available at: https://ssrn.com/abstract=3041622

1742 WASHINGTON LAW REVIEW [Vol. 93:1725 law with broad support.72 It “made public safety a central concern in the judicial officer’s choice [among] . . . pretrial custody options.”73 For the first time at the federal level,74 judges were asked to predict the danger a defendant’s release posed to the community.75 By 1984, “dangerousness” was included as a factor in bail decisions under federal law and in nearly two-thirds of the states. But these laws suffered from a common problem: how to define “danger.” Clues from the federal law pointed to a fairly broad definition of what judges could consider. For example, the Senate Committee on the Judiciary’s report noted that “the risk that a defendant will continue to engage in [drug] trafficking constitutes a danger to the ‘safety of any other person or the community.’”76 Even the risk of non-violent crimes, such as those against property, satisfied this expansive “danger” standard. State laws also consistently failed to provide specific standards to determine whether a defendant was “dangerous.”77 Despite these deficiencies, in 1987, the Supreme Court upheld the Bail Reform Act of 1984 as constitutional, and in turn, sanctioned preventive detention and predictions of dangerousness pretrial.78 In United States v. Salerno,79 the Court held that the Eighth Amendment does not grant an individual an absolute right to bail, that the denial of bail on the basis of dangerousness does not violate the Eighth Amendment, and that pretrial detention was a regulatory act, not

United States Courts) (testifying that in a sample of ten jurisdictions, “new crimes committed by federal offenders released on bail occurred at a rate of 8.4 percent”). 72. Stuart Taylor Jr., Senate Approves an Anticrime Bill, N.Y. TIMES, Feb. 3, 1984 (“The Senate today passed a bipartisan package that supporters call the most significant Federal anticrime measure in more than a decade. The vote was 91 to 1.”). 73. Goldkamp, supra note 10, at 42. 74. See, e.g., United States v. Himler, 797 F.2d 156, 158–59 (3d. Cir. 1986) (“The 1984 Act marks a radical departure from former federal bail policy. Prior to the 1984 Act, consideration of a defendant’s dangerousness in a pretrial release decision was permitted only in capital cases . . . . Under the new statute judicial officers must now consider danger to the community in all cases in setting conditions of release.” (emphasis added)). 75. Notably, the legislative history of the 1984 law demonstrates that Congress clearly understood they were transforming the fundamental premise of bail. See, e.g., Curtis E. Karnow, Setting Bail for Public Safety, 13 BERKELEY J. CRIM. L. 1, 7 (2008). 76. See S. REP. NO. 98-225, at 103 (1983), as reprinted in 1984 U.S.C.C.A.N. 3182, 3196, 1983. 77. Goldkamp, supra note 10, at 17–18. (“Over one-third of the public safety-oriented laws provide no definition of danger. . . . In approximately half of the states with explicit references, the definition of danger is vague.”) 78. United States v. Salerno, 481 U.S. 739, 755 (1987) (“In our society, liberty is the norm, and detention prior to trial or without trial is the carefully limited exception. We hold that the provisions for pretrial detention in the Bail Reform Act of 1984 fall within that carefully limited exception.”). 79. 481 U.S. 739 (1987).

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1743 punishment.80 In just over two decades, the predictive task of a bail hearing was fundamentally transformed. By 1987, states were passing laws mandating that “public safety be the primary [predictive] consideration” at a bail hearing.81 Throughout this period, conservative-minded reformers publicly argued that their set of bail reforms would make bail more honest by eliminating sub rosa detention through high bail. But more than three decades later, “after federal and state statutes were rewritten . . . [to] permi[t] judges to order dangerous defendants to be detained, money bail is still used as a back-door means to manage dangerousness.”82 In fact, what began in the mid-1960s as an effort to reduce poverty- based detention ended in the mid-1980s with a law that led to immediate and lasting increases in pretrial detention.83 Bail reform’s pivot in the late 1960s from a focus on unnecessary pretrial detention, to a focus on widespread prediction of dangerousness, took a mere two years. Critically, this pivot relied on law and order reformers successfully subverting the logic and tools of previous liberal reform efforts. That current reform efforts bear striking similarities to those discussed here should caution liberal reformers.84

II. BAIL IN PRACTICE TODAY

Across the country, after someone is arrested, they appear at a bail hearing. Sometimes the accused appears in court via videoconference from jail. Other times, the accused, en masse, sit in a makeshift courtroom in a jail for their arraignment. Specific practices vary widely from jurisdiction to jurisdiction. Nevertheless, at bail, the law generally

80. Id. at 753–55 (rejecting “the proposition that the Eighth Amendment categorically prohibits the government from pursuing other admittedly compelling interests through regulation of pretrial release” and noting that “when Congress has mandated detention on the basis of a compelling interest other than prevention of flight, as it has here, the Eighth Amendment does not require release on bail”). 81. Karnow, supra note 75, at 8. 82. Gouldin, supra note 23, at 863. 83. The U.S. Marshals Service found that there was “a 32 percent increase in prisoner population . . . during the first year after its passage.” John Riley, Preventive Detention Use Grows—But Is It Fair?, NAT’L L.J., Mar. 24, 1986, at 1. Three years later, a General Accounting Office report found that the 1984 Bail Reform Act led to a “greater percentage of defendants remain[ing] incarcerated during their pretrial period under the new law than under the [1966] law.” U.S. GEN. ACCOUNTING OFFICE, GAO/GGD-88-6, CRIMINAL BAIL: HOW BAIL REFORM IS WORKING IN SELECTED DISTRICT COURTS 18 (1987) https://www.gao.gov/assets/150/145896.pdf [https://perma.cc/K8QE-6CV2]. 84. See infra section II.C.

Electronic copy available at: https://ssrn.com/abstract=3041622

1744 WASHINGTON LAW REVIEW [Vol. 93:1725 asks a judge to assess the risk that a defendant will flee the jurisdiction, or whether the defendant would pose a danger to the community if released. The judge is asked to order the least restrictive set of conditions needed to ensure the defendant appears at future court dates and does not harm the community in the meantime.85 If a judge predicts, or a bail schedule requires, that no feasible combination of conditions could adequately ensure the defendant’s appearance for their trial, nor protect public safety, then that individual is non-bailable, or not eligible for release before trial. These defendants will be remanded to custody and will wait in jail until their trial or other resolution of their case. On the other hand, if a judge believes that the defendant will appear for trial and does not think the defendant will be a danger to the community if released, or believes that some set of conditions would ensure these criteria, then that individual is bailable and eligible for some form of pretrial release. Complicating matters, in some jurisdictions a judge is constrained by a bail schedule, where certain crimes merit certain statutorily determined conditions. There are three basic approaches to release: Release on personal recognizance (sometimes called “ROR”): The defendant promises to reappear for his or her future court dates with no judicially imposed restrictions or conditions.86 Conditional release: The defendant is released with non-monetary conditions, such as a requirement to check in with a pretrial services agency, undergo drug treatment, or wear a GPS monitoring anklet.87 Release on bond: A set financial obligation is defined that the defendant will have to pay if she fails to return to court when required. A secured bond means the defendant must pay the amount up front in order to be released from jail, while an unsecured bond means that the defendant is released without paying but will become liable for the defined amount if she fails to appear in the future.88

85. Again, practices can vary widely across jurisdictions. We only generalize the process for the sake of clarity. 86. See, e.g., WASH. SUPER. CT. CRIM. R. 3.2(a) (“Any person, other than a person charged with a capital offense, shall at the preliminary appearance or reappearance . . . be ordered released on the accused’s personal recognizance pending trial unless [one of several factors is met].”). 87. See, e.g., OR. REV. STAT § 135.230(d) (2017) (“‘Conditional release’ means a nonsecurity release which imposes regulations on the activities and associations of the defendant.”). 88. See, e.g., VA. CODE ANN. § 19.2-123(2a), (3) (2018) (“[A]ny judicial officer may impose any one or any combination of the following conditions of release: . . . Require the execution of an unsecured bond; [or] [r]equire the execution of a secure bond which at the option of the accused shall be satisfied with sufficient solvent sureties, or the deposit of cash in lieu thereof.”).

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1745

Judges can also combine an offer of conditional release with a financial bond. Even when a judge sets a condition of release, the defendant still may not be freed before trial. For example, when a secured bond amount is set, the defendant remains in jail unless and until that money is given to the authorities. Many jurisdictions abide by a “10 percent” rule, where defendants only need to post 10% of a bond to secure their release that day. Funds can come from the defendant directly, from a friend, relative or community member, from a bail fund, or from a commercial bail bondsman (more on that below). But across the country, low income defendants struggle and are often unable to raise the necessary funds. Even though a judge has approved a path for their release, they remain in jail.

A. Motivations for Reform

Bail decisions can upend people’s lives. Before a trial has begun, without any finding of , a judge may nonetheless deprive the defendant of her liberty, or impose a range of other burdens, during the weeks, months, or years that may pass until guilt or innocence is finally determined. Those who are denied bail, or who are offered it on terms they cannot afford, must stay in jail until trial. While they wait, they often lose their jobs, face eviction from their homes, and otherwise watch their lives crumble. Pretrial detention plays a central role in America’s globally extraordinary patterns of incarceration. On any given day in the United States, more than 400,000 individuals are detained and awaiting trial.89 The total population estimated to be in local jails nationally is up about 20% since 2000, and 95% of that growth is attributable to people awaiting trial.90 Two specific motivations deserve to be highlighted. First, the longstanding ills of money bail remain. Inability to pay bail is the primary reason why pretrial defendants stay in jail until the disposition of their cases.91 Compounding the problem, the proportion of felony

89. TODD D. MINTON & ZHEN ZENG, U.S. DEP’T OF JUSTICE, JAIL INMATES AT MIDYEAR 2014, at 3 (2015), http://www.bjs.gov/content/pub/pdf/jim14.pdf [https://perma.cc/J9P3-KCR5]. 90. Id. at 4. 91. The most recent statewide data available shows that 38% of felony defendants in the largest seventy-five counties were detained until the end of their case. Of that group, about 90% were detained because they were unable to meet the financial conditions offered for release. The percentages are essentially the same for felony defendants in state courts, too. REAVES, supra note 45; see also THOMAS COHEN & BRIAN REAVES, U.S. DEP’T OF JUSTICE, PRETRIAL RELEASE OF

Electronic copy available at: https://ssrn.com/abstract=3041622

1746 WASHINGTON LAW REVIEW [Vol. 93:1725 defendants subject to some financial condition for their release has skyrocketed.92 Second, and relatedly, the human cost of pretrial detention is staggering. A growing body of research indicates that pretrial detention itself directly increases the probability of worse case outcomes for the defendant—meaning a guilty plea or conviction at trial.93 Further, recent research shows that pretrial detention worsens the risks that judges aim to predict. That is, pretrial detention itself leads to higher rates of pretrial rearrests, more failures to appear, and greater long-term recidivism than the same defendants would have shown if immediately released.94 This finding has significant import for our core thesis, discussed in sections III.B.2 and B.3.

B. The Shape of Current Reforms: Away from Money, Toward Risk

Today’s reform efforts mark what some call the third generation of bail reform.95 The pace of reform is rapid, and the shape of reforms is varied. A central goal of most of these efforts is to end the wealth-based system, and move pretrial justice systems toward a risk-based model.96

FELONY DEFENDANTS IN STATE COURTS 2 (2007), https://www.bjs.gov/content/pub/pdf/prfdsc.pdf [https://perma.cc/NZ9N-XACT]. 92. From 1990 to 2009, the overall percentage of felony cases involving some sort of financial condition for release rose from just over one-third to nearly two-thirds of all releases. Meanwhile, the fraction of outright releases (without conditions) declined apace. See REAVES, supra note 45. 93. KRISTIAN LUM & MIKE BAIOCCHI, THE SAUSAL IMPACT OF BAIL ON CASE OUTCOMES FOR INDIGENT DEFENDANTS 1 (July 15, 2017) (“It has long been observed that those who are detained pre-trial are more likely to be convicted, but only recently have formal causal inference methods been brought to bear on the problem of determining whether pre-trial detention causes a higher likelihood of conviction. In each case where causal inference methods were used, a statistically significant effect was found.” (citations omitted) (emphasis omitted)); id. at 4 (“We find a strong causal relationship between setting bail and the outcome of a case . . . for cases for which different judges could come to different decisions regarding whether bail should be set, setting bail results in a 34% increase in the chances that they will be found guilty.”); see also EMILY LESLIE & NOLAN G. POPE, THE UNINTENDED IMPACT OF PRETRIAL DETENTION ON CASE OUTCOMES: EVIDENCE FROM NYC ARRAIGNMENTS (2016); MEGAN STEVENSON, DISTORTION OF JUSTICE: HOW INABILITY TO PAY AFFECTS CASE OUTCOMES (2016). 94. See Paul Heaton, Sandra Mayson & Megan Stevenson, The Downstream Consequences of Misdemeanor Pretrial Detention, 69 STAN. L. REV. (2017); CHRISTOPHER T. LOWENKAMP, MARIE VANNOSTRAND & ALEXANDER HOLSINGER, LAURA AND JOHN ARNOLD FOUND., THE HIDDEN COSTS OF PRETRIAL DETENTION (2013); Arpit Gupta, Christopher Hansman & Ethan Frenchman, The Heavy Costs of High Bail Evidence from Judge Randomization, 45 J. LEG. STUD. 1, 3 (2016). 95. Timothy R. Schnacke, Claire M.B. Brooker & Michael R. Jones, The Third Generation of Bail Reform, DEN. L. REV. ONLINE (Mar. 14, 2011), http://www.denverlawreview.org/dlr- onlinearticle/2011/3/14/the-third-generation-of-bail-reform.html [https://perma.cc/N64N-6PSY]. 96. SCHNACKE, supra note 16, at 36.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1747

In many states, legislatures are the first movers of reform. Since 2012, over 500 bills across all fifty states were enacted related to pretrial justice, including financial and non-financial conditions for release, pretrial services and supervision, diversion programs, citation in lieu of arrest, and victim support and services.97 In 2016, forty-four states enacted nearly 120 laws related to pretrial administration.98 Almost two- thirds of those states enacted some sort of law related specifically to pretrial diversion.99 Twelve states, the District of Columbia, and the federal government have enacted a statutory presumption that defendants charged with bailable offenses be released on personal recognizance or unsecured bond “unless a judicial officer makes an individual determination that the defendant poses a risk that requires more restrictive conditions or detention.”100 Six more states have done so by court rule.101 Recent legislation also directly targets money bail. In June 2017, Connecticut passed legislation that barred “cash-only” bail for certain crimes and prohibits courts from imposing a financial condition of release on defendants charged with only a misdemeanor crime.102 New Jersey’s comprehensive bail reforms took effect in January 2017, virtually eliminating cash bail across the state.103 Illinois enacted legislation in June 2017 that requires judges to use the least restrictive conditions to assure a defendant’s appearance, with a presumption that any conditions of release would be non-monetary.104 In early 2017, New

97. TRENDS IN PRETRIAL RELEASE 2017, supra note 6. 98. TRENDS IN PRETRIAL RELEASE 2018, supra note 6, at 5. 99. Id. 100. ARTHUR W. PEPIN, CONFERENCE OF STATE COURT ADM’RS, 2012-2013 POLICY PAPER EVIDENCE-BASED PRETRIAL RELEASE 3 (2012), https://cosca.ncsc.org/~/media/Microsites/Files/COSCA/Policy%20Papers/Evidence%20Based%20 Pre-Trial%20Release%20-Final.ashx [https://perma.cc/BTM3-YPYG]. 101. Id. 102. An Act Concerning Pretrial Justice Reform, 2017 Conn. Acts 716 (Reg. Sess.). Three exceptions exist to the new misdemeanor release rule: if (1) person is charged with a family violence crime, (2) if a person requests such conditions, or (3) court makes a finding on the record that there is a likely risk that the arrested person will fail to appear in court, will obstruct or attempt to obstruct justice, or threaten, injure or intimidate or attempt to threaten, injure or intimidate a prospective witness or juror, or the arrested person will engage in conduct that threatens the safety of himself or herself or another person. Id. 103. Act of Aug. 11, 2014, ch. 31, 2014 N.J. Laws 467. 104. Bail Reform Act of 2017, No. 100-1, 2017 Ill. Legis. Serv. (West). The law also allows the state supreme court to establish a pretrial risk assessment tool, but does not require it. The state’s Administrative Office of the Courts already indicated its support for pretrial risk assessment, as did the Illinois Supreme Court in a statewide policy statement. See Illinois Supreme Court Adopts Statewide Policy Statement for Pretrial Services, ILL. ST. BAR ASS’N: ILL. LAW. NOW (May 1,

Electronic copy available at: https://ssrn.com/abstract=3041622

1748 WASHINGTON LAW REVIEW [Vol. 93:1725

Orleans’s City Council passed an ordinance eliminating cash bail for defendants charged with minor, non-violent crimes.105 A new rule promulgated by Maryland’s Court of Appeals that instructs judges to first look to non-financial conditions of release went into effect on July 1, 2017, after the legislators in the state failed to pass new legislation before their session ended.106 Atlanta’s City Council passed an ordinance eliminating a cash bond requirement for low-level offenses107; Alaska enacted new reforms that eliminate money bail for most defendants108; and multiple New York City district attorneys have ordered prosecutors not to request money bail in most cases.109 Among the most popular reforms are policies that introduce or expand pretrial services and, at the same time, either introduce or expand actuarial risk assessment. Since 2012, at least twenty laws in fourteen states either created or standardized the use of pretrial risk assessment.110 In 2014 alone, eleven laws were passed to regulate how risk assessment tools were used pretrial. Almost half of the states that passed laws relating to pretrial services between 2012 and 2014 authorized or created statewide pretrial service programs.111 Cities and counties across the country have experimented with pretrial risk assessment—some develop their own tools, while others implement or purchase another tool.112 Among policymakers, actuarial tools enjoy broad support across the political spectrum. The American Bar Association specifically recommends that judges use actuarial models in making bail

2017), https://www.isba.org/iln/2017/05/01/illinois-supreme-court-adopts-statewide-policy- statement-pretrial-services [https://perma.cc/TS5K-XM23]. 105. Jessica Williams, City Council Unanimously Passes Overhaul to Municipal Court Bail System Fewer Defendants Will Have to Post Bail, NEW ORLEANS ADVOC. (Jan. 12, 2017), http://www.theadvocate.com/new_orleans/news/courts/article_eb41d288-d90b-11e6-b99c- 4bb3e5442d1b.html [https://perma.cc/BFP9-E8NJ]. 106. Michael Dresser, Maryland Court of Appeals Defendants Can’t Be Held in Jail Because They Can’t Afford Bail, BALT. SUN (Feb. 8, 2017), http://www.baltimoresun.com/news/maryland/bs-md-bail-rule-20170207-story.html [https://perma.cc/VC79-G8LD]. 107. Rhonda Cook, Atlanta Mayor Signs New Ordinance Changing Cash Bail System in a Nod to the Needy, ATLANTA-J. CONST.: MYAJC, (Feb. 5, 2018), https://www.myajc.com/news/local/atlanta-council-oks-changes-cash-bail-system-nod-the- needy/SW50dABJAtWgBwpB4vtgBN/ [https://perma.cc/XF8M-JPMZ]. 108. Senate Bill 91 Summary of Policy Reforms, ALASKA JUST. F., Spring 2016, at 1, 2–4. 109. James C. McKinley Jr., Some Prosecutors Stop Asking for Bail in Minor Cases, N.Y. TIMES, (Jan. 9, 2018), https://www.nytimes.com/2018/01/09/nyregion/bail-prosecutors-new-york.html (last visited Nov. 12, 2018). 110. WIDGERY, supra note 6, at 1. 111. Id. at 3. 112. Samuel R. Wiseman, Fixing Bail, 84 GEO. WASH. L. REV. 417, 442 (2016).

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1749 determinations.113 A cohort of prominent public defense and criminal defense groups recently called for “the use of validated pretrial risk assessment in all jurisdictions, as a necessary component of a fair pretrial release system.”114 The National Association of Counties recently adopted a resolution calling on the U.S. Department of Justice to advise state and county governments to adopt pretrial risk assessment and eliminate commercially secured bonds.115 The Conference of State Court Administrators (COSCA)116 and the Conference of Chief Justices have both called for the use of risk assessment.117 Most notably, during the time in which this Article was being edited, California passed, and its governor signed, Senate Bill 10 into law.118 The law completely eliminates California’s money bail system, replacing it with a system based on risk assessment.119 Despite significant, looming implementation hurdles—including a proposed referendum to repeal it120—the law is scheduled to take effect October 1, 2019.121

113. ABA, ABA STANDARDS FOR CRIMINAL JUSTICE: PRETRIAL RELEASE 5–6 (3d ed. 2007) (see standard 10.1.10 discussing the role of the pretrial services agency in determining release eligibility for defendants). 114. GIDEON’S PROMISE ET AL., supra note 7, at 4. The groups involved included the American Council of Chief Defenders, Gideon’s Promise, the National Association for Public Defense, the National Association of Criminal Defense Lawyers, and the National Legal Aid and Defenders Association. See id. at 1. 115. NAT’L ASSOC. OF CTYS., ADOPTED INTERIM POLICY RESOLUTIONS 11 (2017), http://www.naco.org/ [https://perma.cc/22YU-LKM6] (explaining the policy resolution on improving the pretrial justice process). 116. PEPIN, supra note 100. 117. CONFERENCE OF CHIEF JUSTICES, ENDORSING THE CONFERENCE OF STATE COURT ADMINISTRATORS POLICY PAPER ON EVIDENCE-BASED PRETRIAL RELEASE (2013), https://ccj.ncsc.org/~/media/Microsites/Files/CCJ/Resolutions/01302013-pretrial-release-Endorsing- COSCA-Paper-EvidenceBased-Pretrial-Release.ashx [https://perma.cc/9656-249H] (adopting the policy as proposed by the CCJ/COSCA Criminal Justice Committee at the Conference of Chief Justices midyear meeting). 118. Act of Aug. 28, 2018, ch. 244, 2018 Cal. Legis. Serv. (West) (S.B. 10). 119. The California Judicial Council is statutorily tasked with “[c]ompil[ing] and maintain[ing] a list of validated pretrial risk assessment tools” that jurisdictions may use. Id. § 1320.24(e)(1). According to the law, a validated risk assessment tool is one that “shall be demonstrated by scientific research to be accurate and reliable in assessing the risk of a person failing to appear in court as required or the risk to public safety due to the commission of a new criminal offense if the person is released before adjudication of his or her current criminal offense and minimize bias.” Id. § 1320.7(k). 120. Bob Egelko, Bail Bond Companies Gathering Signatures for Referendum to Keep Them in Business, S.F. CHRONICLE (Sep. 11, 2018, 6:10 PM), https://www.sfchronicle.com/news/article/Bail-bond-companies-seek-to-block-new-law-that- 13221653.php [https://perma.cc/CZ9U-LYMK]. 121. Act of Aug. 28, 2018, ch. 244, § 1320.6, 2018 Cal. Legis. Serv. (West) (S.B. 10).

Electronic copy available at: https://ssrn.com/abstract=3041622

1750 WASHINGTON LAW REVIEW [Vol. 93:1725

However, a growing chorus of advocates have begun to raise some objections. For example, Human Rights Watch argues that pretrial risk assessment tools should be opposed “entirely.”122 Over a hundred community and advocacy groups in New York recently argued that pretrial risk assessment tools will “further exacerbate racial bias in [the] criminal justice system” and that the tools will “likely lead to increases in pretrial detention in the state.”123 And a broad range of more than 100 civil rights, social justice, and digital rights groups declared that risk assessment instruments should not be used pretrial.124 Those same groups detail six principles that any pretrial risk assessment tool must follow “in order to ameliorate the strong dangers and risks we see in the implementation of risk assessment instruments.”125 Early evidence on the impact of risk assessment is limited, but the nascent findings are troubling. A recent study of Kentucky’s bail reforms by Megan Stevenson found that a new risk assessment tool and other policy reforms “led to only a trivial increase in pretrial release” and, simultaneously, “an uptick in failures-to-appear (FTAs) and pretrial crime; a disappointing counter to hopes that all three margins could be improved simultaneously.”126

III. THE CHALLENGES OF PRETRIAL RISK ASSESSMENT

Despite pretrial risk assessment’s broad and enthusiastic adoption, there are significant reasons for caution. Some of these reasons are underappreciated in the public debate. Existing skepticism about the adoption of pretrial risk assessment tools centers on concerns of racial

122. John Raphling, Human Rights Watch Advises Against Using Profile-Based Risk Assessment in Bail Reform, HUM. RTS. WATCH (Jul. 17, 2017), https://www.hrw.org/news/2017/07/17/human- rights-watch-advises-against-using-profile-based-risk-assessment-bail-reform [https://perma.cc/V6ZH-2JLM]. 123. Letter from Over 100 Cmty. & Advocacy Grps. Across N.Y. State to Governor Cuomo (Nov. 2017), http://bds.org/wp-content/uploads/Bail-Reform-Letter.pdf [https://perma.cc/A3P5- ZJAX] (discussing bail reform in New York). 124. THE LEADERSHIP CONFERENCE ON CIVIL AND HUMAN RIGHTS, THE USE OF PRETRIAL “RISK ASSESSMENT” INSTRUMENTS: A SHARED STATEMENT OF CIVIL RIGHTS CONCERNS, (2018) [hereinafter LEADERSHIP CONFERENCE SHARED STATEMENT], http://civilrightsdocs. info/pdf/criminal-justice/Pretrial-Risk-Assessment-Full.pdf [https://perma.cc/DK2Q-8LY3]. 125. Id. at 2. 126. Megan Stevenson, Assessing Risk Assessment in Action, 103 MINN. L. REV. (forthcoming 2018) (manuscript at 5), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3016088 (last visited Nov. 12, 2018).

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1751 bias and on due process inequities, both of which are substantial concerns.127 Though these debates are incredibly important, they have contributed to a lack of discussion of a more basic tension between statistical prediction and bail reform. On the one hand, to change a broken system, policymakers enact and implement policies that work to reduce the risk of rearrest and failure to appear. On the other hand, policymakers ask statistical tools to forecast those very same risks based on data from the very system under reform.128 As a result, without the right conditions and policies, risk assessment tools will typically be blind to the helpful impact of the very changes that reformers seek to introduce. In this Part, we first describe how actuarial risk assessment tools work.

127. In particular, ProPublica’s assertion that COMPAS risk assessment tool was “biased against blacks” stimulated much of this research. See Julia Angwin et al., Machine Bias, PROPUBLICA (May 23, 2016), https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing [https://perma.cc/2C6W-5978]. COMPAS’s creator claims that their risk assessment tool is fair because it maintains “predictive parity”—meaning that defendants with the same risk score are equally likely to reoffend. For example, in Broward County, Florida, 60% of white and 61% of black defendants assigned a risk score of seven actually reoffended. See WILLIAM DIETERICH ET AL., NORTHPOINTE INC. RESEARCH DEP’T, COMPAS RISK SCALES: DEMONSTRATING ACCURACY EQUITY AND PREDICTIVE PARITY (2016), http://go.volarisgroup.com/rs/430-MBX- 989/images/ProPublica_Commentary_Final_070616.pdf [https://perma cc/A6X6-W4D2]. However, among defendants who ultimately did not reoffend, black defendants were twice as likely as white defendants to receive a medium or high risk score. Further, white defendants who subsequently reoffended had lower average risk scores than their black counterparts. Essentially, black defendants were over-classified as risky and unnecessarily subjected to harsher scrutiny. See Julia Angwin et al., ProPublica Responds to Company’s Critique of Machine Bias Story, PROPUBLICA (Jul 29, 2016), https://www.propublica.org/article/propublica-responds-to-companys-critique-of-machine- bias-story [https://perma.cc/JK4Z-D6YW]. The lesson of the ProPublica piece, however, is that this result is inevitable. Where there is a divergent base rate (on average, black defendants recidivate at higher rates) and predictive tools like COMPAS must maintain predictive parity (to pass statistical muster), a predictive algorithm cannot satisfy both fairness criteria (predictive parity and equalized false positive and negative rates) simultaneously. See Jon Kleinberg et al., Inherent Trade-Offs in the Fair Determination of Risk Scores, ARXIV (Nov. 17, 2016), https://arxiv.org/pdf/1609.05807.pdf [https://perma.cc/75NX-7PL9]. COMPAS is also at the center of another debate: due process. In Loomis v. Wisconsin, 881 N.W 2d 749 (Wis. 2016), cert. denied, 582 U.S. __, 137 S. Ct. 2290 (2017) (mem.), the question presented by the petitioner, Eric Loomis, was whether or not “it [is] a violation of a defendant’s constitutional right to due process for a trial court to rely on [proprietary] risk assessment results at sentencing: (a) because the proprietary nature of COMPAS prevents a defendant from challenging the accuracy and scientific validity of the risk assessment; and (b) because COMPAS assessments take gender and race into account in formulating the risk assessment?” Petition for Writ of Certiorari at i, Loomis, 137 S. Ct. 2290 (No. 16-6387). 128. See Gil Rothschild-Elyassi, Johann Koehler & Jonathan Simon, Actuarial Justice, in THE HANDBOOK OF SOCIAL CONTROL (Mathieu Deflem ed., 2019) 194–206; see also id. at 203 (noting, along similar lines, a “special twist to actuarial justice’s legitimacy puzzle: the very system that produced an outcome as illegitimate as mass incarceration must simultaneously be treated as legitimate in producing the data on which actuarial justice-based reform efforts must rely”).

Electronic copy available at: https://ssrn.com/abstract=3041622

1752 WASHINGTON LAW REVIEW [Vol. 93:1725

Section III.B illustrates how current pretrial risk assessment tools will likely make “zombie predictions,” where old data, reflecting outdated bail practices, are newly reanimated. To do so, the Article focuses specifically on two popular strains of reform: creating or expanding pretrial services and limiting or eliminating cash bail—describing in turn why those reforms are likely to significantly change the risks that risk assessment tools seek to forecast. Section III.C examines the underappreciated role of decision-making frameworks, and the risks associated with their creation. These are matrices, akin to bail schedules, that attach a proposed course of action for a judge to a risk assessment score. When defendants are systematically regarded as riskier than they truly are, today’s decision- making frameworks may unnecessarily subject defendants to overly burdensome, and perhaps counterproductive, conditions of release. As a result, risk assessment tools and the accompanying decision-making frameworks may actually erode the benefits of risk-reducing bail reforms. Finally, section III.D explores the long-term dangers that pretrial risk assessment tools pose to bail reform and pretrial jurisprudence more generally. Pretrial risk assessment tools may further legitimize and expand preventive detention. As a result, there is good reason to reexamine whether United States v. Salerno was rightly decided. Even assuming that Salerno was properly decided, the case nevertheless left open significant questions that will have to be resolved—questions that the widespread adoption of pretrial risk assessment make all the more urgent.

A. How Pretrial Risk Assessment Works

Typically, risk assessment tools use data about groups of people, like those who have been arrested or convicted, to assess the probability of future behavior. The creator of an actuarial tool may test hundreds of variables, like a previous failure to appear or age at current arrest, to determine which factors when weighed together are most predictive of rearrest and failure to appear. Although these tools are frequently drawn into broader debates about machine learning or artificial intelligence,129 they typically rely on

129. See, e.g., CHELSEA BARABAS ET AL., INTERVENTIONS OVER PREDICTIONS: REFRAMING THE ETHICAL DEBATE FOR ACTUARIAL RISK ASSESSMENT 4 (2017) (“Notwithstanding the popular discourse on the ethical use of risk assessments, the vast majority of these tools do not use new statistical methods frequently associated with ‘artificial intelligence,’ such as machine learning. They are overwhelmingly based on regression models.”).

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1753 longer-established statistical methods, like regression analysis.130 Often the fact or manner of their use is new, rather than the techniques employed in their creation.131 Tools vary in the kind and numbers of factors they use. Each included factor gets a weighting that reflects how strongly it correlates with rearrest or failure to appear.132 For example, historical data might show that defendants who are under the age of thirty when arrested are much more likely to be rearrested or fail to appear, compared to defendants who are over thirty at the time of arrest.133 Accordingly, if person is under the age of thirty when arrested, that person might be assigned three points. Conversely, a person over thirty might be assigned a single point. The greater the numerical value, the more that variable is correlated with worse outcomes.134 When completing a risk assessment, either a human, a computer, or some mix of the two will determine the applicable variables and calculate the total risk score. Those risk scores are then transformed into risk categories or scales. For example, a tool might sort defendants into “low risk,” “moderate risk,” “high risk.”

130. Id. at 3–4 (“Regression modeling is particularly well-suited for prediction oriented assessment, because it enables researchers to identify variables that are predictive of an outcome of interest, without necessarily having to understand why that factor is significant.”); id. (“Regression analysis is widely used for purposes of forecasting future events. The main goal of regression is to identify a set of variables that are predictive of a given outcome variable. This is achieved by determining the optimal weights for a given set of covariates, ones that are best predictive of the outcome variable of interest. This is done through processes called model checking and selection, whereby statistical tests are run on each covariate to see how significantly predictive they are of the outcome variable.”). 131. There is a thriving debate about how and where the recent growth of machine learning methods (sometimes broadly termed “AI”) should stimulate changes to legal doctrine or public administration. Most of those questions are not (yet) presented in the pretrial context. It is unclear whether these newer technologies should really be seen by the law as something significantly new or different. See, e.g., Richard Berk & Justin Bleich, Statistical Procedures for Forecasting Criminal Behavior A Comparative Assessment, 12 CRIM. & PUB. POL’Y 513 (2013) (arguing that, after comparing various methods, “[t]here seems to be no reason for continuing to rely on traditional forecasting tools such as logistic regression”); Cary Coglianese & David Lehr, Regulating by Robot Administrative Decision Making in the Machine-Learning Era, 105 GEO. L.J. 1147 (2017); Paul Ohm & David Lehr, Playing with the Data What Legal Scholars Should Learn About Machine Learning, 51 U.C. DAVIS L. REV. 653–717 (2017). 132. See generally CYNTHIA A. MAMALIAN, PRETRIAL JUSTICE INST., STATE OF THE SCIENCE OF PRETRIAL RISK ASSESSMENT (2011). 133. See, e.g., PRETRIAL JUSTICE INST. & JFA INST., THE COLORADO PRETRIAL ASSESSMENT TOOL 16 (2012), https://university.pretrial.org/HigherLogic/System/DownloadDocumentFile.ashx? DocumentFileKey=64908e23-bf3e-9379-1a1f-f2d5b9e1702f&forceDialog=0 [https://perma.cc/64BL-NUF2]. 134. See, e.g., Jennifer L. Skeem & John Monahan, Current Directions in Violence Risk Assessment, 20 CURRENT DIRECTIONS PSYCHOL. SCI. 38, 39 (2011).

Electronic copy available at: https://ssrn.com/abstract=3041622

1754 WASHINGTON LAW REVIEW [Vol. 93:1725

The Colorado Pretrial Assessment Tool (CPAT) offers an accessible example.135

Table 1: Risk Scores and Relevant Statistics136

Revised Risk Score Public Court Overall Risk Safety Rate Appearance Combined Category Rate Success Rate 1 0–17 91% 95% 87% 2 18–37 80% 85% 71% 3 38–50 69% 77% 58% 4 51–82 58% 51% 33% Average 30 78% 82% 68%

In the above example, risk scores are sorted into four categories. Those categories represent the rate of rearrest and/or failure to appear. For example, a defendant who has a score of 39 falls into Category 3. Placement into that category means that, as a group, defendants assessed as similar to the current defendant were rearrested 31% of the time and failed to appear 23% of the time.137

135. PRETRIAL JUSTICE INST. & JFA INST., supra note 133, at 18 tbl.2. 136. Id. at 18. 137. Id. at 15 fig.2.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1755

Figure 1: Pretrial Misconduct Rates138

B. Zombie Predictions

Pretrial risk assessment tools developed on data that does not reflect changing ground realities as a result of risk-mitigating reforms will likely make “zombie predictions.”139 That is, the predictions of such a pretrial risk assessment tool may reanimate and give new life to old data and outcomes from a bail system that is presently under reform. Below, we detail the intersection between two common bail reforms, expanding pretrial services and cabining money bail, and zombie predictions. We finish by examining how such zombie predictions might actually dampen the otherwise positive effects of risk-mitigating policy reforms. Our criticism of zombie predictions should not be read as a general criticism of prediction. Predictions are always based on historical data, which, by definition, come from the past. But prediction at bail is problematic because the training data often come from times and places that are materially different from the ones where the predictions are being made, and few actors continuously update tools with new facts.

138. Id. at 15. 139. We use the term “zombie” here, not in the Oxford English Dictionary’s first-listed sense of “a soulless corpse said to have been revived by witchcraft,” but rather in the extended sense indicated in the December 14, 2016 online update to the OED’s Third Edition—now listed as the “most common sense” of the term—a “similar[ly] mindless creature.” OXFORD ENG. DICTIONARY ONLINE, http://www.oed.com/viewdictionaryentry/Entry/232982#eid13494009 (last visited Feb. 10, 2018). We choose this evocative term and its negative connotations intentionally.

Electronic copy available at: https://ssrn.com/abstract=3041622

1756 WASHINGTON LAW REVIEW [Vol. 93:1725

Responsible, fully informed prediction, where outcomes are tracked and models refined and updated, is not susceptible to this objection.

1. Today’s Predictions Follow Yesterday’s Patterns

Using one jurisdiction’s data to predict outcomes in another is an inherently hazardous exercise, a challenge that is highlighted in the existing literature. When a risk assessment tool’s developmental sample does not reflect local conditions, it might not accurately classify risk.140 Geographic differences in law enforcement patterns, for example, can undermine tools’ accuracy. The factors that bring individuals into contact with the criminal justice system in one jurisdiction or country may not be the same as those for offenders in a different jurisdiction or country. Local differences in correctional resources can also make a substantial difference in predictive efficacy. In writing about risk assessment in the neighboring context of criminal sentencing, John Monahan and Jennifer Skeem note that “[v]ariables that predict recidivism in a jurisdiction with ample services for offenders may not predict recidivism in a resource-poor jurisdiction.”141 Take the Level of Service (LS) family of assessment instruments as an example. The LS tools are some of the most widely used risk and needs assessment tools in correctional settings.142 Notably, the tools were developed based on the histories of Canadian offenders.143 The creators of those instruments found “high predictive validity” in their published validation studies.144 But the “predictive performance results reported by [other assessors] . . . , especially those outside Canada, have not been as favorable.”145 One meta-analysis of LS-instruments found that their predictive capacity was significantly worse for U.S. offenders, compared to their performance for Canadian and other offenders.146

140. See, e.g., Melissa Hamilton, Adventures in Risk Predicting Violent and Sexual Recidivism in Sentencing Law, 47 ARIZ. ST. L. REV. 1, 33–39 (2015). 141. John Monahan & Jennifer L. Skeem, Risk Assessment in Criminal Sentencing, 12 ANN. REV. CLINICAL PSYCHOL, 489, 500 (2016). 142. Grant Duwe & Michael Rocque, Effects of Automating Recidivism Risk Assessment on Reliability, Predictive Validity, and Return on Investment (ROI), 16 CRIM. & PUB. POL’Y 235, 239 (2017). 143. Id. at 240 (“[These tools were] [d]eveloped on samples of Canadian offenders by the creators of the RNR approach . . . .”). 144. Id. 145. Id. at 239. 146. See MARK E. OLVER ET AL., Thirty Years of Research on the Level of Service Scales A Metaanalytic Examination of Predictive Accuracy and Sources of Variability, 26 PSYCHOL. ASSESSMENT. 156, 166 tbl.9 (2014).

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1757

The same problem could easily apply not only across geography, but also across time. Underlying conditions, like economic growth and development, can change across time and lead to different results for various reasons.147 Overall, problems arise when the group being assessed “is not similar to the developmental sample, [or] the developmental sample is not a representative reference for the individual to be assessed.”148 Just as a pretrial resource-poor jurisdiction in Wyoming differs in significant ways from a pretrial resource-rich jurisdiction in California, so too does a single jurisdiction when it significantly changes its bail system. Simply put, risk-mitigating policies will likely change the risks a defendant faces upon release, just like a change in economic conditions or in time can. Overall, using historic, pre-reform outcome data to predict future risks within a jurisdiction that significantly reforms its bail system deserves heightened, continued scrutiny.149 The challenge of time-based changes in risk applies equally to jurisdictions that develop their own pretrial risk assessment tool from scratch, and those that validate a tool developed elsewhere. For tools to make well-calibrated predictions from the start, they need to be trained on data that matches the conditions about which they are making predictions. There is strong reason to believe that the data used to build today’s risk assessment tools do not match the reality into which the tools are deployed. As we detailed in section II.C, jurisdictions across the country are pursuing reforms aimed at transforming a defendant’s odds of success upon release. But today’s risk assessment tools are not designed to incorporate the effects of those reforms. For example, consider the publicly available information about the Public Safety Assessment (PSA) and Correctional Offender

147. For example, one can imagine that a jurisdiction that experiences substantial and equitable economic growth might see rearrest and failure to appear rates decline, given a better overall economic environment. 148. Hamilton, supra note 140, at 37. 149. In many ways, proposals for periodic, localized revalidation of risk assessment tools are similar to our argument. For example, in the criminal sentencing context, Monahan and Skeem argue that “[u]nless a tool is validated in a local system—and then periodically revalidated—there is little assurance that it works.” Monahan & Skeem, supra note 141, at 500. These proposals capture a sense that, within a jurisdiction, outcomes can change and that it is important for policymakers to track those changes. Our argument effectively extends and further underscores this conceptual point: thanks in part to the reforms that brought many risk assessment tools into existence, outcomes within jurisdictions are already changing. The need for what Monahan and Skeem call for in the sentencing context is heightened in the pretrial context, where the ground truth of rearrest and failure to appear rates may be significantly mitigated by other pretrial policies.

Electronic copy available at: https://ssrn.com/abstract=3041622

1758 WASHINGTON LAW REVIEW [Vol. 93:1725

Management Profiling for Alternative Sanctions (COMPAS) were each developed on multi-jurisdiction data. PSA, an instrument developed by the Laura and John Arnold Foundation, is based on nearly 750,000 cases drawn from more than 300 jurisdictions.150 Similarly, COMPAS was initially developed on a sample of 30,000 survey responses, administered to prison and jail inmates, probationers, and parolees across the country, between January 2004 and November 2005.151 In all likelihood, there will be some degree of a mismatch between jurisdictions from which PSA and COMPAS drew their sample and those jurisdictions in which the tools were deployed. Pretrial risk assessment tools that are developed locally typically rely upon smaller samples of data, often dating from before significant reform efforts began. For example, the Florida Pretrial Risk Assessment instrument was developed on 1,757 cases across six counties, from January to March 2011.152 The Ohio Risk Assessment System’s Pretrial Assessment Tool was developed on “over 1,800” cases, from September 2006 to October 2007.153 The Virginia Pretrial Risk Assessment Instrument was originally developed on a sample of 1,971 cases between July 1, 1998 to June 30, 1999,154 though it was later revised based on data from 2005.155

150. LAURA & JOHN ARNOLD FOUND., DEVELOPING A NATIONAL MODEL FOR PRETRIAL RISK ASSESSMENT 3 (2013), https://www.arnoldfoundation.org/wp-content/uploads/2014/02/LJAF- research-summary_PSA-Court_4_1.pdf [https://perma.cc/945R-UWVD]. Notably, documents recently released in response to public records requests note that the development data for the tool came from state court systems in Colorado, Connecticut, Florida, Kentucky, Maine, Ohio, and Virginia, the District of Columbia, and the federal pretrial system. See ZACH DAL PRA, JUSTICE SYS. PARTNERS, LJAF PUBLIC SAFETY ASSESSMENT - PSA 27 (2016) https://cdn.muckrock.com/foia_files/2016/12/15/Volusia_Stakeholder_Training_10162015.pdf [https://perma.cc/GZW5-3NA7]. 151. NORTHPOINTE, INC., PRACTITIONER’S GUIDE TO COMPAS CORE 11 (2015) (“The Composite Norm Group consists of assessments from state prisons and parole agencies (33.8%); jails (13.6%); and probation agencies (52.6%).”), http://www.northpointeinc. com/downloads/compas/Practitioners-Guide-COMPAS-Core-_031915.pdf [https://perma.cc/RK2U- CPAR]. Agencies using COMPAS Core can select the default norm group, or a more specific subgroup, like “male jail” or “male prison/parole” or “female jail.” Id. at 11. 152. JAMES AUSTIN ET AL., FLORIDA PRETRIAL RISK ASSESSMENT INSTRUMENT 3–4 (2012), https://university.pretrial.org/HigherLogic/System/DownloadDocumentFile.ashx?DocumentFileKey =58add716-5e41-eba3-0a3d-f9298f7e1a54 [https://perma.cc/2HSS-LJAT]. 153. Edward J. Latessa et al., The Creation and Validation of the Ohio Risk Assessment System (ORAS), 74 FED. PROB. 21 (2010). 154. MARIE VANNOSTRAND, PRETRIAL RISK ASSESSMENT IN VIRGINIA 10 (2009), https://www.dcjs.virginia.gov/sites/dcjs.virginia.gov/files/publications/corrections/virginia-pretrial- risk-assessment-report.pdf [https://perma.cc/A3ZH-G7UB]. 155. MARIE VANNOSTRAND, VA. DEP’T OF CRIMINAL JUSTICE SERVS., ASSESSING RISK AMONG PRETRIAL DEFENDANTS IN VIRGINIA: THE VIRGINIA PRETRIAL RISK ASSESSMENT INSTRUMENT 4

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1759

Even where bail reform legislation simultaneously introduces broad pretrial reforms and risk assessment, the developers tasked with building such a tool still have to look backwards for their examples.156 In fact, the incentives to look further back in history can be strong. A larger, more diverse dataset from which to draw a sample is less likely to contain random statistical artifacts that could skew the results. But, in the bail context, doing so will also mean a deeper reliance on data that represents the historic risks of release, rather than the current ones. Take Colorado as an example. A revised version of the Colorado Pretrial Assessment Tool (CPAT) was released in October 2012.157 But Colorado’s sweeping new bail reform was not signed into law until May 2013.158 Thus, in Colorado, we would expect to see zombie-style predictions that overstate defendants’ true levels of risk. The available data suggest that this may indeed have happened. Below, we reproduce figures from The Colorado Bail Book: A Defense Practitioner’s Guide To Adult Pretrial Release.159

(2003), https://www.dcjs.virginia.gov/sites/dcjs.virginia.gov/files/publications/corrections/assessing -risk-among-pretrial-defendants-virginia-virginia-pretrial-risk-assessment-instrument.pdf [https://perma.cc/H47M-YAQV]. 156. Notably, smaller jurisdictions (aside from likely having fewer resources to dedicate to pretrial systems) might have fewer examples to create a locally-developed tool and thus have an even greater incentive to look farther back. 157. PRETRIAL JUSTICE INST. & JFA INST., supra note 133. 158. Act of May 11, 2013, ch. 202, 2013 Colo. Sess. Laws 820. 159. COLO. CRIMINAL DEFENSE INST., COLO. STATE PUB. DEFENDER & NAT’L ASS’N OF CRIMINAL DEFENSE LAWYERS (NACDL), THE COLORADO BAIL BOOK: A DEFENSE PRACTITIONER’S GUIDE TO ADULT PRETRIAL RELEASE 51 app. 3 (2015).

Electronic copy available at: https://ssrn.com/abstract=3041622

1760 WASHINGTON LAW REVIEW [Vol. 93:1725

Table 2: CPAT Risk Level Research Projections Compared to Denver County Actuals160

CPAT 1 CPAT 2 CPAT 3 CPAT 4 CPAT 20% 49% 23% 8% Projected Denver 12% 39% 27% 22% 2012 Denver 11% 38% 28% 23% 2013 Denver 13% 39% 38% 20% 2014 Avg. Diff. - 8% - 10.33. . .% + 8% + 13.66. . .%

Here, the number of defendants who the tool’s designers expected to be classified as CPAT 3 or CPAT 4—the higher risk categories—is compared to the number of defendants who were actually classified as CPAT 3 or CPAT 4. The gap is noticeable. Based on their training data, the tool’s designers suggested that about a third of defendants would be classified as higher risk. But, for each year of data in Denver, essentially half of all defendants were assessed as higher risk. Based on the available data, it’s unclear why this is the case.161 Simply classifying more defendants as high risk offers little insight. The other side of the coin is, how did defendants who were classified as higher risk perform?

160. Id. 161. One might suspect that defendants from Denver are higher risk than other Coloradans and were not included, or were not heavily weighted, in the development sample of CPAT. In fact, defendants from Denver actually represented 13% of CPAT’s development sample. PRETRIAL JUSTICE INST. & JFA INST., supra note 133, at 9.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1761

Table 3: Failure to Appear Rates Across CPAT Risk Category in Denver162

CPAT 1 CPAT 2 CPAT 3 CPAT 4 CPAT 5% 15% 23% 49% Projected 2013 Denver 7% 11% 16% 20% Actual 2014 Denver 5% 14% 16% 23% Actual Avg. Diff + 1% - 2.5% - 7% - 27.5%

Table 4: Failure to Appear Rates Across CPAT Risk Category in Mesa163

CPAT 1 CPAT 2 CPAT 3 CPAT 4 CPAT 5% 15% 23% 49% Projected 2014 Mesa 7% 9% 13% 15% Actual Avg. Diff + 2% - 6% - 10% - 34%

162. Id. 163. Id.

Electronic copy available at: https://ssrn.com/abstract=3041622

1762 WASHINGTON LAW REVIEW [Vol. 93:1725

Figure 2: (Drawn from Tables 3 and 4)

Table 5: New Criminal Offense Rates Across CPAT Category in Denver164

CPAT 1 CPAT 2 CPAT 3 CPAT 4 CPAT 9% 20% 31% 42% Projected 2013 Denver 3% 8% 15% 18% Actual 2014 Denver 4% 7% 14% 20% Actual Avg. Diff - 5.5% - 12.5% - 16.5% - 23%

164. Id.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1763

Table 6: New Criminal Offense Rates Across CPAT Category in Mesa165

CPAT 1 CPAT 2 CPAT 3 CPAT 4 CPAT 9% 20% 31% 42% Projected 2014 Mesa 11% 21% 25% 28% Actual Avg. Diff + 2% + 1% - 6% - 14%

Figure 3: (Drawn from Tables 5 and 6)

We find similar evidence in efforts to develop and validate the Public Safety Assessment.166 The rate of failures to appear in the PSA’s developmental sample167 diverged from the rate of failures to appear in

165. Id. 166. DAL PRA, supra note 150, at 31–32, 46–47. 167. The developmental sample for PSA came from jurisdictions in Colorado, Connecticut, Florida, Kentucky, Ohio, Maine, Virginia, Washington D.C. and the Federal Pretrial Services. See LAURA & JOHN ARNOLD FOUND., CRIMINAL JUSTICE DATA USED TO DEVELOP THE PUBLIC SAFETY ASSESSMENT, http://www.arnoldfoundation.org/wp-content/uploads/Criminal-Justice-Data-Used-to- Develop-the-Public-Safety-Assessment-Final.pdf [https://perma.cc/Q3ZE-YRZ5].

Electronic copy available at: https://ssrn.com/abstract=3041622

1764 WASHINGTON LAW REVIEW [Vol. 93:1725 jurisdictions where PSA was subsequently validated.168 As Table 6 demonstrates for the top four risk grades—that is defendants receiving FTA scores of 3, 4, 5, and 6—defendants on whom the PSA was actually used (that is, those in the validation sample) succeeded more often than their counterparts in the developmental sample. We see less of this phenomenon with respect to “new criminal activity” rates in Table 7. Nevertheless, it appears that defendants who are assessed as higher risk, in reality, present a lower risk than expected upon release.

Table 7: Failure to Appear Rates Across PSA FTA Categories169

FTA 1 FTA 2 FTA 3 FTA 4 FTA 5 FTA 6 PSA Developmental 10% 15% 20% 31% 35% 40% Sample PSA 12% 16% 18% 23% 27% 30% Validation Avg. Diff. + 2% + 1% - 2% - 8% - 8% - 10%

Table 8: New Criminal Activity Rates Across PSA NCA Categories170

NCA NCA NCA NCA NCA NCA 1 2 3 4 5 6 PSA Developmental 10% 15% 23% 30% 48% 55% Sample PSA 9% 15% 21% 34% 43% 52% Validation Avg. Diff. - 2% + 0% - 2% + 4% - 5% - 3%

This is the stark, unfortunate irony at the heart of today’s bail reform: today’s pretrial risk assessment tools reflect and reinforce the very patterns of failure to appear that new, innovative policies work to

168. DAL PRA, supra note 150, at 31, 46. 169. Id. 170. Id.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1765 change. Below, we consider specifically how these other policy changes may be shaping changes in failure to appear and rearrest risk.

2. Expanded Pretrial Services Will Change the True Risk of Failure to Appear

Small changes in the administration of bail can have a substantial impact on failure to appear rates in a jurisdiction. Many of these reforms are relatively low-cost and low-tech, such as text message reminders about upcoming court dates. Creating pretrial service agencies in jurisdictions that do not already have them, and expanding funding for those agencies already in place, will likely further curb the incidence of failures to appear.171 Of course, a given intervention may reduce failure to appear more in one population than another. For example, low-income defendants, who may lack stable housing, might disproportionately benefit from text message reminders, as opposed to physical postcards sent by mail. “Failure to appear” often reflects factors far more prosaic than a defendant absconding from the jurisdiction. As a 2001 National Institute of Justice report noted, when “released defendants miss a court appearance, it is often not because they are fleeing from prosecution but, rather, for other reasons ranging from genuine lack of knowledge about the scheduled date to forgetfulness.”172 What causes such non-flight failures to appear? Considering the available data for similarly charged defendants who are unable to meet financial conditions of release, it is likely that financial needs play a prominent role for those who do obtain release. People with jobs that have inflexible hours, or that require a significant commute, might find it difficult or impractical to miss work for a court date. A defendant might also fail to appear because they simply forget about an upcoming court appearance. They may be afraid or have insufficient information about how to get to court, what to do once there, and what will happen next. Some observers emphasize the slow pace of justice, arguing that “the typically long period of time between the citation and the court date

171. See, e.g., MARIE VANNOSTRAND, KENNETH J. ROSE & KIMBERLY WEIBRECHT, PRETRIAL JUSTICE INST., STATE OF THE SCIENCE OF PRETRIAL RELEASE RECOMMENDATIONS AND SUPERVISION (2011); NAT’L ASSOC. OF PRETRIAL SERVS. AGENCIES, STANDARDS ON PRETRIAL RELEASE 15 (3d ed. 2004) (“Pretrial programs are a vitally important part of [a criminal justice system] because they perform functions that, in their absence, are often performed inadequately or not at all.”). 172. BARRY MAHONEY ET AL., U.S. DEP’T OF JUSTICE, NAT’L INST. OF JUSTICE, PRETRIAL SERVICES PROGRAMS: RESPONSIBILITIES AND POTENTIAL 38 (2001).

Electronic copy available at: https://ssrn.com/abstract=3041622

1766 WASHINGTON LAW REVIEW [Vol. 93:1725 naturally leads to [failures to appear] due to the relative instability of many defendants.”173 One study found that the amount of time between a defendant’s release and the disposition of her case was the most important factor in predicting failures to appear.174 Others argue that defendants are often “unaware that failing to show up for court can lead to an arrest warrant for seemingly minor violations of the law.”175 Still others counter that deliberate refusals to appear in court are commonplace. These are the risks that reforms are motivated to mitigate.176 A series of studies suggests that reminders, alone, may make a major difference. For example, administrators in Jefferson County, Colorado implemented live-caller reminders where, if the caller “successfully contacted a defendant, she read a script (in either English or Spanish) reminding the defendant of the court date, giving directions to the court, and warning the defendant of the consequences of failing to appear for court.”177 The results of the program were described as “exceptional.”178 In 2010, “the court-appearance rate for defendants who were successfully contacted was 91%, compared to an appearance rate of 71% for those who were not.”179 A study in fourteen counties across Nebraska found that postcard reminders significantly reduced failure to appear rates.180 The study examined three different types of postcard reminders (all in English/Spanish): one reminder-only postcard with just a notification,

173. Timothy R. Schnacke, Michael R. Jones & Dorian M. Wilderman, Increasing Court- Appearance Rates and Other Benefits of Live-Caller Telephone Court-Date Reminders The Jefferson County, Colorado, FTA Pilot Project and Resulting Court Date Notification Program, 48 CT. REV. 86, 87 (2012). 174. STEVENS H. CLARKE, JEAN L. FREEMAN & GARY KOCH, THE EFFECTIVENESS OF BAIL SYSTEMS: AN ANALYSIS OF FAILURE TO APPEAR IN COURT AND REARREST WHILE ON BAIL 34 (1976). 175. Schnacke, Jones & Wilderman, supra note 173, at 87. 176. PRETRIAL JUSTICE CTR. FOR COURTS, USE OF COURT DATE REMINDER NOTICES TO IMPROVE COURT APPEARANCE RATES 1 (2017) (“Several jurisdictions across the country have adopted a court date reminder process (or court date notification system) to improve court appearance rates, such as in Coconino County (AZ), Jefferson County (CO), Lafayette Parish (LA), Reno (NV), New York City (NY), Multnomah and Yamhill Counties (OR), Philadelphia (PA), King County (WA), and the states of Arizona, Kentucky, and Nebraska. Recently, Judge Timothy C. Evans, Chief Judge of the Cook County Circuit Court in Illinois, issued an order requiring the county to implement a pretrial notification system by December 1, 2017.”). 177. Schnacke, Jones & Wilderman, supra note 173, at 91. 178. Id. at 92. 179. Id. 180. Alan Tomkins et al, An Experiment in the Law Studying a Technique to Reduce Failure to Appear in Court, 48 CT. REV. 96, 100 (2012).

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1767 one reminder postcard that included the threat of sanctions for failure to appear, and one that included both the threat of sanctions and other elements of procedural justice.181 All led to significant reduction in failures to appear.182 Other initiatives look to capitalize on SMS text messages as a reminder. For example, the Court Messaging Project—an open-source initiative from Stanford’s Legal Design Lab—works to “to make the court system more navigable and to improve people’s sense of procedural justice—that the legal system is fair, comprehensible, and user-friendly.”183 In New York City, a recent experiment found that simple text reminders reduced failures to appear by 21%, while those with more information led to a 26% drop.184 Uptrust, a company “which sends text message reminders to attend court and other obligations” claims it can reduce failure to appear rates by 75%.185 Other reforms might reduce a defendant’s flight risk, too. For example, electronic monitoring—a contentious, highly invasive condition of release—already has a long history of pretrial use, and may deter defendants from fleeing.186 But unlike reminders—whose effectiveness is documented in a wide range of studies—the efficacy of GPS monitoring for reducing flight from the jurisdiction remains an open question.187 But it is certainly plausible to imagine that electronic monitoring would effectively deter some accused people who might flee from actually fleeing. If so, such an intervention would further reduce the risk of pretrial failure, compared with historical outcomes.188 But consider New York City. At a recent public event, the City’s Criminal Justice Agency announced that it would be developing a new

181. Id. 182. Id. at 98–100. 183. The Court Messaging Project, STAN. LEGAL DESIGN LAB, http://www.legaltechdesign.com/CourtMessagingProject/ [https://perma.cc/GJ82-K3QQ]. 184. BRICE COOKE ET AL., USING BEHAVIORAL SCIENCE TO IMPROVE CRIMINAL JUSTICE OUTCOMES: PREVENTING FAILURES TO APPEAR IN COURT 16 (2018), https://urbanlabs.uchicago.edu/attachments/store/9c86b123e3b00a5da58318f438a6e787dd01d66d0e fad54d66aa232a6473/I42-954_NYCSummonsPaper_Final_Mar2018.pdf [https://perma.cc/MRG8- RBZ8]. 185. What We Do - Our Results, UPTRUST, http://www.uptrust.co/what-we-do#our-results-section [https://perma.cc/GJ82-K3QQ]. 186. Samuel R. Wiseman, Pretrial Detention and the Right to Be Monitored, 123 YALE. L.J. 1344, 1365 (2014). 187. Id. at 1368–69. Most focus on post-conviction proceedings, though a few studies have examined the effectiveness at pretrial (albeit with small sample sizes). 188. Though, as we noted, the incidence of flight from a jurisdiction in 2017 is probably minimal.

Electronic copy available at: https://ssrn.com/abstract=3041622

1768 WASHINGTON LAW REVIEW [Vol. 93:1725 risk assessment tool, based on seven years of criminal justice data.189 According to the announcement, source data for the new model would come from 2009–2015.190 But the City’s arrest practices during that period have been held unconstitutional,191 and therefore are not and should not be a reliable guide, legally or statistically, to who will be arrested in the future. That is particularly true for the very groups of people, like young men of color, who were disproportionately stopped, questioned, and frisked during this period of time.192 Complicating matters even further, the City only recently began an expansive supervised release program in March 2016.193 Earlier release programs were only pilot programs in select areas. Thus, as it is currently envisioned, New York City’s new risk assessment tool will base its predictions primarily on the outcomes of unsupervised release, even though it is being deployed in a setting where more releases can be supervised. As a result, more defendants might be classified as a high risk of failure to appear simply because the new model does not reflect their greater likelihood of reappearance thanks to supervision. At the margin, this would likely lead some defendants who could succeed on release (indeed, some of the very ones who could be most helped by the new program) to be jailed instead. In addition, the City recently began sending text message reminders to all accused people for whom it has a mobile number, building on a successful trial deployment.194 Of course, New York City is just a microcosm. Today’s suite of pretrial risk assessment tools were largely “trained” on populations that did not receive the benefit of newly-enacted, risk-mitigating reforms.195

189. Dr. Richard Peterson, Research Director, N.Y.C. Criminal Justice Agency, Address at Measuring Justice: Redesigning New York City’s Pretrial Risk Assessment and Recommendation System (Sept. 18, 2017). 190. Id.; N.Y.C. CRIMINAL JUSTICE AGENCY, ANNUAL REPORT 2016, at 2 (2018), https://www.nycja.org/measuring-justice-panel/ (last visited Nov. 12, 2018). 191. See Floyd v. City of New York, 959 F. Supp. 540, 667 (S.D.N.Y. 2013). 192. See Stop-and-Frisk Data, N.Y. CIVIL LIBERTIES UNION, https://www.nyclu.org/en/stop-and- frisk-data [https://perma.cc/B94T-2KR2] (observing that in 2009–2015 blacks and Latinos represented more than 80% of stops). 193. CINDY REDCROSS, MELANIE SKEMER, DANNIA GUZMAN, INSHA RAHMAN & JESSI LACHANCE, NEW YORK CITY’S PRETRIAL SUPERVISED RELEASE PROGRAM: AN ALTERNATIVE TO BAIL 4 (2017), https://www.mdrc.org/sites/default/files/SupervisedRelease%20Brief%202017.pdf [https://perma.cc/2GQ3-XRWV]. 194. See New Text Message Reminders for Summons Recipients Improves Attendance in Court and Dramatically Cuts Warrants, N.Y.C , http://www1.nyc.gov/office-of-the-mayor/news/058- 18/new-text-message-reminders-summons-recipients-improves-attendance-court-dramatically [https://perma.cc/4GCZ-8VT6]. 195. Despite studies and pilot projects demonstrating the success of live-caller reminders, postcard reminders, and other reminders in helping reduce the number of failures to appear, the

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1769

As a result, the question that today’s tools answer is, “how likely is this defendant to return to court on schedule without being reminded to appear?” The tools do not measure, because the historical data does not and cannot reflect, the greater chance that some defendants will reappear after being reminded.

3. Replacing or Cabining Money Bail Could Reduce the True Risk of Rearrest for Those Released

Transformative bail reforms that reduce or altogether eliminate money bail—if they lead, as expected, to many more releases on recognizance—are likely to reduce the risk of pretrial rearrest. As detailed above,196 research demonstrates that pretrial detention itself actually increases risk of pretrial rearrest once a defendant is released. And current statistics clearly show that money bail is the main reason defendants spend any significant amount of time in jail pretrial.197 Accordingly, policies that would reduce or eliminate money bail, and release those who would otherwise be held on small bail amounts, are likely to have a significant effect. Such policies would likely reduce the number of people who spend any time in jail pretrial. As a result, released defendants will likely face a lower risk of rearrest following their immediate release from custody. In short, by reducing the incidence of pretrial detention, jurisdictions may also reduce the overall level of rearrest risk. Recent research lends support to this hypothesis.198 Professors Paul Heaton, Sandra Mayson, and Megan Stevenson, find that if defendants in Harris County, Texas who were assigned the lowest amount of cash bail ($500) had simply been released instead, the county would have released 40,000 more defendants pretrial between 2008 and 2013.199 They further find that if those defendants had been directly released (and majority of pretrial service agencies have not adopted these relatively low-tech techniques. For example, the findings from a 2009 Pretrial Justice Institute survey for pretrial service programs found that from 1989 to 2009, the percentage of programs that called the defendant before a scheduled court date declined from over 40% in 1989 to 30% in 2009. In 2009, about 5% of programs used an automated dialing system to call and remind the defendant. The percentage of programs that produced a manually generated reminder letter in 1989 was just under 40%, but in 2009 was about 4%, while about 17% used automatically generated reminder letters. Almost 10% of surveyed pretrial service agencies had no court date reminder procedures at all. See PRETRIAL JUSTICE INST., 2009 SURVEY OF PRETRIAL SERVICES PROGRAMS 50 (2009). 196. See supra section III.B.3. 197. See Heaton, Mayson & Stevenson, supra note 94. 198. Id. 199. Id. at 787.

Electronic copy available at: https://ssrn.com/abstract=3041622

1770 WASHINGTON LAW REVIEW [Vol. 93:1725 did not spend any time in jail at all between arraignment and trial), these defendants as a group would have committed 1,600 fewer felonies and 2,400 fewer misdemeanors than they ultimately did after their eventual, later releases.200 They find that pretrial detention increases the share of defendants charged with new misdemeanors by more than 13% thirty days post-bail hearing, and increases the share of defendants charged with new felonies by more than 30% within one-year post-bail hearing.201 If Harris County abandoned their bail schedule in favor of a presumption of release on personal recognizance, the effect could be significant.202 The incidence of pretrial detention would likely significantly decline. Such a policy would, in turn, accomplish what Heaton, Mayson, and Stevenson simulate: reduce the number of pretrial rearrests, and of rearrests post-case disposition. Yet, a risk assessment tool developed on data that predates these reforms would not capture this new ground truth. Such a risk assessment tool would, oddly, compare post-reform defendants, who benefitted from being released immediately, to a very different group: defendants who were detained a few days or several weeks before being able to meet their financial conditions of release. Here, the risk assessment tool would be blind to the range of possible risk-mitigating reforms and would instead look back to pre-reform outcomes to characterize the defendant’s risk. Doing so will, in all likelihood, overstate those defendants’ risk of rearrest.203 And in jurisdictions where a higher risk assessment score for rearrest can lead to

200. Id. 201. Id. at 768 tbl.8. 202. Judge Lee H. Rosenthal’s April 28 injunction on Harris County’s bail system effectively forces this to occur. Her order required the sheriff’s office to release those charged with misdemeanors within twenty-four hours of their arrest, unless they are wanted on a detainer from other jurisdictions, on immigration proceedings, on mental health concerns, or are held family violence protection measures. Since early June, more than 2,600 defendants have been released on personal bond. See Gabrielle Banks & Mihir Zaveri, Harris County’s Bail Battle to Resume Before Fifth U.S. Circuit in October, CHRON (Aug. 1, 2017, 12:55 PM), https://www.chron.com/news/houston-texas/article/Harris-County-s-historic-bail-battle-to-resume- 11723175.php [https://perma.cc/26RR-ZB8Z]. 203. As detailed above, if defendants assigned the lowest bail amounts had been directly released (and did not spend any time in jail at all between arraignment and trial), these defendants as a group would have committed 1,600 fewer felonies and 2,400 fewer misdemeanors than they ultimately did after their eventual, later releases. Now, imagine two different risk assessment tools: one built on pre-reform data, and one built on post-reform data. In all likelihood, the post-reform risk assessment would observe significantly fewer pretrial rearrests, indicating a diminished risk of pretrial release.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1771 presumptive pretrial detention hearing, like New Jersey,204 it is possible that some poor defendants will be jailed because, historically, the jailing of people similarly situated left those people incredibly ill-prepared to succeed on release.

4. Zombie Predictions May Dampen the Positive Effects of Other Reforms

Zombie predictions might perversely undermine the apparent impact of other bail reforms. A robust literature on the communication and framing of risk suggests that judges will perceive “risky” defendants as tainted, and will treat them differently.205 Thus, zombie predictions may steer a sizable number of defendants away from risk-mitigating reforms and, in turn, dampen the positive effects of policy reforms. In turn, such systematic over-prediction of risk may make the ground realities in a jurisdiction seem worse than they truly are. That is, policymakers may see that there are more “risky” defendants in their jurisdiction than they might have expected. Accordingly, policymakers might second-guess their pursuit of risk-mitigating reforms and focus on more punitive, restrictive conditions of release. This potential feedback loop is subtle, and may be hard to detect, but it should nevertheless concern reformers—if they cannot champion the positive results of new policies and procedures, their endeavor may backfire.

204. Previously, Rule 3:4A(b)(5) indicated that “The court may consider as prima facie evidence sufficient to overcome the presumption of release a recommendation by the Pretrial Services Program established pursuant to N.J.S.A. 2A:162–25 that the defendant’s release is not recommended (i.e., a determination that “release not recommended or if released, maximum conditions” (emphasis omitted)). State v. Mercedes, 183 A.3d 914, 922–23 (N.J. 2018). In State v. Mercedes, the state supreme court revised the rule to read as follows: “The standard of proof for the rebuttal of the presumption of pretrial release shall be by clear and convincing evidence. To determine whether a motion for pretrial detention should be granted, the court may take into account information about the factors listed in N.J.S.A. 2A:162–20.” Id. at 926. As Justice Albin noted in his concurrence, “[t]he pretrial services recommendation, through the court rule, could have operated to undermine the rebuttable presumption favoring pretrial release.” Id. at 931 (Albin, J., concurring) (emphasis added). 205. See, e.g., MORRIS E. CHAFETZ, THE TYRANNY OF EXPERTS: BLOWING THE WHISTLE ON THE CULT OF EXPERTISE 103 (1996); Leam A. Craig & Anthony Beech, Best Practice in Conducting Actuarial Risk Assessments with Adult Sexual Offenders, 15 J. SEXUAL AGGRESSION 193, 197 (2009); Joseph Sanders, The Merits of the Paternalistic Justification for Restrictions on the Admissibility of Expert Evidence, 33 SETON HALL L. REV. 881, 906 (2003); Nicholas Scurich et al., Innumeracy and Unpacking Bridging the Nomothetic/Idiographic Divide in Violence Risk Assessment, 36 LAW & HUM. BEHAV. 548, 548 (2012); Nicholas Scurich & Richard S. John, The Effect of Framing Actuarial Risk Probabilities on Involuntary Civil Commitment Decisions, 35 LAW & HUM. BEHAV. 83 (2011); Paul Slovic et al., Violence Risk Assessment and Risk Communication The Effects of Using Actual Cases, Providing Instruction, and Employing Probability Versus Frequency Formats, 24 LAW & HUM. BEHAV. 271 (2000).

Electronic copy available at: https://ssrn.com/abstract=3041622

1772 WASHINGTON LAW REVIEW [Vol. 93:1725

Once risk is overestimated, defendants may be subject to stricter conditions of release through the decision-making framework.206 This observation is especially relevant given the literature on how lower-risk defendants perform with certain release conditions.207 Studies have shown that lower-risk defendants succeed on release (meaning fewer failures to appear, and fewer rearrests upon release) more often when released without conditions, and that placing conditions of release on lower-risk defendants can actually worsen their odds of success.208 Such a scenario—where defendants are systematically overestimated as riskier than they truly are, leading lower-risk defendants to be subjected to conditions of release that are counterproductive—could perversely sustain an avoidably elevated pretrial failure rate. Similarly, a systematic overestimation of rearrest risk may lead jurisdictions to unduly lean on more controversial, restrictive reforms, such as electronic monitoring. Consider a hypothetical defendant. She was assessed by a pretrial risk assessment tool that was developed on historical data that predates her county’s bail reform. The tool forecast her to be a moderate failure to appear risk and rearrest risk. Accordingly, the jurisdiction’s decision- making framework called for her to be subject to monthly in-person reporting to pretrial services, monthly phone check-ins, as well as a curfew. In reality, she was a busy single mother, who simply needed a timely phone reminder to ensure her appearance. If she had been assessed as a lower failure to appear risk, that’s the only intervention she would have received. However, the zombie prediction led her to then be subject counterproductive conditions of release. As a single mother, the in-person check-ins and curfew were difficult to manage. Ultimately, she failed to appear for some of her court dates, but appeared to most. This hypothetical is of course stylized. But the risks are plausible given the implementation of today’s bail reforms.

206. Decision-making frameworks are discussed more in the next section. 207. See generally CHRISTOPHER T. LOWENKAMP & MARIE VANNOSTRAND, EXPLORING THE IMPACT OF SUPERVISION ON PRETRIAL OUTCOMES (2013) (finding that “the differences between those who received pretrial supervision and those who did not was most pronounced for higher-risk defendants”); Marie VanNostrand & Gena Keebler, Pretrial Risk Assessment in the Federal Court, FED. PROBATION, Sept. 2009, at 30, http://www.uscourts.gov/sites/default/files/fed_probation _sept_2009_test_2.pdf [https://perma.cc/63ZB-4B6Y] (“Paradoxically, when required of lower-risk defendants, i.e., risk levels 1 and 2, release conditions that included alternatives to detention were more likely to result in pretrial failure. These defendants were, in effect, over-supervised given their risk level.”). 208. See LOWENKAMP & VANNOSTRAND, supra note 207; VanNostrand & Keebler, supra note 207.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1773

As an example, consider New Jersey. Below is a comparison of the percentage of defendants that the state expected would be subject to each set of conditions of release—what New Jersey calls pretrial monitoring levels (PML)209—versus the actual percentages of defendants who were subject to each level.210

Table 9: New Jersey’s Projected Conditions of Release vs. Actuals 1/1/17-12/31/17211

DMF Actual State Net difference recommended Avg. %s %s 1/1/17-12/31/17 ROR 26.9 7.5 - 19.4 PML 1 24.6 20.6 - 4.0 PML 2 16.3 14.8 - 1.5 PML 3 9.8 26.4 + 16.6 PML 3 + 2.4 8.3 + 5.9 Detention 5.1 18.1 + 13.00

209. ACLU OF N.J., NAT’L ASS’N OF CRIMINAL DEF. LAWYERS & STATE OF N.J. OFFICE OF THE PUB. DEF., THE NEW JERSEY PRETRIAL JUSTICE MANUAL 11 (2016) [hereinafter N.J. PRETRIAL JUSTICE MANUAL], https://www.nacdl.org/NJPretrial/ [https://perma.cc/R6V3-5BSG]. 210. Chart A Initial Release Decisions for Criminal Justice Reform Eligible Defendants January 1 – December 31, 2017, N.J. COURTS, https://www.judiciary.state.nj.us/courts/assets/criminal/cjrreport.pdf [https://perma.cc/3PWV- GVTY]; see also Chart A Initial Release Decisions for Criminal Justice Reform Eligible Defendants January 1, 2018 – September 30, 2018, N.J. COURTS, https://www.njcourts.gov/courts/assets/criminal/cjrreport2018.pdf?cacheID=R6PoVYo [https://perma.cc/M69V-RZDG]. 211. Chart A Initial Release Decisions for Criminal Justice Reform Eligible Defendants January 1 – December 31, 2017, supra note 210.

Electronic copy available at: https://ssrn.com/abstract=3041622

1774 WASHINGTON LAW REVIEW [Vol. 93:1725

Figure 4: (Taken from Data in Table 9)

Table 10: New Jersey’s Projected Conditions of Release vs. Actuals 1/1/18-9/30/18212

DMF Actual State Net difference recommended Avg. %s %s 1/1/18-9/30/18 ROR 26.9 8.8 - 18.1 PML 1 24.6 17.2 - 7.4 PML 2 16.3 17.3 + 1.0 PML 3 9.8 27.2 + 17.4 PML 3 + 2.4 5.7 + 3.3 Detention 5.1 19.2 + 14.1

Here, about one-third more defendants were subject to the most restrictive conditions of release, or were denied release, than the state anticipated. And while the state expected nearly a quarter of defendants to be released on recognizance, only 7.5–8.8% of defendants were. Meanwhile, while only 10% of defendants were supposed to be subject to the relatively intense monitoring known as PML 3, in fact more than a

212. Chart A Initial Release Decisions for Criminal Justice Reform Eligible Defendants January 1, 2018 – September 30, 2018, supra note 210.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1775 quarter were. The projected and actual intensities of pretrial supervision are nearly perfectly reversed. Based on available data, it is difficult to explain these gaps.213 One potential explanation is that more defendants than expected were charged with crimes that carry a presumptive recommendation detention or high-levels of supervision. Another possibility is that defendants have systematically been overestimated as risky, and thus subject to more punitive and restrictive conditions of release. Similarly, it could be that defendants’ risk was assessed as lower, but judges systematically increased conditions of release. Of course, all the above could be true. But, under this Article’s argument, when defendants are systematically overestimated as riskier than they truly are (and thus subject to conditions of release that are potentially counterproductive), jurisdictions could, perversely, sustain an avoidably elevated pretrial failure rate.214 As a result, policymakers in the future might look back on the move toward non-financial conditions of release as misguided and might inaccurately conclude that, despite its ills, a money bail system is the least bad option. Here, the history of bail reform is instructive: conservatives in the late 1960s used the logic of liberal reforms to, in turn, advocate for a broader, more punitive system. Similarly, though pretrial risk assessment tools currently enjoy bipartisan support in the mission of decarceration, pretrial risk assessment tools and decision-making frameworks are vulnerable to a new “law and order” turn. In fact, New Jersey’s Attorney General just recently released modified guidance related to its decision- making framework, adjusting many recommendations to favor lower standards for pretrial detention.215

213. To our knowledge, New Jersey has not released data on failure to appear or rearrest rates for 2017. Nor has New Jersey released data on how many defendants received what kind of PSA classification. Thus, we cannot compare risk forecast, against conditions of release, against pretrial failure rates. As a result, we cannot clearly evaluate whether or not defendants were systematically overestimated as riskier than they truly were. Nor can we see the effects these conditions of release have. 214. We do not suggest that in every case this would be true. Of course, local conditions vary considerably. Some jurisdictions might have put a premium on releasing a majority of defendants on their own recognizance, or with the minimal set of conditions. Others might experience a decline in cases where certain charges require certain more restrictive conditions of release. 215. For example, many modifications “recalibrate[d] the presumptions for pretrial-detention applications that are triggered by the PSA scores” downward. See Memorandum from Christopher S. Porrino, N.J. Att’y Gen. to Director, Div. of Criminal Justice et al. (May 24, 2017), http://nj.gov/oag/newsreleases17/Revised-AG-Directive-2016-6_Introductory-Memo.pdf [https://perma.cc/722W-TVX2].

Electronic copy available at: https://ssrn.com/abstract=3041622

1776 WASHINGTON LAW REVIEW [Vol. 93:1725

C. Frameworks of Moral Judgment

Between any statistical estimate of risk and the deciding judge, there is an important mediating layer—a series of choices about what the risk estimates mean, how much risk is tolerable, how to manage that risk, and how to communicate that information to a judge. Though there is growing public debate about the quantitative interstices of risk assessment, there is relatively little discussion of the vital policy judgments that render those risk numbers into actionable advice.216 These policy judgments are often represented in matrices, somewhat interchangeably referred to as a “structured decision-making process,” “pretrial decision-making matrix,” “decision making framework,” “praxis,” or, most recently, a “release conditions matrix.”217 These

216. See, e.g., Sam Corbett-Davies, Emma Pierson, Avi Feller & Sharad Goel, A Computer Program Used for Bail and Sentencing Decisions was Labeled Biased Against Blacks. It’s Actually Not That Clear, WASH. POST (Oct. 17, 2017), https://www.washingtonpost.com/news/monkey- cage/wp/2016/10/17/can-an-algorithm-be-racist-our-analysis-is-more-cautious-than- propublicas/?utm_term=.33e6169658ac [https://perma cc/8Q3U-25HS]; Sam Corbett-Davies, Sharad Goel & Sandra González-Bailón, Even Imperfect Algorithms Can Improve the Criminal Justice System, N.Y. TIMES (Dec. 20, 2017), https://www.nytimes.com/2017/12/20/upshot/algorithms-bail-criminal-justice-system.html (last visited Nov. 11, 2018); Issie Lapowsky, One State’s Bail Reform Exposes the Promise and Pitfalls of Tech-Driven Justice, WIRED (Sept. 5, 2017, 7:00 AM), https://www.wired.com/story/bail-reform- tech-justice/ [https://perma.cc/G9YK-7JUZ]; Adam Liptak, Sent to Prison by a Software Program’s Secret Algorithms, N.Y. TIMES (May 1, 2017), https://www.nytimes.com/2017/05/01/us/politics/sent-to-prison-by-a-software-programs-secret- algorithms.html (last visited Nov. 11, 2018); Megan Stevenson, Opinion, In US Bail Reform, Justice-By-Algorithm Can Only Go So Far, HILL (Feb. 2, 2018, 8:15 AM), https://thehill.com/opinion/civil-rights/371954-in-us-bail-reform-justice-by-algorithm-can-only-go- so-far [https://perma.cc/6V3U-GW64]; Elizabeth Glazer, Hannah Jane Sassaman & Jon Wool, Debating Risk-Assessment Tools Experts Weigh in on Whether Algorithms Have a Place in our Criminal Justice System, MARSHALL PROJECT (Oct. 25, 2017, 2:29 PM), https://www.themarshallproject.org/2017/10/25/debating-risk-assessment-tools [https://perma.cc/43MU-XA7R]; Logan Koepke, A Reality Check Algorithms in Courtroom, MEDIUM (Dec. 21 2017), https://medium.com/equal-future/a-reality-check-algorithms-in-the- courtroom-7c972da182c5 [https://perma.cc/S3ZR-TN6X]; Jon Schuppe, Post Bail, NBC NEWS (Aug. 22, 2017), https://www.nbcnews.com/specials/bail-reform [https://perma.cc/VPE4-SYFZ]. 217. See generally MONA DANNER, MARIE VANNOSTRAND & LISA SPRUANCE, RACE AND GENDER NEUTRAL PRETRIAL RISK ASSESSMENT, RELEASE RECOMMENDATIONS, AND SUPERVISION: VPRAI AND PRAXIS REVISED (2016), https://www.dcjs.virginia.gov/sites/dcjs.virginia.gov/files/publications/corrections/race-and-gender- neutral-pretrial-risk-assessment-release-recommendations-and-supervision.pdf [https://perma.cc/LER3-HTB2]; N.J. COURTS, PRETRIAL RELEASE RECOMMENDATION DECISION MAKING FRAMEWORK (DMF) (2018), https://www.njcourts.gov/courts/assets/criminal/decmakframwork.pdf [https://perma.cc/L598- Y5W3]; LAURA & JOHN ARNOLD FOUND., PUBLIC SAFETY ASSESSMENT IMPLEMENTATION GUIDES: 9. GUIDE TO THE RELEASE CONDITIONS MATRIX (2018) [hereinafter GUIDE TO THE

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1777 matrices attach risk assessment scores to a proposed course of action for a judge or pretrial service agency. The frameworks are similar to bail schedules. But, where charged offenses might have previously determined outcomes in bail schedules, risk assessment scores guide conditions of release (or non-release) in decision-making frameworks. Below is a visual summary of the decision-making framework used in New Jersey.218

Figure 5: New Jersey’s Decision-making Framework219

In the above example, a defendant who receives a new criminal arrest score of four (“NCA 4”) and a failure to appear score of three (“FTA 3”) would be suggested for what New Jersey calls “pretrial monitoring level 2” (“PML 2”). Each pretrial monitoring level calls for different non-financial conditions of release, with each increase in pretrial monitoring level calling for a requisite increase in the number or kind of conditions. For example, PML 2 includes various conditions for a

RELEASE CONDITIONS MATRIX], https://psapretrial.org/implementation/guides/managing- risk/guide-to-the-release-conditions-matrix (last visited Nov. 6, 2018). 218. N.J. COURTS, supra note 217, at 4. 219. Id.

Electronic copy available at: https://ssrn.com/abstract=3041622

1778 WASHINGTON LAW REVIEW [Vol. 93:1725 pretrial defendant like reporting once a month in person, once a month by phone, and abide by some set of other restrictions, like a curfew.220 In addition to the moral and political question of how to respond to risk, there are essential factual questions at the core of risk assessment implementation that today’s instruments make no attempt to answer. These questions concern responsiveness: how will various defendants respond to particular pretrial services and conditions? Pretrial risk assessment instruments do not speak to which conditions will work best, or will prove harmful, for which defendants. In short, it is vitally important to understand that the specific contents of these grids are not supplied by statistical evidence. For all the public discussion of validation and evidence-based practice, implementing a risk assessment still requires someone to use clinical judgment to estimate, non-statistically, which defendants would be helped or harmed by which conditions. To be sure, some numbers do exist that bear on this question—for example, as discussed above, studies have found that imposing burdensome conditions can actually increase the risks posed by low risk defendants. But when a jurisdiction applies such findings or assesses local numbers to create its own grid, it inevitably must run such numbers through a filter of human judgment. Even in the most evidence-based of jurisdictions, the decision-making framework necessarily reflects expert opinion about which conditions tend to work well for whom. And such opinions, while a vital resource, should not be regarded as scientific findings.

1. Undemocratic Justice

Determining how much risk a society should tolerate, and then formalizing those answers inside decision-making frameworks, is a difficult political and moral question, not a primarily technical one. To date, however, this decision has generally not been a target of considered political or policy debate.

220. Id. at 10. Note that, in some jurisdictions, some charges or circumstances may predetermine an outcome where pretrial detention will be ordered regardless of the risk assessment result. Typically, these charges might include murder, rape, first degree robbery, felony domestic violence, violation of a protective order, felony sex crimes, or charges involving the use of a weapon. Another example of a DMF matrix can be seen in Volusia County, Florida. There, the release with conditions level 1 requires monthly reporting, release with conditions level 2 requires bi-weekly reporting, and release with conditions level 3 requires weekly in-person meeting. See DAL PRA, supra note 150, at 50.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1779

The description of a defendant or group of defendants as “high risk,” for example, singles that group out for different treatment, and there is no mathematical rule about how expansive the category should be, or what it should or does mean.221 To see the quandary, it may be helpful to imagine defendants lined up in descending order of their respective risk levels as calculated by a statistical model. The most widely used metric for the performance of a risk assessment instrument is the “Area Under the Curve” (AUC) metric, which simply measures the likelihood that when two individuals are picked at random, the one with the higher score actually does have a higher true level of risk.222 In other words, the AUC measures the extent to which a risk tool places different defendants into correct rank order of riskiness for whatever the tool measures. The AUC does not, however, say anything about the size of the difference in risk level between any two individuals, or between any two deciles in the risk distribution. The question of how to define and use risk categories, in short, may best be answered by tandem consideration of three things—a community’s preferences and moral judgments; the specifics of how “risk” has been defined and measured; and a histogram that literally displays the shape of how that risk is distributed in the population of defendants. To date, few if any jurisdictions have successfully combined these elements. Instead, risk categories are defined by technicians and interpreted (or at times misinterpreted) by judges. One advocate who works on bail reform across many U.S. jurisdictions told us that, when he asks judicial system stakeholders what “high risk” means in their jurisdiction, many confess ignorance and others speculate that high risk many mean a greater than 50% chance of reoffends.223 (In fact, even for

221. This is especially true where the Public Safety Assessment’s dashboard displays a “stop sign” when a defendant’s new criminal activity is too high and is deemed to be an “elevated risk of violence.” See LAURA & JOHN ARNOLD FOUND., PUBLIC SAFETY ASSESSMENT 10, https://nccalj.org/wp-content/uploads/2016/02/Virginia-Bersch-PSA-State-of-North-Carolina.pdf (2016) [https://perma.cc/7SP9-7WXM] (PowerPoint presentation). 222. Jay P. Singh, Predictive Validity Performance Indicators in Violence Risk Assessment A Methodological Primer, 31 BEHAV. SCI. & LAW 8, 19 (2013) (“[A] number of performance indicators are available to researchers . . . the AUC has become ubiquitous in studies attempting to establish predictive validity.”). 223. More generally, the vocabulary used in the pretrial risk context—“this person is a high risk”—largely diverges from how individuals perceive the assessed risk. That is: though an individual may be assessed as “high risk,” the rate of reoffense for the “high risk” group may resemble a probability that is rather low, not “high.” Take the earlier CPAT example. The “highest risk” category of failure to appear in the PSA is 40%. Within the internal logic of the pretrial risk assessment tool and relative sample, 40% is “high risk.” But more generally, a 40% probability that

Electronic copy available at: https://ssrn.com/abstract=3041622

1780 WASHINGTON LAW REVIEW [Vol. 93:1725 the highest risk categories, the actual failure rates are much lower than this). Further, the process by which a society determines and formalizes answers to how much risk to tolerate must be a democratic one. There is, potentially, a strong incentive for certain actors within the criminal justice system to perpetuate relatively high levels of pretrial detention. This is especially true of private actors who contract with local governments.224 This is not to say that local governments should be wary of expert help available from private industry, just that those actors should not play an outsized role in developing decision-making frameworks. Nor should policymakers rely upon a decision-making framework successfully developed and deployed in another jurisdiction. True, there may also be a strong political incentive to rely on contractors. A message that a framework, developed by experts, which has succeeded elsewhere, might seem attractive. However, the question addressed is about the community. Thus, the process should include elected policymakers, judges, public defenders, individuals returning from incarceration, advocates, prosecutors, and the general public.225 A broad- based coalition would not only likely enhance perceptions of a decision- making framework’s legitimacy, but also empower communities to stick with their plan of reform after high-profile incidents of pretrial crime.

D. Longer-term Dangers

Beyond the immediate concerns detailed above, a bail reform movement predicated on the widespread adoption of pretrial risk assessment also presents at least three longer-term dangers to the norms

some event will occur could easily be understood as “unlikely” or “doubtful.” See, e.g., RICHARDS J. HEUER, JR., PSYCHOLOGY OF INTELLIGENCE ANALYSIS 154–55 (1999); Zonination, Perceptions of Probability and Numbers - Gallery, GITHUB, https://github.com/zonination/perceptions [https://perma.cc/96BZ-9JZS]. 224. That is to say, so long as “problems” continue to exist, private actors can continue to sell their services to help alleviate that problem. 225. Here, Mesa, Colorado actually serves as a good example. Officials developed a pretrial working group which consisted of “Judges, Public Defenders, District Attorneys, Private Defense Lawyers, Pretrial Services Officials, Mesa County Jail Officials, and Victim Advocates.” See MESA CTY., MESA COUNTY EVIDENCE-BASED PRETRIAL IMPLEMENTATION GUIDE 6 (2015), https://web.archive.org/web/20170202153208/http://www.apainc.org/wp-content/uploads/Mesa- County-Evidenced-Based-Pretrial-Implementation-Guide-1.pdf (last visited Nov. 12, 2018). Ultimately, “new Guidelines were developed collaboratively, albeit through many heated and confrontational meetings . . . . The Chief Judge, the District Attorney and the Sheriff signed this document, which showed a strong collaborative framework. The public and private defenders also showed support in the development and implementation of this document.” Id. at 11.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1781 and jurisprudence of pretrial justice. First, statistical risk assessment— including future machine learning-based approaches—may insulate poorly defined concepts of “dangerousness” from essential scrutiny. Second, the embrace of a risk-based approach could ultimately trigger an increase in preventive detention. Third, the constitutionality of preventive detention will be left unexamined because pretrial risk assessments tacitly presume that Salerno’s core holding was correct.

1. Insulating Nebulous Concepts of “Dangerousness” From Scrutiny

One of the great ironies of the push to consider dangerousness at bail is that none of the professional communities involved has seized the mantle to define dangerousness. As Marc Miller and Norval Morris argued, by initially considering predictions of dangerousness to be “‘the province of psychiatry,’ lawyers foreclosed appropriate jurisprudential consideration of the use of predictions.”226 That reliance came at a time when psychiatrists disclaimed their ability as a profession to predict future dangerousness.227 With no one stakeholder claiming the mantle, courts began to allow “much greater reliance . . . on psychological predictions of dangerousness than do the organized professions of psychiatry and psychology.”228 This pattern of de facto abdication of responsibility by lawyers and jurists continues today, but with the developers of risk assessment tools stepping into the extra-judicial, expert role formerly filled by psychologists.229 As risk assessment designers move from today’s logistic regression-based techniques toward more complex machine learning techniques, this may reinforce lawyers’ impression that they do not belong at the table. If lawyers conceive predictions of pretrial failure

226. Marc Miller & Norval Morris, Predictions of Dangerousness Ethical Concerns and Proposed Limits, 2 NOTRE DAME J.L. ETHICS & PUB. POL’Y 393, 404 (1987). 227. Brief for Am. Psychiatric Ass’n as Amicus Curiae at 12, Barefoot v. Estelle, 463 U.S. 880 (1983) (No. 82-6080) (“The unreliability of psychiatric predictions of long-term future dangerousness is by now an established fact within the profession.”). 228. Miller & Morris, supra note 226. 229. As an example, consider the Public Safety Assessment’s prediction of “New Violent Criminal Activity.” “Violent criminal activity” might, on the surface, seem like a good approximation for “dangerousness.” But, as detailed above, the PSA was developed on data from nearly 200 different counties and cities and nearly 100 federal judicial districts, which, as a group, do not share a uniform definition of violent felonies. Thus, when the PSA tool claims to predict “New Violent Criminal Activity,” it may not be possible to state precisely what the PSA’s violence model is even trying to predict. Put differently, the instrument may have as its outcome variable a statistical amalgam that does not precisely match any real world definition of “violence,” and is instead unaccountably intermediate among the range of definitions employed by the jurisdictions in its training set.

Electronic copy available at: https://ssrn.com/abstract=3041622

1782 WASHINGTON LAW REVIEW [Vol. 93:1725 as the province of the data scientists who design new, advanced pretrial risk assessment tools, they may, yet again, foreclose appropriate jurisprudential consideration of the use of those predictions. Similarly, courts may begin to place much greater reliance on what a sophisticated computer program says, even though data scientists disclaimed their program’s ability to predict pretrial failure for a specific individual.

2. The Expansion of Preventive Detention

A bail regime predicated on pretrial risk assessment may reduce the overall incidence of pretrial detention, especially where a risk assessment replaces a money bail system. Simultaneously, however, such a bail regime could lead to an increase in the number of defendants who are preventively detained pretrial—meaning they were never offered a path of release. For various reasons, explored more below, such a development should concern bail reformers. Consider Maryland. Statewide, one quarter of defendants are held without bail after their initial appearance.230 One explanation for this level of detention may be that only eleven out Maryland’s twenty-four counties have pretrial service agencies.231 As a result, judges may feel that they must detain some individuals that they would otherwise release if the necessary supports or services were available. However, that theory does not explain the results we see elsewhere in the state. In fact, only two counties in Maryland use both pretrial risk assessment and pretrial services.232 Montgomery County, the state’s largest, is one of them.233 In September 2016, only 3.4% of defendants were held without bail after their initial appearance.234 One year later, as of September 2017, 19.3% of defendants were held without bail after their initial

230. MD. JUDICIARY, IMPACT OF CHANGES TO PRETRIAL RELEASE RULES 33 (2018), https://mdcourts.gov/sites/default/files/import/media/newsitem/reference/pdfs/impactofbailreviewre port.pdf [https://perma.cc/A3LV-238E]. 231. JAMES AUSTIN & JOHNETTE PEYTON, MARYLAND PRETRIAL RISK ASSESSMENT DATA COLLECTION STUDY app. B at 34 (2014), http://goccp.maryland.gov/pretrial/documents/2014- pretrial-commission-final-report.pdf [https://perma.cc/W8BQ-XDNS]. 232. CHRISTINE BLUMAUER ET AL., ADVANCING BAIL REFORM IN MARYLAND: PROGRESS AND POSSIBILITIES 26 (2018), http://wws.princeton.edu/sites/default/files/content/Advancing_Bail_Reform_In_Maryland_2018- Feb27_Digital.pdf [https://perma.cc/ZBQ7-EYQK]. 233. Bail System Reform FAQ, MD. ATT’Y GEN., http://www.marylandattorneygeneral.gov/Pages/BailReform.aspx [https://perma.cc/57BE-4WBQ] (“Montgomery County and St. Mary’s County are using a validated risk assessment tool for every defendant.”). 234. MD. JUDICIARY, supra note 230, at 19.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1783 appearance.235 St. Mary’s County, the other Maryland County with both pretrial risk assessment and services agency, saw the population of defendants held without bail grow from 5.7% in September 2016 to 15.7% in September 2017.236 Similarly, three months after reforms were enacted in New Jersey, 12.4% of defendants were preventatively detained.237 As of December 2017, nearly 20% of defendants are detained without bail.238 Of course, these statistics belie the greater number of detention motions filed. In 2017, New Jersey filed nearly 20,000 detention motions.239 The state has kept apace in 2018: from January to September 30, 2018, nearly 17,000 detention motions were filed.240 Similarly, detention in Lucas County, Ohio increased after the county implemented risk assessment.241 Overall, even where risk assessment tools are adopted to advance explicitly liberal reforms, nothing inherent to risk assessment guarantees liberal results.242

235. Id. at 33. 236. Id. at 19, 33. 237. Judiciary Budget for Fiscal Year 2018 Hearing Before the Senate Budget and Appropriations Committee, 217th Leg. 4 (N.J. 2017) (remarks of J. Glenn A. Grant, Acting Administrative Director of the Courts), http://www.judiciary.state.nj.us/pressrel/ 2017/SenateBudgetCommitteeRemarks_May_4_2017.pdf [https://perma.cc/MWP9-X736]. 238. Chart A Initial Release Decisions for Criminal Justice Reform Eligible Defendants January 1 – December 31, 2017, supra note 210. 239. Chart B - Supplemental Graph Detention Motions for Criminal Justice Reform Eligible Defendants, Defendants Arrested in January 1 - December 31, 2017, N.J. COURTS, https://www.judiciary.state.nj.us/courts/assets/criminal/cjrreport.pdf [https://perma.cc/3PWV- GVTY] (indicating that 41% of motions were granted). 240. Chart B - Supplemental Graph Detention Motions for Criminal Justice Reform Eligible Defendants, Defendants Arrested in January 1, 2018 - September 30, 2018, N.J. COURTS, https://www.njcourts.gov/courts/assets/criminal/cjrreport2018.pdf?cacheID=R6PoVYo [https://perma.cc/M69V-RZDG] (showing that 39% of motions were granted). 241. MARIE VONNOSTRAND, LUNINOSITY, ASSESSING THE IMPACT OF THE PUBLIC SAFETY ASSESSMENT, LUCAS COUNTY, OHIO 15 (observing that nearly 6% more defendants remained in custody until the final disposition of their case). 242. Consider the immigration context (which is different in significant ways but supports the broader point). In 2013, U.S. Immigration and Customs Enforcement (ICE) deployed the Risk Classification Assessment (RCA) system. Mark Noferi & Robert Koulish, The Immigration Detention Risk Assessment, 29 GEO. IMM. L.J. 45, 48 (2014). The RCA forecast public safety and flight risks to help ICE officials make release or detain recommendations. Id. The purpose of the RCA was to “foster alternatives to detention” by ascertaining the “‘optimal pool of participants.’” Id. at 59–60. Notably, immigration advocates “uniformly embraced risk assessment with only qualified concerns.” Id. at 48. But the tool’s flight risk assessment was based on an interview with questions that such individuals, justifiably, might not want to answer fully or truthfully, like family, residency, and work authorization history. Ultimately, the RCA over-classified individuals as medium and high flight risks and recommended less than 1% of arrestees for release in Baltimore. Robert Koulish, Immigration Detention in the Risk Classification Assessment Era, 16 CONN. PUB.

Electronic copy available at: https://ssrn.com/abstract=3041622

1784 WASHINGTON LAW REVIEW [Vol. 93:1725

Perhaps California’s new Senate Bill 10 is most illustrative of this concern: the law is rife with various detention provisions and triggers. For example, the statute directs the newly-established Pretrial Assessment Services to not release anyone assessed as high risk.243 Regardless of a person’s risk score, if they have violated a condition of release in the past five years, the law commands Pretrial Assessment Services to not release them.244 A high risk assessment score, combined with one of four other factors, establishes a “rebuttable presumption that no condition or combination of conditions of pretrial supervision will reasonably assure public safety.”245 While individuals assessed as low risk are supposed to be released,246 whether individuals assessed as medium risk will be released or detained is subject to the local rules of each superior court.247 More broadly, the law empowers prosecutors to file a motion seeking prevent detention at any time if the prosecutor believes there is a “substantial reason to believe that no nonmonetary condition or combination of conditions of pretrial supervision will reasonably assure protection of the public or a victim.”248 Notably, “substantial reason” is not defined.249 Functionally, that discretion is essentially unlimited.250

INT. L.J. 1, 5 (2016). Overall, though the RCA was supposed to aid ICE in fostering alternatives to detention and alleviate bed shortages for detainees, it had exactly the opposite effect. See id. at 33. 243. Act of Aug. 28, 2018, ch. 244, § 1320.10(e)(1), 2018 Cal. Legis. Serv. (West) (S.B. 10). 244. Id. § 1320.10(e)(11) (“Notwithstanding subdivisions (a) and (b), Pretrial Assessment Services shall not release . . . [a] person who has violated a condition of pretrial release within the past five years.”). 245. Id. § 1320(a)(2). 246. Id. § 1320.10(b). 247. Id. § 1320.11(a) (stating that the “local rule may further expand the list of offenses and factors for which prearraignment release of persons assessed as medium risk is not permitted but shall not provide for the exclusion of release of all medium-risk defendants by Pretrial Assessment Services”). Local advocates have expressed substantial fear related to this provision. See Eric Westervelt, California’s Bail Overhaul May Do More Harm Than Good, Reformers Say, NPR (Oct. 2, 2018), https://www.npr.org/2018/10/02/651959950/californias-bail-overhaul-may-do-more-harm- than-good-reformers-say [https://perma.cc/Y6FD-C9E7] (“Local counties based on their political environment and based on their own tendencies around incarceration could increase the net of pretrial detention by simply changing the dial and saying that not only are higher risk people now excluded from release but we also think this broad swath we considered moderate risk would also be excluded from release.” (quoting Raj Jayadev, co-founder of the social and legal advocacy group Silicon Valley De-Bug)). 248. Act of Aug. 28, 2018, ch. 244, § 1320.18(5), 2018 Cal. Legis. Serv. (West) (S.B. 10). 249. See Westervelt, supra note 247 (quoting Chesa Boudin, a public defender in San Francisco, who notes that “[u]nder this law prosecutors have the discretion to seek pre-emptive detention of a person with no criminal record charged with a low level misdemeanor”). 250. Cf. CAL. CONST. art. 1, § 12, which states that “[a] person shall be released on bail by sufficient sureties, except for” capital crimes, violent felonies, and felonies—where the latter two

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1785

In upholding the 1984 Bail Reform Act, Chief Justice Rehnquist wrote that, “[i]n our society liberty is the norm, and detention prior to trial or without trial is the carefully limited exception.”251 Of course, this stated principle rings hollow when compared to today’s stark reality. True, an emergent consensus says that individuals should not be detained before trial simply because they are too poor to pay their way out. Yet, even in New Jersey, a pioneering state that eliminated money bail in favor of widespread pretrial risk assessment, nearly one-fifth of defendants are detained pretrial. Ultimately, reforms predicated on pretrial risk assessment may expand the instances in which defendants are preventively detained and denied bail pretrial.252

3. Conceding Salerno

Only a few decades ago, the constitutional propriety of predicting dangerousness pretrial was a matter of vociferous and widespread debate. Yet, oddly, a practice that was once seen as fundamentally at odds with the U.S. Constitution and our system of moral judgment is now seen as an obvious, rational component of pretrial decision- making.253 Though there “has been relatively little innovation in the law and scholarship on bail in the twenty years since Salerno,”254 the new era of bail reform requires such innovation. In fact, the logic of a bail reform model predicated on risk assessment “requires that judges have authority to order pretrial preventive detention.”255 But today, only twenty-two states, the District of Columbia, and the federal system authorize

offenses require other, particular findings, like “clear and convincing evidence that the person has threatened another with great bodily harm and that there is a substantial likelihood that the person would carry out the threat if released.” 251. United States v. Salerno, 481 U.S. 739, 755 (1986). 252. True, in many states before risk assessment reforms were implemented the “detention net” was already wide. That is, risk assessment reforms did not drive the increase in charges in which a prosecutor could file a preventive detention motion. But the adoption of risk assessment as a means of reform, like in California, shines a bright light on that wide detention net because it will be the only way in which the criminal justice system detains individuals pretrial—money bail will not offer a sub rosa means to do so. 253. Mayson, supra note 24, at 495 (2018) (“[A]uthorities on pretrial law and policy—including pretrial laws themselves—now universally identify . . . protecting the public from harm at the hands of defendants” as a core purpose of the pretrial system). 254. Samuel Wiseman, Discrimination, , and the Bail Reform Act of 1984 The Loss of the Core Constitutional Protections of the Excessive Bail Clause, FORDHAM U. L.J. 121, 123 (2009). 255. Mayson, supra note 24, at 515.

Electronic copy available at: https://ssrn.com/abstract=3041622

1786 WASHINGTON LAW REVIEW [Vol. 93:1725 preventive detention, and those only in some circumstances.256 “[T]wenty-three states still guarantee a broad constitutional right to bail and [would] have to amend their constitutions to authorize preventive detention without bail.”257 Actuarial tools may, in short, offer bail reformers a Faustian tradeoff: a new approach to pretrial justice that comes with chance (and a hope) of reduced incarceration, but that also ratifies recent erosions of the fundamental rights of the accused. This Article contends that reformers need to be more attentive to this trade-off. This question ultimately points back to Salerno, and the contested question of whether that case was rightly decided. In examining Salerno, scholars take particular issue with Salerno’s conclusion that preventive detention would not be punitive, and with its treatment of the risk of error in preventive detention decisions. To determine whether or not a governmental act—in this case, preventive pretrial detention—had punitive effect, the Salerno Court applied Kennedy v. Mendoza-Martinez.258 Under that case, the first step was to examine the legislative history of the 1984 Bail Reform Act to determine if there was explicit punitive intent. According to the Court in Kennedy, if no punitive congressional intent is discernible, “each factor of the [following] test is to be weighed”259: [1] Whether the sanction involves an affirmative disability or restraint, [2] whether it has historically been regarded as a punishment, [3] whether it comes into play only on a finding of scienter, [4] whether its operation will promote the traditional aims of punishment—retribution and deterrence, [5] whether the behavior to which it applied is already a crime, [6] whether an alternative purpose to which it may rationally be connected is assignable for it, and [7] whether it appears excessive in relation to the alternative purpose assigned.260 Where “conclusive evidence of congressional intent as to the penal nature of a statute” is not available, the seven factors “must be considered in relation to the statute on its face.”261

256. SUSAN KEILITZ & SARA SAPIA, NAT’L CTR. ON STATE COURTS, PREVENTIVE DETENTION 1 (2017). 257. Mayson, supra note 24, at 515–16. 258. 372 U.S. 144 (1963). 259. Michael J. Eason, Eighth Amendment—What Will Become of the Innocent, 78 J. CRIM. L. & CRIMINOLOGY 1048, 1061–62 (1988). 260. Kennedy, 372 U.S. at 168–69 (citations omitted). 261. Id. at 169 (emphasis added).

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1787

But, as Professor Jean Koh Peters argues, the two key cases upon which Salerno relied had already derogated from this test. In Bell v. Wolfish,262 the Court stated that: [a]bsent a showing of an express[ ] intent to punish . . . that determination generally will turn on “whether an alternative purpose to which [the restriction] may rationally be connected is assignable for it, and whether it appears excessive in relation to the alternative purpose assigned [to it].” Thus, if a particular condition or restriction of pretrial detention is reasonably related to a legitimate governmental objective, it does not, without more, amount to “punishment.”263 By narrowly interpreting Kennedy this way, “the Bell Court effectively amputated the first five Kennedy criteria.”264 Of utmost importance, according to the Bell Court, was whether or not the policy in question was reasonably related to a “‘legitimate state objective.’”265 After fully quoting the Kennedy test, Justice Rehnquist “supported his drastic restatement of the test with neither precedent nor logic. In fact, he did not even acknowledge the change.”266 Schall, in turn, relied on Bell’s truncated version of the Kennedy criteria to find that juveniles could be detained before trial to prevent their commission of conduct that would be a crime if committed by an adult.267 This is the foundation upon which Salerno relies. So long as a statute authorizing pretrial detention did not intend to be punitive, and so long as it could be understood as “reasonably related” to a legitimate governmental interest, it would not fall within the definition of punishment.268

262. 441 U.S. 520 (1979). 263. Id. at 538. 264. Jean Koh Peters, Schall v. Martin and the Transformation of Judicial Precedent, 31 B.C. L. REV. 641, 652 (1990). 265. Id. 266. Eason, supra note 259, at 1063 (emphasis added). 267. But in doing so, the Court actually further cabined its analysis. Specifically, the Court in Schall only evaluated whether or not the text of the New York Family Act statute evidenced punitive intent, whereas the Court in Kennedy had examined legislative history that “revealed not only a predecessor statute that had called the measure a ‘penalty,’ but also legislative memoranda and floor debates replete with punitive language.” Peters, supra note 264, at 659. 268. See, e.g., Albert Alschuler, Preventive Pretrial Detention and the Failure of the Interest- Balancing Approaches to Due Process, 85 MICH. L. REV. 510, 537 (1987) (“[T]he Court’s two- tiered framework [in Schall] seems to mandate the conclusion not only that every scheme of adult pretrial detention enacted by state and federal legislatures is constitutional but that detention simply on the basis of test scores would be constitutional as well.”); Margaret S. Gain, The Bail Reform Act of 1984 and United States v. Salerno Too Easy to Believe, 39 CASE W. RES. L. REV. 1371, 1378–84

Electronic copy available at: https://ssrn.com/abstract=3041622

1788 WASHINGTON LAW REVIEW [Vol. 93:1725

Under this formulation, once the Salerno Court determined that Congress did not intend for pretrial detention to be a punitive restriction in the Bail Reform Act,269 it only needed to identify an “alternate purpose” for the restriction. There, the Salerno majority identified the prevention of danger to public safety as a “legitimate regulatory goal.”270 The Salerno Court did not even mention the other five Kennedy criteria.271 Nor does the majority acknowledge, as an earlier case had held, that “even a clear legislative classification of a statute as ‘non- penal’ would not alter the fundamental nature,” or effect, “of a plainly penal statute.”272 Once it found that preventive pretrial detention was regulatory rather than punitive, the Salerno Court next applied Mathews v. Eldridge.273 Mathews established a three-factor balancing test to determine what kinds of procedures are required once an individual has been deprived of life, liberty, or property on a regulatory basis: First, the private interest that will be affected by the official action; second, the risk of an erroneous deprivation of such interest through the procedures used, and the probable value, if any, of additional or substitute procedural safeguards; and, finally, the Government’s interest, including the function involved and the fiscal and administrative burdens that the additional or substitute procedural requirement would entail.274 Generally, when the liberty or property interest is weightier, more process is required. So, for example, depriving someone of their

(1989) (noting that so long as no congressional intent to punish could be discerned, the Court only needed to find an “alternate purpose” and “determine that the means chosen for implementation were not excessive in relation to that purpose”). 269. United States v. Salerno, 481 U.S. 739, 747 (1987). 270. Id. 271. M. Gray Styers, Jr., United States v. Salerno Pretrial Detention Seen Through the Looking Glass, 66 N.C. L. REV. 616, 626 (1988). 272. Trop v. Dulles, 356 U.S. 86, 95 (1958). In other contexts, the Court has distinguished between what Congress calls an action and the effect of that action. See, e.g., United States v. Constantine, 296 U.S. 287, 294 (1935) (“But even though the statute was not adopted to penalize violations of the amendment, it ceased to be enforceable at the date of repeal if, in fact, its purpose is to punish, rather than to tax.”); United States v. La Franca, 282 U.S. 568, 572 (1931) (“No mere exercise of the art of lexicography can alter the essential nature of an act or a thing; and if an exaction be clearly a penalty it cannot be converted into a tax by the simple expedient of calling it such.”); Lipke v. Lederer, 259 U.S. 557, 561–62 (1922) (“When, by its very nature the imposition is a penalty, it must be so regarded.”); Helwig v. United States, 188 U.S. 605, 613 (1903) (“[T]he use of those words does not change the nature and character of the enactment.”). 273. Mathews v. Eldridge, 424 U.S. 319 (1976). 274. Id. at 335.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1789 fundamental interest in freedom from detention would require substantial process.275 But here again, Peters argues, an earlier case had distorted the relevant precedent and set the stage for Salerno to disregard that precedent. Schall had ignored the first part of Mathews’s second factor, risk of error, “by focusing not upon the question of whether a prediction can be accurate, but rather upon the far more simplistic question of whether the prediction can be made or ‘attain[ed].’”276 To the extent that the Schall majority confronted how predictions of future criminal conduct are made, the Court recognized that “prediction of future criminal conduct is ‘an experienced prediction based on a host of variables’ which cannot be readily codified . . . . The decision is based on as much information as can reasonably be obtained at the initial appearance.”277 As Peters argued, if the Salerno Court were forced to apply Mathews’s second criterion, “it would have been obliged to evaluate not whether any detention are justified, but rather whether the risk of erroneous detentions would be unacceptable.”278 But the Salerno Court did not do so. Following the Schall majority’s lead, the Salerno Court’s analysis was largely framed as a two-pronged balancing process: society’s interest, on the one hand, to prevent crime, and the individual’s interest, on the other, in their liberty.279 The Salerno Court neither “acknowledged nor discussed the [second] Mathews . . . criteria, the risk of error in current procedure and the probable value of additional or substitute procedures, respectively.”280 To the extent the Salerno Court even referenced the procedures of the Bail Reform Act, it simply recited provisions of the Act, but did not

275. Hamdi v. Rumsfeld, 542 U.S. 507, 529 (2004) (“The Mathews calculus then contemplates a judicious balancing of these concerns, through an analysis of ‘the risk of an erroneous deprivation’ of the private interest if the process were reduced and the ‘probable value, if any, of additional or substitute safeguards.’” (citation omitted)). 276. Peters, supra note 264, at 677. 277. Schall v. Martin, 467 U.S. 253, 279 (1984). 278. Peters, supra note 264, at 690 (emphasis omitted). 279. Jack F. Williams, Process and Prediction A Return to a Fuzzy Model of Pretrial Detention, 79 MINN. L. REV. 325, 363 (1994) (“Again, the Court exhaustively considered treatment of the government’s interests while only begrudgingly recognizing an adult individual’s interest to be free from governmental restraint. Rhetorically, to mask the basis of the decision, the losing interest ought to receive more time than the winning one. In Salerno, this rhetoric is not the case.”); see also Albert W. Alschuler, Preventive Detention and the Failure of the Interest-Balancing Approaches to Due Process, 85 MICH. L. REV. 510, 511 n.1 (1986) (“[T]he Supreme Court’s current approach to the due process clause has tilted too far toward interest balancing and too far from historic concepts of individual freedom.”). 280. Peters, supra note 264, at 686.

Electronic copy available at: https://ssrn.com/abstract=3041622

1790 WASHINGTON LAW REVIEW [Vol. 93:1725 examine whether “additional or substitute” procedures held any probable value.281 Without any citation, the Court proclaimed that “the procedures by which a judicial officer evaluates the likelihood of future dangerousness [under the Bail Reform Act] are specifically designed to further the accuracy of that determination.”282 Worse, whatever one thinks of the safeguards that the Bail Reform Act of 1984 provides, they are irrelevant when examining the “‘probable value, if any, of additional or substitute procedural safeguard,’ if indeed no procedures, no matter how intricate, could ever make the procedure more accurate.”283 By ignoring the second part of the Mathews test, the Salerno Court avoided squarely addressing whether or not predictions of dangerousness could ever be tolerably accurate.284 Of course, since Salerno, the Supreme Court “reversed course” on the application of Mathews in matters of state criminal procedures.285 In Medina v. California,286 the Court held that “a state rule of criminal procedure not governed by a specific rule set out in the Bill of Rights violates the Due Process Clause of the Fourteenth Amendment only if it offends a fundamental and deeply rooted principle of justice.”287 Whether Mathews’s balancing test applies, or Medina’s “principle of justice so rooted in the traditions and conscience”288 standard applies, the accuracy of a prediction of future danger should be core to a due process analysis.289 That’s especially true for a pretrial justice system based on a risk assessment tool. The risk of a pretrial risk assessment tool mislabeling someone as risky based on old and otherwise non- representative data—and thus keeps them in jail for longer, or makes it more likely that they will commit crimes in the future—raises significant issues under Mathews or Medina.

281. Mathews v. Eldridge, 424 U.S. 319, 335 (1976). 282. United States v. Salerno, 481 U.S. 739, 751 (1987). 283. Peters, supra note 264, at 690 (emphasis added). 284. Charles P. Ewing, Schall v. Martin Preventive Detention and Dangerousness Through the Looking Glass, 34 BUFF. L. REV. 173 (1985) (arguing that, based on a survey of the literature, predictions of violent and criminal behavior are wrong more often than they are right). 285. Niki Kuckes, Civil Due Process, Criminal Due Process, 25 YALE L. & POL. REV. 1, 15 (2006). 286. 505 U.S. 437 (1992). 287. Nelson v. Colorado, 581 U.S. __, 137 S. Ct. 1249, 1258 (2017) (Alito, J , concurring). 288. Medina, 505 U.S. at 445 (quoting Patterson v. New York, 432 U.S. 197, 202 (1977)). 289. Under Mathews the relative accuracy of a risk assessment—as compared to another risk assessment, or other procedures—is of significant importance. Under Medina, the “fundamental and deeply rooted principle of justice” is presumption of innocence. Nelson, 137 S. Ct. at 1258. There, what matters is the process by which that presumption is rebutted. See id. at 1255 (“‘[A]xiomatic and elementary,’ the presumption of innocence ‘ at the foundation of our criminal law.’” (quoting Coffin v. United States, 156 U.S. 432, 453 (1895))).

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1791

Further, if a bail system centered on pretrial risk assessment does, in some instances, lead to a higher incidence of preventive detention, it is all the more urgent for reformers to squarely address whether or not Salerno was rightly decided. Even if one were to agree with the core holdings of Salerno—that the Constitution does not provide an affirmative right to bail and does not definitively prohibit preventative detention—several pressing questions remain open and underexplored. First, Salerno said nothing about “what degree of risk it thought constitutionally sufficient to justify detention.”290 That question is perhaps “the most important [element] of risk assessment . . . because it marks the compromise between the presumption of innocence, decarceration, and public safety.”291 If it were determined that judges or pretrial risk assessment tools cannot predict dangerousness with adequate reliability, “pretrial detention [might] not be rationally related to the goal of reducing pretrial crime.”292 Finally, Salerno dictates that “detention prior to trial . . . is the carefully limited exception.”293 A risk-based bail system seemingly requires some defendants to be detained, without ever having been offered a path of release. But how many such cases are allowable until it can be no longer be said that detention is the “carefully limited exception?” Of course, risk assessment tools themselves cannot “answer [the] normative question at the heart of contemporary pretrial justice . . . how certain must we be that the person will commit a crime or not appear in court?”294 Instead, bail reform predicated on risk assessment means that answers to this question are all the more urgent and necessary.295

290. Mayson, supra note 24, at 498. 291. Note, Bail Reform and Risk Assessment The Cautionary Tale of Federal Sentencing, 131 HARV. L. REV. 1125, 1145 (2018). 292. Eason, supra note 259, at 1065. 293. United States v. Salerno, 481 U.S. 739, 755 (1987). 294. Note, supra note 291, at 1145. 295. Here, Sandra Mayson’s recent work deserves special note. Mayson argues that [T]here is no clear, relevant distinction between defendants and non-defendants who are equally dangerous . . . there is no constitutional text or doctrine that clearly grants the state more expansive preventive authority over defendants than non-defendants. . . . [Further], the practical justifications proffered to support the special preventive restraint of defendants are, at best, incomplete. Mayson, supra note 24, at 499. Given that there is no moral or practical distinction, and like cases should be treated similarly, Mayson develops a “parity principle,” which “holds that the state has no greater authority to preventively restrain a defendant than it does a non-defendant who poses an equal risk.” Id. Overall, Mayson argues “[g]iven the trajectory of pretrial reform, it is both an important and an opportune time to clarify the contours of the state’s pretrial powers.” Id. at 500.

Electronic copy available at: https://ssrn.com/abstract=3041622

1792 WASHINGTON LAW REVIEW [Vol. 93:1725

IV. ADDRESSING THE CHALLENGES

A. Pretrial Reform Without Risk Assessment

Although pretrial risk assessments are often described as a necessary or natural focus for bail reform efforts, this focus is misplaced. Considering the arguments detailed above, and on the basis of the preliminary evidence currently available, it is not safe to assume that pretrial risk assessment instruments will help to reduce today’s widespread overuse of pretrial incarceration, or to mitigate the longstanding racial disparities in pretrial justice.296 Other policy changes may more effectively and immediately serve the goals of decarceration and fairness in pretrial justice. These other steps include:

Automatic release of broad categories of defendants: As detailed above, defendants appear to be best able to succeed on release when they avoid any period at all of pretrial incarceration. Policies that provide for the summary, automatic release on recognizance of defendants charged with certain crimes, such as for example all defendants whose most serious charge is a misdemeanor, could help to ensure that large numbers of defendants are positioned to avoid pretrial incarceration and its harmful effects. High procedural burdens for imposing pretrial detention or supervisory conditions: In New Jersey, for example, one major focus of reform efforts has been to encourage litigation of pretrial release decisions by requiring robust pretrial detention hearings.297 Practitioners report that these heightened evidentiary standards have played a central role in reducing the use of pretrial incarceration.

296. Our suggestions here largely accord with the recommendations of more than 100 civil rights, digital rights, and community-based groups, who argue that “[r]eal reform addresses underlying structural inequalities, rather than attempting to triage a structurally flawed system. Jurisdictions can—and should—abolish systems of monetary bail, combat mass incarceration, make meaningful investments in communities, and pursue pretrial fairness and justice without adopting such tools.” See LEADERSHIP CONFERENCE SHARED STATEMENT, supra note 124, at 2. Through our work at Upturn, we played a research and advisory role in the development of this statement. 297. For example, New Jersey Court Rule 3:4-2(c) requires the prosecution to turn over any available preliminary incident reports, affidavits of probable cause, “all statements or reports relating to the affidavit of probable cause, . . . all statements or reports relating to additional evidence the State relies on to establish probable cause at the hearing,” statements related to community safety when determining whether to release the defendant, and all . N.J. CT. R. 3:4-2(c)(2). But see State v. Dickerson, 177 A.3d 788 (2018) (limiting the discovery the state must automatically make available to defendants facing a pretrial detention hearing).

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1793

Text message reminders and other supportive pretrial services: Accused people are more likely to succeed upon release if they are reminded of upcoming court dates. Reminders should be a standard practice for all those released. Other services designed to support success, such as transportation assistance for people who may need help getting to court, can likewise make success on release more likely and thus make release more attractive. Front-end reforms that address excessive and racially disparate charging patterns: Policing changes, such as issuing citations in lieu of arrest, as well as investments in non-police responses to public safety problems, may help to reduce the number of people who face criminal charges in the first place, and to reduce racial disparities in the pretrial population. Similarly, electoral efforts that support the election of progressive prosecutors can help create a world of more measured prosecutorial decision-making and hence less incarceration.

We do not believe that the above steps should be seen as flawless or guaranteed solutions to the deeply rooted injustices of today’s bail system. In fact, such steps can at times have paradoxical or unintended consequences, no less than other social policies.298 However, these steps do speak directly to the challenge of pretrial incarceration, and we believe it would be reasonable for advocates and reform-minded system actors to shift their change-making efforts to focus on steps like these.

B. Where Risk Assessment Is Used: Relevant, Timely Data

1. Risk Assessment Tools Should Always Rely on Recent, Local Data

Jurisdictions that are reforming bail practices should always rely on recent data, gathered after their other pretrial reforms have taken root, to construct or calibrate their risk assessment tools. Existing “off the shelf” risk assessment tools, whose predictions assume that defendants still face the same long odds of succeeding outside jail, should not be used without adjustment in jurisdictions where those risks have been

298. For example, Maryland Court Rule 4-216.1—which says that judges may not impose a financial condition of release if that condition would lead to the person’s pretrial detention—is on its face a progressive policy that should help ensure fewer people are detained pretrial. MA. CT. R. 4-216.1. However, that policy has had its own unintended consequences. While 6% more defendants have been released on their own recognizance, nearly 12% more defendants were held without bail. Data suggests that, “in the absence of [pretrial service] options, many judges are opting to hold defendants in jail pretrial rather than release them on their own recognizance.” See BLUMAUER ET AL., supra note 232, at 14.

Electronic copy available at: https://ssrn.com/abstract=3041622

1794 WASHINGTON LAW REVIEW [Vol. 93:1725 mitigated. Risk assessment tools developed solely from historical data that predates the enactment of significant risk-mitigating reforms will not reflect defendants’ new odds of success on release and could, in turn, hamper overall reform efforts. What counts as “recent data” will vary, depending on context. For example, a jurisdiction’s decision to move from a money bail schedule to a system that presumptively releases all misdemeanor defendants would be a major risk mitigating reform, and no risk assessment instrument should be used unless it can reflects the outcomes of a presumptive-release regime. What’s ultimately important is for jurisdictions to track changing patterns of risks and outcomes. Recent data is indispensable, but earlier data can potentially be responsibly used, given certain conditions. The critical concept is that the scoring process must evolve to reflect declining risks of pretrial failure. Models based partly on older data can be adjusted to reflect more recent developments. Adjustment is possible because the defendants who get released are themselves a diverse group, with different (albeit low) levels of failure risk, and their outcomes can be compared both before and after reforms. If released defendants are grouped by the risk score they were assigned at arraignment, so that there are separate groups of released defendants who earned scores of 1, 2, and 3 out of ten (say), the groups should each have different—presumably increasing— rates of pretrial failure. Those rates can be compared both before and after a reform to assess how scores may need to be adjusted. A regression that compared the failure rates of these before and after reform could reveal by how much, and for which offenders, risks have now been reduced. A regression linking scored risk level to post-reform failure rate can reveal when a jurisdiction has succeeded in reducing the actual level of risk associated with each score. The jurisdiction can then either recalibrate the risk scale or simply begin to release more defendants at the higher score levels (which have come to betoken a lower true level of risk than they did initially). Ultimately, if jurisdictions are to truly rely on and promote “evidence- based practices,” they must gather the evidence first. For a pretrial risk assessment tool to promote evidence-based practice, the tool must incorporate recent data that reflects the fact that bail reform policies have mitigated the risks defendants face once released. If jurisdictions cannot, or will not, wait for fresh data to introduce risk assessment, then they must vigorously collect data on the failure rates of defendants

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1795 before and after reform.299 This would empower policymakers to update the model early on and, once they do so, they should weigh the recency of post-reform data higher within the model.

2. Regularly Compare Predictions Against Outcomes

It is vitally important that jurisdictions track risk scores and subsequent outcomes.300 This may seem like a simple or obvious suggestion, but current practice lags woefully far behind it. Data collection practices on pretrial outcomes at the county level are notoriously varied, often haphazard, and sometimes totally absent.301 Some agencies do not or cannot calculate failure to appear rates or pretrial rearrest rates.302 Even more basic information, like the average length of jail stay for detained pretrial defendants, is sometimes

299. We expand on the benefits of vigorous pretrial outcome data collection in the next section. 300. Stevenson, supra note 126, at 59 (“When a new technique is adopted, outcomes should be monitored to see if the desired effects were achieved. If they were not, adjustments can be made accordingly. In this paradigm, a method would be neither championed nor pilloried until its impacts in practice are clearly understood.”). 301. A 2014 report by the Governor’s Commission to Reform Maryland’s Pretrial System identified the following gaps in available data about Maryland’s pretrial population, which it labeled “Unanswered Questions”: How many defendants post bond? How many defendants are released on pretrial supervision? How many defendants released pretrial are arrested prior to trial? Of those defendants on pretrial supervision, how many fail to appear for court or get arrested prior to trial? What is the risk level of each defendant detained pretrial in jail? How many pretrial defendants are detained in jail who could not post bond? What was the bond amount? What is the average length of stay of pretrial defendants detained in jail? COMM’N TO REFORM MD’S PRETRIAL SYSTEM, FINAL REPORT 15 (2014), http://goccp.maryland.gov/pretrial/documents/2014-pretrial-commission-final-report.pdf [https://perma.cc/U37Z-8SAM]. 302. See, e.g., OHIO CRIMINAL SENTENCING COMM’N, AD HOC COMM. ON BAIL & PRETRIAL SERVS., FINAL REPORT & RECOMMENDATIONS 18 (2017), https://www.supremecourt.ohio.gov/Boards/Sentencing/Materials/2017/March/finalAdHocBailRepo rt.pdf [https://perma.cc/JSY4-TJVZ] (“As in other areas of Ohio’s criminal justice system, data regarding pretrial decisions, agencies, and outcomes is rarely collected. Less than 20% of respondents to the Ad Hoc survey collect data on failure to appear rates and even less are collecting data regarding arrests for crimes committed while on release pretrial. The Ad Hoc Committee recommends a dedicated and concerted effort to increase data collection and analysis for all facets of the bail and pretrial system in Ohio. At a minimum, the committee recommends that collection of appearance rates, safety rates, and concurrence rates (how often a judge accepts a pretrial service agency recommendation) be mandated for each jurisdiction.”). Four of fifty-six programs with survey responses in Ohio said they calculate failure to appear rates; zero of fifty-six programs with survey responses said they calculate pretrial rearrest rates, and two of fifty-six calculate release rates. Id. at 247, 251, 253.

Electronic copy available at: https://ssrn.com/abstract=3041622

1796 WASHINGTON LAW REVIEW [Vol. 93:1725 unavailable or not tracked.303 Others alter the very pretrial failure rates they seek to measure through technological errors.304 Data on the risk scores and subsequent outcomes, whether for all defendants or for a representative sample of them, is necessary in order to understand the relationship between scores and true levels of risk. Without such data, there is no way to know whether the risk assessment data is systematically wrong about the risks posed by defendants. Such regular monitoring would not only allow jurisdictions to evaluate how well their risk assessment tool classifies risk, but also empower jurisdictions to track how reform efforts may be changing risk levels. It is also important for a defendant’s risk assessment prediction, their subsequent outcome, and case file to be linked. This would make it possible to analyze how predictions or outcomes correlate with other features, such as a defendant’s race, socioeconomic status, recent rearrests, or type of pretrial monitoring. Doing so would help ensure that risk assessment tools lead to more equitable outcomes across race, gender, and socioeconomic status.305 Such tracking should also include data on how often judges concur with the risk assessment tool’s recommendations, and ideally on their reasons for diverging when they do. By tracking concurrence, divergence, and why a judge diverged, policymakers may be able to create a positive feedback loop. The more that judges understand how a risk assessment tool works, and the more that the developers of a risk assessment tool understand how judges use—or do not use—their tool, the better. Further, validation studies should not stand in the way of continuous monitoring. Pretrial risk assessment validation studies take time simply because criminal cases themselves take a long time to resolve. In order for an entire six-month period of risk assessment predictions to be analyzed, one must observe each case until disposition. Such observation

303. Id. at 88, 94–95. 304. In Harris County, newly approved court rules schedule many hearings for within one business day. The county’s computer system, however, still tells misdemeanor defendants to return to court in seven days. Misinformed defendants are missing their court dates, which may make the otherwise successful reform look like a failure. See Bryce Covert, Are Harris County Officials Trying to “Sabotage” Bail Reform with Misleading Data?, MEDIUM: IN JUST. TODAY (Dec. 8, 2017), https://medium.com/in-justice-today/advocates-say-harris-county-officials-are-trying-to- sabotage-bail-reform-with-misleading-data-81a1292edca1 [https://perma.cc/6VP8-8SYQ]. 305. More than 100 civil rights groups recently called for similar requirements, demanding that revalidation of risk assessment tools check for predictive validity and differences across race, gender, and other protected characteristics. See THE LEADERSHIP CONFERENCE SHARED STATEMENT, supra note 123, at 8–9.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1797 may last a year and a half, or longer. Accordingly, jurisdictions must resist the temptation to blind themselves from viewing pretrial outcome data until each case is resolved. Monthly reporting—though offering an incomplete view—can still provide valuable insight. Overall, validation studies and continuous monitoring need not stand in each other’s way.306 Many jurisdictions may lack the technological infrastructure, expertise, and other resources to pay attention to whether their pretrial risk assessments are right or wrong. Such challenges, where they exist, must not be considered a warrant to ignore the question. To the extent policymakers imagine that they can combine bad data with good prediction, a shift in perspective is essential. Data infrastructure is not an afterthought, but an indispensable pillar of the responsible deployment of statistical predictions in pretrial justice.307

3. Focus on the Risks that Matter Most

Pretrial risk assessment tools generally forecast two outcomes: failure to appear and rearrest. It is important to interrogate the gap between the data jurisdictions have and the questions jurisdictions ask of that data.308 Currently, forty-five states and the District of Columbia permit either pretrial detention or release subject to restrictions “[a]fter a finding that a defendant poses a danger to an individual or community.”309 However, most of today’s risk assessment tools only predict future rearrest. As others have observed, the two are not the same. While rearrest for a violent crime might signal danger to an individual or community, rearrest writ large does not. Correlates of rearrest do not so much measure “dangerousness” as they measure—and anticipate—future contact with the criminal justice system.310

306. Civil rights advocates called for validation to occur “annually at the least, with the ideal being quarterly.” Id. at 9. 307. See, e.g., ERIKA PARKS ET AL., URBAN INST., LOCAL JUSTICE REINVESTMENT 12 (2016) (“A critical component of justice reinvestment is data analysis and data-driven decision-making . . . . To improve data capacity, local sites developed data warehouses, integrated data systems, data dashboards, and jail population and cost-benefit projection tools.”). 308. See LEADERSHIP CONFERENCE ON CIVIL AND HUMAN RIGHTS, SIX THINGS TO KNOW ABOUT ALGORITHM-BASED DECISION-MAKING TOOLS 1 (2018), http://civilrightsdocs.info/pdf/criminal-justice/Pretrial-Risk-Assessment-6Things.pdf [https://perma.cc/LGB4-UP45] (“‘[R]isk scores’ from algorithmic risk assessment tools have little bearing on the risks that really matter to communities.”). 309. Baradaran & McIntyre, supra note 62, at 512. 310. DELBERT S. ELLIOTT, CTR. FOR THE STUDY & PREVENTION OF VIOLENCE, LIES, DAMN LIES AND ARREST STATISTICS 11 (1995), http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.182. 9427&rep=rep1&type=pdf [https://perma.cc/6L6U-4BZ2].

Electronic copy available at: https://ssrn.com/abstract=3041622

1798 WASHINGTON LAW REVIEW [Vol. 93:1725

For example, most rearrests pretrial appear to be for technical violations, not new felonies or violent crimes.311 Federal data from 2012–2014 show that for defendants released to the community pretrial who had at least one violation while on release, technical violations of bail conditions represented the vast majority of all violations in which a new offense was charged.312 Between 2008 and 2010, 90% of all pretrial violations by federal defendants released were technical violations.313 The technical conditions of bail are often mundane: curfews, travel restrictions, drug tests, and even keep-your-job requirements. Other bail reforms will make it all the more important to delineate technical violations from new violent arrests. Jurisdictions will likely release more defendants pretrial by increasing the use of non-financial conditions of release.314 In all likelihood, this will increase the incidence of rearrest for technical violation of release conditions. Ultimately, communities should determine which public safety risks matter most to them, and researchers, practitioners, and policymakers should act accordingly. Should decisions be based on “the risk of the defendant committing another crime pretrial or the risk of the defendant committing specific crimes pretrial (e.g., violent offences)?”315 And

311. A “technical violation” of conditional release, benign as it may sound, can also carry serious and immediate consequences (beyond making the person appear riskier in future risk assessments). The American Bar Association’s Pretrial Release General Principles say that a “person who has been released on conditions and who has violated a condition of release, including willfully failing to appear in court, should be subject to a warrant for arrest, modification of release conditions, revocation of release, or an order of detention, or prosecution on available criminal charges.” ABA, ABA STANDARDS FOR CRIMINAL JUSTICE 19 (3d ed. 2006). 312. MARK MOTIVANS, U.S. DEP’T OF JUSTICE, FEDERAL JUSTICE STATISTICS, 2012 - STATISTICAL TABLES 15 tbl.3.3 (2015), https://www.bjs.gov/content/pub/pdf/fjs12st.pdf [https://perma.cc/RH6N-KQ79]; MARK MOTIVANS, U.S. DEP’T OF JUSTICE, FEDERAL JUSTICE STATISTICS, 2013 - STATISTICAL TABLES 15 tbl.3.3 (2017), https://www.bjs.gov/content/pub/pdf/fjs13st.pdf [https://perma.cc/H8QZ-XDPM]; MARK MOTIVANS, U.S. DEP’T OF JUSTICE, FEDERAL JUSTICE STATISTICS, 2014 - STATISTICAL TABLES 15 tbl.3.3 (2017), https://www.bjs.gov/content/pub/pdf/fjs14st.pdf [https://perma.cc/38MP-7PJK]. 313. THOMAS H. COHEN, U.S. DEP’T OF JUSTICE, PRETRIAL RELEASE AND MISCONDUCT IN FEDERAL DISTRICT COURTS, 2008–2010, at 1 (2012), https://www.bjs.gov/content/pub/pdf/prmfdc0810.pdf [https://perma cc/37CH-XDDT]. 314. That is at least already an observable trend in federal-level data—80% of defendants released on personal recognizance received some set of pretrial conditions, while only 40% of defendants with a surety bond release received pretrial conditions. Id. at 9; see also SANTA CRUZ CTY. PROB. DEP’T, ALTERNATIVES TO CUSTODY REPORT 2015 (2016) (detailing a decrease in the ADP in the first half of the year, followed by a modest increase in the next half). But see id. at 11 (“[F]ollowing modifications of the PSA-Court decision making framework, in the first quarter of CY2016 saw a dramatic rise of the [Average Daily Population on pretrial supervision]—almost double of previous years.”). 315. MAMALIAN, supra note 132, at 13.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1799 though developers of risk assessment tools sometimes concede that rearrest data is an imperfect measure, they generally defend it as the one of the few measures available—or, the least bad option available. But distinguishing new violent crimes from technical violations is a bare minimum requirement for responsible use of these tools. Similar critiques can be made about failure to appear data. While data on failure to appear does not suffer from sampling bias,316 the reasons defendants fail to appear vary widely and should not be construed to suggest flight risk. Thus, generalized failure to appear data flattens the underlying circumstances. How might jurisdictions interrogate the gap between the data jurisdictions have and the questions jurisdictions ask of that data? As a start, jurisdictions should demand aggregate-level reporting of the development sample or training data for any pretrial risk assessment tool. Such a report should at least disclose: the breakdown of rearrests by charge, severity of charge, age, race, and gender. Failures to appear should also be explained on some metric, like whether the failures are persistent or sporadic. Such a report might tell policymakers that there is a significant gap between the data upon which a risk assessment tool is based and the questions that truly matter to them. Or it might not. Either way, such information is critical for jurisdictions to make an informed decision on whether or not to adopt a risk assessment tool. Moreover, the risks that matter most might not just be the risks of a defendant’s rearrest or failure to appear. As Crystal Yang recently argued, jurisdictions could also consider the risks that pretrial detention might worsen a defendant’s circumstances.317 In imagining a “net- benefit” assessment, Yang argues that risk assessment tools could be used to “maximize social welfare in the bail setting [by] . . . also us[ing] data to predict the likelihood of harms associated with detention.”318 Given the emergent literature on the staggering downstream costs of pretrial detention, such a concept is more than worthy of future research.319

316. That is to say, a person either did or did not show up to a court date or hearing. A court does not only observe a select class of failures to appear. 317. Crystal S. Yang, Toward an Optimal Bail System, 92 N.Y.U. L. REV. 1399, 1488 (2017). 318. Id. (“For example, data on detained defendants can be used to identify factors that are most predictive of agreed-upon harms: whether someone is wrongfully convicted, whether someone loses their home, whether someone is unable to find employment in the formal labor market, and whether someone commits crime in the future.”). 319. Id. at 1489–90. Though Yang does not advocate for the practicality of her suggestion, she does note that “[u]ltimately, by using data to predict both the costs and benefits of pre-trial detention

Electronic copy available at: https://ssrn.com/abstract=3041622

1800 WASHINGTON LAW REVIEW [Vol. 93:1725

C. Where Risk Assessment Is Used: Strong, Inclusive Governance

1. Risk Assessments and Frameworks Should Not Recommend Detention

For the strongest chance at reducing incarceration—and in order to keep preventive detention within constitutional limits—policymakers must ensure that risk assessment instruments and their associated decision-making frameworks do not supplant the adversary hearing and specific findings of fact that are constitutionally necessary before a defendant can be preventively detained. Finding that the defendant is a member of a collectively high risk group is the most specific observation that risk assessment instruments can make about any person. And such a finding does not answer, or even address, the question of whether preventive detention is the only way to reasonably assure that person’s reappearance or the preservation of public safety. That question must be asked specifically about the individual whose liberty is at stake—and it must be answered in the affirmative in order for detention to be constitutionally justifiable. In practical terms, this means that a responsible and well-governed risk assessment instrument will never go so far as to propose detention or suggest that the defendant not be released. Instead, for those deemed highest risk, the decisonmaking framework can and should propose that a hearing be conducted to assess what combination of conditions can reasonably assure the reappearance of the accused and public safety. It is the role of the judge or magistrate at such a hearing—not the role of a risk assessment instrument—to decide whether the defendant should be detained. In addition to being prudent policy, our recommendation on this point is also essential as a matter of statistical practice. Today’s risk assessment instruments make no attempt to measure the change in risk that is caused by the supportive services or supervisory conditions that a judge may impose in a particular case. And yet, it is precisely this modified level of risk that the judge must assess as the basis for a detention decision. (Another term for this change is the “responsiveness” of the defendant, as noted above in our discussion of risk assessment frameworks.) No risk assessment score and no decision-making recommendation should in and of itself trump a person’s presumption of release. Likewise, such scores should never serve as prima facie for each defendant, jurisdictions could create ‘net-benefit’ assessment tools using largely the same set-up already employed for risk-assessment tools.” Id. at 1490.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1801 evidence that no condition of release would assure someone’s appearance or public safety.320 After all, responsiveness to conditions is not something these tools measure. This recommendation is broadly endorsed: Civil rights groups have made this point directly,321 and so too have the largest proponents assessment tools. For example, in new materials, the Arnold Foundation instructs that the decision-making framework—what it calls a “release conditions matrix”—should not include detention among its recommendations 322 Instead, If the judicial officer determines that a person is eligible for detention . . . under state law, the officer must hold a hearing in order to lawfully detain the person pretrial. [Such hearings must] adhere to certain fundamental legal foundations for detention, most of which are articulated in United States v. Salerno.323 The foundation “encourages [jurisdictions] to understand the law in order to use the PSA in the most legally sound manner; this necessarily includes holding a due process hearing prior to intentional pretrial detention.”324 We emphatically join in that particular piece of advice.

2. Risk Assessments and Frameworks Must Be Public

Risk assessment models used in the courtroom pretrial—and the process used to develop and test them—must be public. Of course, making public the risk assessment models and the process used to develop and test them would do much to alleviate due process concerns. Risk assessment tools that rely on trade secret claims, like Equivant’s COMPAS risk assessment tool, have already seen due process challenges. But it is not just defendants who fear what may be happening behind the curtain of trade secrecy. Judges may also be wary.

320. The recent State v. Mercedes decision in New Jersey supports this view. See State v. Mercedes, 183 A.3d 914, 931 (N.J. 2018) (Albin, J., concurring) (N.J. Sup. Ct. 2018) (“The pretrial services recommendation, through the court rule, could have operated to undermine the rebuttable presumption favoring pretrial release.”). 321. See LEADERSHIP CONFERENCE SHARED STATEMENT, supra note 124. 322. GUIDE TO THE RELEASE CONDITIONS MATRIX, supra note 217, at 1–2 (“On its own, the PSA does not direct a judicial officer to release or detain a person or recommend a presumptive level of pretrial release (or its associated conditions) . . . . Detention is not included in the matrix because eligibility for detention is based on state law, and the matrix becomes relevant only after a judicial officer decides a person will be released.”). 323. LAURA & JOHN ARNOLD FOUND., PUBLIC SAFETY ASSESSMENT IMPLEMENTATION GUIDES: 8. GUIDE TO THE PRETRIAL DECISION FRAMEWORK 10, https://www.psapretrial.org/implementation /guides/managing-risk/guide-to-the-pretrial-decision-framework (last visited Nov. 6, 2018). 324. Id.

Electronic copy available at: https://ssrn.com/abstract=3041622

1802 WASHINGTON LAW REVIEW [Vol. 93:1725

For example, in State v. Loomis,325 the Wisconsin Supreme Court became the first court to address the relationship between trade secrets in risk assessments and due process principles in sentencing. Although the Court ultimately rejected Loomis’ due process claim, one Justice noted in a concurrence that “this court’s lack of understanding of COMPAS was a significant problem in the instant case. At oral argument, the court repeatedly questioned both the State’s and defendant’s counsel about how COMPAS works. Few answers were available.”326 Though Loomis involved the use of a risk assessment tool’s findings at sentencing, the same problems apply pretrial. Early evidence suggests that judges diverge from the recommendations of risk assessment tools at rates that should concern reformers and policymakers alike.327 One way to ensure that judicial concurrence rates with risk assessment recommendations stay high is to ensure that judges are involved from the outset with the development, design, and testing of a new pretrial risk assessment system. A criminal justice system that is better understood and debated by all stakeholders will not only enjoy greater public support, but also enjoy greater legitimacy from all of those actors. Higher perceptions of legitimacy when it comes to pretrial risk assessment likely means higher concurrence rates. Algorithmic trade secrecy is just one problem, however. For example, the PSA is a fairly simple system that can be implemented without a

325. 881 N.W.2d 749 (Wis. 2016). 326. Id. at 774 (Wis. 2016) (Abrahamson, J., concurring). 327. Early evidence indicates a high rate of judicial overrides, in which judges depart from the recommendations of a risk assessment tool. A report by the Cook County sheriff’s office reportedly found that Cook County judges diverged from the recommendations of their risk assessment tool more than 80% of the time. See Frank Main, Cook County Judges Not Following Bail Recommendations Study, CHI. SUN-TIMES (July 3, 2016), https://chicago.suntimes.com/news/cook- county-judges-not-following-bail-recommendations-study-find/ [https://perma.cc/U8Y8-GNCK]; see also Stevenson, supra note 126, at 17–36. Other evidence suggests that these diversions are not randomly distributed. HUMAN RIGHTS WATCH, “NOT IN IT FOR JUSTICE”: HOW CALIFORNIA’S PRETRIAL DETENTION AND BAIL SYSTEM UNFAIRLY PUNISHES POOR PEOPLE 93 (2017) (“[J]udges disregard release recommendations, setting bail for as much as 75 percent of all defendants determined to be ‘low risk.’”); SANTA CRUZ CTY. PROB. DEP’T, supra note 314, at 2, 8. Santa Cruz County piloted PSA-Court from July 2014 to June 2015. During 2015, the Superior County Court considered 1,437 recommendations. Id. Six hundred and forty-four PSA-Court recommendations were for release and 793 were for detention. Judges departed from the release recommendations a little more than half of the time—53% of the time—but only departed from detain recommendations 16% of the time. Most of the departures, in other words, were in the direction of greater detention. Id. Cf. MATTHEW DEMICHELE ET AL., WHAT DO CRIMINAL JUSTICE PROFESSIONALS THINK ABOUT RISK ASSESSMENT AT PRETRIAL? 33 tbl.5 (noting that in a survey of criminal justice actors, all judges responded that they “often” or “sometimes” agree with the PSA recommendation and the vast majority of judges said that the PSA informed their release/bail decisions).

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1803 computer. And it appears that the Arnold Foundation is benevolently motivated—it provides the system free of charge. But, today, there is still a substantial amount that’s unknown about PSA. As Robert Brauneis and Ellen Goodman recently observed, the Arnold Foundation has not revealed how it developed its algorithms, why it used the data it chose to develop the system, whether it performed validation, and, if it did, what the outcomes were.328 Nor has it disclosed, in quantitative terms, what “low risk” and “high risk” meant. In order to see if court systems had this information, Brauneis and Goodman sent open records requests to sixteen different courts, only to largely be stymied.329 Of the five courts that responded to their request by providing documents, four of them “stated that they could not provide information about PSA because that information was owned and controlled by the Arnold Foundation,” thanks to a Memorandum of Understanding “which contained identical language prohibiting the courts from disclosing any information about the PSA program.”330 Such contractual confidentiality requirements may have some benefits. But such confidentiality can also be detrimental and worsen perceptions of procedural legitimacy. Definitions of input data and outcome measures must also be public. By this we mean that designers should disclose precisely what a tool attempts to predict, and for what time period, based on what inputs. For example, one of the input factors that the PSA uses to calculate a defendant’s risk of rearrest (as well as the more specific risk of rearrest for violent crimes) is whether the defendant has been previously convicted of a “violent” crime.331 The specific crimes that can be prosecuted vary from one jurisdiction to another. Thus, in order to produce a PSA score, a jurisdiction must decide which of the specific charges in its local laws will be counted as “violent” crimes during scoring—by creating what the Arnold Foundation calls a Violent Offense List.332 This process is supposed to consist of the jurisdiction

328. Robert Brauneis & Ellen P. Goodman, Algorithmic Transparency for the Smart City, 20 YALE J.L. & TECH. 103, 138 (2018). 329. Id. 330. Id. at 138–39. 331. See LAURA & JOHN ARNOLD FOUND., PUBLIC SAFETY ASSESSMENT: RISK FACTORS AND FORMULA 2 (2016), https://www.arnoldfoundation.org/wp-content/uploads/PSA-Risk-Factors-and- Formula.pdf [https://perma.cc/UNG8-U7C8]. 332. LAURA & JOHN ARNOLD FOUND., PUBLIC SAFETY ASSESSMENT IMPLEMENTATION GUIDES: 10. GUIDE TO THE PSA VIOLENT OFFENSE LIST 1 (2018), https://www.psapretrial.org/implementation/guides/measuring-risk/guide-to-the-psa-violent- offense-list (last visited Nov. 19, 2018).

Electronic copy available at: https://ssrn.com/abstract=3041622

1804 WASHINGTON LAW REVIEW [Vol. 93:1725 deciding which of its local charges fall within the Foundation’s definition of violence. As the Foundation’s implementation guidance explains: [A]n offense is categorized as violent if a person causes or attempts to cause physical injury through use of force or violence against another person. To ensure fidelity to the PSA and consistency with its underlying research, a jurisdiction must use this definition when creating its PSA Violent Offense List. You may not use a different definition based on law or policy.333 Based on our conversations with practitioners and policymakers, we believe that some are confused about this process, and incorrectly believe that they can decide what violence will mean in their jurisdictions’ score reports. (For example, they may wrongly believe that by adding drug crimes to their local Violent Offense List, they can consider the PSA’s violence flag to be a statistically valid tool for predicting drug crimes as well as other crimes of violence.) In fact, the PSA’s statistical model is not modified locally, and any jurisdiction that deviates from Arnold’s definition of violence is to create a statistically incoherent situation. They would be feeding the tool information it does not expect, and preventing the tool from functioning as expected and verified in validation studies. At a minimum these definitions must be public. Clear definitions of exactly what an outcome measure means, and for what relevant time period, are bare minimum requirements. The precise period of time over which rearrest and FTA are being predicted is a similarly confusing area that would likewise benefit from clear public disclosure. It would be easy for a layperson to conclude, incorrectly, that today’s tools predict the likelihood of a defendant’s rearrest or FTA during that defendant’s particular pretrial period. But in order to make predictions over such individual time horizons, an instrument would need to incorporate some conjecture about how long this particular case will take to resolve, and none do so today. Instead, each of today’s tools predicts rearrest and FTA over a fixed time period—such as, for example, two years. If a defendant’s period of pretrial release is half as long as an instrument’s time horizon, then the defendant will be less likely to be rearrested than the tool predicts. Thus, it is essential that each tool disclose and publicize the number of days over which it predicts rearrest or failure to appear. Similarly, the underlying data upon which the model was originally developed must be made public in some way. As mentioned in

333. Id.

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1805 section IV.B.3, aggregate-level reporting of the development sample or training data for any pretrial risk assessment tool is necessary. Such a report should at least disclose the breakdown of rearrests by charge, severity of charge, age, race, and gender. Disclosure of such data can help ensure that the model was not overly dependent on charges that are especially racially biased, like drug possession. Overall, in order for bail reform to be best positioned to succeed, the public needs a chance to find the kinds of risks we have described elsewhere in this paper. Claims of trade secrecy or confidentiality immunize pretrial risk assessment tools from meaningful public inspection, including from judges.

3. Community Oversight of the Tools and Frameworks Is Essential

As we argued in section III.C, the role of decision-making frameworks in bail reform is sorely underexamined. Substantial attention and scholarship is directed to the pretrial risk assessment tool. But the goal of pretrial risk assessment tools is limited: to classify risk. Yes, that classification is important in and of itself. But what judges do with that information matters. Ideally, more scholarship and experimentation will lead to a more robust debate regarding the role and importance of decision-making frameworks. Given the right conditions and sets of policies, decision-making frameworks could operate as a strong force for decarceration. For example, decision-making frameworks that presumptively favor release on recognizance, or the fewest, least restrictive conditions of release, for the vast majority of defendants would significantly mitigate the concerns we advance here—that systematically overestimating risk will, in turn, subject a substantial number of defendants to counterproductive conditions of release. Of course, most decision-making frameworks are advisory. Though a decision-making framework advises a judge as to what conditions of release—or non-release—are recommended for a given defendant’s level of risk, a judge is largely free to assign what conditions of release they wish. In fact, maintaining this level of judicial discretion is seen as an important political bargain. But maintaining judicial discretion does not necessarily vitiate the need for community oversight. Ideally, a decision-making framework is the product of vibrant community input and debate, from advocates, former defendants, public defendants, district attorneys, the judiciary, policymakers, and more. At its best, the document should formalize the answer to the question: “How much risk will our community tolerate?” Understood this way, a decision-making framework would provide any

Electronic copy available at: https://ssrn.com/abstract=3041622

1806 WASHINGTON LAW REVIEW [Vol. 93:1725 judge a strong signal as to what the community would like done with defendants of various risk levels. Accordingly, it should be the norm, not the exception, that judges concur with the recommendations of a decision-making framework. One way to increase concurrence rates is to make the decision-making framework presumptive, not advisory, in releasing certain classes of defendants, establishing heightened evidentiary and procedural burdens for upward departures (that is, for steps that increase incarceration or the intensity of supervision). For example, jurisdictions might require judges to explain their decision when they diverge from the decision-making frameworks’ recommendations. When judges do disagree with the recommendations of a decision-making framework—which would not be infrequent, even in a perfect world—a couple of steps could be required. First, a system could immediately capture the fact that the judge diverged from the recommended course of action.334 Second, the judge could have to explain in writing why they diverged from the decision-making framework’s recommendation. Ideally, this explanation should be released in machine-readable format. Under such a system, jurisdictions could plausibly be better positioned to not only ensure that judges follow the recommendations of a decision-making framework, but also be better positioned to lock-in decarceral results. Such a system would offer valuable data to policymakers and researchers, like: how often judges diverge from the recommendations, for what types of defendants they diverge, and why they diverge. The histories of bail reform, recited above, and of sentencing reform offer current bail reformers a useful cautionary tale for limiting judicial discretion through technocratic solutions.335

CONCLUSION

Pretrial risk assessment instruments, as they are currently used, cannot safely be assumed to advance reformist goals of reducing incarceration and enhancing the bail system’s fairness. Early evidence

334. Ideally, this would be linked to the defendant’s file at hand—that way policymakers and researchers could analyze why judges deviate from the recommended course of action. 335. Note, supra note 291, at 1138 (“The history of sentencing reform warns that technocratic criminal justice reform can be vulnerable on nearly all fronts. Powerful system actors can hijack tools of reform toward their own economic, structural, and racial ends. In the face of political pressure and media attention, the same legislature that passes reform can waver in its commitment to evidence-based practices and undermine the project. And without buy-in, aligned incentives, and limits on discretion, prosecutors and judges can manipulate technocratic reform. Technocratic tools can be useful, but they cannot answer tough normative questions at the heart of criminal justice.”).

Electronic copy available at: https://ssrn.com/abstract=3041622

2018] RISK ASSESSMENT AND THE FUTURE OF BAIL REFORM 1807 remains sparse, and risk assessment instruments may yet prove themselves effective tools in the arsenal of bail reform. But they have not done so to date. Shifting to risk assessment-based bail will not necessarily reduce incarceration. Without stronger policies, data practices, and open governance, we believe it is likely that these tools will perpetuate or worsen the very problems reform advocates hope to solve. Stakeholders who are eager to reduce pretrial incarceration and improve the system’s fairness and racial equity may wish to renew their energies on policies whose benefits are clearer, such as automatically releasing broad categories of misdemeanor defendants. Where risk assessments remain in use, their design and governance must be improved. If history is any guide, the most significant impacts of today’s bail reforms may turn out not to be the ones that reformers intend. By updating their models with recent, post-reform data, continuously monitoring outcomes, measuring the indicators that truly matter, opening tools to public scrutiny, and ensuring that risk assessments and frameworks never recommend detention, today’s reformers can at least minimize their own risk of frustration in the vitally important work that they pursue.

Electronic copy available at: https://ssrn.com/abstract=3041622 The Risk Assessment Era: An Overdue Debate Author(s): Sonja B. Starr Source: Federal Sentencing Reporter, Vol. 27, No. 4, The Risk Assessment Era: An Overdue Debate (April 2015), pp. 205-206 Published by: University of California Press on behalf of the Vera Institute of Justice Stable URL: http://www.jstor.org/stable/10.1525/fsr.2015.27.4.205 Accessed: 08/05/2015 10:45

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

University of California Press and Vera Institute of Justice are collaborating with JSTOR to digitize, preserve and extend access to Federal Sentencing Reporter.

http://www.jstor.org

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:45:47 AM All use subject to JSTOR Terms and Conditions GUEST EDITOR’S OBSERVATIONS

The Risk Assessment Era: An Overdue Debate

SONJA B. STARR*

Professor of Law, University of Michigan Visiting Professor of Law, Harvard University

It is an understatement to refer to risk assessment as a criminal justice trend. Rather, we are already in the risk assessment era. Quantitative tools for the prediction of crime risk have quickly come to pervade the administration of criminal justice in the United States. In sentencing, their use is largely new and remains fast expanding: today at least twenty states use these tools in some or all sentencing decisions, and many more plus the federal government are considering reforms that would introduce them. The modern revision of the Model Penal Code, currently in draft form, endorses their use in sentencing.1 Risk assessment instruments are also commonly used to guide policing, bail decisions, diversion programs, prison assignment and programming, probation and parole supervision, discretionary parole decisions, and reentry programs. And yet these instruments have largely escaped both legal and public scrutiny. We have not had a national conversation about their use and the risk factors that they include. This Issue reflects the view that that conversation is overdue. Risk assessment is meant to serve an important objective: protecting the public from crime while allowing better tailored use of criminal justice resources (including incar ceration). On the other hand, almost all the risk assessment instruments now in use base a defendant’s risk assessment score in part on demographic and socioeconomic factors, not just on past and present criminal conduct. Whether it is appropriate to base defendants’ risk assessment on such factors is a choice that involves serious value conflicts and raises serious constitutional questions. This past summer, Attorney General Holder weighed in, raising the issue’s public profile. While endorsing the broader practice of using data to inform criminal justice policy and practice, the Attorney General expressed serious concern about the use of risk assessment instruments in sentencing. In a speech to the National Association of Criminal Defense Lawyers, he stated:

Although these measures were crafted with the best of intentions, I am concerned that they may inadvertently undermine our efforts to ensure individualized and equal justice. By basing sen tencing decisions on static factors and immutable characteristics like the defendant’s education level, socioeconomic background, or neighborhood they may exacerbate unwarranted and unjust disparities ...Criminal sentences must be based on the facts, the law, the actual crimes committed, the circumstances surrounding each individual case, and the defendant’s history of criminal conduct. They should not be based on unchangeable factors that a person cannot control, or on the possibility of a future crime that has not taken place.2

The Department of Justice also recently sent a letter asking the U.S. Sentencing Commission to examine the issue.3 We can expect that the Commission will do so soon. We have come to an important juncture, in which we can expect attention to the issue to increase among judges, policymakers, and the public. The risk assessment era is already underway, but it is young, and the trends are not irreversible. Much remains undetermined about what role these instru ments will play in sentencing and other parts of our system, what factors they will include, and what courts will have to say about the constitutional issues. So I think this Issue comes at an excellent time, and I am grateful for the opportunity to serve as its guest editor. I am also grateful to all of the contributors for taking the time to share their rich, informed, and often provocative insights. I have been an outspoken critic of the use of demographic and socio economic factors to shape sentencing risk assessments, but I have aimed to put together an issue with a range of different perspectives, including those of several of risk assessment’s leading advocates.

Federal Sentencing Reporter, Vol. 27, No. 4, pp. 205 206, ISSN 1053 9867, electronic ISSN 1533 8363. © 2015 Vera Institute of Justice. All rights reserved. Please direct requests for permission to photocopy or reproduce article content through the University of California Press’s Rights and Permissions website, http://www.ucpressjournals.com/reprintInfo.asp. DOI: 10.1525/fsr.2015.27.4.205.

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 205

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:45:47 AM All use subject to JSTOR Terms and Conditions The issue begins with the views of those advocates, and then proceeds to critiques. The first contribution is from the Honorable Richard Kopf, Senior U.S. District Judge for the District of Nebraska. Judge Kopf introduces readers to the objectives and mechanics of actuarial risk assessment, focusing on the example of the federal Post Conviction Risk Assessment tool (PCRA). Although the PCRA is currently only used for supervision decisions by the Probation Office, Judge Kopf makes a strong case for the use of similar instruments in sentencing, and specifically defends the use of demographic classifications to predict risk. The second piece comes from former New Jersey Attorney General Anne Milgram, now Vice Pres ident of Criminal Justice at the Laura and John Arnold Foundation; Marie VanNostrand, a risk assess ment expert with Luminosity, Inc.; Alex Holsinger, a professor of criminal justice at the University of Missouri Kansas City; and Matthew Alsdorf, Director of Criminal Justice at the Arnold Foundation. They focus on the use of risk assessment in pretrial detention decisions, arguing for its importance in protecting the public while reducing unnecessary incarceration. The authors also describe new research showing that pretrial risk assessment can be done effectively even without demographic and socioeco nomic classifications. Third, two scholars offer a forward looking piece exploring new methods for improving the accuracy of risk predictions. Richard Berk, a professor of statistics and criminology at the University of Pennsylvania, and Jordan Hyatt, an assistant professor of criminology and justice studies at Drexel University, are experts in machine learning methods, which have so far not been used much in the criminal justice system but could be the wave of the future. The authors provide a lucid introduction to these methods for lay readers. They also offer responses to several criticisms of the risk assessment movement more generally. We then turn to several critical pieces. First, I present a constitutional and policy case against the widespread practice of using demographic, socioeconomic, and family background variables in risk assessments that affect the defendant’s punishment. I argue that sentencing people based on poverty and identity sends a noxious expressive message and worsens existing disparities. Moreover, a number of the classifications used by the instruments are quite clearly unconstitutional, as I show by reviewing the Supreme Court’s doctrine on ‘‘statistical discrimination’’ and on discrimination against indigent criminal defendants. I also respond to a variety of counterarguments, including the claim that the state’s interest in crime prevention allows these classifications to survive heightened scrutiny. Second, Professor Bernard Harcourt, a criminologist and law professor at Columbia University, raises an additional equality related concern. He focuses on the risk factors that my piece leaves alone: those related to criminal history. Professor Harcourt argues that risk factors like prior arrests are essentially ‘‘proxies for race,’’ and that further expanding the influence of criminal history by incorpo rating it into risk assessments in sentencing and parole will exacerbate existing racial disparities. Next, Professor Kelly Hannah Moffat, a criminologist at the University of Toronto, examines practical problems with the administration and use of risk assessments, drawing from her extensive research on their implementation in Canada. Hannah Moffat argues that actuarial risk assessments are not as objective as their advocates suggest; many of the factors are subjective, and those who carry out the assessments have varied perspectives and sometimes actively manipulate results. Moreover, judges and other decision makers vary in their willingness to use the risk assessments and in their interpretations of their results. This Issue reprints Judge Roger Warren’s statement to the U.S. Sentencing Commission about what he describes as evidence based sentencing in state courts. We also reprint Attorney General Holder’s speech and the Department’s letter to the Sentencing Commission, which certain developments in data driven criminal justice policy while also raising serious reservations about the use of factors outside a defendant’s control to determine his sentence. These were very important steps in kick starting the conversation about risk assessment in our criminal justice system. We should take up Attorney General Holder’s call. Whatever one’s views on it, the rise of actuarial risk assessment is too important a development to occur without serious debate and consideration. I hope that the varied perspectives in this issue will help to move that debate forward.

Notes * I am grateful to Doug Berman and Steven Chanenson for the opportunity to edit this issue, and to all the contributors. 1 Model Penal Code: Sentencing § 6B.09 cmt. a at 53, 55 (Tentative Draft No. 2, 2011). 2 U.S. Dep’t of Justice, Attorney General Eric Holder Speaks at the National Association of Criminal Defense Lawyers’ 57th Annual Meeting and 13th State Criminal Justice Network Conference: Remarks as Prepared for Delivery (release date Aug. 1, 2014), at 3. 3 Jonathan J. Wroblewski, U.S. Department of Justice, Letter to the Hon. Patti B. Saris, Chair, U.S. Sentencing Commission, July 29, 2014, available at http://www.justice.gov/criminal/foia/docs/2014annual letter final 072814.pdf.

206 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:45:47 AM All use subject to JSTOR Terms and Conditions Federal Supervised Release and Actuarial Data (including Age, Race, and Gender): The Camel’s Nose and the Use of Actuarial Data at Sentencing Author(s): Richard G. Kopf Source: Federal Sentencing Reporter, Vol. 27, No. 4, The Risk Assessment Era: An Overdue Debate (April 2015), pp. 207-215 Published by: University of California Press on behalf of the Vera Institute of Justice Stable URL: http://www.jstor.org/stable/10.1525/fsr.2015.27.4.207 Accessed: 08/05/2015 10:46

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

University of California Press and Vera Institute of Justice are collaborating with JSTOR to digitize, preserve and extend access to Federal Sentencing Reporter.

http://www.jstor.org

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:46:47 AM All use subject to JSTOR Terms and Conditions Federal Supervised Release and Actuarial Data (including Age, Race, and Gender): The Camel’s Nose and the Use of Actuarial Data at Sentencing

‘‘If the camel once gets his nose in the tent, his body will neighborhood they may exacerbate unwarranted soon follow.’’ The camel’s nose is a metaphor for a small and unjust disparities that are already far too com RICHARD G. act leading to a larger, undesirable action. Mixing meta mon in our criminal justice system and in our soci KOPF phors, I intend to turn the tables on the anti camel slur. ety,’’ Mr. Holder told the defense lawyers. Criminal sentences, he said, ‘‘should not be based on Senior United I want to welcome our humped friend, particularly since unchangeable factors that a person cannot control, States District the ruminant is already halfway into our tent. And, guess or on the possibility of a future crime that has not Judge for the what? Like all beasts of burden, the camel has a lot to 2 taken place.’’ District of Nebraska offer us. In this piece, I will do three things. After I set out the Holder’s remarks did not address hard data compiled by context, I will first describe, albeit in an over simplified criminologists after many years of painstaking study and manner, how the federal supervised release system pres research. Consider the following. ently uses actuarial data to supervise offenders once they Dr. J.C. Oleson was formerly a senior staff member of are released from prison. Second, I will give examples of the Administrative Office of the U.S. Courts, and he is now how certain types of actuarial data like age, race, and a faculty member and Director of Research at the School of gender can properly influence the supervision of offen Social Sciences at the University of Auckland.3 In an ders. Third, I will propose that actuarial data of all types, extremely detailed and well researched paper, Dr. Oleson including age, race, and gender, play an important role at found that there are ‘‘seventeen discrete variables that sentencing and not only in the supervised release appeared to be significantly associated with recidivism.’’4 context.1 This conclusion was drawn from a meta analysis (that is, examining individual studies for the purpose of integrating I. Attorney General Holder’s Implicit Call to the findings of all the studies) of 131 different research Reject Science papers that identified the static and dynamic variables that Attorney General Eric Holder, addressing criminal defense appear to be most predictive of reoffense. In descending lawyers, recently expressed a concern about the use of order of strength of association, they are: actuarial data to sentence people. One assumes he has similar concerns about the use of such data to decide how (1) criminal companions, (2) criminogenic needs, (3) offenders should be supervised when out on the streets antisocial personality, (4) adult criminal history, (5) after serving a prison sentence. race, (6) pre adult antisocial behavior, (7) family rear The Wall Street Journal wrote the following on August 1, ing practices, (8) social achievement, (9) interper 2014, detailing Holder’s remarks: sonal conflict, (10) current age, (11) substance abuse, (12) intellectual functioning, (13) family struc Attorney General Eric Holder warned [recently] ture, (14) criminality, (15) gender, (16) socio that a new generation of data driven criminal jus economic status of origin, and (17) personal distress.5 tice programs could adversely affect poor and minority groups, saying such efforts need to be Holder’s speech brings to mind Oleson’s introductory studied further before they are used to sentence thoughts. We must ask ourselves whether we wish to suspects. follow science, understanding that it may open up ‘‘ter In a speech in Philadelphia to a gathering of the rifying vistas of reality,’’ or whether we will ‘‘flee from National Association of Criminal Defense Lawyers, Mr. Holder cautioned that while such data tools hold this deadly light into the peace and safety of a new dark 6 promise, they also pose potential dangers. age.’’ I vote (twice if I could) for science, even though the ‘‘By basing sentencing decisions on static factors reality may, at times, be terrifying, especially for lawyers and immutable characteristics like the defendant’s (like Holder) and judges (like me) who lack scientific education level, socioeconomic background, or training.

Federal Sentencing Reporter, Vol. 27, No. 4, pp. 207 215, ISSN 1053 9867, electronic ISSN 1533 8363. © 2015 Vera Institute of Justice. All rights reserved. Please direct requests for permission to photocopy or reproduce article content through the University of California Press’s Rights and Permissions website, http://www.ucpressjournals.com/reprintInfo.asp. DOI: 10.1525/fsr.2015.27.4.207.

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 207

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:46:47 AM All use subject to JSTOR Terms and Conditions II. The Present State of Affairs: Predicting and Mitigating target for significant correctional intervention, (2) what Risk in the Federal Supervised Release Context Through needs to address, and (3) how to remove barriers to suc the Use of Evidence-Based Practices such as Actuarial cessful implementation of a supervision and treatment Data7 plan. Criminal justice agencies in the United States began using actuarial risk assessment instruments for post conviction B. The Development of the PCRA supervision as early as 1923. The federal judiciary and the By 2009, the federal system began to develop a fourth federal probation system got serious about using these generation risk assessment tool. The Federal Post scientific techniques to assist with post prison supervision Conviction Risk Assessment (PCRA) apparatus, a scientifi in 1974 and thereafter.8 cally based instrument, was developed by the Administra tive Office of the United States Courts (hereinafter, the A. Background Principles: Risk, Need, Administrative Office) with the help of Christopher and Responsivity T. Lowenkamp, Ph.D., a nationally recognized expert in risk Social science research has consistently demonstrated that assessment and community corrections research. effective interventions for offenders under supervision Dr. Lowenkamp and other Administrative Office research adhere to the principles of risk, need,andresponsivity. ers constructed and validated the PCRA using data collected According to the risk principle, the level of correctional through the Probation/Pretrial Services Automated Case intervention should match the offender’s risk of recidivism. Tracking System (PACTS), existing risk assessments from Higher risk persons require more intensive services to the five federal districts with pilot risk assessment pro reduce reoffending, whereas lower risk persons need less grams, criminal history records, and presentence reports.9 intervention. The risk level is determined by the presence There was a construction sample consisting of 51,428 or absence of those factors likely to cause criminal behavior, federal cases, a first validation sample consisting of 51,643 and those are personal characteristics and circumstances cases, and a second validation sample that included 193,586 statistically associated with an increased chance of persons. Using a statistical standard as a measuring stick, recidivism. the PCRA is one of the most accurate instruments in the When one speaks of ‘‘need’’ in this context, one is field of criminology, producing ‘‘area under the curve’’ attempting to determine what interventions are necessary (AUC) values of .709 to .783. The AUC is a commonly used to change those factors that, when changed, lower the statistic that measures the strength of association between probability of recidivism. Empirical research has shown risk classification and recidivism. AUCs range from .000 to that the needs most associated with criminal activity 1.000, with higher AUCs demonstrating higher predictive include procriminal attitudes, procriminal associates, accuracy for assessing an offender’s risk.10 In other words, impulsivity, substance abuse, and deficits in educational, a perfect score is 1.0, while a score of .5 would be expected vocational, and employment skills. Although an assessment based on chance. The PCRA’s predictive validity was con of overall risk suggests the level of correctional services that firmed for both short term (6 12 months) and longer (up to should be used, the assessment of need pertains to the 48 months) follow up periods. human factors that should be changed to reduce recidivism. Bivariate and multivariate analyses were used to deter Though static factors such as criminal history are good mine the most predictive elements for inclusion in the predictors of offending, they do not identify what needs instrument. They included criminal history, education, should be targeted to reduce the likelihood of future crim employment, substance abuse, social networks, and cogni inal behavior. tion. Law enforcement records were used to identify any The responsivity principle calls for the identification of new arrests after the start of supervision. barriers that may frustrate the needs of the offender and the Four risk categories were identified based on the sta success of the treatment plan developed by the probation tistical analysis: low, low/moderate, moderate, and high. In officer. After determining what barriers exist, the respon all three samples of offenders on federal supervised release, sivity principle calls for a determination of the approach low and low/moderate risk persons accounted for at least (treatment modality) most likely to remove the barrier and 85 percent of the cases. Much smaller percentages were thereby increase the chance for success and reduce the risk identified in each sample as moderate and high risk of reoffense. For example, if one is impulsive, his or her (approximately 12 percent and 1 percent, respectively). responsivity to the need to avoid smoking crack is a barrier. It is worth noting that this skewed distribution is unsur In such a circumstance, the probation officer might require prising because the pool of persons under post conviction the offender to participate in cognitive behavioral therapy, supervision is presumably at lower risk than offenders in an action oriented therapeutic approach that can be tailored prison. to provide the offender with specific strategies to help address impulsivity. C. Content of the PCRA The most advanced assessment instruments incorpo The PCRA consists of two sections. One section is com rate the principles of risk, need, and responsivity by pleted by the probation officer (Officer Assessment), and addressing all three components that is, (1) whom to the other section is completed by the person under

208 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:46:47 AM All use subject to JSTOR Terms and Conditions supervision (Offender Self Assessment). The Officer assessment helps the probation officer get at the question of Assessment includes both ‘‘scored items’’ and ‘‘unscored the offender’s thinking style. items.’’ Scored items have been demonstrated by the The housing, finances, and recreation domain assesses Administrative Office’s empirical research to be statistically the level of home stability, whether there are criminal risks significant predictors of recidivism, and they contribute to at home, the financial situation, and the level of engage the PCRA’s final conclusion regarding risk level and needs. ment in healthy social activities. Unscored items have been shown by other empirical For the responsivity category, the officer is directed to research to be predictors of recidivism, but have not been check for the following areas of concern: low intelligence, studied by the Administrative Office in federal cases due to physical handicap, reading and writing limitations, mental the lack of necessary data. health issues, no desire to change or participate in pro There are currently 15 scored items and 41 unscored grams, homelessness, transportation, child care, language, items. Information for all scored items and the majority ethnic or cultural barriers, history of abuse/neglect, and of unscored items is obtained as part of the Officer interpersonal anxiety. Assessment based on the interviews and a review of file documents. The Offender Self Assessment is currently D. Criminal Thinking as a Part of the PCRA used only for 12 unscored items under the ‘‘cognitions’’ As noted above, the probation officer is directed to assess domain. the degree to which the offender exhibits antisocial think The PCRA includes information from the following ing styles. That effort is aided by the Offender Self seven domains: (1) Criminal History 6 scored items, Assessment. That assessment is based on the Psychological 1 unscored item; (2) Education/Employment 3 scored Inventory of Criminal Thinking Styles (PICTS), which was items, 2 unscored items; (3) Substance Abuse 2 scored developed by Glenn Walters, Ph.D., using data collected on items, 4 unscoreditems; (4) Social Networks 3 scored items, Federal Bureau of Prisons inmates.11 3 unscored items; (5) Cognitions 1 scored item, 13 unscored The PICTS is a quantifiable instrument that provides items; (6) Other (Housing, Finances, Recreation) no scored a reliable and valid method to assess criminal thinking items, 4 unscored items; and (7) Responsivity Factors no styles. It is an 80 item self report measure of criminal scored items, 14 unscored items. thinking styles created to provide the probation officer with The criminal history domain is measured by whether information about how an offender thinks, which can be the person was arrested at or under age 18, the number of valuable for treatment and supervision purposes. It is prior misdemeanor and felony arrests, whether there are designed to assess eight thinking styles that support and prior violent offenses, whether there is a varied (more than maintain criminal activity. one offense type) offending pattern, whether there has been The PICTS also includes a General Criminal Thinking a revocation for new criminal behavior on supervision, score, which is the sum of the raw scores for the items in whether there has been problematic institutional adjust the self assessment that make up the eight PICTS thinking ment while imprisoned, and the person’s age at the time of style scales. The PICTS also includes the Proactive Crimi supervision. nal Thinking composite scale and the Reactive Criminal The education and employment domain includes mea Thinking composite scale, which identify the mode of sures for the highest education level achieved. It also criminal thinking an individual endorses. Proactive think includes the degree of employment and number of jobs in ing is goal directed. Persons who are proactive tend to the past 12 months. Drug and alcohol use is measured by expect positive things to come from their criminal behavior, whether there are disruptions at work, school, and home such as money, status, and power. Others may describe due to drug or alcohol use, whether the offender uses drugs them as devious, callous, calculating, and cold blooded. or alcohol when it is physically hazardous, whether legal Reactive thinking involves reacting to a situation rather problems have occurred due to drug or alcohol use, whether than planned behavior. Persons who are reactive view the the person continues to use drugs or alcohol despite social world suspiciously and misinterpret others as hostile. and interpersonal problems, and whether a current drug or Others may describe them as impulsive, emotional, and alcohol problem exists. hot blooded. Under the social networks category, the officer assesses marital status, whether the person lives with a spouse or children, whether there is a lack of family support, whether E. Output from the PCRA there is an unstable family situation, the nature of the After the Officer Section and the Offender Self Assessment person’s relationship with peers, and whether the person are completed, an output page is produced that lists the lacks positive prosocial support. person’s risk category, needs, and responsivity factors. The Regarding the cognitions domain, the officer is directed total risk score is determined by adding the points for each to assess whether the person has antisocial attitudes and of the scored items in the seven domains. The score is then values and whether he is motivated toward supervision and used to classify the person into one of four risk categories: change. The offender also takes part in an 80 question self low, low/moderate, moderate, and high. The Administra assessment, which is discussed further below. This tive Office’s research indicates that, with each increase in

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 209

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:46:47 AM All use subject to JSTOR Terms and Conditions risk category, the probability of failure (re arrest and revo officers believe that the PCRA risk score is not appropriate: cation) increases. sex offenders, persistently violent offenders, offenders with The majority of the persons under federal supervision severe mental illness, and youthful offenders with extensive fall into the low or low/moderate categories. In the low risk criminal histories. Officers are also permitted to deviate category, 8 percent of offenders have their supervision from the PCRA risk level for other reasons through a ‘‘pro revoked, and 9 percent are re arrested within the first 190 fessional override.’’ These require a comprehensive justifi days from their initial assessment. Intensive interventions cation. Any type of override requires the approval of with this population have little impact and may increase the a supervising officer. risk of recidivism. Typically, a low level of supervision is appropriate. The probation officer should consider low G. Value of the PCRA intensity supervision and eventual early termination. The PCRA is based on a dataset of unprecedented size In the low/moderate category, 11 percent of offenders that is representative of the population of federal offen have their supervision revoked, and 15 percent are re arrested ders under supervision. It is also consistent with con within the 498 to 810 days from their initial assessment. temporary scientific research since it adheres to the If appropriate risk factors are effectively addressed, these principles of risk, need, and responsivity. Assessment failure rates decline in subsequent time periods. information is used not only to measure risk to determine In the moderate category, 47 percent of offenders have the appropriate supervision level, but to reduce risk going their supervision revoked, and 30 percent are re arrested forward. Because it includes dynamic risk factors, the within the first 190 days from their initial assessment. PCRA allows officers to identify needs that should be If appropriate risk factors are effectively addressed, these targeted for change and responsivity barriers to change. failure rates decline in subsequent time periods. Simply put, the PCRA is a data driven approach, and In the high risk category, 74 percent of offenders have utilization of it results in the rational and equitable their supervision revoked, and 42 percent are re arrested supervision of offenders. within the first 190 days from their initial assessment. If appropriate risk factors are effectively addressed, these III. Examples of the Proper Use of Age, Race, and Gender failure rates decline in subsequent time periods. under the PCRA Here is an example of an output page for an actual It is helpful to look at how the PCRA utilizes age, race, and offender, which has been ‘‘scrubbed’’ to make it impossible gender with respect to scored and unscored items. to determine the identity of the person: Remember that scored items have been demonstrated by the Administrative Office’s empirical research to be statis Risk Category tically significant predictors of recidivism. Unscored items Federal Risk Screening Instrument Score: Moderate have been shown by other empirical research to be predic In this category, 47% of offenders have their tors of recidivism, but have not been studied by the supervision revoked and 30% are re arrested within the first 190 days from their initial assess Administrative Office in federal cases. ment. If appropriate risk factors are effectively addressed, these failure rates decline in subse A. Age as a Scored Item quent time periods. Using the PCRA, age is specifically taken into consideration Dynamic Risk Factors when scoring an offender’s criminal history for purposes of Federal Needs Screening Instrument Indications: the assessment of risk. Presumably, age is one of those #1 Cognitions immutable characteristics that Attorney General Holder #2 Education/Employment was concerned about. Offender Self Report Results: Age is used in two ways when assessing criminal history Valid Profile: Yes under the PCRA. That is, the probation officer determines Exhibits General Criminal Thinking: Modest (55 59) (1) whether the offender was first arrested by any jurisdic Proactive: None (<55) Reactive: Moderate (55 69) tion before the age of 18, and (2) the age of the offender at Predominant Style: Reactive the time of federal supervision. Profile Differentiated: Yes If the offender was arrested before the age of 18, that Elevated Offender Thinking Styles behavior increases the total criminal history score by 1 point Cognitive Indolence (Ci) Lazy Thinking (Item 1.1 of the PCRA). If the age at intake is between 26 to Mollification (Mo) Making Excuses 40, 1 point is added to the total criminal history score (Item Power Orientation (Po) Asserting Power Over 1.7). If the age at intake is 25 years or younger, that fact 12 Others increases the criminal history score by 2 points (Item 1.7). If the age of intake is 41 or above, no criminal history points are assessed for that category (Item 1.7). F. Override of the PCRA That it is sensible to concentrate on age for criminal Probation officers can deviate from the PCRA risk category history purposes should surprise no one. That younger through a ‘‘policy override’’ for the following categories if persons are seen as more likely to be at risk for reoffending

210 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:46:47 AM All use subject to JSTOR Terms and Conditions should not surprise anyone either. That the Administrative Table 1. Presence of responsivity issues for federally Office’s empirical research on age was consistent with the supervised offenders at initial assessment, conclusions of many other studies is also not surprising. by offender demographic characteristics, Based upon a meta data analysis of the research in this area, November 2013 March 2014.

J.C. Oleson writes, ‘‘Those between the ages of about fifteen Number of Percent with or sixteen and twenty four or twenty five appear to be at Offender demographics offenders responsivity issues greatest risk of offending, but after that period, for a variety Any offender 19,753 28% 13 of possible reasons, adults gradually ‘age out’ of crime.’’ Race/ethnicity Indeed, ‘‘the highest rates of violent crime occur[] at age American Indian or Alaska Native 557 50 Hispanic, any race 4,623 31 eighteen.’’14 It is also worth noting that persons who started White, not Hispanic 6,916 27 earlier in the commission of crime are more likely to have Black or African American 6,576 26 a greater risk of re offense later in life as compared to Asian or Pacific Islander 518 24 Gender persons of the same age whose criminal record did not start Female 3,644 31 so early. Male 15,698 27 Age 20 or younger 254 34 B. Age, Race, and Gender as Unscored Items 21 24 1,301 30 With respect to race and gender, the PCRA does not score 25 34 6,137 27 such characteristics,15 but the PCRA does call for the anal 35 44 5,732 26 45 54 3,534 30 ysis of race and gender. These characteristics come into 55 or older 2,383 32 play when the probation officer determines ‘‘responsivity’’ or the barriers that preclude successful satisfaction of the Source: Cohen and Wetzel, note 16, at 15 and tbl. 2 (note omitted). offender’s needs. Except as noted above for criminal history purposes, age is likewise not scored, but it too is an American Indians, Alaska Natives, and women were important data point when considering barriers to correc more likely than other groups to confront mental health tional treatment. barriers to effective correctional treatment. The same was Data on age, race, and gender can be used to assist the true for American Indians, Alaska Natives, and women probation officer in overcoming barriers to the success of when it came to a history of abuse and neglect. Moreover, as 16 offenders. Recently, a study was completed of 19,753 a demographic group, Native Americans and Alaska federal offenders with an initial assessment upon com Natives were more likely to face more numerous barriers to mencement of supervised release occurring from Novem effective correctional treatment than other racial or ethnic ber 2013 through March 2014. Of these offenders, race and groups. Asian and Pacific Islanders were more likely to face ethnicity information was available for 97 percent, gender language barriers to effective correctional treatment when data was available for 98 percent, and age information was compared to other racial or ethnic groups. available for 98 percent. By closely examining data from this large pool of federal C. The Use of Age, Race, and Gender under the PCRA offenders, these three ‘‘immutable characteristics’’ age, is Appropriate race, and gender stood out as especially relevant. From an The PCRA is not a blunt instrument. Nor is the approach it examination of these characteristics, federal probation dictates. It would be difficult for a reasonable person to officers learned much about the barriers that certain conclude that the utilization of the PCRA is in any way offenders were facing. As it turns out, those barriers were discriminatory. Although age, race, and gender are part of not shared equally, but were distributed differentially the PCRA approach to supervision, the use of those char according to discrete demographic characteristics. acteristics is carefully and narrowly tailored to further the Although barriers to effective correctional treatment government’s compelling interest in seeing to it that impacted 28 percent of all offenders, American Indians or offenders do not reoffend and instead are provided with the Alaska Natives clearly faced the greatest hurdle of all most effective correctional treatment. Indeed, it is hard to demographic groups. One half (50 percent) of this popula imagine an alternative that would serve the compelling tion was found to have responsivity issues that stood in the governmental interest nearly as well. way of meeting the needs of these offenders. Women were clearly more likely to face barriers to effective correctional treatment than men. Very young federal offenders (20 or IV. All Statistically Significant Actuarial Data Regarding younger) and older federal offenders (55 or older) were Risk to Reoffend, including Age, Race, and Gender, particularly likely to confront barriers to effective correc Should be Used at Sentencing tional treatment. Having shown that actuarial data has been a boon to the When the researchers drilled down even further, they rational and nondiscriminatory supervision of federal found that particular barriers to effective correctional offenders once they leave prison, it is not a great leap to treatment were, once again, distributed unequally accord suggest, as I do, that statistically significant actuarial data ing to discrete demographic characteristics. regarding the risk to reoffend should become a salient part

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 211

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:46:47 AM All use subject to JSTOR Terms and Conditions Table 2. Types of responsivity issues identified for federally supervised offenders at initial assessment, by offender demographic characteristics, November 2013 March 2014.

Percent of offenders with responsivity issues

Offender race and ethnicity Offender gender

Type of American Indian Asian/Pacific Black/African Hispanic, White, not responsivity issues or Alaska Native Islander American any race Hispanic Female Male

Transportation 26% 3% 10% 7% 9% 9% 9% Mental health 11 2 6 6 10 12 7 Physical handicap 5 2 4 2 5 4 4 Homeless or unstable housing 7 1 4 3 4 3 4 No desire to participate in programs 7 1 4 3 3 2 4 History of abuse or neglect 7 2 3 3 4 8 2 Reading and writing limitations 5 6 3 5 2 2 3 Low intelligence 5 2 4 3 2 2 3 Language 1 13 9 1 2 3 Interpersonal anxiety 3 1 1 2 2 1 Ethnic or cultural issues 8 3 1 1 1 Child care 2 0 1 1 2 Number of offenders 557 518 6576 4623 6916 3644 15,698

Source: Cohen and Wetzel, note 16, at 16 & tbl. 3 (note omitted).

of federal sentencing. Relying on Dr. Oleson once again, vertical axis. Each point in the scatter plot would represent I describe how that could be done. After that, I address a previous federal case matched to the offense of conviction, Attorney General Holder’s concerns. offender characteristics, and arrest records following conviction.18 Offender characteristics could include all or some19 of A. Actuarial Software the 17 offender characteristics found statistically probative Dr. Oleson has proposed the use of software that could of reoffense by the meta analysis of 131 different studies capture both the risk to reoffend and retribution related discussed earlier. (As a reminder, the characteristics are concerns.17 Although I remain agnostic about the specific criminal companions, criminogenic needs, antisocial per design and whether this software would replace, or serve as sonality, adult criminal history, race, pre adult antisocial an adjunct to, the Federal Sentencing Guidelines, it is clear behavior, family rearing practices, social achievement, to me that Oleson’s proposal has much merit. In very interpersonal conflict, current age, substance abuse, intel general terms, here is how it would work. lectual functioning, family structure, criminality, gender, Let us assume that we want to compare the offender socio economic status of origin, and personal distress.) standing before the judge at sentencing with past offenders The judge would then ‘‘click on’’ a point within the who have committed similar crimes and who have similar scatter plot to compare the offender described on the scatter characteristics. By comparing the present offender with like plot with the offender facing the judge. The judge could pull past offenders, we should be able to predict risk based upon up the specifics of that prior offender’s case: the name and the past performance of similarly situated offenders. In photo of the offender, the offense of conviction, the char other words, the object of our comparison is to predict the acteristics of the offender, and the particulars of the sen risk that the offender standing before the judge will engage tence imposed. The judge would be able to review any in further criminal behavior by examining how other sim educational, vocational, or treatment programs that ilar past offenders have fared. successful offenders had completed while serving their There is a huge amount of actuarial data available to sentences, and to search online for available, equivalent the Sentencing Commission and the federal courts about programs. If desired, the underlying documents associated federal offenders. We now have access to that data through with any of the previous cases could be retrieved through the federal court’s Case Management and Electronic Case the judge’s computer. Filing (CM/ECF) system and otherwise. With that in mind, Dr. Oleson writes: one can imagine software that would tap into that data and generate a scatter plot. By concentrating on points near the top of the verti The severity of the sentence, showing the entire spec cal axis (individuals who went long periods of time trum of imprisonment and supervised release available without a new arrest), the judge could engage in actu under the statute, would be plotted on the horizontal axis of arial sentencing and impose a sentence that was effec a pictogram that would appear on the judge’s computer. tive in reducing recidivism among similar defendants The duration without a new arrest (easily obtainable from convicted of similar crimes. A judge could divert cor such sources as NCIC, National Crime Information Center) rectional resources from low risk offenders (who actu for similarly situated past offenders would be plotted on the ally become more likely to reoffend if oversupervised)

212 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:46:47 AM All use subject to JSTOR Terms and Conditions to high risk offenders in greater need of intensive with a host of other empirically validated characteristics, services. Defendants who are statistically most likely does not violate the Constitution. This is because the use of to recidivate could be sentenced to longer sentences actuarial risk data serves the compelling governmental (within the statutory range), while those who present interest of protecting the public from further crime and little risk of recidivism could be sentenced to brief because it is narrowly tailored to take into consideration terms of incarceration or noncustodial sentences.20 only those factors that are statistically correlated with recidivism. Additionally, since research has shown that The foregoing approach would also be useful if a judge excluding race from mathematical models of recidivism was concerned with retribution. The software could be degrades the predictive power of the model, no less written to address retribution related concerns by having restrictive means will satisfy the compelling governmental a part of the horizontal axis of the plot in a different color, interest.26 say red. ‘‘The red band would reflect the recommended Furthermore, no matter how we might wish that it were sentencing range in terms of proportionality and just otherwise, luck plays an important part in every person’s desserts. ...[A]s a general rule, the red band would serve as life. Some people are born smart, and others are born not so an anchoring point for the moral wrongness of the offense, smart. Some are born rich, and others born poor. Some and suggest appropriately retributive penalties.’’21 have had good parents, and others virtually no parents at all. Some are born male, while others are born female. B. Immutable Characteristics Some are born black, and some are born white. Some Attorney General Holder was concerned about sentencing offenders are young, and some are old. All of these, and people on the basis of ‘‘immutable characteristics’’ that many more, are attributes that the offender cannot be those people could not control. When subjected to a clear expected to change. eyed analysis, his arguments about ‘‘immutable character If protection of the public is central to the act of sen istics’’ are not persuasive. tencing, carving out special classes of offenders with ‘‘bad’’ Initially, for the public that must bear the risk of the luck for relatively benign treatment, while simultaneously offender committing another crime, and considering the ignoring the ‘‘bad’’ luck of other offenders, make no sense explicit command of 18 U.S.C. § 3553(a)(2)(C) requiring the at all, especially if we are about protecting the public. On the judge to protect the public from further crimes of the contrary, considering all statistically valid predictors of risk defendant, differentiating between ‘‘immutable character seems far more fair and consistent with our obligation to istics’’ and other characteristics of risk makes no sense. protect the public than picking winners and losers based If a sociopath, for example, is more likely to offend again, upon our uneven sympathy for certain classes of offenders why should the judge ignore that unchangeable character who have had ‘‘bad’’ luck.27 istic for purposes of sentencing? The victim of the new offense will find no solace whatever in such ignorant C. Correlation is Not Causation behavior. It should not be necessary, but one thing needs to be stated It is likely that Holder was worried only about certain clearly: A black man is not inherently more likely to be highly sensitive ‘‘immutable’’ characteristics like age, race, a criminal than a white man. Age, race, and gender do not or gender. There are (at least) two responses to such a con cause crime. Correlation is not causation. cern one relates to the legal justification for considering None of the criminologists who painstakingly derived age, race, and gender, and the other to the question of luck. actuarial data to study crime did so because they believed While it is beyond the scope of this piece to fully artic that a statistical correlation between race (or some other ulate a legal rationale for consideration of age, race, or variable) and crime proved causation. Rather, they relied gender as a part of an actuarial sentencing scheme, a sen upon statistical correlations to better help them understand tencing system based upon a robust actuarial data set con crime. Although it may be true that race, for example, can sisting of all factors statistically correlated with risk would be a proxy for past or present discrimination, it is also true arguably pass constitutional muster, even under strict that all statistically correlated variables are proxies for scrutiny.22 Quickly put, and concentrating on race as an something else. In the case of race (or age or gender), one example, the argument goes something like this: cannot reasonably argue that such a classification is inevi First, ‘‘The ‘legitimate and compelling state interest’ in tably a proxy for one thing and one thing only that is, protecting the community from crime cannot be discrimination. Since that is so, and race is likely to be doubted.’’23 Second, ‘‘Highly relevant if not essential to a proxy for a wide variety of potentially relevant variables [a judge’s] selection of an appropriate sentence is the pos that cannot be easily untangled from race, then, from the session of the fullest information possible concerning the utilitarian perspective of 18 U.S.C. § 3553(a)(2)(C), the pro defendant’s life and characteristics.’’24 Third, like the tection of the public from future crime demands that the admission policy at the University of Michigan Law School sentencing judge be well informed of all actuarial data that that considered race as one of several factors and that sur helps him or her assess the risk of reoffense. vived strict scrutiny in Grutter v. Bollinger,25 utilization of In summary, an actuarial data set that includes all of the race as part of a statistical model at sentencing, together 17 variables statistically correlated with crime including

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 213

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:46:47 AM All use subject to JSTOR Terms and Conditions age, race, and gender would materially assist the federal online.wsj.com/articles/u s attorney general cautions on judge when selecting a sentence. Those variables need not, risk of bias in big data use in criminal justice 1406916606. 3 Oleson received his Ph.D. in criminology from the University of and should not, drive the sentence. But they are worthy of Cambridge, and his J.D. from the law school at the University careful and cautious consideration. In short, it is worth of California, Berkeley (Boalt Hall). remembering that the blind cannot see. 4 J.C. Oleson, Risk in Sentencing: Constitutionally Suspect Vari ables and Evidence Based Sentencing, 64 S.M.U. L. Rev. 1329, D. A Few Words about Unwarranted 1367 (2011) [hereinafter Risk in Sentencing]. 5 Id. Oleson discusses in detail each of the seventeen factors. Id.at Sentencing Disparity 1353 68. Although all the other factors are relatively self Citing 18 U.S.C. § 3553(a)(6), which cautions against evident, ‘‘criminogenic needs’’ requires explanation. Essen unwarranted sentencing disparity, a canny critic would tially, it means ‘‘holding criminal values,’’such as the rejection of argue that using actuarial data will cause sentencing dis lawful work and the adoption of crime as a lifestyle. Id. at 1354. 6 parity. But that is not true. The more closely the risk factors Id. at 1330. 7 An ‘‘evidence based practice’’ is one that is not primarily for the offender pending sentencing are matched with like based on intuition or personal experience, but rests on verifi risk factors for a past offender, the closer the judge will get able data. ‘‘Actuarial data’’ is a subset and refers to statistics to the ideal of sentencing like offenders in a like manner on about groups rather than stereotypical or other assumptions. a ground (risk to reoffend) that is indisputably proper. And, As many philosophers of science and others have observed, to the extent that an actuarial sentencing scheme also con there are limits to ‘‘common sense.’’ 8 Most of the information contained in Part II is drawn from two tains retributive considerations like proportionality and just excellent and heavily footnoted sources: (1) Administrative desserts, ‘‘unwarranted’’ sentencing disparity becomes Office of the United States Courts, An Overview of the Federal even less likely. Post Conviction Risk Assessment (Sept. 2011) and (2) James L. Johnson, Christopher T. Lowenkamp, Scott W. VanBenschoten E. Punishment for a Crime Not Yet Committed & Charles R. Robinson, The Construction and Validation of the Federal Post Conviction Risk Assessment (PCRA), 75(2) Federal Attorney General Holder was concerned about punishing Probation (Sept. 2011), available at http://www.uscourts.gov/ an offender for a crime that the offender had not (yet) uscourts/FederalCourts/PPS/Fedprob/2011 09/index.html. committed. Implicit in this assertion is the belief that sen 9 Id. at 4 8. tencing should not as a normative matter address utilitarian 10 ‘‘For a frame of reference, it is helpful to consider the risk concerns. While Holder is, of course, free to express his factors for a heart attack (e.g., high levels of bad cholesterol, smoking, and hypertension). These risk factors were identified views about the proper goals of sentencing, the decision in a study, which followed approximately 5,000 people over about what concerns should be addressed at sentencing a 12 year period. When the risk factors are combined, the AUC has already been made by Congress. Specifically, 18 U.S.C. falls between .74 and .77. See [D.A. Andrews & James Bonta, § 3553(a)(2)(C) commands that the ‘‘court, in determining The Psychology of Criminal Conduct 276 (4th ed. 2007)] (citing the particular sentence to be imposed, shall consider ...the W.F. Wilson et al., Prediction of Coronary Heart Disease Using Risk Factor Categories, 97 Circulation Journal of the American need for the sentence imposed ...to protect the public from Heart Association 1837 (1998)). While perfect prediction is an further crimes of the defendant. ...’’ In short, Holder’s impossibility in both the medical and criminal justice fields, argument against punishing someone for a crime they the knowledge of risk has practical value. Id.’’ Administrative have not yet committed should not be taken seriously. Office of the United States Courts, supra note 8, at 9 n.35. 11 Since 1992, Doctor Walters has been employed as a clinical psychologist by the Federal Bureau of Prisons (Federal Cor V. Conclusion rectional Institution Schuylkill). The original PICTS instru The more a judge knows about offenders and the risk they ment was developed in 1989 and then updated in 2001. Glenn pose to reoffend, the better the judge can sentence offen D. Walters, The Psychological Inventory of Criminal Thinking Styles (PICTS): A Review and Meta Analysis, 9(3) Assessment ders for their crimes and supervise them when they leave 278 91 (September 2002). prison. Actuarial data including age, race, and gender 12 The output page in the text was provided by Supervising materially assists the judge in such an effort. With that, United States Probation Officer Doug Steensma from the I warmly invite the camel into our tent. District of Nebraska. When SUSPO Steensma supplied this to me, he wrote: ‘‘I have shared the PICTS results with offenders along with the PCRA results. By doing so, it can create change Notes talk which helps officers to guide offenders towards services or 1 The nine page ‘‘PCRA Officer Section,’’ which is discussed programs aimed at addressing identified needs.’’ He added, later, is reproduced and is available at www.ned.uscourts. ‘‘I reviewed these results with the offender, and the offender gov. Click on ‘‘Attorney,’’ ‘‘Judges’ Information,’’ and ‘‘Richard admitted to having the thinking styles listed. As an officer, the G. Kopf’’; scroll down to ‘‘Other Items of Interest’’ and information was beneficial for the purpose of developing ‘‘Federal Sentencing Reporter 2014 15.’’ Likewise, the strategies for programming and supervision.’’ I take this 80 question ‘‘PCRA Offender Section’’ discussed later is opportunity to thank SUSPO Steensma and Senior Probation reproduced there, too. Finally, the articles cited in these Officer Todd Enger for their able assistance and patient endnotes are reproduced in this archive. When retrieving teaching. In that same vein, I also compliment the Chief these documents, be patient. Some documents are large, and United States Probation Officer for the District of Nebraska, it takes time for the server to retrieve them. Mary Lee Ranheim, and her Chief Deputy, Kit Lemon, for 2 Devlin Barrett, Holder Cautions on Risk of Bias in Big Data Use in running one of the best federal probation offices in the Criminal Justice, Wall St. J., Aug. 1, 2014, available at http:// country.

214 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:46:47 AM All use subject to JSTOR Terms and Conditions 13 Risk in Sentencing, supra note 4, at 1362 (footnote omitted). 22 Risk in Sentencing, supra note 4, at 1372 88. 14 Id. 23 Schall v. Martin, 467 U.S. 253, 264 (1984) (quoting De Veau 15 Although ‘‘race’’ and ‘‘gender’’ are not scored in the federal v. Braisted, 363 U.S. 144, 155 (1960)) (authorizing pretrial supervised release system for purposes of ‘‘risk’’ assessment, detention of an accused juvenile delinquent based on finding those factors are clearly predictors of reoffense. Criminolo that there was ‘‘serious risk’’ that the juvenile ‘‘may before the gists have found that ‘‘race’’ is a statistically significant pre return date commit an act which if committed by an adult dictor of recidivism, although the reasons why this is so are would constitute a crime’’ and holding that such practice did disputed. Risk in Sentencing, supra note 4, at 1356 59. Male not violate the Due Process Clause). gender is another significant predictor of reoffense, especially 24 Williams v. New York, 337 U.S. 241, 247 (1949) (emphasis for serious crime. Id. at 1365 66. added) (New York judge, who was charged with the responsi 16 The data and much of the analysis discussed in this section bility of sentencing under New York statute setting wide limits are derived from Thomas H. Cohen & Jay Whetzel, The for maximum and minimum sentences, is not restricted by the Neglected ‘‘R’’ Responsivity and the Federal Offender, Federal Due Process Clause to information received in open court). Probation, 78(2) Federal Probation 11 (Sept. 2014), available 25 Grutter, 539 U.S. 306 (2003). at http://www.uscourts.gov/uscourts/FederalCourts/PPS/ 26 ‘‘Once the constitutional door is open to race, all other sen Fedprob/2014 09/federal offender.html. tencing factors can pass through: gender, age, marital status, 17 J.C. Oleson, Blowing Out All the Candles: A Few Thoughts on the education, class, and so forth.’’ Risk in Sentencing, supra note Twenty Fifth Birthday of the Sentencing Reform Act of 1984,45 4, at 1387. U. Rich. L. Rev. 693 (2011). 27 By the way, it is not true that the examination of race will 18 Id. at 744 47. always adversely impact groups that have been historically 19 For example, one could exclude race. But remember that for discriminated against. Take child pornography as an example. every omission, there is a consequent drop in the ability of the The Sentencing Commission has found that white males software to match the offender pending sentencing with other comprise as much as 93 percent of offenders who do not like past offenders. In short, the more personal characteristics manufacture child pornography. U.S. Sentencing Commission, that are omitted, the weaker the scatter plot becomes in terms Report to the Congress: Federal Child Pornography Offenses, of predicting risk of recidivism. Id. at 755 56. ch. 11 at 304 & tbl. 11 2 (Dec. 2012) (Chapter 11 deals with 20 Id. at 746 47 (footnotes omitted). recidivism). Indeed, the Commission has stated that such 21 Id. at 748 49. offenders are ‘‘overwhelmingly white.’’ Id.at308n.56.

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 215

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:46:47 AM All use subject to JSTOR Terms and Conditions Machine Learning Forecasts of Risk to Inform Sentencing Decisions Author(s): Richard Berk and Jordan Hyatt Source: Federal Sentencing Reporter, Vol. 27, No. 4, The Risk Assessment Era: An Overdue Debate (April 2015), pp. 222-228 Published by: University of California Press on behalf of the Vera Institute of Justice Stable URL: http://www.jstor.org/stable/10.1525/fsr.2015.27.4.222 Accessed: 08/05/2015 10:49

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

University of California Press and Vera Institute of Justice are collaborating with JSTOR to digitize, preserve and extend access to Federal Sentencing Reporter.

http://www.jstor.org

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:49:38 AM All use subject to JSTOR Terms and Conditions Machine Learning Forecasts of Risk to Inform Sentencing Decisions

I. Introduction are only one sentencing consideration, relying on more When threats to public safety are a factor in sentencing traditional forecasting approaches instead can disadvantage decisions, forecasts of ‘‘future dangerousness’’ are neces decision makers who are committed to preventing harm RICHARD sarily being made. Sometimes the forecasts are effectively and are serious about evidence based sentencing.4 BERK* mandatory. Federal judges, for example, are required to In the next few pages, the machine learning approach is assess risk in every case. Under 18 U.S.C. § 3553(a)(2)(C), summarized, with special emphasis on a machine learning Professor, ‘‘[t]he court, in determining the particular sentence to be procedure called ‘‘random forests.’’5 Random forests has Departments of imposed, shall consider ...(2) the need for the sentence already been successfully applied in several challenging, Criminology & imposed ...(C) to protect the public from further crimes of criminal justice forecasting exercises6 and will serve here as Statistics, the defendant ...’’ A judge must look into the future, an excellent didactic device. Software is readily available. University of determine the likelihood and seriousness of criminal Pennsylvania behavior, and within certain bounds, sentence to minimize II. The Criminal Justice Context for Machine Learning the harm that could result. Forecasts JORDAN Ideally, the forecasts should be highly accurate. They Forecasting has been an integral part of the criminal justice HYATT also should be derived from procedures that are practical, system in the United States since its inception.7 Judges, as Assistant Professor, transparent, and sensitive to the consequences of forecast well as law enforcement and correctional personnel, have Department of ing errors. However, there is usually no compelling guid long used projections of relative and absolute risk to help Criminology and ance on precisely how these goals can best be achieved. inform their decisions.8 Assessing the likelihood of future Justice Studies, Subjective judgment, sometimes called ‘‘clinical judg crime is not a new idea, although it has enjoyed a recent Drexel University ment,’’ is an approach that relies on intuition guided by resurgence: an increasing number of jurisdictions mandate experience. As discussed below, the resulting risk assess the explicit consideration of risk at sentencing.9 ments are often wildly inaccurate and their rationale opa We begin by briefly describing the forecasting task that que. ‘‘Actuarial’’ methods depend on data that allow one to machine learning undertakes. Typically, the goal is to link ‘‘risk factors’’ to various outcomes of interest. The forecast one or more forms of human malfeasance. These associations found can then be used to forecast those out can be relatively specific behavioral events such as an arrest comes when they are not known. Over the past several for a homicide, or broad categories of behavioral events decades, regression statistical procedures have dominated such as an arrest for any felony. The events can also be the actuarial determination of empirically based risk fac undesirable behavior that is not necessarily criminal, such tors. By and large, this enterprise has been a success. But as failing to report to a parole office for a drug test. the increasing availability of very large datasets coupled Rhetorical packaging usually centers on forecasting with new data analysis tools promise dramatically better undesirable outcomes, but one automatically gets the flip success in the future. Machine learning will be a dominant side: forecasts of desirable outcomes as well. When, for statistical driver. instance, an individual is forecasted not to fail on probation, There is now a substantial and compelling literature in one has forecasted a ‘‘success.’’ Forecasts of success can be statistics and computer science showing that machine coupled with less intrusive and costly criminal justice learning statistical procedures will forecast at least as interventions, or no interventions at all, that may be used in accurately, and typically more accurately, than older a sentencing context to mitigate ‘‘over incarceration.’’ Also, approaches commonly derived from various forms of one is not limited to forecasts of binary outcomes. Forecasts regression analysis.1 The experience in a variety of criminal into any of several categories are easily accommodated justice settings is no different.2 Claims to the contrary are (e.g., an arrest for a felony, an arrest for a misdemeanor, badly misinformed.3 a technical violation, or no behavior that could lead to Forecasting methods derived from machine learning incarceration). offer a singular opportunity to provide decision makers Because the forecasts are for different categories of with more accurate, transparent, and properly evaluated behavior, the enterprise is often called ‘‘classification.’’ predictions of criminal behavior. Although such forecasts There need be no numerical risk scores, either as

Federal Sentencing Reporter, Vol. 27, No. 4, pp. 222 228, ISSN 1053 9867, electronic ISSN 1533 8363. © 2015 Vera Institute of Justice. All rights reserved. Please direct requests for permission to photocopy or reproduce article content through the University of California Press’s Rights and Permissions website, http://www.ucpressjournals.com/reprintInfo.asp. DOI: 10.1525/fsr.2015.27.4.222.

222 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:49:38 AM All use subject to JSTOR Terms and Conditions stand alone forecasts or as precursors to a forecasted class. simply forecasted the absence of a homicide for all offen Thus, the output usually looks nothing like an LSI R score ders, one would be correct 99.5 percent of the time. It is (Level of Service Inventory Revised, a general risk instru hard to imagine any forecasting method doing better than ment) or a fitted probability from logistic regression. that, and there is no need to even think about predictors. Rather, an individual can be placed directly into an outcome However, if false negatives are especially costly, the 99.5 class. One important consequence is that machine learning percent accuracy figure is very misleading. The 0.5 percent forecasting algorithms can be tuned explicitly with fore of the time that the forecasts are wrong is actually very casting accuracy as the primary objective. significant and much more important than the magnitude Some forecasting errors are always to be expected. How of the 0.5 percent figure suggests. Fortunately, if the relative those errors are treated can vary across forecasting meth cost of false negatives is high, the cost ratio will alter the ods. Forecasting into classes of behavior has the benefit of forecasts, implicitly giving the very rare outcomes far creating easy and explicit ways to incorporate properly the greater weight. It is as if, say, 25 percent of the offenders relative costs of different kinds of forecasting errors. Con will actually commit a homicide a very rare event now not sider on the one hand a forecast that an individual being nearly so rare, and forecasts that ignore the predictors will considered for probation will be crime free, whereas that be correct only the equivalent of 75 percent of the time. individual subsequently commits a homicide. One has There is room for the predictors to play a useful role, exactly probably wasted an opportunity to prevent a murder. Con as they should under that cost ratio. sider on the other hand a forecast that an individual being In short, forecasts should be constructed that reflect considered for probation will commit a homicide, whereas ‘‘asymmetric’’ costs as needed, and this is precisely what that individual subsequently is crime free. One has proba machine learning classification procedures can often do. bly misallocated resources and caused collateral injury to Conventional approaches typically proceed as if the costs of the offender. The consequences of the two kinds of fore false negatives and false positives are exactly the same, even casting errors, a false negative and a false positive, respec in the face the clear stakeholder preferences to the contrary. tively, are very different. Their social and economic costs Terribly misleading forecasts can result. are different too. And tradeoffs can never be avoided. Many criminal justice stakeholders will treat false III. Building and Evaluating a Machine Learning negatives as more costly than false positives. When this Forecasting Procedure policy preference applies, the standard of statistical proof Machine learning classification procedures are algorithmic necessarily will be lower for forecasts of a homicide. The in nature.11 Unlike conventional risk assessment proce intent is to not release an individual who will commit dures, which generally rely on known ‘‘models’’ of likely a homicide and, in trade, to accept a larger number of false behaviors, machine learning is atheoretical. As a result, positives. Should false positives been seen as more costly explanation is a secondary concern. For example, from than false negatives, the standard of statistical proof will a modeling point of view, interpretable results depend on necessarily be lower for forecasts of no homicide. The a relatively modest number of predictors having strong intent is to not incarcerate individuals who will not commit associations with the outcome. Predictors having weak a homicide and, in trade, accept a larger number of false associations with the outcome are implicitly folded into the negatives. ‘‘noise’’ (e.g., a ‘‘disturbance term’’) and assumed, on the Consequently, the tradeoffs between false negatives and average, to cancel out. Explanations for the outcome are, false positives directly impact the forecasts made. In par quite sensibly, derived solely from the strong association ticular, the forecasts respond to the ratio of the costs asso predictors. But if forecasting accuracy is the goal, predictors ciated with false negatives to costs associated with false having weak associations with the outcome should not be positives. For example, incorrectly forecasting that an ignored because a large number of weak associations can in individual will not commit a homicide might be seen as ten the aggregate dramatically improve forecasting accuracy. times worse than incorrectly forecasting that the individual Whether all of those weak associations conform to social will commit a homicide. It follows that a given individual science theory or practitioner preconceptions is irrelevant properly forecasted to commit a homicide under one cost in part because one by one, they do not matter much. In ratio can be properly forecasted to not commit a homicide practice, this means that machine learning procedures can under another cost ratio. This ratio, therefore, must be work with, and should work with, a very large number of responsive to stakeholder preferences if the sentence given predictors even if there are more predictors than obser is to appropriately balance assessments of competing vations. Forecasts using well over 100 predictors are not harms. unusual. This is well beyond the capabilities of traditional An unappreciated feature of cost ratios is that they can modeling approaches. facilitate forecasts of relatively rare outcomes. For example, The procedure building process begins with ‘‘training among a set of individuals convicted of a felony, the fraction data.’’ These are data that include the outcomes to be fore released on probation that subsequently commit a homi casted and the predictors on which the forecasts will be cide may be about 1 in 200.10 Yet it is precisely for the based. If the intent is to develop forecasting procedures to 0.5 percent that accurate forecasts are most needed. If one be used subsequently in real time, one should only include

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 223

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:49:38 AM All use subject to JSTOR Terms and Conditions predictors that will be routinely available when the forecasts classes, there will be four proportions. These anticipate how will be needed. Important predictors included in the train well the forecasts are likely to perform in practice. For such ing data, but not likely to be routinely available subse proportions, very high accuracy rates can be difficult to quently, are of no help in practice. For example, achieve because the relative costs of forecasting errors will performance in an experimental drug treatment diversion often have a substantial impact. For example, when larger program open to a small subset of offenders cannot provide relative costs are assigned to false negatives compared to information on the performance of all offenders, or offen false positives (e.g., 5 to 1), one necessarily builds into any ders in the future, should that program be discontinued. forecasts a larger number of false positives. Thus, for Training data with many observations is desirable. a given outcome, stakeholders may be prepared to accept Machine learning procedures can improve their forecasting two false positives for every true positive. Then the accuracy skill by searching within the training data in ways that can rate for true positives cannot in practice be better than automatically lead to transformations of the predictors about 33 percent. If false negatives are given less weight, the provided, including functions that combine predictors in forecasting accuracy rate for true positives can be higher. new ways. To take a simple example, age and gender indi Thus, one might correctly classify fewer individuals who vidually may be strongly associated with recidivism. But will commit a homicide, but when such individuals are when a convicted felon is a male in his late teens, the link to identified, the forecast is more likely to be correct. recidivism is strengthened; the pair of predictors acting in It is generally not helpful to combine either kind of concert is stronger than the sum of the two associations by proportion into a single, overall, performance measure themselves. With a conventional modeling approach, this because differences in the proportions can convey very interaction effect would need to be anticipated and built important information to stakeholders. For example, a sen into the model a priori. Such prescience is not required for tencing judge may well want to know whether a forecast machine learning approaches because if an interaction that an individual will commit a violent crime is more effect improves forecasting accuracy, it will be likely be accurate than a forecast that an individual will commit found. In effect, new predictors are constructed as needed a nonviolent crime. However, if overall measures of fore using the existing predictors as the raw material. casting accuracy are desirable, the different kinds of fore One should never evaluate the forecasting accuracy of casting errors should be weighted by their relative costs. any procedure with the data used to build that procedure. Some kinds of forecasting errors are worse than others and Virtually all forecasting procedures, whether model based should count more in the overall performance measure. or algorithmic, will exploit local idiosyncrasies in the Because the forecasts are outcome classes, and because training data. This ‘‘overfitting’’ will lead to falsely opti for most procedures there are no numerical scores that mistic assessments of forecasting performance and mis claim to accurately represent model based risk, many con leading evidence about how well the forecasting procedures ventional measures of forecasting accuracy are irrelevant or will work in practice. Forecasting performance should be ill advised. Such measures can be irrelevant because the evaluated with test data that include the same predictors kinds of distinctions just described have no analogues. It and outcomes as the training data, but for observations not may be, for example, that no distinction between classifi used to build the forecasting procedure. This is the only way cation accuracy and forecasting accuracy can be provided. that honest assessments of forecasting performance can be At sentencing, a judge needs measures of forecasting obtained. Ideally, the training data and the test data should accuracy. Measures that do not capture forecasting accuracy both be probability samples from the population of indivi can be ill advised because when employed where they do duals for whom forecasts will later be needed. not belong, claims of accuracy can be very misleading. There are several useful ways to measure forecasting Misinformed sentencing decisions can be made if a judge accuracy in the test data. When the forecasts are outcome is only given measures of classification accuracy, but mis classes, a simple and sensible approach is to consider how takenly makes inferences about forecasting accuracy. The often forecasting errors are made. There can be two kinds. resulting sentences and their intended impact for the First, what proportion of the time does the forecasting individual and community may be very wide of the mark. procedure correctly identify the observed outcome? Some call this ‘‘classification accuracy.’’ If that outcome has, say, IV. Random Forests as a Machine Learning Illustration four outcome classes, there will be four proportions. These Random forests is a machine learning procedure that has go directly to the quality of the forecasting procedure itself proved to be very effective as a forecasting tool in criminal and can indicate where improvements are needed; they are justice settings. There are other machine learning proce useful diagnostics. By such measures, one can find peer dures that can perform about as well, even occasionally a bit reviewed, machine learning results for criminal justice better, but random forests comes with an especially rich outcomes in which actual outcomes are correctly identified menu of output features that can dramatically facilitate in advance over 75 percent of the time.12 interpretations and in broad brush strokes is relatively easy Second, when a forecast is actually made, what propor to explain. tion of the time is that forecast correct? Some call this The random forests machinery proceeds in two stages. ‘‘forecasting accuracy.’’ Again, if there are four outcome The first results in the construction of a large number of

224 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:49:38 AM All use subject to JSTOR Terms and Conditions classification trees. A common default is 500 trees. In the once because additional splits of that predictor may pro usual criminal justice setting, each tree is nothing more ductive. For example, a first threshold for age might be at 19 than a partitioning of the training data into a collection of and then among those over 19, a second threshold might be case portraits. Each portrait is a set of predictor features at 28. By sampling in this manner, even very weak predic associated with a given outcome class. tors can be chosen as ‘‘best’’ because they are sometimes Suppose that an outcome to be forecasted is a subse a member of a random predictor subset that contains no quent arrest for intimate partner violence. The random competing strong predictors. The use of two forms of ran forests algorithm searches over the training data for con dom sampling of cases and of predictors is where the figurations of predictor values that are associated with such ‘‘random’’ in random forests comes from. arrests. To take a simple example, the subset of defendants A full set of classification trees is considered to be the who are unemployed males, under 30 years of age, with random forest on which the second stage of the procedure prior arrests for assault, and who are in the process of is implemented. That stage combines the results from each breaking up with a pregnant wife or girlfriend may be likely tree to arrive at a forecast. A vote is taken. If a plurality of to reoffend. In contrast, the subset of defendants who are trees places an individual in a given outcome class, that retired males over 55, with no prior arrest for assault, and becomes the forecasted class. In a sense, each tree is con who have a long term, ongoing relationship with a wife or sidered an expert at classification, and the vote can be seen girlfriend past childbearing age may be unlikely to reoffend. as a form of crowdsourcing. The different portraits can be visualized as paths down Where do the test data come in? Recall that each clas branches of a tree; hence the name ‘‘classification tree.’’ sification tree is grown from a random sample of the To facilitate the subsetting process, all predictors are training data. The sampling is done ‘‘with replacement,’’ utilized as binary variables. Consider a predictor such as the which means that a given observation can be used more number of prior arrests. The algorithm splits the number of than once in the construction process. It follows that for any prior arrests into two categories using a threshold that given tree, on the average, about a third of the training maximizes any association with the outcome. For example, sample is not included at all by the luck of the draw. The the split might be imposed at no priors versus one prior or observations not included when that tree is grown serve as more. No other partitioning on that variable will lead to test data, and legitimate forecasts can be made for those a stronger association between the number of priors and observations and that single tree. When the vote over trees the outcome. Then, cases can be a member of only one of is taken for any case, a vote only counts if it is derived from the two subsets, depending on which side of the threshold trees in which that case was at random not used as part of they fall. the training data. This clever procedure means that one Categorical predictors are handled in a similar manner. does not need to begin the data analysis with a training data Suppose the employment predictor has four categories: set and a separate test data set. Random forests constructs employed full time, employed part time, unemployed test data sets on the fly. looking for work, and unemployed not looking for work. Both stages are usually repeated several times because, The algorithm considers all possible ways to collapse the like most machine learning procedures, random forests four categories into two so that the association with the comes with turning parameters. For example, the number outcome is the strongest. The choice might be to include of trees grown is a tuning parameter. It is standard practice the individuals who are unemployed and not looking for to try several values for certain tuning parameters in an work as one group, and individuals who fall into any of the effort to improve forecasting accuracy. Like the available other three categories as the other group. predictors, tuning parameter values are specific to the Each classification tree is grown from a random sample application and context for which the model was con of the training data so that each tree can respond to varying structed. Once a satisfactory set of tuning parameter values chance features of the data. Then, the development of is determined, one has a forecasting procedure developed forecasting portraits for a given tree proceeds sequentially from the training sample that can then be ported elsewhere under another kind of sampling procedure. At each step, and used to forecast unknown results. For new observa the algorithm must choose the next predictor to include tions with undetermined outcomes (e.g., post sentence from a small random sample of all predictors. For example, recidivism), the known values of predictors from those in the first step, a random sample of, say, five predictors is observations can be used to arrive at estimates of what the chosen from which the algorithm has to decide which is unknown outcomes will be. most strongly related to the set of outcome classes once the In summary, random forests used for criminal justice best split of each predictor is determined. In the second forecasting is essentially a very flexible regression proce step, the next predictor is chosen in the same manner from dure for outcome variables that are usually represented by another random sample of five predictors, given that the two or more categories. It can build in the asymmetric costs first predictor is already included in its optimal binary form. of forecasting errors and automatically provides forecasts All subsequent steps proceed in the same manner; with the from test data. There is no model in the conventional sense, two best predictors in place, a third is chosen, and so on. so its predictive portraits are assembled inductively. Expla Along the way, a given predictor can be chosen more than nation necessarily takes a back seat to forecasting accuracy.

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 225

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:49:38 AM All use subject to JSTOR Terms and Conditions Conventional, model based forecasts typically reverse these similarly. This is an improvement directly supporting due priorities. It follows that at sentencing, random forests can process and fairness. provide forecasts that would often perform dramatically Consistency in sentencing is another benefit that may better than forecasts derived from conventional modeling favor the constitutionality of actuarial methods. Recall that approaches. the push for sentencing guidelines three decades ago was in part motivated by inexplicable and unacceptable variation in V. Some Common Misunderstandings sentences across different jurisdictions and even among Risk assessments based on machine learning can be different judges within a given jurisdiction. There were law grouped with most current risk assessment tools as an and order jurisdictions and permissive jurisdictions. There actuarial method. Actuarial approaches in criminal justice were ‘‘hanging judges’’ and ‘‘bleeding heart’’ judges. settings have been of late subjected to substantial criticism. Although hardly a perfect solution, sentencing guidelines Random forests is then implicitly criticized as well. How were seen as one constructive response. Another con ever, much of the criticism is on close inspection misin structive response may be the use of actuarial forecasts at formed, and it is important to respond, even if briefly. sentencing. First, questions have been raised about the constitu One must also consider, for constitutional purposes, the tional merit of the variables used to construct the forecasts context in which prediction is undertaken. It may be that could be used in sentencing. These concerns are not impermissible and is certainly unwise to turn over the unique to random forests, nor do they rest upon any char sentencing process in its entirety to an actuarial forecast, and acteristic of that procedure. Their very generality under indeed, no credible observers have to our knowledge seri scores the need for a thorough airing,13 but that would take ously made this recommendation. Instead, risk informa us far beyond the scope of this venue. Moreover, it may well tion is used to inform and support decision making in line turn out that the challenges are more provocative than with permissible judicial discretion. Consider Pennsylva compelling. Years of legal precedent have shaped the nia: with an advisory guidelines system, judges are pro information that can be considered at sentencing, and vided information about the instant offense and prior when the predictors employed stay within those bound history that results in the calculation of a recommended aries, the use of actuarial methods likely remains consti range for the sentence. Then, the judge must consider tutional. The explicit use of race, national origin, and other aggravating and mitigating factors, of which risk is only one suspect classes for forecasting, regardless of the method, (though it cuts both ways), before arriving at a sentence. would likely fail to meet the necessary, strict scrutiny The federal system operates similarly. This approach, with threshold.14 On the other hand, criminal history is relatively qualifications, has been well vetted and deemed constitu uncontroversial.15 Random forests procedures allow for the tionally permissible.19 The Sentencing Commission’s clear, exclusion and inclusion, respectively, of these factors. legislative purpose requires a sentencing system with Few courts have explicitly considered the constitution ‘‘sufficient flexibility to permit individualized sentences ality of actuarial forecasting methods. In Malenchik v. when warranted by mitigating or aggravating factors not Indiana,16 for example, the court evaluated the constitu taken into account in the establishment of general sen tionality of using LSI R scores (a common and long used tencing practices,’’20 as well as creating the opportunity for tool for determining correctional supervision) at sentenc the consideration of ‘‘advancement[s] in knowledge of ing. In that case, and despite the ‘‘off label’’ application of human behavior as it relates to the criminal justice pro the LSI R, its use was upheld. Though not all courts agree,17 cess.’’21 Employing sophisticated, advanced forecasting data the logic behind the Malenchik decision bolsters a contin at sentencing is supported under these purposes, as well as ued reliance on actuarial forecasting in sentencing. As the in ensuring that sentences are ‘‘sufficient, but not greater Supreme Court noted in Jurek v. Texas, ‘‘prediction of than necessary,’’ as required under 18 U.S.C. § 3553(a). As future conduct is an essential element in many of the the Court notes in Rita v. United States,22 the resulting decisions rendered throughout the criminal justice sys sentencing scheme, type of information that can be tem ...And any sentencing authority must predict a per employed, as well as goals for this process ‘‘embody the son’s probable future conduct when it engages in the § 3553(a) considerations, both in principle and in practice.’’ process of determining what punishment to impose.’’18 A second claim is that actuarial methods are insuffi The transparency afforded by actuarial methods also ciently accurate. The obvious question is: Compared to supports the constitutionality of their employ. Consider what? If the comparison is to clinical methods, there is the alternatives. Under the current, unstructured process, a very credible scientific literature, beginning with work by absent an explicit (and unlikely) pronouncement that Robyn Dawes and others several decades ago, showing a sentence was based on race (or nationality, gender, etc.), actuarial methods to be vastly superior.23 If the comparison one may have little information on the underlying goals or is to criminal justice decisions made by judges, prison rationale for a particular sentence. With actuarial meth administrators, or parole boards, we have not found a single ods, the sentencing rationale can be made explicit. The credible study in which empirical comparisons are actually goal of classification, including random forests, is to made. And, given the results when clinical judgments are ensure that statistically like individuals are treated the benchmark, one must be very skeptical.

226 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:49:38 AM All use subject to JSTOR Terms and Conditions A third claim is that some of the predictors used in Unpacking the exact meaning of the proxy argument actuarial methods ‘‘double count’’ certain features of requires more space than we have, and an instructive dis offenders. For example, a magistrate at an arraignment may cussion can be found elsewhere.26 Nevertheless, readers use prior record to help decide whether a defendant should might consider the follow fact: people disproportionally be held until a subsequent court appearance. Should the commit crimes against people like themselves. For exam defendant later be convicted, prior record may be used to ple, homicide is by far the leading cause of death among help determine the sentence. A prior record empirically young black males. And young black males disproportion associated with a risk to the public is considered twice. But ately commit those homicides. Suppose one could purge with actuarial methods, the arraignment decision and sen actuarial methods of all racial factors captured indirectly tencing decision are quite different decisions, made at quite through proxy predictors. It is almost certain that forecast different points in the adjudication process. The magistrate ing accuracy would decline, in some applications substan is concerned about the short term in which the choice can tially. In the case of homicide, there would be an increase in be jail or release. The judge is concerned about the long the number of deaths among young black males that could term in which the choice can be prison or probation. If one have been forecasted and in principle prevented. How were to take claims about ‘‘double counting’’ seriously, the many such deaths are stakeholders prepared to trade for predictors used at an arraignment would need to be totally actuarial forecasts completely purged of racial content? different from the predictors used at sentencing. This Victims count for something too. would be unwise and impractical. Imagine if one required that the sentences attached to separate charges had to be VI. Conclusion informed by disjoint, individual features of the crime and Our focus has been on forecasts used to help inform sen the offender. tencing decisions. The public at large, law enforcement A fourth claim is that actuarial methods are not personnel, and offenders themselves may all be put in ‘‘dynamic.’’ Predictor values are frozen in time and do not harm’s way if a sentence does not sufficiently reduce risk. properly take into account that people and their life cir At the same time, failing to identify nonviolent defendants cumstances change. There is absolutely nothing about amenable to reform can be costly and can directly disad actuarial methods that precludes the use of dynamic pre vantage such individuals. In either case, forecasting accu dictors, and many routinely do. racy is paramount. Machine learning offers superior Age is the most obvious example and serves as an forecasting accuracy and more, at least compared to tradi important proxy for any number of dynamic processes. tional methods of forecasting and unstructured clinical Over time, an individual’s physical, emotional, social, and judgment. circumstantial attributes change. One can also capitalize on Actuarial risk assessment tools including random chronological features of prior record. For example, fore forests and other, advanced approaches remain entirely casts of threats to public safety have used the elapsed time compatible with a system of punishment based on the just between the current offense and the most recent earlier deserts and limited retributivist ideologies. Predictions of offense. The shorter that elapsed time, the greater the risk risk, derived from machine learning procedures, can better of offending. Other dynamic factors that can (and have help judges decide when to sentence toward the top (or the been) used to forecast threats to public safety include bottom) of the range of recommended sentences, or per employment and marital status.24 For individuals who have haps even outside of those ranges. Under a just deserts previously served time in prison, measures of behavior in philosophy, balanced sentencing requires a consideration prison can be exploited; prison records often contain of blameworthiness and proportionality as well as how the information about an inmate’s performance on prison jobs, defendant, sitting in court, is likely to act in the future.27 the number and kinds of prison rule infractions, and even The integration of accurate forecasting directly incorporates scores on various psychological tests. Clinical judgment can of the ideals of limited retributivism. Under this hybrid also have merit, and some have advocated for the inclusion system, principles of uniformity and proportionality are of clinically derived assessment data within actuarial fore used to set a range for given sentence, after which ‘‘other casting processes.25 All actuarial methods, including ran principles provide the necessary fine tuning of the sentence dom forests, can easily accommodate such predictors. imposed in a particular case, includ[ing] not only traditional Finally, some have asserted that actuarial methods build crime control purposes such as deterrence, incapacitation, in racial bias. An immediate question is, as before: Com and rehabilitation, but also a concept known as parsi pared to what? Are criminal justice decision makers who mony a preference for the least severe alternative that will rely on their own clinical judgment any more race neutral? achieve the purposes of the sentence.’’28 Forecasting pro Where’s the evidence? In addition, actuarial methods need vides the information essential to this process. not include race as a predictor, and to the best of our At best, alternate approaches for assessing risk, knowledge, most do not. although sometimes appealing in their simplicity, share A related claim is far more difficult to pin down and and, in most cases, redouble the limitations of machine introduces a number of technical issues. The claim is that at learning. At worst, they sacrifice readily available statistical least some of the included predictors are proxies for race. gains, responsiveness to stakeholder preferences, and

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 227

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:49:38 AM All use subject to JSTOR Terms and Conditions much needed transparency. Ignoring the potential value of 9 See Jordan M. Hyatt, Steven L. Chanenson & Mark H. Berg modern forecasting methods is, in itself, a policy decision strom, Reform in Motion: The Promise and Perils of Incorporating Risk Assessments and Cost Benefit Analysis into Pennsylvania that can imply a willful blindness to procedures that could Sentencing, 49 Duq. L. Rev. 707 50 (2011); see also John save lives, as well as spare individuals and the criminal Monahan & Jennifer L. Skeem, Risk Redux: The Resurgence of justice system the costs of unnecessary incarceration. Risk Assessment in Criminal Sanctioning, 26 Fed. Sent’g Rep. When applied to sentencing decisions, whether for indivi 158 66 (2014). 10 duals who are dangerous or those who are amenable to See Richard Berk, The Role of Race in Forecasts of Violent Crime, 1 Race & Soc. Probs. 231 42 (2009). diversion, machine learning can provide a better way. 11 See Leo Breiman, Statistical Modeling: The Two Cultures,16 Stat. Sci. 199 231 (2001). 12 See Berk, supra note 10. Notes 13 Melissa Hamilton, Risk Needs Assessment: Constitutional and * This project was supported in part by a grant under the Bridg Ethical Challenges, Am. Crim. L. Rev. (forthcoming 2015). ing Research and Practice Program awarded by the National 14 See Carissa Byrne Hessick & F. Andrew Hessick, Recognizing Institute of Justice. Points of view are those of the authors and Constitutional Rights at Sentencing, 99 Calif. L. Rev. 47 94 do not necessarily represent the official position or policies of (2011); see also Davinder Sidhu, Moneyball Sentencing, the Department of Justice. Boston Coll. L. Rev. (forthcoming 2015). 1 Trevor Hastie, Robert Tibshirani & Jerome Friedman, the Ele 15 See Almendarez Torres v. United States, 523 U.S. 224 (1998). ments of Statistical Learning: Data Mining, Inference, and 16 See Malenchik v. State, 928 N.E.2d 564 (Ind. 2010). Prediction (2nd ed. 2009). 17 See James C. Oleson, Risk in Sentencing: Constitutionally Sus 2 Richard Berk, Criminal Justice Forecasts of Risk: A Machine pect Variables and Evidence Based Sentencing, 64 Smu L. Rev. Learning Approach (2012). 1329 402 (2011). 3 See Richard Berk & Justin Bleich, Forecasts of Violence to 18 See Jurek v. Texas, 428 U.S. 262 (1976). Inform Sentencing Decisions, 30 J. Quantitative Criminology 19 See United States v. Booker, 543 U.S. 220 (2005). 79 96 (2014). 20 See 28 U.S.C. § 991(b)(B) (2008). 4 See Steven L. Chanenson, Prosecutors and Evidence Based Sen 21 See id. at (b)(C). tencing: Rewards, Risks, and Responsibilities, 1 Chap. J. Crim. 22 See Rita v. United States, 551 U.S. 338 (2007). Just. 27 41 (2009). 23 See Robyn M. Dawes, David Faust & Paul E. Meehl, Clinical 5 See Leo Breiman, Random Forests, 45 Machine Learning 5 32 Versus Actuarial Judgment, 243 Sci. 1668 74 (1989); see also (2001). William M. Grove & Paul E. Meehl, Comparative Efficiency of 6 See Richard Berk, Brian Kriegler & Jong Ho Baek, Forecasting Informal (Subjective, Impressionistic) and Formal (Mechanical, Dangerous Inmate Misconduct: An Application of Ensemble Algorithmic) Prediction Procedures: The Clinical Statistical Statistical Procedures, 22 J. Quantitative Criminology 131 45 Controversy, 2 Psychol. Pub. Pol’y & L. 293 323 (1996); see (2005); see also Richard Berk, Lawrence Sherman, Geoffrey also William M. Grove, David H. Zald, Boyd S. Lebow, Beth E. Barnes, Ellen Kurtz & Lindsay Ahlman, Forecasting Murder Snitz & Chad Nelson, Clinical Versus Mechanical Prediction: A Within a Population of Probationers and Parolees: A High Stakes Meta Analysis, 12 Psychol. Assessment 19 30 (2000); see Application of Statistical Learning, 172 J. Royal Stat. Soc’y also D.A. Andrews, James Bonta & J. Stephen Wormith, The Series A 191 211 (2009) (U.K.) (employing random forests to Recent Past and Near Future of Risk and/or Need Assessment, forecast homicide); see also Geoffrey C. Barnes & Jordan M. 52 Crime & Delinq. 7 27 (2006). Hyatt, Classifying Adult Probationers by Forecasting Future 24 See Berk, supra note 2. Offending (March 2012) (unpublished report) (on file with the 25 See John Monahan, Henry J. Steadman, Paul S. Appelbaum, National Institute of Justice) (evaluating random forest model Thomas Grisso, Edward P. Mulvey, Loren H. Roth, Pamela that forecasts probationer and parolee behavior); see also Clark Robbins, Stephen Banks & Eric Silver, The Classification Berk & Bleich, supra note 3 (demonstrating that random for of Violence Risk, 24 Behav. Sci. & L. 721 30 (2006). ests can effectively be employed at sentencing). 26 See Berk, supra note 10. 7 See Don M. Gottfredson, Prediction and Classification in Crim 27 See Richard S. Frase, Punishment Purposes, 58 Stan. L. Rev. inal Justice Decision Making, 9 Crime & Just. 1 20 (1987). 67 84 (2005); see also Richard P. Kern & Mark H. Bergstrom, 8 See Mark D. Cunningham & Thomas J. Reidy, Violence Risk A View From the Field: Practitioners’ Response to Actuarial Assessment at Federal Capital Sentencing Individualization, Sentencing: An ‘‘Unsettled Proposition,’’ 25 Fed. Sent’g Rep. Generalization, Relevance, and Scientific Standards, 29 Crim. 185 89 (2013). Just. & Behav. 512 37 (2002). 28 See Frase, supra note 27, at 68.

228 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:49:38 AM All use subject to JSTOR Terms and Conditions The New Profiling: Why Punishing Based on Poverty and Identity Is Unconstitutional and Wrong Author(s): Sonja B. Starr Source: Federal Sentencing Reporter, Vol. 27, No. 4, The Risk Assessment Era: An Overdue Debate (April 2015), pp. 229-236 Published by: University of California Press on behalf of the Vera Institute of Justice Stable URL: http://www.jstor.org/stable/10.1525/fsr.2015.27.4.229 Accessed: 08/05/2015 10:47

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

University of California Press and Vera Institute of Justice are collaborating with JSTOR to digitize, preserve and extend access to Federal Sentencing Reporter.

http://www.jstor.org

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:47:39 AM All use subject to JSTOR Terms and Conditions The New Profiling: Why Punishing Based on Poverty and Identity Is Unconstitutional and Wrong

Let’s begin with a thought experiment. Suppose a judge is pervade U.S. criminal sentencing and have long dominated sentencing two co defendants, Jones and Smith, for a small parole decision making. Under these systems, judges and scale cocaine trafficking offense, having found that both parole boards are told to consider risk scores that are based were equal contributors to their conspiracy. Neither has not only on criminal history, but also on socioeconomic and SONJA B. a criminal record, both express sincere remorse, and no family related disadvantages as well as demographic traits STARR* mandatory minimum applies. The judge sentences Jones to like gender and age. The advocates of this actuarial turn are Professor of Law, probation and Smith to three years in prison. She explains: not all law and order conservatives; they span the political University of spectrum, and include many progressive critics of mass Michigan Jones is a married, college educated accountant who incarceration. Their case, in a nutshell, is that risk assess lives in a nice suburb. He grew up middle class in ment can help courts protect public safety (the Smiths of Visiting Professor of a stable household; nobody in his family has a crim the world commit lots of crime!) while avoiding unneces Law, Harvard inal record or has been a crime victim. People like sary incarceration (why would we lock up someone like University him might slip up, but they tend to learn their lesson Jones?). I agree that public safety and reducing incarcera the first time. But Smith is different. He grew up on tion are both important objectives, and that various forms public assistance, in and out of foster care, with his of data driven decision making can help us to balance them. dad in prison. He dropped out of school at 17 in order I also agree that prevention of recidivism is one of the to get a job, because his family was facing eviction. traditional purposes of sentencing, and I do not contend He worked for a year at an auto shop, but then the shop went under and since then he has only found that this is improper; my focus is on how this goal should short term jobs. He lives in a high crime neighbor and should not be pursued. And I appreciate the sense of hood and got robbed twice last year. He’s working on urgency that motivates many risk assessment advocates his G.E.D. and applying for jobs, but I bet he won’t the sense that our overextended criminal justice system is find anything more than minimum wage. For people in crisis, and something has to be done about it. like Smith, the first crime is often just the beginning. But sentencing based on socioeconomic and demo It’s safer to send him to prison. graphic factors cannot be the right path. If judges or pol icymakers would be embarrassed to embrace ideas like ‘‘we The judge’s generalizations have some truth to them. should increase people’s sentences for being poor’’ openly, People with Smith’s disadvantages are, on average, likelier then they should not do so covertly by relying on a risk score to reoffend. Even so, I suspect that by the time the judge that is substantially driven by such factors. Doing so merely finished her explanation, many jaws in the courtroom uses dry, technical language to obscure discrimination that would have dropped. Perhaps none of us would be sur we would otherwise never accept. prised to learn that defendants like Smith often face worse In this short piece, I focus exclusively on equality related outcomes in the criminal justice system, but many of us concerns associated with the use of demographic, socio think of this as an inequality that the system should work to economic, and family related variables in risk assessments eliminate, and we would not expect to hear a judge openly that are used to determine a defendant’s punishment. embrace it. (If anything, one might have expected Smith’s Specifically, I seek to persuade readers that basing sen counsel to recount the same facts all of which concern tences on these variables is not only wrong, but unconsti hardships outside Smith’s control in an argument for tutional an argument that I laid out in much more detail mitigation.) And we might be even more shocked to hear in a recent Stanford Law Review article.1 Other than those a politician make a speech endorsing adoption of such troubling variables, modern risk assessment instruments reasoning in all cases, arguing explicitly that people should also include criminal history factors and sometimes psy systematically face extra punishment specifically because chological or attitudinal assessments; although these fac they are poor or were born with the wrong traits or into the tors may raise other concerns (such as those explored by wrong families. Bernard Harcourt later in this Issue), I do not explore those And yet exactly that reasoning is at the core of the here.2 I also bracket, for now, potentially serious procedural ‘‘actuarial’’ risk assessment practices that increasingly objections to the way the assessments are administered,

Federal Sentencing Reporter, Vol. 27, No. 4, pp. 229 236, ISSN 1053 9867, electronic ISSN 1533 8363. © 2015 Vera Institute of Justice. All rights reserved. Please direct requests for permission to photocopy or reproduce article content through the University of California Press’s Rights and Permissions website, http://www.ucpressjournals.com/reprintInfo.asp. DOI: 10.1525/fsr.2015.27.4.229.

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 229

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:47:39 AM All use subject to JSTOR Terms and Conditions including their lack of transparency3 as well as the self history. For good reason, past generations of sentencing incrimination problem that emerges when defendants are reformers sought to eliminate the influence of the very effectively compelled to participate.4 factors that the new reformers now quietly embrace.7 The U.S. Sentencing Guidelines, for example, forbid consider I. The Dark Side of Actuarial Risk Assessment ation of socioeconomic status and gender.8 Similarly, to Before proceeding to the constitutional argument, I want to sentence people based on their family members’ or neigh highlight some facts that risk assessment advocates tend to bors’ conduct runs afoul of principles of individual play down, and point out the policy concerns they raise. responsibility and adds the state’s imprimatur to the stigma Like our hypothetical judge, today’s actuarial risk assess people from troubled family backgrounds already face. ment instruments deem defendants riskier based on indi Finally, the system risks seriously exacerbating the cators of socioeconomic disadvantage. Some relatively legitimacy problems that the U.S. criminal justice system spare instruments include only a few factors, typically already has. How can anyone plausibly defend the justice including education, employment status, marital status, system against a claim that it is biased against the poor, and age, along with past criminal history. The most widely when the state’s sentencing system outright punishes used instruments incorporate much more detailed analyses poverty? If you don’t want people to think the system is of the defendant’s financial, housing, family, and employ rigged, it helps not to rig it. ment history, current situation, and prospects. Many con sider gender, deeming males riskier. Some count crime II. The Constitutional Problem victimization as a risk factor, as well as living in a high crime Consideration of some of the above described factors is also neighborhood. Some also include assessments of the unconstitutional. Perhaps the most obvious problem con defendant’s attitudes or mental health related risk factors.5 cerns the use of gender in many instruments. But sen Advocates often refer to the method as ‘‘evidence based tencing based on socioeconomic factors also runs afoul of sentencing,’’ although the term is confusing, since the risk Supreme Court doctrine that is well settled, albeit appar score is typically unaffected by the evidence in the defen ently entirely forgotten amidst the recent enthusiasm over dant’s case (and by the charges for which he is being sen risk assessment. tenced). A more descriptive phrase is ‘‘actuarial Let’s consider gender first. A long line of Supreme sentencing’’ or just ‘‘profiling.’’ The ‘‘evidence’’ in ques Court cases forbids gender based ‘‘statistical discrimina tion comprises empirical studies of average recidivism rates tion.’’9 That is, the government is prohibited from relying for past offenders with similar characteristics. For example, on generalizations about the behavioral tendencies of men the weight a risk instrument assigns to unemployment and women, even if those generalizations have empirical would be loosely based on the strength of the association support. So, for example, the government may not apply between unemployment and average recidivism rates in different drinking ages to men and women, even though some past sample, holding other measured variables studies show young men are more than ten times as likely constant. to drive drunk.10 Likewise, the government may not rely on When the government instructs judges to consider risk gendered generalizations about juror voting patterns,11 scores based on factors like these, it is explicitly endorsing workforce participation,12 or students’ preferred learning sentencing discrimination based on factors the defendant environments.13 The key principle is that individuals cannot control. It is embracing a system that is bound to deserve to be treated as individuals, not as representatives worsen the intersectional racial, class, and gender dispari of their genders. ties that already pervade our criminal justice system. The Sentencing based on socioeconomic generalizations is risk assessment instruments commonly in use do not unconstitutional for very similar reasons. This defect has consider race (although some advocates of actuarial sen been completely overlooked in the risk assessment debate, tencing, such as Judge Kopf elsewhere in this Issue, pro perhaps because more generally, class based distinctions pose that it should be considered).6 But the socioeconomic, don’t receive heightened equal protection scrutiny. But the family, and neighborhood related factors that the instru criminal justice system is an exception to this general rule. ments do consider are highly race correlated. As to gender, A long line of Supreme Court jurisprudence, intertwining I suspect that many commentators do not worry so much equal protection and due process principles, applies special about its inclusion because men are not generally a socially scrutiny to state actions that entail adverse treatment of disadvantaged group. But the men in the criminal justice indigent criminal defendants.14 These cases embody what system are mostly disadvantaged along other dimensions, is sometimes known as the ‘‘Griffin equality principle,’’ and gender based sentencing could interact with these after Griffin v. Illinois, a 1956 case in which the plurality dimensions to exacerbate the extraordinary rates of incar opinion referred to the aim of providing ‘‘equal justice for ceration of young and poor men of color. poor and rich, weak and powerful alike’’ as ‘‘the central aim By telling judges to sentence based on such factors, the of our entire judicial system,’’ and stated that the ‘‘consti state is also endorsing a message: that it considers certain tutional guaranties of due process and equal protection groups of people dangerous because of who they are, not both call for procedures in criminal trials which allow no what they have done. This kind of message has a toxic invidious discriminations.’’15

230 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:47:39 AM All use subject to JSTOR Terms and Conditions When reviewing wealth related classifications in the restitution. In Bearden, the Court was more sympathetic to criminal justice system, the Court applies a standard that the state’s failure to pay argument than to its attempt to does not quite match any of the traditional ‘‘tiers’’ of equal argue that poverty increased recidivism risk. The Court protection review, but rather ‘‘requires a careful inquiry acknowledged that inability to pay might sometimes into such factors as ‘the nature of the individual interest necessitate an alternative punishment to ensure that the affected, the extent to which it is affected, the rationality of ‘‘defendant’s poverty in no way immunizes him from the connection between legislative means and purpose, punishment.’’22 To avoid inequality in the treatment of rich [and] the existence of alternative means for effectuating the and poor, the ‘‘sentencing court can consider the entire purpose.’’’16 Notwithstanding the term ‘‘rationality,’’ it is background of the defendant, including his employment clear from the case law that this test demands quite history and financial resources.’’23 Even then, however, if searching analysis. In fact, the Court’s requirement that a person is to be incarcerated because he cannot pay alternatives be considered mirrors a key feature of strict a financial penalty, the court must first determine ‘‘that scrutiny. alternatives to incarceration are not adequate ...to meet the For our purposes, the dispositive precedent is the State’s interest in punishment and deterrence.’’24 Supreme Court’s unanimous decision in Bearden v. In contrast to gender and socioeconomic variables, Georgia.17 The petitioner, Bearden, had initially been sen some other risk factors in the instruments are constitu tenced to probation and restitution, and then after he lost tionally permissible considerations. These include criminal his job, his probation was revoked. The Court, applying the history as well as some demographic classifications, such as test outlined above, held that this was unconstitutional age, that do not trigger special constitutional scrutiny. wealth based discrimination. The state had argued that the Other variables do potentially raise constitutional concerns, revocation was justified for two reasons: first, because the but challenges to them would push the boundaries of cur petitioner was no longer able to afford his restitution pay rent doctrine. For example, one could argue that sentencing ments; and second, because petitioner’s loss of employ based on a parent’s criminal record merits heightened ment rendered him a higher recidivism risk. In support of scrutiny; this argument would be novel, but could rely on the second proposition, the state cited several empirical a very strong analogy to illegitimacy.25 studies establishing a significant relationship between unemployment and recidivism.18 The Court did not ques III. Can Gender- and Poverty-Based Risk Assessments tion the existence of this relationship. Rather, it squarely Survive Constitutional Scrutiny? rejected the state’s attempt to use it to justify punishment: But can sentencing based on gender and socioeconomic status survive heightened scrutiny because of its crime [T]he State asserts that its interest in rehabilitating prevention purpose? I think not, for at least five reasons. the probationer and protecting society requires it to Many of these reasons also undercut the policy case for remove him from the temptation of committing actuarial sentencing. other crimes. This is no more than a naked assertion that a probationer’s poverty by itself indicates he may A. Reliance on Forbidden Generalizations commit crimes in the future. ...[T]he State cannot First and foremost, any plausible argument that these justify incarcerating a probationer who has demon strated sufficient bona fide efforts to repay his debt to classifications survive heightened scrutiny is precluded by society, solely by lumping him together with other the Supreme Court doctrine outlined above. That doctrine poor persons and thereby classifying him as danger does not suggest that, if a statistical generalization concerns ous. This would be little more than punishing a per a relationship with an important state interest, the state may son for his poverty.19 rely on it after all. Instead, it makes clear that the bar on statistical generalizations constrains the type of argument This holding, which remains good law, should be fatal to the state can make to show such a relationship. Many of the the use of socioeconomic variables in ‘‘evidence based cases cited above involved important state interests, such as sentencing.’’ Today’s risk assessment instruments do prevention of drunk driving (in Craig) or crime more gen exactly what the Bearden Court condemned: they ‘‘lump erally (in Bearden). But the prohibition on statistical dis [defendants] together with other poor persons and thereby crimination is an additional constitutional constraint classify [them] as dangerous.’’20 That classification may be beyond the required relationship to a state interest. In grounded in empirics, but so was the state’s attempt to United States v. Virginia, the Court further clarified this make the same argument in Bearden. The principle here is point by including the two points separately in a list of the the same one that underlies the gender cases discussed key principles of its gender discrimination law.26 above: individuals deserve to be treated as individuals.21 Any argument that these sentencing classifications A defendant’s poverty might sometimes permissibly be survive strict scrutiny would have to rely on exactly the type considered in sentencing, but not as the basis for group of statistical generalization the Court forbids. The risk based behavioral generalizations. Instead, it might be rele assessment instruments are nothing but statistical general vant for narrower purposes in individual cases for exam izations, and the entire case for them depends on the merits ple, assessment of a defendant’s ability to pay a fine or of those generalizations.

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 231

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:47:39 AM All use subject to JSTOR Terms and Conditions B. How Much Do the Instruments Improve Recidivism now, we don’t know, and the state bears the burden of Prediction? showing it.32 Even if the state were free to rely on statistical generaliza tions, it may have difficulty showing that the actuarial C. How Much Do the Problematic Classifications instruments substantially improve judges’ ability to assess Improve the Instruments? recidivism risk. To be sure, the literature does provide good Let’s suppose it could be shown that the actuarial instru empirical evidence that the risk scores have predictive ments produce substantially better risk predictions than power. They don’t provide a precise prediction of what any judges arrive at on their own. Even if so, this is not the key individual defendant will do,27 but in the aggregate, they do constitutional question. It is the contested classification substantially better than chance at sorting higher risk from itself that is subject to the relevant constitutional tests. To lower risk defendants. However, the right comparison survive an equal protection challenge, the state must not point isn’t chance. Rather, the state must show that the just defend actuarial instruments; it must defend the use of instruments substantially outperform the individualized, gender and socioeconomic variables in those instruments. informal assessments of the defendant’s crime risk that Unfortunately, most of the empirical literature on risk judges routinely perform in the absence of actuarial assessment does not assess the marginal gain in predictive instruments. accuracy that is added by including specific variables There is very little empirical research that bears on this which means the state, in general, cannot meet this burden. comparison. The few relevant studies that do exist have The recent research on pretrial risk assessment discussed reached differing conclusions, and have had quite serious in this issue by Anne Milgram, Alexander Holsinger, Marie flaws.28 For example, a recent Administrative Office of the VanNostrand, and Matthew Alsdorf supports the conclu U.S. Courts study of the new federal risk assessment sion that risk assessment instruments need not rely on instrument concluded that providing it to probation officers demographic or socioeconomic classifications rather, rendered their risk predictions more accurate.29 But the criminal conduct factors were far more predictive.33 This study, which tested officers’ evaluations of a single fictitious research points to a model that could be used in sentencing, case, had no actual case outcomes with which to measure in place of the discriminatory instruments that states have ‘‘accuracy.’’ Instead, circularly, it defined officers’ predic so far adopted; in that case, one might still worry about tions as accurate if they were consistent with the risk certain other policy concerns,34 but the constitutional assessment instrument! This approach meant that its con objections I raise here would be alleviated. Furthermore, clusion was hardly a surprise. Instead of demonstrating their conclusion is consistent with that of an earlier study by accuracy, the study actually simply shows that if you tell Joan Petersilia and Susan Turner, which did assess the officers to use a risk assessment instrument, they will. marginal predictive value of particular variables and found When advocates invoke the oft repeated claim that that including demographic and socioeconomic variables actuarial prediction outperforms human decision makers’ offers no noticeable improvement in accuracy.35 individualized predictions, they usually do not cite recidivism specific studies at all. Rather, they cite reviews of D. Comparison to Other Alternatives studies drawn from a wide range of fields, such as medical Under the Bearden test, wealth related classifications must diagnosis.30 As a whole, this literature does favor actuarial also be compared to less discriminatory alternatives. So in methods, but there is a lot of variability in results, which addition to showing that including the socioeconomic suggests that it is too simple to say that actuarial prediction variables substantially improves the risk instruments, the is always better. Instead, one needs to analyze the specific state must show that there aren’t other nondiscriminatory actuarial method in question and compare it to the judg ways to achieve the same gains. For example, one could ments of the relevant comparison set of decision makers. In imagine an instrument that included, instead of socioeco the sentencing context, states have not done this yet, and nomic factors, the more traditional sentencing facts that they bear the burden of doing so.31 today’s risk assessments typically exclude. These include Butevenifthishasn’tbeendoneyet,shouldn’twe the nature and severity of the crime for which the defendant logically expect better results from empirically grounded is being sentenced, her role in the offense, and perhaps her risk prediction instruments than from judges’ intuition? reasons for committing it. It is the state’s burden to con Perhaps we should. But there is a counterargument. sider and analyze such (rather obvious) alternatives, and Actuarial risk assessments are based on a fixed list of risk there is no evidence that states have. factors, and some of the factors that are left off the list may be important. For example, the assessments incorporate E. Crime Prediction Versus Crime Prevention little or no information about the defendant’s criminal Finally, even if using these classifications does improve the case. It is not facially obvious that a mechanical scoring accuracy of risk predictions and outperforms alternatives, system that relies only on factors unrelated to the case at crime prediction is not the same as crime prevention. And hand should substantially outperform a judge who can the latter is the state interest that purportedly justifies consider those factors plus case specific information. gender or wealth discrimination. No empirical study so far Again, it might I wouldn’t be surprised, in fact. But as of has assessed whether use of actuarial instruments reduces

232 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:47:39 AM All use subject to JSTOR Terms and Conditions crime, much less whether any particular variable’s inclu To summarize, we don’t know whether actuarial pre sion does so. dictions are better than those of judges, we don’t know If a sentencing judge is seeking to protect the public whether particular problematic variables substantially from the defendant’s future crime risk, she needs to know improve actuarial predictions, we don’t know whether more than just how ‘‘risky’’ that defendant is in the abstract. other alternatives might work better, and we don’t know Rather, she needs to know how her sentencing decision will how any of these prediction related questions relate to affect the defendant’s crime risk. And the risk assessment crime prevention. Given that this sentencing movement instruments used in sentencing do not assess this.36 They purports to be ‘‘evidence based,’’ this is a rather dire lack do not, for example, tell our hypothetical judge whether of evidence. And even if we did know all these things, Smith’s expected crime risk would be lower if he spends and the answers were favorable, the state would still three years in prison than if he only spends two, or none. notbefreetolumppeopletogetherbasedontheir Nor do they tell the judge whether it is Smith or Jones socioeconomic traits or their gender and treat them whose crime risk would be affected more by prison time. accordingly. One might respond that there is an intuitive link between crime prediction and crime prevention: if one IV. Other Counterarguments person is dangerous and another is not, surely public safety Here, I briefly address three other arguments sometimes is better served by locking up the dangerous one. And this made by risk assessment advocates: that equality concerns may often be true, especially in the extreme cases; there is are mitigated by the fact that judges consider a variety of no recidivism prevention gain if a person poses no risk of sentencing goals and are not required to place any specific recidivism to begin with, for example. But away from the weight on the risk assessment, that risk assessment is extremes, the relationship between risk level and respon a useful strategy for reducing incarceration, and that the siveness of risk to punishment is not so obvious. equality concerns it raises are no worse than those pre This is because recidivism prevention does not turn only sented by traditional sentencing. on incapacitation. If incapacitation effects were all that mat tered, then it would generally be true that incapacitating a higher risk person for a given length of time would have A. ‘‘It’s Just One Factor.’’ a greater crime prevention effect. But sentencing decisions The first of these arguments is easiest to dismiss. It is true, can also affect the defendant’s post release crime commission. thank goodness, that judges are usually free to decline to These effects are potentially lifelong, whereas most incar place much or any weight on the risk assessment. (Indeed, ceration sentences are quite short, so incapacitation effects if you are a judge reading this article, I hope to persuade are brief. Incarceration could have specific deterrence effects; you to do precisely that.) But this is hardly a reason not to it could promote or interfere with rehabilitation or job pro worry about them. The risk assessments are designed to spects; it could either promote or interfere with criminal be used, or their existence would be pointless. Judges are networks. And all these effects may vary among different told to consider the defendant’s recidivism risk, and then defendants. It’s not at all intuitive that the defendants whose provided with a purportedly scientific estimate of that risk. demographic or socioeconomic traits make them ‘‘high risk’’ Unless every judge simply ignores this estimate, we also are the ones who are likely to respond to incarceration by should worry about the equality concerns this approach reducing their crime commission the most.37 raises. One could imagine a different type of actuarial instru These concerns do not disappear just because other ment that did focus on crime prevention that is, on the factors besides risk obviously also affect the sentence. responsivity of particular classes of defendants to different Defendants are entitled to a decision making process that sentencing choices. But the causal effects of marginal does not include unconstitutional factors. If a judge said ‘‘I increases in incarceration on crime are quite complicated to would have sentenced the defendant to three years if he assess, even if one focuses only on overall averages.38 We were white, but since he’s black I’m sentencing him to are quite far from having the ability to tell judges how these four,’’ nobody would respond: ‘‘No worries most of his effects vary based on defendants’ characteristics. One could sentence isn’t based on his race.’’ The role of risk assess also imagine directly studying the effect of judges’ use of ment is less transparent: we may not know exactly what role today’s risk prediction instruments on defendants’ subse any particular factor played in producing a defendant’s quent crime rates. For example, a state could randomly score, and the judge may not state exactly what role the risk assign defendants to be sentenced using different risk score played in the sentence. But lack of transparency is prediction instruments (with and without the problematic hardly a defense. When the state tells judges to use a scor variables), and compare their long run outcomes. But such ing system that assigns all defendants extra risk points if a study would require a constitutionally problematic sen they are male, poor, unemployed, and so forth, it is ensur tencing scheme to be employed for many years while data ing that some defendants will receive harsher sentences are collected. And if general deterrence effects were also than they would have absent those characteristics. Indeed, considered, the empirical inquiry would become even more that is the sole objective of including those characteristics in complicated. the scoring system.

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 233

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:47:39 AM All use subject to JSTOR Terms and Conditions B. Do Judges Do This Anyway? penalized, some still will be scored relatively low risk, if Advocates of actuarial sentencing often point out that risk they don’t have other risk factors. assessments are inevitably part of sentencing. When judges This argument is appealing, but it is misguided. First, consider the defendant’s risk to public safety using more we have no empirical information yet about whether giving traditional means, they may well often consider the same judges risk scores before sentencing increases or reduces kinds of troubling factors that are included in the risk sentences on average. But it will certainly increase some assessment instruments.39 If so, perhaps there’s no novel people’s sentences. It is unrealistic to expect judges to equality concern raised by the instruments they just make ignore high risk assessments and act only on the low risk judges’ use of these factors more scientifically accurate. ones. Elected judges, especially, will doubtless face pressure This counterargument relies on an empirical claim that not to release defendants labeled high risk. Prosecutors hasn’t been tested. It would be useful for researchers to may also be motivated to push for harsh sentencing of such investigate the effect of adoption of actuarial sentencing on defendants. One Virginia study found that when judges socioeconomic and other disparities in sentencing. Mean were more likely to follow the actuarial instrument’s risk while, though, there’s good reason to expect the intuitive classifications in cases scored high risk than in those scored result: giving judges a risk assessment that depends on low risk.43 particular characteristics will likely increase the role those Even if the sentencing scheme is formally structured so characteristics play in determining the sentence. that risk assessment is a mitigating factor and not an First, in traditional sentencing, it’s not entirely obvious aggravating one, it would be hard to prevent judges and which direction many of the characteristics in question cut. prosecutors from taking a high risk assessment into Through presentencing reports, judges may learn a num account. And even if risk assessments are carried out only ber of biographical details about defendants, and many after sentencing, to qualify defendants for a diversion pro might consider these in formulating expectations about gram, their existence could affect sentencing. Judges might risk. But other judges will likely have the opposite reaction: be more likely to give harsher sentences if they know that they may consider factors like poverty or a difficult family low risk defendants won’t end up having to serve them, but life to be mitigating factors, from a retributive perspective. high risk defendants will. Second, when the state gives judges a quantified risk But let’s suppose we could guarantee that incarceration assessment and tells them to use it, there is good reason to would only be reduced. Would the state be justified in expect that many of them will place more weight on it than allowing socioeconomic and family background and they would place on their own, subjective assessments of demographics to determine who is eligible for this reduc defendants’ crime risk. Research on decision making in tion? I think it wouldn’t be. From a constitutional per other contexts suggests that when you quantify what would spective, there’s no difference between using otherwise be a subjective assessment, and tell people it impermissible factors to distribute benefits (or, in this case, represents a scientific or expert analysis, they will give it relief from harms) and using them to distribute harms.44 more weight.40 Judges traditionally must balance recidi And from a policy perspective, it would be very troubling for vism prevention with other sentencing objectives, such as the state to set up a punishment reduction mechanism and general deterrence and retribution. The actuarial sentenc then tell some prisoners that their disadvantaged back ing movement may well have the effect of increasing the grounds preclude them from taking advantage of it. salience of recidivism risk in this balance.41 If so, then the The incarceration reduction arguments for risk assess shift to actuarial sentencing would not merely be a matter of ment seem to rely, at least implicitly, on the idea of a forced substituting one kind of risk prediction for another. choice: Either we continue with criminal justice policies that Finally, even if it’s true that traditional sentencing gratuitously incarcerate enormous numbers of nondanger includes similar sources of discrimination, that’s a problem ous people for minor crimes, or we try to reduce these that we should be trying to solve. The actuarial sentencing numbers by sorting defendants according to their actuarial movement instead openly endorses this discrimination. In risks of recidivism. But this is a false dichotomy. It is pos my view this is worse, both because of its expressive mes sible to use risk assessment instruments that do not include sage and because it gives up on the possibility of progress socioeconomic, demographic, and family variables, and it is toward solutions. also easy to imagine any number of incarceration reduction measures that do not incorporate actuarial risk assessment at C. Incarceration Reduction all. For example, legislatures could eliminate mandatory Many risk assessment advocates focus on its potential to minimums or otherwise change the sentencing ranges mitigate sentences.42 These supporters emphasize that associated with particular crimes. Such policy choices could many of the people sitting in prison pose low crime risks. be evidence based, relying on the above mentioned growing They argue that if risk assessment can reduce this problem, research about the effects of incarceration on crime. Not it will disproportionately benefit the populations who are every evidence based strategy for reducing crime or reducing bearing the brunt of the mass incarceration crisis that is, incarceration involves profiling. that it will help the people that I have argued it would harm. The most plausible case for the ‘‘forced choice’’ way of Even though people from disadvantaged backgrounds are thinking is based on politics, not policy. Perhaps

234 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:47:39 AM All use subject to JSTOR Terms and Conditions legislatures will be more likely to support sentencing Notes reform if they can be convinced that risk assessment will * Thanks to Don Herzog, Ellen Katz, Richard Primus, Eve prevent dangerous criminals from being released. But Brensike Primus, and Margo Schlanger for comments that improved the underlying academic work, and to Grady although this is possible, there is no good reason to believe Bridges, Matthew Lanahan, and Jarred Klorfein for excellent that the success of sentencing reform depends on adopting research assistance. constitutionally and morally problematic risk assessment 1 Sonja Starr, Evidence Based Sentencing and the Scientific instruments. Rationalization of Discrimination, 66 Stan. L. Rev. 803 (2014). 2 Recently, we have seen the beginnings of a turn toward Bernard Harcourt, Risk as a Proxy for Race: The Dangers of Risk Assessment, 27 Fed. Sent’g Rep. 237 (2015) (critiquing the sentencing reform and a modest reversal of the long term racially disparate impact associated with risk assessments trend of incarceration growth. These developments have based on criminal history). coincided with the risk assessment trend, but that does not 3 Some advocates of actuarial sentencing, including Judge mean they should be credited to it. Several other factors are Richard Kopf, Richard Berk, and Jordan Hyatt, included else almost certainly more important, including very serious where in this Issue, suggest that transparency is an advantage versus judges’ less formal risk predictions, which are often not fiscal pressures on states, the rise of a libertarian movement well explained. See Richard G. Kopf, Federal Supervised that has made bipartisan reform efforts possible, increasing Release and Actuarial Data (including Age, Race, and Gender): attention to mass incarceration among civil rights advocates The Camel’s Nose and the Use of Actuarial Data at Sentencing, and other progressive groups, and judicial attention to 27 Fed. Sent’g Rep. 207, at 208; Richard Berk & Jordan Hyatt, prison overcrowding, including the Supreme Court’s Machine Learning Forecasts of Risk to Inform Sentencing Deci sions, 27 Fed. Sent’g Rep. 222, at 226. And this point could be landmark holding in Brown v. Plata.45 But if the key political right, but only if the details of the risk assessment instrument factors supporting criminal justice reforms have little to do (including the weights given to each factor) are made available with risk assessment, there’s no reason to believe that to the defense and the public. Unfortunately this is not the including risk assessment is necessary to make reforms case in many jurisdictions; the most widely used risk politically possible. If reformers succeed in making relief assessment instruments are proprietary corporate products. 4 In Estelle v. Smith, 451 U.S. 454 (1981), the Supreme Court from harsh sentencing laws available to people with the held that the defendant’s privilege against self incrimination right actuarial traits, they may sap the political momentum was violated by the admission at sentencing, as evidence of that could otherwise support relief that does not turn on recidivism risk, of the results of a compulsory pretrial psy wealth or identity. chiatric examination. If refusal to participate is penalized, Moreover, it is hard to believe that the idea of sentencing today’s sentencing stage risk assessments may well run afoul of Estelle. See also Mitchell v. United States, 526 U.S. 314, people more harshly based explicitly on their poverty or 325 30 (1999) (reaffirming that the privilege applies at sen identity is a political winner in the first place. In fact, as noted tencing, even when the defendant has pled guilty); but see above, I doubt many politicians would be willing to advocate White v. Woodall, 572 U.S., n.6 (2014) (describing a circuit it in public. In many jurisdictions, risk assessment has been split about the scope of this principle). 5 adopted at the initiative of courts or sentencing commis See Starr, supra note 1, at 811 14 (providing a more detailed sions, not ordered by legislatures, and legislation that does description of the instruments). 6 See Kopf, supra note 3. require risk assessment typically is not specific about what 7 See Joan Petersilia & Susan Turner, Guideline Based Justice: variables need to be included. The public debate over risk Prediction and Racial Minorities, 9 Crime & Just. 151, 153 54, assessment has played down the factors that the assessments 160 (1987). 8 rely on, employing feel good euphemisms like ‘‘evidence U.S. Sentencing Guidelines Manual § 5H1.10 (2012). 9 based sentencing.’’ It is therefore not at all obvious that leg See Starr, supra note 1, at 823 29 (reviewing this case law). 10 Craig v. Boren, 429 U.S. 190, 191 92 (1976). islators support or even understand the details of the risk 11 J.E.B. v. Alabama ex rel. T.B., 511 U.S. 127, 135 (1994). assessment instruments that courts have been using. 12 Weinberger v. Wiesenfeld, 420 U.S. 636, 645 (1975); Fron tiero v. Richardson, 411 U.S. 677, 690 91 (1973). 13 Conclusion United States v. Virginia, 518 U.S. 515, 531 (1996). 14 See, e.g., Evitts v. Lucey, 469 U.S. 387, 405 (1985) (discuss In short, sentencing people based on demographic, socio ing case law). economic, and family characteristics is both morally trou 15 Griffin v. Illinois, 351 U.S. 12, 16 17 (1956) (Black, J., plural bling and, with respect to some factors, unconstitutional. ity) (quoting Chambers v. Florida, 309 U.S. 227, 241 (1940)). Whatever gains actuarial sentencing offers are not worth 16 Bearden v. Georgia, 461 U.S. 660, 666 67 (1983) (quoting these costs. History has seen many efforts to justify dis Williams v. Illinois, 399 U.S. 235, 260 (1970) (Harlan, J., concurring in the result)). crimination with appeals to science, and it has not looked 17 Id. kindly upon them. It is my hope that this latest effort will be 18 Id. at 671 n.11; see Brief for Respondent at 32 35, Bearden, relegated to history’s dustbin sooner, rather than later. The 461 U.S. 660 (No. 81 6633), 1982 U.S. S.Ct. Briefs LEXIS efforts of the many thoughtful, serious people who have 438, at *35 39. 19 developed and advocated for these risk assessment instru Bearden, 461 U.S. at 671. 20 ments could be better channeled to the development of Id. 21 Note that the Bearden Court treated poverty, unemploy effective instruments that do not rely on these troubling ment, and ability to pay as interchangeable in this discus variables, or to other data driven efforts to reduce incar sion (that is, as equally impermissible factors), and its logic ceration and crime. applies equally to many of the related factors included in

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 235

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:47:39 AM All use subject to JSTOR Terms and Conditions risk assessment instruments. See Starr, supra note 1, at 27 Fed. Sent’g Rep. 244 (2015). See also Brian J. Ostrom 835 36. et al., Offender Risk Assessment in Virginia: A Three Stage 22 Bearden, 461 U.S. at 669 70. Evaluation, 57 (2002), available at http://www.vcsc.virginia. 23 Id. gov/risk off rpt.pdf (finding strong evidence that ‘‘total risk 24 Jennifer Skeem and Christopher Slobogin are simply wrong to score is [treated by judges as] an offender gestalt. ...[J]udges dismiss Bearden as ‘‘merely stand[ing] for the proposition that used a more ‘bottom line’ metric ...rather than a multitude of inability to pay a fine should not, by itself, lead to imprison individual factors’’). Moreover, as noted above, many states ment.’’ Jennifer Skeem & Christopher Slobogin, Scientific risk use nontransparent proprietary risk assessment instruments. assessment in Sentencing May Beat the Alternative, The Berke 33 See Anne Milgram et al., Pretrial Risk Assessment: Improving ley Blog, Sept. 30, 2014, available at http://blogs.berkeley. Public Safety and Fairness in Pretrial Decision Making, 27 Fed. edu/2014/09/30/scientific risk assessment in sentencing Sent’g Rep. 216 (2015). may beat the alternative/. In fact, the Court separately 34 As Harcourt argues in this Issue, even an instrument that relied rejected both the state’s argument about ability to pay resti exclusively on criminal history would likely have a racially dis tution and its separate argument about the heightened recid parate impact, which does not make it unconstitutional under ivism risk stemming from the defendant’s job loss and poverty. the Court’s doctrine, but might make us worry about it on policy Its reasoning on both points was essential to its judgment. grounds. Harcourt, supra note 2, at 237. This, however, is not 25 See Starr, supra note 1, at 821 22. a concern unique to risk assessment, because criminal history 26 Virginia, 518 U.S. at 533. already plays a substantial role in our sentencing system 27 See Starr, supra note 1, at 842 50. (although risk assessment might compound its impact). Id. 28 Berk and Hyatt, supra note 3, at 222, likewise state that they 35 Petersilia & Turner, supra note 7, at 169 71 & fig. 1 (finding ‘‘have not found a single credible study in which empirical that demographic and socioeconomic variables do not comparisons are actually made.’’ They suggest that given this noticeably improve predictive accuracy). absence of evidence, we should assume that actuarial pre 36 See Starr, supra note 1, at 855 62 (developing this critique). diction is more accurate, because actuarial prediction is more 37 See Bernard Harcourt, Against Prediction 123 (2007) (making accurate than clinical prediction in other contexts that have a similar argument about responsiveness to police presence). been studied. But although this kind of extrapolation from 38 See David S. Abrams, The Imprisoner’s Dilemma: A Cost Benefit other fields may have value for policy purposes (but see the Approach to Incarceration, 98 Iowa L. Rev. 905, 929 36 (2013) discussion below), it is not constitutionally sufficient for the (reviewing this literature). state to meet its burden to defend the gender and socioeco 39 See, e.g., Skeem & Slobogin, supra note 24; Model Penal nomic classifications in the instruments. Code: Sentencing § 6B.09 cmt. a at 53, 55 (Tentative Draft 29 J.C. Oleson et al., Training to See Risk: Measuring the Accuracy No. 2, 2011). of Clinical and Actuarial Risk Assessments Among Federal Pro 40 See Starr, supra note 1, at 865 66 (reviewing research). bation Officers, 75(2) Fed. Probation 52, 54 55 (2011). 41 See id. at 867 70 (reporting results of a small experiment 30 See Starr, supra note 1, at 851 55 (reviewing and critiquing tentatively supporting this prediction). this literature and advocates’ claims about it). 42 See id. at 816 (citing examples). 31 See Virginia, 518 U.S. at 533 (holding that the burden of jus 43 Ostrom et al., supra note 32, at 33 34. There is no way to know tifying gender discrimination ‘‘rests entirely on the state’’). how judges would have assessed these cases absent the risk 32 In theory, the sentencing judge could supplement the infor assessment, however. mation the risk score provides with additional information at 44 Many equal protection cases concern access to some positive her disposal. But this cannot happen unless the judge has benefit offered by the government, such as government con a clear understanding of what factors are and are not reflected tracts or university admission. in the risk score which is generally unlikely, as Kelly Hannah 45 Brown, 131 S.Ct. 1910 (2011). See generally Avlana Eisen Moffat details elsewhere in this Issue, see The Uncertainties of berg, Incarceration Incentives in the Decarceration Era (2014) Risk Assessment: Partiality, Transparency, and Just Decisions, (examining these factors) (draft on file with author).

236 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:47:39 AM All use subject to JSTOR Terms and Conditions Risk as a Proxy for Race: The Dangers of Risk Assessment Author(s): Bernard E. Harcourt Source: Federal Sentencing Reporter, Vol. 27, No. 4, The Risk Assessment Era: An Overdue Debate (April 2015), pp. 237-243 Published by: University of California Press on behalf of the Vera Institute of Justice Stable URL: http://www.jstor.org/stable/10.1525/fsr.2015.27.4.237 Accessed: 08/05/2015 10:48

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

University of California Press and Vera Institute of Justice are collaborating with JSTOR to digitize, preserve and extend access to Federal Sentencing Reporter.

http://www.jstor.org

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:48:26 AM All use subject to JSTOR Terms and Conditions Risk as a Proxy for Race: The Dangers of Risk Assessment

Actuarial risk assessment in the implementation and consequence: the turn to dangerousness had a distinctly administration of criminal sentencing has a long history in disproportionate effect on African American populations. this country a long and fraught history.1 Today, many The proportion of minorities in mental hospitals progressive advocates promote the use of actuarial risk increased significantly during the process of deinstitution assessment instruments as part of a strategy to reduce the alization. From 1968 to 1978, for instance, there was a sig problem of ‘‘mass incarceration.’’ Former Attorney General nificant demographic shift among persons admitted to BERNARD E. Eric Holder has called on the U.S. Sentencing Commission mental hospitals. In a 1984 study, Henry Steadman, John HARCOURT to hold hearings to further consider the matter of risk Monahan, and their colleagues tested the degree of reci Isidor and Seville assessment and prediction tools in sentencing and parole. procity between the mental health and prison systems in Sulzbacher The objective to reduce our massive over incarceration the wake of state mental hospital deinstitutionalization, Professor of Law, in this country is critical and noble. But risk assessment using a randomly selected sample of 3,897 male prisoners Director of the tools are simply the wrong way forward. I argue here that and 2,376 adult male admittees to state mental hospitals Columbia Center for we should resist the political temptation to embrace the from six different states.4 What their research revealed is Contemporary progressive argument for risk prediction instruments that the proportion of nonwhites admitted to mental facil Critical Thought, because their use will unquestionably aggravate the already ities increased from 18.3 percent in 1968 to 31.7 percent in Columbia University intolerable racial imbalance in our prison populations. 1978: ‘‘Across the six states studied, the ...percentage of The fact is, risk today has collapsed into prior criminal whites among admitted patients also decreased, from 81.7% history, and prior criminal history has become a proxy for in 1968 to 68.3% in 1978.’’5 As evidenced by Figure 1, the race. The combination of these two trends means that using track record is damning: mental hospitals were deinstitu risk assessment tools is going to significantly exacerbate the tionalized by focusing on dangerousness, and the result unacceptable racial disparities in our criminal justice sys was a sharp increase in the black representation in asylums tem. More generally, the use of actuarial tools is likely to and mental institutions. produce a ‘‘ratchet effect’’ on all members of the higher risk A second cautionary tale: selective incapacitation. Here, categories whether along racial or other lines with it is important to recall that the development of selective highly detrimental consequences on their employment, incapacitation in the 1970s had, as its explicit goal, the educational, familial, and social outcomes.2 For these sim objective of reducing prison populations. One of the leading ple reasons, we should avoid embracing actuarial risk pre arguments for selective incapacitation, especially in Cali diction and instead turn to race neutral solutions to mass fornia, was precisely the progressive case that it would lower incarceration. overall prison populations while reducing crime. As we There are, to be sure, political and strategic advantages know, the theory of selective incapacitation, which traced to to using ‘‘technological’’ instruments, such as actuarial the seminal research of Marvin Wolfgang, Robert Figlio, tools, to justify prison releases. Risk assessment tools pro tect political actors and serve to de responsibilize decision Figure 1 3 makers. Given that we still today ‘‘govern through crime,’’ Racial Breakdown of Mental Hospital Admissions these strategic considerations are undoubtedly important. from Steadman et al. 1984 Study (note 4) But I think that this advantage is outweighed by the cost to racial justice. In the end, we need to find better solutions to reduce mass incarceration. Before making the argument, though, let me start here with a few cautionary tales about progressive arguments for prediction.

I. Cautionary Tales about Prediction This is not the first time we have been tempted to use a metric of dangerousness as a way to empty ‘‘total insti tutions.’’ We did the same thing with our asylums and mental hospitals in the 1950s, ’60s, and ’70s. That experi ment revealed a number of things, including one dramatic

Federal Sentencing Reporter, Vol. 27, No. 4, pp. 237 243, ISSN 1053 9867, electronic ISSN 1533 8363. © 2015 Vera Institute of Justice. All rights reserved. Please direct requests for permission to photocopy or reproduce article content through the University of California Press’s Rights and Permissions website, http://www.ucpressjournals.com/reprintInfo.asp. DOI: 10.1525/fsr.2015.27.4.237.

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 237

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:48:26 AM All use subject to JSTOR Terms and Conditions and Thorsten Sellin, rested on the idea that only a small their inception in the 1920s to at least the 1970s, many of subset of youths in their study, 627 youths or 6.3 percent the prediction tools expressly used the nationality and race of the cohort was responsible for over 50 percent of the of the parents of the inmate as one of the central factors to total crimes committed by that cohort.6 The modern idea of predict future dangerousness. This practice ebbed in the selective incapacitation, which was grounded on prediction 1970s as a result of the Civil Rights movement and con and risk assessment, grew from this insight: predict, iden stitutional developments in Equal Protection, but was nev tify, and lock up those 6 percent and only those 6 per ertheless replaced with two other trends the narrowing of cent and society could cut crime in half, while at the same the prediction instruments and the focusing of those tools time reducing prison populations and correctional costs. on prior criminal history. The only problem was how to identify the 6 percent who are The combined effect of these various trends has been to chronic offenders; and the solution, naturally, was to turn to turn risk into a proxy for race. The intermediation through actuarial methods. criminal history took several decades, but its origins pro When Peter Greenwood and Allan Abrahamse issued vide the best clue to decipher the problems of our present their RAND report in 1982 the report that set forth the condition and the dangers of risk assessment. The evo most fully articulated plan for implementing a strategy of lution from race to risk can be traced neatly by examining selective incapacitation using actuarial methods they the factors used in risk assessment tools during the early to made clear that the benefit of selective incapacitation was mid twentieth century. I will turn to that here. that it saved money by reducing prison populations. Their seven factor prediction instrument7 was attractive, they A. Race as an Early Predictor of Parole Failure argued, because it had the advantage of reducing the The first parole prediction instrument that was ever used in prison population. Greenwood and his colleagues con the parole decision making process the Burgess method cluded their study precisely on that note: ‘‘Increasing the developed in 1927 and 1928 by Professor Ernest Burgess at accuracy with which we can identify high rate offenders the University of Chicago included the nationality or race or increasing the selectivity of sentencing policies can of the father as one of 21 factors that predicted success or lead to a decrease in crime, a decrease in the prison popu failure on parole.11 Burgess was particularly interested in lation, or both. Selective incapacitation is a way of the question of national origin. In his study of 3,000 Illinois increasing the amount of crime prevented by a given level inmates released on parole, he discovered that ‘‘the smallest of incarceration.’’8 ratio of violations [are] among more recent immigrants like Now, it turned out that their prediction instrument the Italian, Polish and Lithuanian,’’ and ‘‘the highest rates identified low and medium rate offenders, but was highly of violation [are] among the older immigrants like the Irish, inaccurate regarding high rate offenders: 91 to 92 percent British and German.’’12 Burgess also observed, referring to of those scoring 0 or 1 the lowest possible scores turned parole violators, that ‘‘[t]he group second in size was the out to be low or medium rate offenders; by contrast, only Negro with 152 at Pontiac, 216 at Chester, and 201 at 50 percent of those scoring 5, 6, or 7 turned out to be high [Joliet].’’13 Burgess’ model was implemented by the Illinois rate burglars or robbers.9 The authors nevertheless con Board of Paroles in 1933, and as a result, nationality and cluded their study on an unwarranted optimistic note and race were used expressly as a factor in the prognasio that Greenwood and Turner issued a follow up revised report, served as the basis for the decision whether or not to parole five years later, with a slightly different and more honest an inmate. Race and nationality, it was believed, predicted title: Selective Incapacitation Revisited: Why the High Rate parole violation.14 Offenders are Hard to Predict.10 The use of parent nationality, race, color, and other It is worth emphasizing that the advent of ‘‘selective’’ ethnic identifiers, such as religious belief and church incapacitation and the deployment of these purportedly attendance, was a continuous thread that wove through the refined actuarial tools anticipated the largest single increase evolution of the parole prediction studies and instruments in prison populations in history. at least into the 1970s. In 1931, Clark Tibbitts, the former With these cautionary tales in mind, let me turn then to assistant to Ernest Burgess on the Illinois parole study, the central issue: the relationship between risk and race. replicated the Burgess method (using a sample of 3,000 youths paroled from the Illinois reformatory at Pontiac over II. From Race to Risk: Prior Criminal History as a Proxy a seven year period from 1921 to 1927) and included for Skin Color ‘‘nationality of his father’’ as one of his 23 factors.15 Not Risk, today, is predominantly tied to prior criminal history, surprisingly, being ‘‘American (Colored)’’ was a predictor of and prior criminality has become a proxy for race. The parole violation, and being ‘‘American (White)’’ was result is that decarcerating by means of risk instruments is a marker of success.16 The very next year, in 1932, Elio likely to aggravate the racial disparities in our already overly Monachesi published his Ph.D. dissertation on predicting racialized prisons. probation violations and, like the others, included the Suprisingly, but tellingly, it was not always this way. nativity of the probationer, the nativity of his parents, the Throughout most of the twentieth century, race was used religion of the mother and father and probationer, and explicitly and directly as a predictor of dangerousness. From church attendance as predictors.

238 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:48:26 AM All use subject to JSTOR Terms and Conditions Nationality, race, and religion became staples of the in 1921 and 1922, and became strong advocates of reducing prediction research. Courtlandt Churchill Van Vetchen prediction tools to a narrower set of factors. Theirs ulti included nationality and family religion in his 1935 study, mately focused on only seven factors.22 The competition Walter Argow included race and church attendance in his between the Burgess and Glueck methods generated a tre model the same year, Elizabeth Redden used color and mendous amount of research from the 1930s through the nativity in 1939, and Bernard Kirby and F.J. Carney 1950s, and much of it was focused on narrowing the included race in their studies in 1954 and 1967, respec number of factors in the models.23 University of Chicago tively. U.S. Attorney General Homer Cummings com sociologist and heir to Burgess, Albert J. Reiss Jr., strenu missioned a large survey of parole practices in 1936, ously advocated limiting the number of factors in predic reviewing over 90,000 federal parole cases, and specifi tion instruments,24 as did Lloyd E. Ohlin, another cally found that ‘‘whites had better records on parole than prominent sociologist who taught at Harvard, and Daniel Negroes.’’17 Glaser, who worked on the prediction tables used at the In terms of actuarial instruments that would actually be penitentiary at Pontiac, Illinois, in 1954 and 1955. Begin implemented, Burgess, Tibbitts and Hakeem all included ning in the 1970s, the federal government adopted more race and nationality of the father in the prediction instru narrowly focused parole guidelines. The U.S. Parole Com ment. I discuss many of these tools in Against Prediction,18 mission relied on the Salient Factor Score that used only but failed there to mention whether the prediction instru seven predictive factors (and the majority of those seven ments included race. I remedy that in the Appendix table factors related to prior delinquency). California adopted an here, which lists the major prediction studies and tools, actuarial model, the Base/Expectancy Score, that narrowed discussed herein, that were developed in the early to mid in on four factors.25 The narrowing of the prediction twentieth century in the United States. The table discloses instruments can be visualized by plotting the number of the use of race, nationality, or religion and whether the factors used in parole prediction models over time and tools were actually implemented. The table reveals a con drawing a regression line through the plot, shown in tinued use of race until at least the late 1960s. In fact, when Figure 2. California began using a parole prediction instrument in the The second trend focused the predictors on prior crim 1970s, it used an actuarial device that relied on race. The inal history as a proxy for future dangerousness. Practically first California ‘‘Base/Expectancy Score’’ narrowed in on all of the prediction studies converged on prior correctional race and only three other factors prior commitments, contacts (arrests, convictions, and incarcerations) as one of offense type, and number of escapes.19 the stronger predictors of recidivism. What developed, as It is also interesting to note that the link to race was not a result, were more simplistic but easier to administer only always direct, it was also at times metaphorical. Clark sentencing schemes that relied predominantly on prior Tibbitts, for instance, in his 1931 replication of the Burgess criminal history.26 This new emphasis on criminal history method, referred to ‘‘white marks’’ and ‘‘black marks.’’ So, shaped not only actuarial prediction, but also sentencing he wrote, ‘‘A record of no work ...which shows a violation guidelines schemes. The federal sentencing guidelines, in rate of 38.5 percent would be an unfavorable sign or what fact, turned to criminal history as a more effective and we have chosen to call ‘a black mark,’ while a good work simple way to predict future dangerousness after giving up record with only 5.6 percent failure would be favorable or ‘a on more complicated prediction instruments.27 Like the white mark.’ The rates of violation either above or below the actuarial instruments, state and federal sentencing guide average permit the factors to be listed ...according to lines reflected the view that criminal history predicts whether they are favorable or unfavorable, ‘white’ or recidivism risk. This is reflected well in the views of Paul ‘black’.’’20 Not surprisingly though one has to wonder Robinson, former member of the U.S. Sentencing Com whether Tibbitts caught the lack of irony being ‘‘Ameri mission and a professor of law at the University of can (Colored)’’ was a ‘‘black mark’’ and being ‘‘American Pennsylvania: (White)’’ was a ‘‘white mark.’’21 The rationale for heavy reliance upon criminal his B. Narrowing on Prior Criminal History tory in sentencing guidelines is its effectiveness in The actuarial instruments ultimately evolved away from incapacitating dangerous offenders. As the Guide lines Manual of the United States Sentencing Com race as an explicit predictor, but that trend was accompa mission explains, ‘‘the specific factors included in nied by two others and these too would have significant [the calculation of the Criminal History Category] are race effects: first, a general reduction in the number of consistent with the extant empirical research asses predictive factors used and, second, an increased focus on sing correlates of recidivism and patterns of career prior criminal history. criminal behavior.’’28 The first trend was fueled by Sheldon and Eleanor Glueck. They developed the principal competitor to the Most sentencing guidelines ensure that criminal history as Burgess method in their 1930 book, Five Hundred Criminal a proxy for risk plays a troublingly large role at sentencing Careers. The Gluecks conducted extensive investigation into already. The expanding use of actuarial instruments in the lives of 510 inmates whose criminal sentences expired sentencing, as well as the increased importance of criminal

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 239

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:48:26 AM All use subject to JSTOR Terms and Conditions Figure 2 Number of Factors in Risk Assessment Tools Number of Factors in Prediction Models 35

30

25

20

15

10 Number of Factors

5

0 1920 1930 1940 1950 1960 1970 1980 1990 Year

history in the instruments used in parole, will only com sample set will contain a disproportionate number of those pound that role. higher risk individuals disproportionate, that is, as com Whether prior criminal justice contact actually works to pared to their representation in the offending population. predict high or low risk offenders remains unclear, espe This imbalance will get incrementally worse each year if law cially given the failure of most research to account for the enforcement departments rely on the evidence of last year’s nonrandom assignment to penal treatments.29 But what is correctional traces arrest or conviction rates when set clear is that prior criminality has become the predictor of ting next year’s profiling targets. The resulting ratchet effect choice in sentencing. If anything, it has increased in will have significant detrimental consequences on the importance relative to other factors. employment, educational, familial, and social outcomes of Current actuarial instruments vary widely in the num the profiled populations including, in the case of racial ber and type of risk factors that they include, but all place profiling, the devastating effects associated with the notion heavy weight on criminal history. Unfortunately, reliance of black criminality that pervades the public imagination on criminal history has proven devastating to African and, in the case of recidivists, the extreme difficulties of American communities and can only continue to have prisoner reentry. These high costs associated with any disproportionate impacts in the future. The reason is that ratchet effect should temper our embrace of the actuarial. the continuously increasing racial disproportionality in the Assuming that people are only marginally responsive to prison population necessarily entails that the prediction greater punishment, the use of actuarial methods would instruments, focused as they are on prior criminality, are impose excessive and counterproductive costs on all high going to hit hardest the African American communities. risk populations and, overall, worsen the outcomes for The trend in incarceration over the twentieth century has society. been one of increasing disproportionality, as evidenced by the graphic representation of the proportion of the carceral III. Back to Square One system that is nonwhite (Figure 3). This leaves me, once again, against prediction. But that may In the end, the use of risk instruments focused on prior not be such a bad thing. What fueled mass incarceration, criminal history is toxic. The consequence is unacceptable: research shows, was a combination of increased prison relying on prediction instruments to reduce mass incar admissions and lengthier sentences. John Pfaff’s research, ceration will surely aggravate what is already an unaccept for example, highlights the importance of increased able racial disproportionality in our prisons. admissions in the prison build up.31 Derek Neal and Armin More generally, as I have demonstrated elsewhere,30 Rick have shown that increases in prison sentence length, using risk assessment tools produces a ‘‘ratchet effect’’ on largely due to changes in sentencing law, are also important the profiled populations whether along racial or along reasons for the growth in incarceration.32 Bill Sabol’s other lines. Putting into effect actuarial prediction is effec careful analysis of the carceral build up from 1994 to 2006 tively like sampling more or over sampling, in social sci corroborates many of these findings.33 Although Sabol entific terms from the higher risk group. The resulting identifies different mechanisms for violent versus drug

240 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:48:26 AM All use subject to JSTOR Terms and Conditions Figure 3 State and Federal Prisons and Jail Population Ratios by Race nonwhite white 80.00 73.22 67.71 69.06 70.00 65.24 61.79 57.07 60.00 65.08 48.90 62.34 50.00 51.10 53.1746.83 42.93 40.00 32.29 38.21 30.94 34.76 37.66 30.00 26.78 34.92 20.00 10.00 0.00 1923 1933 1940 1950 1960 1970 1980 1990 2000 2008

offenses the first associated more with increased prose lengths, and eliminate mandatory minimums while cutions, and the second with increased arrests the bottom remaining extremely attentive to the racial imbalances in line is essentially the same: increased admissions, as well as our sentencing laws. Further reducing the crack cocaine length of stay, account for sharp increases in imprison disparitywouldbeastepintheright direction; however, ment. All of these factors disparately affect people of color. other immediate steps should include eliminating For example, Sonja Starr and Marit Rehavi’s research mandatory prison terms, reducing drug sentencing highlights the role of mandatory minimum sentences, and laws, creating additional diversion and alternative prosecutors’ disparate use of them in black men’s cases, as supervision programs, and decreasing the imposition of drivers of racial disparity in incarceration.34 hard time. The real solution, then, is not to turn to the Rather than release through prediction, then, we need actuarial, but to reduce prison admissions and to be less punitive at the front end, reduce sentence sentences.

Appendix. Major Prediction Studies and Tools Developed in the United States, 1923 1978

Number of Derived Name of factors Factor involving race, ethnicity, workable Year researcher investigated national origin, or religion instrument? Publication citation

1923 Warner 66 Color; nativity of parents; religion of No Sam B. Warner, Factors Determining Parole parents; religion of prisoner From the Massachusetts Reformatory,14J. Crim. L. & Criminology 172 207 (1923). 1924 Hart 66 Religion of prisoner being ‘‘other"; Yes Hornell Hart, Predicting Parole Success, 41(3) claim to attend church regularly J. Am. Inst. Crim. L. & Criminology 405 13 (1923). 1927 Witmer 15 [none] No Helen Leland Witmer, Some Factors in Success or Failure on Parole, 17 J. Crim. L. & Criminology 384 403 (Nov. 1927). 1928 Borden 26 Nationality; race No Howard G. Borden, Factors for Predicting Parole Success, 19 J. Crim. L. & Criminology 328 36 (Nov. 1928). 1928 Burgess 22 Nationality of father Yes Ernest W. Burgess, The Workings of the Indeterminate Sentence Law and the Parole System (1928). 1930 Glueck 52 Nativity of parents and prisoner; Yes S. Glueck & E.T. Glueck, Five Hundred religion; church attendance Criminal Careers (1930). 1931 Vold 34 [none] Yes George B. Vold, Prediction Methods and Parole (1931). 1931 Tibbitts 25 Nationality of father Yes Clark Tibbitts, Success and Failure on Parole Can Be Predicted, 22 J. Crim. L. & Criminology 11 50 (1931). 1932 Monachesi 54 Nativity of prisoner; nativity of parents; No Elio D. Monachesi, Prediction Factors in religion of mother; religion of father; Probation (1932). religion of prisoner; church attendance (continued)

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 241

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:48:26 AM All use subject to JSTOR Terms and Conditions Appendix. (continued)

Number of Derived Name of factors Factor involving race, ethnicity, workable Year researcher investigated national origin, or religion instrument? Publication citation

1935 Van Vechten 225 Nationality; family religion No Courtlandt Churchill Van Vechten Jr., A Study of Success and Failure of One Thousand Delinquents Committed to a Boy’s Republic (1935) (unpublished Ph.D. dissertation, Department of Sociology, University of Chicago) (on file with author). 1935 Argow 37 Race; church attendance No Walter Webster Argow, A Criminal Liability Index for Predicting Possibility of Rehabilitation, 26 J. Crim. L. & Criminology 561 571 (1935). 1935 Laune 54 [none] No Ferris F. Laune, Predicting Criminality: Forecasting Behavior on Parole, 1 Nw. U. Stud. Soc. Sci. (1936). 1939 Redden 22 Color; nativity No Elizabeth Redden, Embezzlement: A Study of One Kind of Criminal Behavior with Prediction Tables Based on Fidelity Insurance Records (1942) (unpublished Ph.D. thesis, University of Chicago) (on file with author). 1939 Survey of 82 Race of father; citizenship of father; No U.S. Dep’t of Justice, the Attorney General’s Release religion of father Survey of Release Procedures, vol. 4, Parole Procedures (1939). 1942 Jenkins 95 [none] No R.L. Jenkins, Henry Harper Hart, Phillip I. et al. Sperling, and Sidney Axelrod, Prediction of Parole Success: Inclusion of Psychiatric Criteria, 33 J. Crim. L. & Criminology 38 46 (1942). 1943 Weeks 14 Nativity of mother; religion of mother; No H. Ashley Weeks, Predicting Juvenile nativity of father Delinquency, 8 Am. Soc. Rev. 40 46 (1943). 1948 Hakeem 27 Nationality of father Yes Michael Hakeem, The Validity of the Burgess Method of Parole Prediction, 53 Am. J. Soc. 376 86 (1948). 1951 Ohlin 27 [none] Yes Lloyd E. Ohlin, Selection For Parole: A Manual of Parole Prediction (1951). 1954 Glaser 27þ [none] Yes D. Glaser, A Reconsideration of Some Parole Prediction Factors, 20 Am. Soc. Rev. 335 41 (1954). 1954 Kirby 33 Race No Bernard C. Kirby, Parole Prediction using Multiple Correlation, 6 Am. Soc. Rev. 539 50 (1954). 1967 Carney 14 Race No F.J. Carney, Predicting Receidivism in a Medium Security Correctional Instution,58J. Crim. L., Criminology, & Police Sci. 338 48 (1967). 1973 Hoffman & 66 [none] Yes P.B. Hoffman & J.L. Beck, Parole Decision Beck Making: A Salient Factor Score, 2 J. Crim. Just. 195 206 (1974). 1976 Heilbrun, 10 [none] No A.B. Heilbrun, I.J. Knopf & P. Bruner, Knopf & Criminal Impulsivity and Violence and Bruner Subsequent Parole Outcome, 16 Brit. J. Criminology 367 77 (1976). 1978 Brown 10 [none] Yes Lawrence D. Brown, The Development of a Parolee Classification System Using Discriminant Analysis, 15 J. Res. Crime & Delinq. 92 108 (1978).

Notes Incapacitation, Governmentality, and Race,’’ 33(1) Law & Soc. * Special thanks to Sonja Starr for organizing this symposium Inquiry, 265 83, 270 73 (Winter 2008). and for helpful comments on this article; and to Shawn 3 Jonathan Simon, Governing Through Crime (2007). Bushway, Kelly Hannah Moffat, and John Pfaff for comments, 4 Henry J. Steadman, John Monahan, Barbara Duffee, Elliott Hart criticisms, and guidance on a prior draft. stone & Pamela Clark Robbins, The Impact of State Mental Hos 1 I trace the history of actuarial risk assessment in criminal pital Deinstitutionalization on United States Prison Populations, justice in the United States in a previous book: Bernard E. 1968 1978, 75 J. Crim. L. & Criminology 474 90, 478 (1984). Harcourt, Against Prediction: Profiling, Policing, and Punish 5 Id. at 479. Note that there was a similar, though less stark shift ing in An Actuarial Age (2007). in prison admissions: ‘‘Across the six states, the ...percen 2 See, generally, Bernard E. Harcourt, ‘‘A Reader’s Companion tage of whites among prison admittees was also relatively to ‘Against Prediction’: A Reply to Ariela Gross, Yoram Mar stable, decreasing only from 57.6% in 1968 to 52.3% in galioth, and Yoav Sapir on Economic Modeling, Selective 1978.’’ Id.

242 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:48:26 AM All use subject to JSTOR Terms and Conditions 6 Marvin E. Wolfgang, Robert M. Figlio & Thorsten Sellin, Delin Incidentally, they included the nativity of the parents and of quency in A Birth Cohort 248 (1972). the exinmates, as well as their religion and church attendance, 7 Peter W. Greenwood & Allan Abrahamse, Selective Incapaci in their study. tation (Rand Corporation, 1982). The seven factors relied on 23 See, e.g., Elio D. Monachesi, Prediction Factors in Probation three primary categories: prior criminal record, history of drug (1932); Michael Hakeem, The Validity of the Burgess Method of abuse, and employment history. (Notice, for later discussion, Parole Prediction, 53 Am. J. Soc. 376 (1948); Albert J. Reiss, that the first four of the seven factors focus on prior criminal The Accuracy, Efficiency, and Validity of a Prediction Instrument, history and dominate the prediction instrument.) Greenwood 56 Am. J. Soc. 552 61 (1951); Daniel Glaser, A Reconsidera and his colleagues based their study on self report surveys tion of Some Parole Prediction Factors, 20 Am. Soc. Rev. 335 from 2100 male prison and jail inmates from California, (1954). Michigan, and Texas in 1977. Id. at xii. The study focused on 24 Reiss, supra note 23, at 561. robbery and burglary offenses, and excluded more serious 25 See Simon, supra note 19, at 173. One of the four factors was crimes such as murder or rape because these low rate crimes prior commitments, another was race, and the other two were are harder to predict. offense type and number of escapes. 8 Id. at xiii (emphasis added). 26 See generally Paul Gendreau, Tracy Little, & Claire Goggin, A 9 Id., at 53. meta analysis of the predictors of adult offender recidivism: What 10 Peter W. Greenwood & Susan Turner, Selective Incapacitation works!, 34 Criminology 575 607 (1996). Revisited: Why the High Rate Offenders are Hard to Predict, 27 See Harcourt, supra note 1, at 96 98. National Institute of Justice, R 3397 NIJ (March 1987), 28 Paul H. Robinson, Punishing Dangerousness: Cloaking Preven available at https://www.ncjrs.gov/pdffiles1/Digitization/ tive Detention as Criminal Justice, 114 Harv. L. Rev. 1429, 109924NCJRS.pdf. 1431 n.7 (2001). 11 Harcourt, supra note 1, at 57. 29 Shawn Bushway & Jeffrey Smith, Sentencing Using Statistical 12 Andrew A. Bruce, Ernest W. Burgess & Albert M. Harno, Treatment Rules: What We Don’t Know Can Hurt Us,23J. A Study of the Indeterminate Sentence and Parole in the State Quantitative Criminology 377 87 (2007). of Illinois, 19 J. Crim. L. & Criminology, Part II, 1, 259 (1928). 30 Harcourt, supra note 1; and Harcourt, supra note 2. 13 Id. 31 See John Pfaff, A Plea for More Aggregation: The Looming 14 See Harcourt, supra note 1. Threat to Empirical Legal Scholarship (July 16, 2010), avail 15 Clark Tibbitts, Success and Failure on Parole Can Be Predicted, able at SSRN: http://ssrn.com/abstract¼1641435; John 22 J. Crim. L. & Criminology 11, 11 (1931). Pfaff, The Durability of Prison Populations, U. Chi. Legal F. 73 16 Id. at 43. (2010). 17 U.S. Dep’t of Justice, the Attorney General’s Survey of Release 32 Derek Neal and Armin Rick, The Prison Boom and the Lack of Procedures, vol. 4, Parole, 541 (1939). Black Progress After Smith and Welch (working paper, Novem 18 Harcourt, supra note 1. ber 2013), available at http://home.uchicago.edu/*/arick/ 19 Jonathan Simon, Poor Discipline: Parole and the Social Con prs boom 201309.pdf. trol of the Underclass 173 (1993). 33 William J. Sabol, ‘‘Implications of Criminal Justice System 20 Tibbitts, supra note 15, at 40. Adaptation for Prison Population Growth and Corrections Pol 21 Id. at 43. icy,’’ (working paper, July 2010), available at http://www. 22 Sheldon Glueck & Eleanor Glueck, Five Hundred Criminal albany.edu/scj/documents/Sabol ManagingPopulations.pdf. Careers (1930). The seven factors were: (1) industrial habits, 34 Sonja B. Starr & M. Marit Rehavi, Mandatory Sentencing and (2) seriousness and frequency of prereformatory crime, (3) Racial Disparity: Assessing the Role of Prosecutors and the arrests for crimes preceding, (4) penal experience preceding, Effects of Booker, 123 Yale L.J. 2 (2013); M. Marit Rehavi & (5) economic responsibility preceding, (6) mental abnormality Sonja B. Starr, Racial Disparity in Federal Criminal Sentences, on entrance, and (7) frequency of offences in the reformatory. 122 J. Pol. Economy 1320 (2014).

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 243

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:48:26 AM All use subject to JSTOR Terms and Conditions University of Michigan Law School University of Michigan Law School Scholarship Repository

Law & Economics Working Papers

1-1-2013 Evidence-Based Sentencing and the Scientific Rationalization of Discrimination Sonja B. Starr University of Michigan Law School, [email protected]

Follow this and additional works at: http://repository.law.umich.edu/law_econ_current Part of the Civil Rights and Discrimination Commons, Criminal Law Commons, and the Criminal Procedure Commons

Recommended Citation Starr Sonja B. "Ev dence Based Sentenc ng and the Sc ent fic Rat onal zat on of D scr m nat on" (2013). Law & Economics Working Papers. Paper 90. http://repos tory.law.um ch.edu/law econ current/90

Th s A c e s b oug o you o ee a d ope access by U ve s y o M c ga Law Sc oo Sc o a s p Repos o y. I as bee accepedocus Law & Eco o cs Wo k g Pa e s by a au o zed ad s a o o U ve s y o M c ga Law Sc oo Sc o a s p Repos o y. Fo o e o a o , p ease co ac aw. epos o y@u c .edu. Starr:

LAW AND ECONOMICS RESEARCH PAPER SERIES

PAPER NO. 13-014 SEPTEMBER 2013

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

SONJA B. STARR

FORTHCOMING IN STANFORD L. REV. 66 (2014)

THE SOCIAL SCIENCE RESEARCH NETWORK ELECTRONIC PAPER COLLECTION: HTTP://SSRN.COM/ABSTRACT=2318940

FOR MORE INFORMATION ABOUT THE PROGRAM IN LAW AND ECONOMICS VISIT: HTTP://WWW.LAW.UMICH.EDU/CENTERSANDPROGRAMS/LAWANDECONOMICS/PAGES/DEFAULT.ASPX

Published by University of Michigan Law School Scholarship Repository, 2013 1 Law & Economics Working Papers, Art. 90 [2013]

Evidence-Based Sentencing and the Scientific Rationalization of Discrimination Sonja B. Starr* Forthcoming, STANFORD LAW REVIEW, Vol. 66 (2014)

ABSTRACT This paper critiques, on legal and empirical grounds, the growing trend of basing criminal sentences on actuarial recidivism risk prediction instruments that include demographic and socioeconomic variables. I argue that this practice violates the Equal Protection Clause and is bad policy: an explicit embrace of otherwise-condemned discrimination, sanitized by scientific language. To demonstrate that this practice should be subject to heightened constitutional scrutiny, I comprehensively review the relevant case law, much of which has been ignored by existing literature. To demonstrate that it cannot survive that scrutiny and is undesirable policy, I review the empirical evidence underlying the instruments. I show that they provide wildly imprecise individual risk predictions, that there is no compelling evidence that they outperform judges’ informal predictions, that less discriminatory alternatives would likely perform as well, and that the instruments do not even address the right question: the effect of a given sentencing decision on recidivism risk. Finally, I also present new, suggestive empirical evidence, based on a randomized experiment using fictional cases, that these instruments should not be expected merely to substitute actuarial predictions for less scientific risk assessments, but instead to increase the weight given to recidivism risk versus other sentencing considerations.

* Professor of Law, University of Michigan. Thanks to Don Herzog, Ellen Katz, Richard Primus, and participants in Michigan Law’s Faculty Scholarship Brownbag Lunch for their comments, and to Grady Bridges, Matthew Lanahan, and Jarred Klorfein for research assistance.

http://repository.law.umich.edu/law_econ_current/90 2 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

INTRODUCTION...... 1 I. Actuarial Risk Prediction and the Movement Toward Evidence-Based Sentencing ...... 5 II. The Disparate Treatment Concern...... 12 A. Gender Classifications and the Problem with Statistical Discrimination...... 13 B. Wealth-Related Classifications in the Criminal Justice System ...... 17 C. The Social Harm of Demographic and Socioeconomic Sentencing Discrimination ...... 22 III. Assessing the Evidence for Evidence-Based Sentencing ...... 25 A. Precision, Group Averages, and Individual Predictions ...... 26 B. Do the Instruments Outperform Clinical Prediction and Other Alternatives?...... 32 C. Do the Risk Prediction Instruments Address the Right Question? ...... 36 IV. Will Risk Prediction Instruments Really Change Sentencing Practice?...... 41 A. Does EBS Merely Provide Information?...... 41 B. Does EBS Merely Replace One Form of Risk Prediction With Another?...... 43 CONCLUSION ...... 47

ii Published by University of Michigan Law School Scholarship Repository, 2013 3 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

INTRODUCTION

“Providing equal justice for poor and rich, weak and powerful alike is an age-old problem. People have never ceased to hope and strive to move closer to that goal. ... In this tradition, our own constitutional guaranties of due process and equal protection both call for procedures in criminal trials which allow no invidious discriminations.…[T]he central aim of our entire judicial system [is that] all people charged with crime must, so far as the law is concerned, stand on an equality before the bar of justice in every American court.” --Griffin v. Illinois, 351 U.S. 12, 16 (1956) Criminal justice reformers have long worked toward a system in which defendants’ treatment does not depend on their socioeconomic status or demographics, but on their criminal conduct. How to achieve that objective is a complicated and disputed question. Many readers might assume, however, that there is at least a general consensus on some key “don’ts.” For example, judges should not systematically sentence defendants more harshly because they are poor or uneducated, or more lightly because they are wealthy and educated. They should not follow a policy of increasing the sentences of male defendants, or reducing those of females, on the explicit basis of gender. They likewise should not increase a defendant’s sentence specifically because she grew up without a stable, intact family, or because she lives in a disadvantaged and crime-ridden community. It might surprise many readers, then, to learn that a growing number of U.S. jurisdictions are adopting policies that deliberately encourage judges to do all of these “don’ts.” These jurisdictions are directing sentencing judges to explicitly consider socioeconomic variables, gender, age, and sometimes family or neighborhood characteristics—not just in special contexts in which one of those variables might be particularly relevant (for instance, ability to pay in cases involving fines), but routinely, in all cases. This is not a fringe development. At least eight states are already implementing some form of it. One state supreme court has already enthusiastically endorsed it.1 And it now has been embraced by the American Law Institute in the draft of the newly revised Model Penal Code—a development that reflects its mainstream acceptance and, given the Code’s influence, may soon augur much more widespread adoption.2 There is a similar trend in Canada, the United Kingdom, and other foreign jurisdictions.3 Meanwhile, the majority of states now similarly direct parole boards to take demographic and socioeconomic factors into account. The trend is called “evidence-based sentencing” (hereinafter EBS). “Evidence,” in this formulation, refers not to the evidence in the particular case, but to empirical research on factors predicting criminal recidivism. EBS seeks to help judges advance the crime-prevention objectives of punishment by equipping them with the tools of criminologists—recidivism risk prediction instruments grounded in regression models of past offenders’ outcomes. The instruments give considerable weight to criminal history, which is already central to modern sentencing schemes. However, they also add something new: explicit inclusion of gender, age,

1 Malenchik v. State, 928 N.E.2d 564 (Ind. 2010). 2 Model Penal Code: Sentencing § 6B.09 (Discussion Draft No. 4, 2012) (hereinafter “Draft MPC”). 3 See James Bonta, Offender Risk Assessment and Sentencing, 49 CAN. J. CRIMINOLOGY & CRIM. JUST. 519, 519-20 (2007).

1 http://repository.law.umich.edu/law_econ_current/90 4 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

and socioeconomic factors such as employment and education (with socioeconomically disadvantaged, male, and young defendants receiving higher risk scores). Some instruments also include family background, neighborhood of residence, and/or mental or emotional disorders. EBS has been widely hailed by judges, advocates, and scholars as representing hope for a new age of scientifically guided sentencing. The idea is to replace judges’ “clinical” evaluations of defendants (that is, reliance on their own expertise) with “actuarial” risk prediction, which is purportedly more accurate. Incongruously, this trend is being pushed by progressive reform advocates, who hope it will reduce incarceration rates by courts to identify low-risk offenders. In this Article, I argue that they are making a mistake. As currently practiced, EBS should be seen neither as progressive nor as especially scientific—and it is almost surely unconstitutional. This Article sets forth a constitutional and policy case against this approach, based on analysis of both the relevant doctrine and the empirical research supporting EBS. I show that the current prediction instruments should be subject to heightened equal protection scrutiny, and that the science falls short of allowing them to survive that scrutiny. The concept of “evidence-based practice” is broad, and I do not mean to issue a sweeping indictment of all its many criminal justice applications. Indeed, I strongly endorse the general objective of informing criminal justice policy with data. Nor do I argue that actuarial prediction of recidivism is always inappropriate. My objection is specifically to the use of demographic, socioeconomic, and family status variables to determine whether and how long a defendant is incarcerated. I am concerned that a well-intentioned desire for data-driven decision-making is causing discrimination to be rationalized based on rather weak empirical evidence. I focus principally on the instruments’ use in sentencing, but virtually the same case can be made against their use in parole decisions, which is now established practice in thirty states. The technocratic framing of EBS should not obscure an inescapable truth: sentencing based on such instruments amounts to overt discrimination based on demographics and socioeconomic status. The instruments typically do not include race as a variable (even their most enthusiastic defenders have limits to their comfort with group-based punishment), but sentencing based on socioeconomic predictors will have a racially disparate impact as well. Equal treatment of all persons is a central normative objective of the criminal justice system, and EBS may have serious social consequences, contributing to the concentration of the criminal justice system’s punitive impact among those who already disproportionately bear its brunt. Moreover, the expressive message of EBS—the justification of disparate treatment based on statistical generalizations about crime risk—is, when stripped of the anodyne scientific language, toxic. Group-based generalizations about dangerousness are not innocuous; they have an insidious history in our culture. And the express embrace of additional punishment for the poor conveys the message that the system is rigged. The instruments’ use of gender and socioeconomic variables should be subject to heightened constitutional scrutiny. Gender is the only equal protection issue the existing literature pays any attention to, but I show that the socioeconomic variables trigger similar scrutiny under a line of Supreme Court doctrine concerning indigent criminal defendants—doctrine that the EBS literature completely ignores. In fact, the Court has specifically (and unanimously) condemned the notion of treating poverty as a predictor of recidivism risk in sentencing, even if there is statistical evidence supporting the correlation. Finally, while other variables in the instruments

2 Published by University of Michigan Law School Scholarship Repository, 2013 5 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

(such as age and marital status) are subject only to rational basis review under current doctrine, I also argue that they raise substantial normative concerns. Contrary to the other commentators that have considered the gender discrimination issue, I do not think the EBS instruments can survive heightened scrutiny, nor are they justified as a policy matter. There are doubtless important and even compelling state interests at stake. But heightened scrutiny requires the state to prove a strong relationship to those interests, and the case law on wealth classifications in criminal justice also requires analysis of alternatives, as does sensible policymaking. With these principles in mind, I turn to the strength of the empirical evidence supporting EBS. It falls short for three principal reasons. First, the instruments provide nothing close to precise predictions of individual recidivism risk. The underlying regression models estimate average recidivism rates for offenders sharing the defendant’s characteristics. While some models have reasonably narrow confidence intervals for this predicted average, the uncertainty about what an individual offender will do is much greater. Individual recidivism outcomes vary for many reasons that are not captured by the models. While some uncertainty is inherent in predicting probabilistic future events, the risk prediction models also leave out many measurable variables that one might expect to be important—for instance, there are typically no variables relating to the crime of conviction or the case’s facts. The individual prediction problem is constitutionally important because the Supreme Court’s cases on gender and indigent defendants have consistently held that disparate treatment cannot be justified based on statistical generalizations about group tendencies, even if they are empirically supported. Instead, individuals must be treated as individuals. Second, it is not even clear that including the constitutionally problematic variables can substantially improve risk prediction in the aggregate. A core EBS premise is that actuarial risk prediction consistently outperforms clinical predictions. I examine the literature on which that claim is based, and find it unsupportive of this claim. To be sure, meta-analyses of “clinical versus actuarial” comparisons in various fields have given an edge on average to the actuarial— but not a large edge, and not a consistent one. The specifics of the actuarial instrument matter— one cannot say that any regression model is good by definition. Only a few comparative studies actually concern recidivism, and those have had mixed results. If anything, the studies support actuarial instruments that are very different from the crude ones that are actually being used— suggesting less discriminatory alternatives that could more effectively serve the state’s penological interests. Another alternative is simply to drop the constitutionally problematic variables, perhaps to be replaced with crime characteristics. The empirical research gives no reason to believe that including these variables offers any nontrivial predictive improvement. Third, even if the instruments predicted individual recidivism perfectly, they do not even attempt to predict the thing that judges need to know to use recidivism information in a utilitarian sentencing calculus. What judges need to know is not just how “risky” the defendant is in some absolute sense, but rather how the sentencing decision will affect his recidivism risk. For example, if a judge is deciding between a one-year and a two-year prison sentence for a minor drug dealer, it is not very helpful to know that the defendant’s characteristics predict a “high” recidivism risk, absent additional information that tells the judge how much the additional year in prison will reduce (or increase) that risk. Current risk prediction instruments do not provide that additional information. Future research might be able to fill that gap, but it will not be easy. Estimating the causal relationship between sentences and recidivism is challenging, in part because sentencing judges take recidivism risk into account, introducing reverse causality

3 http://repository.law.umich.edu/law_econ_current/90 6 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

concerns. Some researchers have used quasi-experimental methods to tease out these causal pathways, but so far their estimates of incarceration’s effects have not been demographically and socioeconomically specific. Finally, I consider two interrelated counterarguments that defend EBS essentially by saying that it doesn’t do much. The first is the claim that the instruments are innocuous because they do not directly specify a resulting sentence. Rather, they merely provide information—and what kind of obscurant would prefer sentencing to be ill-informed? This argument is not persuasive. The EBS instruments are meant to be used by judges, and to the extent they are used, they will systematically, and by design, produce disparate sentences across groups. The fact that the instruments do not exclusively determine the sentence might help in a “narrow tailoring” inquiry, but it is not enough alone to establish their constitutionality, nor their desirability. The second counterargument might be labeled the “So what else is new?” defense. Risk prediction has always been central to sentencing, implicating its incapacitation, rehabilitation, and specific deterrence objectives. EBS advocates thus often argue that judges will inevitably predict risk, and may well rely on demographic and socioeconomic factors, even if they do not say so expressly. The instruments, on this view, are merely there to improve this assessment’s accuracy. I argue, however, that EBS is not likely merely to replace one form of risk prediction with another. Rather, it will probably place a thumb on the sentencing scale in favor of more judicial emphasis on recidivism prevention relative to other sentencing goals. In many contexts, judges and other decision-makers tend to defer to “expert” assessments, especially with respect to scientific methods that they do not really understand. Moreover, providing risk predictions may simply increase the salience of crime prevention in judges’ minds. On this point, I also provide some new empirical evidence, based on a small experimental study that presented subjects with two fact patterns involving slight variations on the same crime. The two defendants varied sharply on several dimensions considered by risk prediction instruments. All subjects were presented with both scenarios and asked to recommend sentences; the experimental variation was that half the subjects were also presented with actuarial risk prediction scores. The effects of providing the scores were statistically significant and large. Subjects who did not receive the scores tended to give higher sentences to the lower- risk defendant, apparently focusing on small differences in the fact pattern that rendered that defendant more morally culpable. This pattern reversed when subjects received the scores, suggesting that the scores encouraged them to emphasize recidivism risk over moral desert. These results are tentative; judges in real cases might act differently. But the experiment adds to the existing empirical evidence that decision-making is affected by quantification and claims of scientific rigor. Part I of this Article introduces the EBS instruments, describes their rise, and reviews the literature. Part II sets forth the disparity concern and makes the case for heightened constitutional scrutiny, and Part III applies that scrutiny to the empirical evidence underlying EBS. Part IV considers the above-described counterarguments. Finally, I offer some conclusions. Ultimately, in my view, the equality concerns are so serious that aggravating sentences on the basis of demographics and poverty would be bad policy even if the instruments advanced the state’s interests far more substantially than they do. Likewise, the Supreme Court’s case law on statistical discrimination may simply preclude deeming people dangerous on the basis of gender or poverty even if those generalizations were sufficiently well-supported that doing so would advance important state interests. But the fact that the instruments, and the use

4 Published by University of Michigan Law School Scholarship Repository, 2013 7 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

of the problematic variables therein, do not advance those interests strongly (if at all) means that there is no defense of them available. This approach does not satisfy heightened constitutional scrutiny, and courts and policymakers should not embrace it.

I. Actuarial Risk Prediction and the Movement Toward Evidence-Based Sentencing

“Evidence-based sentencing” (EBS) refers to the use of actuarial risk prediction instruments to guide the judge’s sentencing decision. The instruments are based on past regression analyses of the relationships between various offender characteristics and recidivism rates. Criminologists have developed a wide range of such instruments.4 All incorporate criminal history variables, such as number of past convictions, past incarceration sentences, and number of violent or drug convictions.5 Surprisingly, almost none include the crime of conviction in the case at hand. A few include very basic information such as whether it was a drug crime or a violent crime; others include no crime information.6 Most of the instruments include gender, age, and employment status; many also include education, and some include composite socioeconomic variables like “financial status.”7 Although risk prediction instruments used by some parole boards included race until as late as the 1970s, the modern EBS instruments overwhelmingly do not. One exception is a “sentencing support” software program promoted by an Oregon state judge, Michael Marcus,8 but this not been formally adopted by any state. There appears to be a general consensus that using race would be unconstitutional. In 2000, the Supreme Court granted certiorari in a capital case to consider whether “a defendant’s race or ethnic background may ever be used as an aggravating circumstance”; the issue was not a judicial sentencing instrument, but problematic testimony by a prosecution expert.9 Before oral argument, the State of Texas conceded error and granted a new sentencing hearing, mooting the case.10 Most instruments now in sentencing use are limited to fairly objective factors, such as demographics, employment status, and criminal history.11 But others include much more abstract, conceptual variables, which are meant to be coded by experienced evaluators. For instance, the Indiana Supreme Court in 2010 upheld against a state law challenge, and endorsed enthusiastically, use in sentencing of the Level of Services Inventory-Revised (LSI-R), which is also used by at least eight states elsewhere in the corrections process.12 In addition to objective factors, the instrument also requires “subjective evaluations on … performance and interactions at work, family and marital situation, accommodations stability and the level of crime in the neighborhood, participation in organized recreational activities and use of time, nature and extent

4 See J.C. Oleson, Risk in Sentencing: Constitutionally Suspect Variables and Evidence-Based Sentencing, 64 S.M.U. L. REV. 1329, 1399 (2011) (listing variables in 19 different instruments); Malenchik, 928 N.E.2d at 571-73. 5 See Oleson, supra. 6 Id. 7 Id. 8 See Draft MPC § 6B.09, cmt. (i) (discussing and criticizing this system); Michael H. Marcus, Conversations on Evidence Based Sentencing, 1 CHAP. J. CRIM. JUST. 61 (2009). 9 See Saldano v. State, 70 S.W.3d 873, 875 (Tex. Crim. App. 2002) (describing the case’s history); Monahan, A Jurisprudence of Risk Assessment, 92 VA. L. REV. 391, 392-93 (2006). 10 Monahan, supra, at 393. 11 Oleson, supra. 12 See BERNARD E. HARCOURT, AGAINST PREDICTION: PROFILING, POLICING, AND PUNISHING IN AN ACTUARIAL AGE 78-84 (2007) (describing the LSI-R’s uses).

5 http://repository.law.umich.edu/law_econ_current/90 8 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

of social involvement with companions, extent of alcohol or drug problems, emotional/psychological status, and personal attitudes.13 The instruments are mechanical: each possible value of each variable corresponds to a particular increase or reduction in the risk estimate in every case. The variables’ weights are not determined based on each case’s circumstances—for instance, men will always receive higher risk scores than otherwise-identical women (because, averaged across all cases, men have higher recidivism rates), even if the context is one in which men and women tend to have similar recidivism risks or in which women have higher risks.14 This is a feature of the simple underlying regression models, which generally have no interaction terms. Moreover, in practice the instruments use even simpler point systems, in which the “high risk” answer to a yes-or-no question results in a point or two being added to the defendant’s score, based only quite loosely on the underlying regression.15 Demographic variables and socioeconomic variables receive substantial weight. For instance, in Missouri, presentence reports include a score for each defendant on a scale from -8 to 7, where “4-7 is rated ‘good,’ 2-3 is ‘above average,’ 0-1 is ‘average’, -1 to -2 is ‘below average,’ and -3 to -8 is ‘poor.’”16 Unlike most instruments in use, Missouri’s does not include gender. However, an unemployed high school dropout will score three points worse than an employed high school graduate—potentially making the difference between “good” and “average,” or between “average” and “poor.”17 Likewise, a defendant under age 22 will score three points worse than a defendant over 45.18 By comparison, having previously served time in prison is worth one point; having four or more prior misdemeanor convictions that resulted in jail time adds one point (three or fewer adds none); having previously had parole or probation revoked is worth one point; and a prison escape is worth one point.19 Meanwhile, current crime type and severity receive no weight. Recidivism risk prediction instruments have been developed in various forms by criminologists over nearly a century,20 and their use in parole determinations dates back decades,

13 Malenchik, 928 N.E.2d at 572. 14 For instance, medical studies suggest that women are on average more vulnerable to addiction and relapse than men are, so it may be that for some drug crimes women are more likely to recidivate. See, e.g., Jill B. Becker & Ming Hu, Sex Differences in Drug Abuse, 29 FRONT NEUROENDOCRINOL 36 (2008). Recidivism studies do not break down gender effects like this, however. 15 The point additions are at best crude roundings of regression coefficients. Moreover, the instrument does not track the regression’s functional form. The underlying studies typically use logistic regression models, in which the coefficients translate nonlinearly into changes in probability of recidivism. When the instruments translate the coefficients into fixed, additive increases on a point scale, they are “linearizing” the variables’ effects, and the resulting instrument will be only loosely related to the underlying nonlinear model, especially (because of the probability curve’s shape) for very high-risk or very low-risk cases. 16 Michael A. Wolff, Missouri’s Information-Based Discretionary Sentencing System, 4 OHIO ST. J. CRIM. L. 95, 113 (2006). 17 Id. at 112-13. 18 Id. 19 Id. A defendant with every possible criminal history risk factor (four or more misdemeanors resulting in jail, two or more prior felonies, prior imprisonment, prior prison escape, convictions within five years, revocation of probation and parole, and past conviction on the same offense as the current charge) will score eight points higher than one with no criminal history--just two points more than the combined effect of age, employment status, and high school graduation. Id. 20 See HARCOURT, supra, at 1-2, 39-92 (reviewing this history).

6 Published by University of Michigan Law School Scholarship Repository, 2013 9 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

although it has expanded sharply beginning in the 1980s.21 Their use in sentencing is newer, however, and other than the state-specific instruments, none were initially designed for use in sentencing. For instance, the LSI-R manual specifically states that it “was never designed to assist in establishing the just penalty,” which did not discourage the Indiana Supreme Court from endorsing its use for that purpose.22 The first state to incorporate such an instrument in sentencing was Virginia in 1994, but the trend has taken off nationwide much more recently. Judge Roger Warren, the President Emeritus of the National Center for State Courts, argues that two developments in 2007 catalyzed this acceleration: a formal resolution of the Conference of Chief Justices and the Conference of State Court Administrators23 and a report by the NCSC, the Crime and Justice Institute, and the National Institute of Corrections.24 Another factor may be the recent shift toward discretionary sentencing after Blakely v. Washington and United States v. Booker. Tight sentencing guidelines leave little room for considering the defendant’s individual risk, but in discretionary systems, judges are expected to assess it.25 Whatever the reasons, in recent years increasing number of states have followed Virginia’s lead.26 In fact, Douglas Berman states that “[i]n some form, nearly every state in the nation has adopted, or at least been seriously considering how to incorporate, evidence-based research and alternatives to imprisonment into their sentencing policies and practices.”27 EBS has many enthusiastic advocates in academia,28 the judiciary and sentencing commissions,29 and think tanks and advocacy organizations.30 The National Center on State Courts has advocated using risk instruments to guide decision-making at all process stages, including training prosecutors and defense counsel to identify high- and low-risk offenders and thereby shaping

21 Id. at 9, 77-80. 22 Malenchik, 928 N.E.2d at 572-73. 23 Conference of Chief Justices and Conference of State Court Administrators, Resolution 12 in Support of Sentencing Practices that Promote Public Safety and Reduce Recidivism, August 1, 2007; see Roger K. Warren, Evidence-Based Setencing: Are We Up to the Task?, 23 FED. SENT. R. 153, 153 (2010). 24 Nat’l Inst. Of Corr. and Crime & Justice Inst, Evidence-Based Practice to Reduce Recidivism (2007). 25 See Steven L. Chanenson, The Next Era of Sentencing Reform, 53 EMORY L.J. 377 (2005). 26 Warren, supra, usefully reviews national and state policies promoting EBS. 27 Douglas A. Berman, Editor’s Observations: Are Costs a Unique (and Uniquely Problematic) Kind of Sentencing Data?, 24 FED. SENT. R. 159 (2012). 28 E.g., Jordan M. Hyatt, Mark H. Bergstrom, & Steven Chanenson, Follow the Evidence: Integrate Risk Assessment into Sentencing, 23 FED. SENT. R. 266 (2011); Lynn S. Branham, Follow the Leader: The Advisability and Propriety of Considering Cost and Recidivism Data at Sentencing, 24 FED. SENT. R. 169 (2012; Richard E. Redding, Evidence-Based Sentencing: The Science of Sentencing Policy and Practice, 1 CHAP. J. CRIM. JUST. 1, 1 & n.4 (reviewing articles praising EBS, and stating that failure to employ EBS “constitutes sentencing malpractice and professional incompetence”). 29 E.g., Marcus, supra; Warren, supra; Justice Michael Wolff (Chair, Missouri Sentencing Advisory Commission), Evidence-Based Judicial Discretion: Promoting Public Safety through State Sentencing Reform, 83 N.Y.U. L. Rev. 1389 (2008); Chief Justice William Ray Price, State of the Judiciary Address, Feb. 3, 2010, available at http://www.courts mo.gov/page.jsp?id=36875; Mark H. Bergstrom (Pa. Commission on Sentencing) & Richard P. Kern (Va. Criminal Sentencing Commission), A View from the Field: Practitioner’s Response to “Actuarial Sentencing: An ‘Unsettled’ Proposition, 25 FED. SENT. R. 185 (2013). 30 E.g., Pamela M. Casey, Roger K. Warren, & Jennifer K. Elek, USING OFFENDER RISK AND NEEDS INFORMATION AT SENTENCING 14 (Nat’l Ctr for State Courts 2011); PEW Ctr. on the States, Arming the Courts with Research: 10 Evidence-Based Sentencing Initiatives to Control Crime and Reduce Costs, 8 Pub. Safety Policy Brief 2-3 (2009); NAT’L CTR. FOR STATE COURTS, EVIDENCE-BASED SENTENCING TO IMPROVE PUBLIC SAFETY AND REDUCE RECIDIVISM: A MODEL CURRICULUM FOR JUDGES (2009); Matthew Kleiman, Using Evidence-Based Practices in Sentencing Criminal Offenders, in THE BOOK OF THE STATES (Council of State Gov’ts 2012).

7 http://repository.law.umich.edu/law_econ_current/90 10 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

plea-bargaining decisions.31 Other academics have offered more cautious takes, but have ultimately offered qualified endorsements.32 The new Model Penal Code, currently undergoing its first revision since its adoption in 1962, embraces this new movement. This is a serious development, both because it reflects an emerging academic consensus and because of the MPC’s influence. The original MPC was “one of the most successful law reform projects in American history,” producing “revised, modernized penal codes in a substantial majority of the states.”33 Section 6B.09 of the new Code not only endorses use of “actuarial instruments or processes, supported by current and ongoing recidivism recidivism, that will estimate the relative risks that individual offenders pose to public safety,” but also their formal incorporation into presumptive sentencing guidelines.34 It also provides that when particularly low-risk offenders can be identified, otherwise-mandatory minimum sentences should be waived.35 While parts of the revision are still being drafted, the American Law Institute has already approved Section 6B.09.36 The official Commentary to the MPC revision illustrates the core argument for EBS: recidivism risk prediction is inevitably part of sentencing, and should be guided by the best available scientific research: Responsible actors in every sentencing system—from prosecutors to judges to parole officials—make daily judgments about…the risks of recidivism posed by offenders. These judgments, pervasive as they are, are notoriously imperfect. They often derive from the intuitions and abilities of individual decisionmakers, who typically lack professional training in the sciences of human behavior. …. Actuarial—or statistical— predictions of risk, derived from objective criteria, have been found superior to clinical predictions built on the professional training, experience, and judgment of the persons making predictions.37 Most EBS advocates frame it as a strategy for reducing incarceration and its budgetary costs and social harms.38 These advocates argue, or assume, that the prediction instruments will primarily allow judges to identify low-risk offenders whose sentences can be reduced, not high- risk offenders whose sentences must be increased. Some suggest that, absent scientific information on risk, judges probably already err on the side of longer sentences.39 Others suggest that the instruments should categorically only be used in mitigation.40

31 Casey et al., supra, at 23-26. 32 E.g., Margareth Etienne, Legal and Practical Implications of Evidence-Based Sentencing by Judges, 1 CHAPMAN J. CRIM. JUST. 43 (2009). 33 Gerald Lynch, Revising the Model Penal Code: Keeping It Real, 1 OH. ST. J. CRIM. L. 219, 220 (2003) (also observing that the Code’s classroom use makes it “the document through which most American lawyers come to understand criminal law”). 34 Draft MPC § 6B.09 (2). 35 Id. at § 6B.09 (2). 36 See id. at 133. 37 Draft MPC, § 6B.09(2), cmt. (a). See also, e.g., Wolff, supra, at 1406 (emphasizing superiority of actuarial prediction). 38 E.g., Nat’l Ctr. for State Courts, Using Offender Risk and Needs Assessment Information at Sentencing 2-3 (2011); Price, supra (citing EBS as a way to “move from -based sentencing” toward reduced incarceration); Wolff, supra, at 1390; PEW Ctr. on the States, supra, at 1; Michael Marcus, MPC—The Root of the Problem: Just Deserts and Risk Assessment, 61 FLA. L. REV. 751, 751 (2009). 39 E.g., Bonta, supra, at 524. 40 E.g., Etienne, supra.

8 Published by University of Michigan Law School Scholarship Repository, 2013 11 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

In this spirit, the draft MPC Commentary asserts that “Section 6B.09 takes an attitude of skepticism and restraint concerning the use of high-risk predictions as a basis of elongated prison terms, while advocating the use of low-risk predictions as grounds for diverting otherwise prison-bound offenders to less onerous penalties.” However, despite this “attitude,” the actual content of Section 6B.09 endorses incorporation of risk assessment procedures into sentencing guidelines, including for the purpose of increasing sentences. The Commentary expresses hope that moving risk instruments from parole (the MPC would abolish parole) to sentencing will effectively constrain their “incapacitative” use, because access to counsel and greater transparency at sentencing would allow the defendant a chance to argue his case.41 But the Commentary never explains how these procedural protections will ameliorate the instruments’ substantive consequences for defendants whose objective characteristics render them “high risk.” Even the best counsel will have trouble contesting the defendant’s age, gender, education level, employment status, and past criminal convictions.42 Moreover, if state legislatures adopt Section 6B.09 but not the MPC’s recommendations concerning abolition of parole, the claim that parole- stage use is worse would be irrelevant. Although most of the EBS literature is positive, or even celebratory, a few scholars have criticized it. The most thorough critique of risk prediction in criminal justice more broadly has 43 come from Bernard Harcourt in his book AGAINST PREDICTION. Some of Harcourt’s arguments center on law enforcement profiling, but others apply to sentencing and parole. In particular, he argues that prediction instruments contravene punishment theory, because punishment turns on who the defendant is (and what he is therefore expected to do in the future), rather than just what he has done.44 Although Harcourt’s book primarily focuses on actuarial risk prediction, his theoretical objection is applicable to clinical prediction too—he seeks to “make criminal justice determinations blind to predictions of future dangerousness.”45 Likewise, advocates of purely retributive punishment have always held that a defendant’s future risk is morally irrelevant to the state’s justification for punishment.46 Indeed, beyond mere irrelevance, there may be direct conflict (raising practical dilemmas for defense counsel): some factors that heighten a defendant’s predicted recidivism risk, from young age to mental illness to socioeconomic disadvantage, are frequently considered mitigating factors from a retributive perspective.47 Other commentary on EBS has raised similar theoretical objections.48 John Monahan, while advocating actuarial prediction in other contexts (such as civil commitment), has argued

41 Id. 42 Because the MPC draft advocates mandatory sentencing guidelines, it points out that the Sixth Amendment would require aggravating factors (but not mitigating factors) to be found by juries. Id. cmt. (e). This constraint, if anything, seems likely to discourage states from including difficult-to-prove dynamic factors like “antisocial attitudes” in the instruments. For factors like gender, age, and employment, the jury trial requirement seems essentially irrelevant. 43 HARCOURT, supra note 12. 44 Id. at 31-34, 188-89. Another of Harcourt’s arguments is discussed below in Part III.C. 45 Id. at 5; see id. at 237-38 (arguing that clinical judgment is just as vulnerable to his critique); Yoav Sapir, Against Prevention? A Response to Harcourt’s Against Prediction on Actuarial and Clinical Predictions and the Faults of Incapacitation, 33 LAW & SOC. INQUIRY 253, 258-61 (2008) (arguing that the problem with the instruments is really a broader problem with incapacitation as a punishment objective, including via clinical judgment). 46 E.g., Paul Robinson, Punishing Dangerousness: Cloaking Preventive Detention as Criminal Justice, 114 HARVARD L. REV. 1429 (2001). 47 E.g., Graham v. Florida, 560 U.S. 48 (2010) (explaining mitigating role of young age). 48 See Oleson, supra, at 1388-92 (reviewing literature).

9 http://repository.law.umich.edu/law_econ_current/90 12 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

against the current instruments’ use in sentencing.49 His view is that, while recidivism risk may be a legitimate sentencing consideration, blameworthiness is nonetheless the central question, and thus the only risk factors that should be considered are those that also bear on the defendant’s moral culpability: past and present criminal conduct.50 Some critics protest the probabilistic nature of risk prediction, ensuring “false positives” when those deemed high-risk do not, in fact, recidivate.51 Others draw an unfavorable analogy to the science fiction movie “Minority Report,” in which the government punishes “pre-crime,” suggesting that even if the future could be known with certainty, punishing people for future acts is fundamentally unfair.52 Many commentators raise such criticisms but do not treat them as dispositive, but merely as cautionary notes.53 For others, like Harcourt, they are more fundamental flaws. I do not seek to answer foundational sentencing-philosophy questions here. I accept EBS advocates’ premise that recidivism prevention will inevitably play at least some role in the sentencing process in many cases (although I argue below that adoption of actuarial instruments will probably increase this role). The Supreme Court has affirmed the relevance of recidivism risk to sentencing, for example permitting judges to hear expert testimony concerning the defendant’s dangerousness.54 Instead, this Article’s central question is about discrimination and disparity: whether risk prediction instruments that classify defendants by demographic, socioeconomic, and family characteristics can be constitutionally or normatively justified. One could, after all, predict risk in other ways—for instance, based only on past or present criminal behavior, or based on individual assessment of a defendant’s conduct, mental states, and attitudes. Current literature’s treatment of the disparity concern is surprisingly limited; the MPC Commentary, for instance, barely mentions it. Among scholars who do raise the issue, most treat it as a policy concern, rather than (also) a constitutional one. For example, Harcourt, addressing the instruments’ use in early release decisions, has argued that “risk is a proxy for race,” observing that the instruments give heavy weight to criminal history, which is highly correlated with race.55 He argues that this strategy will “unquestionably aggravate the already intolerable racial imbalance in our prison populations.”56 Kelly Hannah-Moffat has similarly critiqued the criminal history variables on grounds of racially disparate impact, and further emphasizes that criminal history may be influenced by past discriminatory decision-making.57

49 In the civil commitment literature, scholars have focused on whether expert testimony predicting dangerousness is admissible evidence, rather than on the constitutionality or desirability of a particular judicial decision-making process. E.g., Alexander Scherr, Daubert and Danger: The ‘Fit’ of Expert Predictions in Civil Commitments, 55 HASTINGS L.J. 1, 5-28 (2003) (reviewing case law and literature). I do not focus on the evidence law issues here. 50 Monahan, supra, at 427-28. 51 The MPC Commentary raises, but ultimately is unswayed by, this objection; see infra note 62 and accompanying text. 52 E.g., Oleson, supra, at 1390; Etienne, supra, at 59; Peter Moskos, Book Review, Against Prediction, 113 AM. J. SOCIOLOGY 1475, 1477 (2008). 53 E.g., Oleson, supra, at 1397-98 (concluding simply that EBS “raises excruciatingly difficult questions” and that “judges and jurists must determine” how to answer them). 54 Barefoot v. Estelle, 463 U.S. 880 (1983); see also Jurek v. Texas, 428 U.S. 262 (1976) (holding that “prediction of future criminal conduct is an essential element in many” criminal justice-related decisions). 55 Bernard Harcourt, Risk as a Proxy for Race, CRIM. & PUBLIC POL’Y (forthcoming), draft available at http://www.law.uchicago.edu/files/file/535-323-bh-race.pdf. 56 Id. 57 Kelly Hannah-Moffat, Actuarial Sentencing: An ‘Unsettled’ Proposition, at 17, available at http://www.albany.edu/scj/documents/Hannah-Moffatt_RiskAssessment.pdf.

10 Published by University of Michigan Law School Scholarship Repository, 2013 13 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

The existing constitutional analyses, meanwhile, have focused on gender (and the hypothetical use of race), and have been limited in their doctrinal analysis.58 The most extensive such analysis, by J.C. Oleson, concludes that the instruments survive even strict scrutiny.59 Similarly, Monahan, while opposing use of demographic variables in sentencing on punishment- theory grounds, defends the constitutionality of their use in civil commitment, arguing that only race and gender raise constitutional issues at all, and that gender survives intermediate scrutiny because the gender differences are real and the state interests are substantial.60 In my view, the existing literature has seriously understated both the breadth and the gravity of the constitutional concern. There is a strong case that most or all of the risk prediction instruments now in use are unconstitutional, and current literature has not made that case or even seriously examined it. I seek to fill that gap, comprehensively analyzing the relevant case law and empirical research. I show both that the use of gender cannot be defended on the statistical bases that other authors have offered and that the problem goes well beyond gender—the socioeconomic variables, at least, should also receive heightened constitutional scrutiny. And if such scrutiny is applied, the empirical evidence is not currently strong enough to sustain the instruments, and it likely never will be. In the criminological literature on the instruments, there is considerable debate over issues of reliability, validity, and precision. Current EBS scholarship often notes these concerns but ultimately advocates the instruments’ use anyway.61 The MPC Commentary is a striking example. It states that “error rates when projecting that a particular person will engage in serious criminality in the future are notoriously high” and that “most projections of future violence are wrong in significant numbers of cases,” and yet concludes: Although the problem of false positives is an enormous concern—almost paralyzing in its human costs—it cannot rule out, on moral or policy grounds, all use of projections of high risk in the sentencing process. If prediction technology shown to be reasonably accurate is not employed, and crime-preventive terms of confinement are not imposed, the justice system knowingly permits victimizations in the community that could have been avoided.62 In my view, for all their apparent agonizing, the MPC drafters and other EBS advocates are missing the legal import of the methodological concerns: If the instruments don’t work well, their use in sentencing is almost surely unconstitutional, and terribly unwise as well. As I show in Part II, the Supreme Court has warned against disparate treatment based on generalizations about (at least) gender and poverty, even if the generalizations have statistical support. If the statistical support is shoddy, there is simply no defending them. It is curious that the EBS literature has not taken the constitutional concern more seriously. EBS scholars have occasionally asserted that actuarial prediction is obviously

58 E.g., Christopher Slobogin, Risk Assessment and Risk Management in Juvenile Justice, 27-WTR CRIM. JUST. 10, 13-14 (2013); Pari McGarraugh, Note, Up or Out: Why “Sufficiently Reliable” Statistical Risk Assessment is Appropriate at Sentencing and Inappropriate at Parole, 97 Minn. L. Rev. 1079, 1102 (2013). 59 Oleson, supra, at 1388-92; see also Slobogin, supra, at 13-14 (briefly stating that gender discrimination probably survives intermediate scrutiny). 60 Monahan, supra, at 429-432. 61 E.g., Slobogin, supra, at 16-17; McGarraugh, supra, at 1105-07; see also Hannah-Moffat, supra (raising various concerns but reaching an ambivalent conclusion: “Arguably, we should pause to reflect on the complexities of risk- needs assessments and concordant calls for and against evidence-based risk jurisprudence.”). 62 MPC Draft §6B.09, cmt. (e).

11 http://repository.law.umich.edu/law_econ_current/90 14 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

constitutional because the Supreme Court has approved, against a due process challenge, admission of even-less-reliable expert clinical predictions of risk in sentencing proceedings.63 This assertion is wrong. The equal protection issue is not presented in those cases, and in general is not presented by individualized clinical assessments of risk per se; it is presented by punishment of group membership, which is explicit in the actuarial instruments. And even assuming actuarial predictions are more accurate than clinical ones, a question to which I return in Part III, the fact that evidence is reliable enough to be admissible does not mean that it establishes a strong enough relationship to an important government interest to withstand heightened scrutiny.64 In the next Part, I show that such scrutiny applies.

II. The Disparate Treatment Concern

The most distinctive feature of EBS is that it formally incorporates discrimination based on socioeconomic status and demographic categories into sentencing. In this Part, I set forth the basic constitutional and policy objections to this practice. I begin with the constitutionality of gender-based sentencing in Section A (setting aside race because the current instruments do not include it).65 Although it is uncontroversial that gender classifications are subject to heightened scrutiny, I examine the gender case law in some detail because it illuminates a core feature of the Supreme Court’s equal protection jurisprudence that will make it very hard for EBS to survive heightened scrutiny: otherwise-unconstitutional discrimination cannot be justified by statistical generalizations about groups, even if the generalizations have empirical support. In Section B, I show that that the constitutional concern goes beyond gender: a form of heightened scrutiny (and a similar prohibition on group generalizations) also applies to socioeconomic discrimination in the criminal justice context. And in Section C, I articulate reasons policymakers should take the disparity concern seriously even if courts were to sustain EBS against constitutional challenges. This Part does not complete either the constitutional or the normative analysis; rather, it establishes the seriousness of the disparity concern and the resulting need at least for a very strong empirical justification for EBS. In Part III, I address whether such a justification exists. Note that I frame my constitutional argument within existing doctrine, and thus do not argue for heightened scrutiny of certain other variables in the model—for instance, age and marital status are routine government classifications that are subject to rational basis review. There is, however, a plausible broader argument for strict scrutiny of group-based sentencing discrimination more generally, grounded in the “fundamental rights” branch of equal protection jurisprudence rather than the “suspect classifications” branch. Incarceration, after all, profoundly interferes with virtually every right the Supreme Court has deemed fundamental, and EBS makes these rights interferences turn on identity rather than criminal conduct. Although I would be happy to see the Supreme Court adopt such an approach, it is presently foreclosed to lower courts by language the Court used in a case called Chapman v. United States, and I do not focus

63 E.g., Slobogin, supra, at 15; Steven J. Morse, Mental Disorders and Criminal Law, 101 J. CRIM. L. & CRIMINOLOGY 885, 944 (2011); see Barefoot v. Estelle, 463 U.S. 880 (1983). 64 In Barefoot, the Court made clear that the defects in evidence would have to be extreme before their admission would be barred by the Due Process Clause on the grounds of sheer unreliability. 463 U.S. at 898-99. 65 The instruments do include socioeconomic variables that are highly correlated with race, a point I return to in § C, but they would be hard to challenge constitutionally on that basis. The Supreme Court has consistently held that absent a racially disparate purpose, policies that are facially neutral as to race cannot be challenged merely on the grounds of a racially disparate impact. E.g., Washington v. Davis, 426 U.S. 229 (1976).

12 Published by University of Michigan Law School Scholarship Repository, 2013 15 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

on it.66 Certain variables used in some models might also merit new recognition as quasi- suspect—particularly variables relating to an offender’s family background or family members’ criminal history, which are closely analogous to illegitimacy, a quasi-suspect classification—but again, I do not rely on this possibility.67 The policy critique in Section C thus applies more broadly, to more variables, than the constitutional arguments in Sections A and B do. A. Gender Classifications and the Problem with Statistical Discrimination Virtually every risk prediction instrument in use incorporates gender. Because the coefficient on gender is the same for all defendants, every single male defendant will, due to gender alone, be assigned a higher risk score than an otherwise-identical woman. Gender classifications are subject to heightened constitutional scrutiny, requiring an “exceedingly

66 Chapman v. United States, 500 U.S. 453, 464-65 (1991). In Chapman, the defendant challenged the U.S. Sentencing Guidelines’ method of calculating LSD weight, which included the carrier medium; the claim was that this method created unfair distinctions between people who carried the same amount of actual LSD. The Court rejected the notion that fundamental rights analysis should apply to sentencing distinctions within the statutory sentencing range, reasoning that once convicted, the offender no longer has a fundamental right to any sentence below the statutory maximum. Note that this holding does not preclude a challenge to a sentencing decision based on the nature of the classification; it speaks only to the “fundamental rights” branch. As I show below, both gender and poverty-based discrimination have triggered successful challenges to sentences within the statutory range. Although Chapman’s holding is not entirely surprising (the Court in general is quite reluctant to apply constitutional scrutiny to sentences, see Carissa Byrne Hessick & F. Andrew Hessick, Recognizing Constitutional Rights at Sentencing, 99 Cal. L. Rev. 47, 49 (2011), and presumably worried that doing so in that case would require the extension of strict scrutiny to virtually every sentencing distinction), its reasoning, in my view, fails to take seriously the tremendous stakes of sentencing choices within statutory ranges. Those ranges are often very broad (say, zero to 20 years), and it is hard to imagine any government decision that would have a more drastic impact on a defendant’s exercise of fundamental liberties than the choice between, say, 5 and 20 years’ incarceration. Moreover, the Court’s characterization of the right at issue was unduly narrow; the question is not whether the defendant had a right to a sentence below the statutory maximum. Rather, underlying, clearly established fundamental rights are being taken away (including the defendant’s most basic physical liberty, which is directly and deliberately retracted by the incarceration decision, plus iadditional rights as procreation, communication, and voting). Cf. Lawrence v. Texas, 539 U.S. 558, 567 (2003) (critiquing the Court’s past, overly narrow characterization of the right to sexual intimacy as a “right to homosexual sodomy”). The outcome in Chapman is perfectly defensible, but it could have been reached with a different rationale. The drug-weighting rule was a classification of criminal conduct, not persons, and thus (absent evidence of some discriminatory motive) raised no equal protection concern at all; all persons are prospectively subject to the same weighting rules, and have an equal chance to conduct their activities to avoid the rule. Applying fundamental rights analysis to EBS thus would not imply that routine sentencing distinctions between crimes are also subject to strict scrutiny. One could likewise defend sentencing distinctions based on criminal history as also being conduct-based and universally applicable—all persons who commit crimes are subjecting themselves to potential higher sentences for subsequent crimes. But when the state systematically gives different sentences to different groups of people for the same crime, with the same past criminal conduct, the Constitution should demand a compelling justification. 67 Such variables are outside the defendant’s control, unchangeable, generally unrelated to legitimate state policy, and often—especially in the case of familial incarceration or time in foster care—the basis for considerable social stigma and disadvantage. See Miriam J. Aukerman, The Somewhat Suspect Class: Towards a Constitutional Framework for Evaluating Occupational Restrictions for People With Criminal Records, 7 J. L. Society 18, 51 (2005) (reviewing case law and identifying factors that often trigger heightened scruting); John Hagan & Ronit Dinovitzer, Collateral Consequences of Imprisonment for Children, Communities, and Prisoners, 26 CRIME & JUST. 121 (1999) (reviewing literature on effects of parental incarceration); United States v. Sprei, 145 F.3d 528, 535 (2d Cir. 1998) (describing stigma and reduced marital prospects as an “inevitable result” of a parent’s incarceration); Daniel Pollack et. al., Foster Care as a Mitigating Circumstance in Criminal Proceedings, 22 TEMP. POL. & CIV. RTS. L. REV. 43, 59 (2012) (quoting Sandra Stukes Chipungu & Tricia B. Bent-Goodley, Meeting the Challenges of Contemporary Foster Care, 14 FUTURE CHILD. 75, 85 (2004)).

13 http://repository.law.umich.edu/law_econ_current/90 16 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

persuasive justification”—that is, the state must prove “that the classification serves important governmental objectives and that the discriminatory means employed are substantially related to the achievement of those objectives.”68 Given this well-established doctrine, one might have thought that gender’s inclusion in the instruments would have occasioned considerable concern and debate. And yet most scholarship ignores this concern, or else briefly asserts that the state’s interests are important.69 The draft Model Penal Code recommends excluding race, and the Commentary notes that sentencing based on race would be unconstitutional.70 And yet the MPC drafters recommend including gender, and offer no commentary defending this on constitutional grounds, as though its constitutionality is self-evident.71 In the rare cases in which the issue has been presented, modern courts have consistently held (outside the EBS context) that it is unconstitutional to base sentences on gender.72 There is, to be sure, considerable statistical research suggesting that judges (and prosecutors) do on average treat women defendants more leniently than men.73 But it is virtually unheard of for modern judges to say that they are taking gender into account,74 and demonstrating gender bias would usually be challenging. Before the past few decades, explicit consideration of gender as well as race was common, but few today defend that practice.75 The U.S. Sentencing Guidelines, for example, expressly forbid the consideration of both race and sex.76 Outside the literature on EBS, scholars have likewise mostly treated the gender gap as “unwarranted” sentencing disparity.77 Given this widespread consensus against sentencing based on gender, there is a certain surreal quality to the EBS literature’s mostly untroubled embrace of it. The justification offered (if any) is that women in fact pose substantially lower recidivism risk than men do.78 Some scholars add that to fail to account for this fact is unfair to women, essentially punishing them for men’s recidivism risk.79 More generally (referring to “gender, ethnicity, age, and disability”),

68 United States v. Virginia, 518 U.S. 515 (1996). 69 E.g., Slobogin, supra, at 14. McGarraugh, supra at 1102, states that gender should be removed from the instruments to preserve their constitutionality, but does not develop the legal reasoning for this point. 70 Draft MPC, supra, Sec 6B.09 cmt. (i). 71 Id. 72 Carissa Byrne Hessick, Race and Gender as Explicit Sentencing Factors, 14 J. GENDER RACE & JUST. 127, 137 (2010); United States v. Maples, 501 F.2d 985, 989 (4th Cir. 1974); Williams v. Currie, 103 F. Supp. 2d 858, 868 (M.D.N.C. 2000). 73 E.g., Sonja B. Starr, Estimating Gender Disparities in Federal Criminal Cases (under review) (2013) (finding large gender gaps at multiple procedural stages that are unexplained by observable variables, and also reviewing other studies). 74 Hessick, supra, at 128. 75 Id. at 129-36. 76 U.S.S.G. Sec 5H1.10. 77 E.g., Oren Gazal-Ayal, A Global Perspective on Sentencing Reforms, 76 LAW & CONTEMP. PROBS. I, iii-iv (2013); Mona Lynch, Expanding the Empirical Picture of Sentencing: An Invitation, 23 FED. SENT. R. 313 (2011). Some scholars criticize increasing female incarceration rates, but do not generally argue that women should receive lower sentences based on gender per se. Rather, they argue that the system should take more account of certain mitigating factors that are more often present in female defendants’ cases. E.g., Phyllis Goldfarb, Counting the Drug War’s Female Casualties, 6 J. GENDER RACE & JUST. 277, 291-93 (2002); Leslie Acoca & Myrna S. Raeder, Severing Family Ties: The Plight of Nonviolent Female Offenders and their Children, 11 STAN. L. & POL’Y REV. 133 (1999). 78 E.g., Monahan, supra, at 431. 79 See Margareth Etienne, Sentencing Women: Reassessing the Claims of Disparity, 14 J. GENDER, RACE, & JUSTICE 73, 82 (2010).

14 Published by University of Michigan Law School Scholarship Repository, 2013 17 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

Judge Michael Marcus states: “We are not treating like offenders alike if we insist on ignoring factors that make them quite unalike in risk.”80 But this argument, which embraces a concept of “actuarial fairness,”81 stands on unsound constitutional footing. The Supreme Court has consistently rejected defenses of gender classifications that are grounded in statistical generalizations about groups—even those with empirical support. In Craig v. Boren, for instance, the Court considered a challenge to a law subjecting men to a higher drinking age for certain alcoholic beverages than women. The state had defended the law with statistical evidence, including a study showing that young men were arrested for drunk driving at more than ten times the rate of young women (2% versus 0.18%). The Court noted observed that “prior cases have consistently rejected the use of sex as a decisionmaking factor even though the statutes in question certainly rested on far more predictive empirical relationships than this.” That is, what is prohibited is not just “outdated misconceptions” and merely “hypothesized” gender differences.82 What is prohibited is inferring an individual tendency from group statistics. Note that the government’s argument in Craig could easily have been framed in “actuarial fairness” terms: arguably it would have been unfair to bar young women from drinking based on a drunk driving risk that came almost entirely from males. But the Court’s approach to equal protection means that individuals are neither entitled to a favorable statistical generalization based on gender, nor subject to unfavorable ones. Examples of this principle abound. For instance, the Court has repeatedly held that government cannot base benefits policies on the assumption that wives are financially dependent on their husbands—even though, when the cases were decided in the 1970s, that presumption was usually correct.83 The Court explained that “such a gender-based generalization cannot suffice to justify the denigration of the efforts of women who do” support their families.84 Likewise, the Court has struck down gender-based peremptory challenges in jury selection, holding that the state cannot make assumptions about jurors based on gender, “even when some statistical support can be conjured up.”85 And in United States v. Virginia, the Court ordered the Virginia Military Institute to admit women, rejecting its arguments about “typically male or typically female ‘tendencies.’” The Court observed: “The United States does not challenge any expert witness estimation on average capacities or preferences of men and women. … It may be assumed, for purposes of this decision, that most women would not choose VMI's adversative method.” But, the Court emphasized, the point is not what most women would choose. “[W]e have cautioned reviewing courts to take a hard look at generalizations or ‘tendencies’ of the kind pressed by Virginia… [T]he State's great goal [of educating soldiers] is not substantially advanced by women's categorical exclusion, in total disregard of their individual merit, from the State's premier ‘citizen soldier’ corps.”86

80 Marcus, supra, at 769. 81 This is a concept that has traditionally (although subject to some limitations) dominated insurance law—the idea is that it is fair for insurers to tailor rates to the risks posed by particular groups, and unfair to expect groups to cross- subsidize one another’s risks. See, e.g., Tom Baker, Health Insurance, Risk, and Responsibility After the Patient Protection and Affordable Care Act, 159 U. PA. L. REV. 1577, 1597-1600 (2011). 82 See Monahan, supra, at 432-433 (defending gender-based risk prediction for civil commitment). 83 Frontiero v. Richardson, 411 U.S 677 (1973); Weinberger v. Wiesenfeld, 420 U.S. 636, 645 (1975). 84 Wiesenfeld, 420 U.S. at 645. 85 J.E.B. v. Alabama ex rel. T.B., 511 U.S. 127, 140 n.11 (1994). 86 In City of Los Angeles v. Manhart, 432 U.S. 702 (1978), the Court similarly struck down, on Title VII grounds, a requirement that female employees pay higher pension plan premiums because of their higher actuarial life expectancy. The Court stated:

15 http://repository.law.umich.edu/law_econ_current/90 18 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

In short, the Supreme Court has squarely rejected “statistical discrimination”—use of group tendencies as a proxy for individual characteristics—as a permissible justification for otherwise constitutionally forbidden discrimination. Economists often defend statistical discrimination as efficient, arguing that if a decision-maker lacks detailed information about an individual, relying on group-based averages (or even mere stereotypes, if the stereotypes have a grain of truth to them) will produce better decisions in the aggregate. But the Supreme Court has held that this defense of gender and race discrimination offends a core value embodied by the equal protection clause: that people have a right to be treated as individuals. Individualism, indeed, is at the very heart of the Supreme Court’s equal protection case law.87 Many scholars have criticized this characteristic, arguing that it renders the Court’s jurisprudence overly formalistic and too inattentive to substantive inequalities. On this view, the primary purpose of the Equal Protection Clause is to dismantle group-based subordination, not to ensure that government will treat individuals in ways that are blind to group identity; the latter approach may actually undermine the former if it prevents government from recognizing and acting to rectify societally entrenched inequalities.88 I am sympathetic to this view myself, in fact, but I frame this Article within the approach that dominates current doctrine. In any event, an antisubordinationist approach to equal protection law would hardly be friendlier to EBS, an approach that amplifies inequality in the criminal justice system’s impact by inflicting additional criminal punishment on the poor and, via disparate impact, on people of color. In Section D, I explore further EBS’s social and distributive impacts, and explain why (even though men, in general, are not a subordinated class) its inclusion of gender can be expected to exacerbate this social impact on disadvantaged groups. Thus, although gender discrimination is not wholly constitutionally forbidden, EBS proponents are going to face tough sledding if their defense of it depends on statistical generalizations about men and women. And it does—EBS is all about generalizing based on statistical averages, and its advocates defend it on the basis that the averages are right. At least in the gender context, that probably will not convince courts. The statistical relationship would at the very least have to be so strong that courts could deem the resulting individual predictions noticeably more sound than those the Supreme Court has rejected in the past, and could accordingly hold that an “exceedingly persuasive justification” for the classification was present. But this requirement sets a high bar—in United States v. Virginia, for instance, the Court’s only example of sex differences that the government could (within constraints) consider was the irreducible physical differences between men and women.89 Beyond gender, the Court’s emphasis on individualism and rejection of statistical discrimination should inform our thinking about the constitutionality of other variables as well.

This case … involves a generalization that the parties accept as unquestionably true: Women, as a class, do live longer than men….It is equally true, however, that all individuals in the respective classes do not share the characteristic that differentiates the average class representatives.….. [Title VII] precludes treatment of individuals as simply components of a racial, religious, sexual, or national class. … Even a true generalization about the class is an insufficient reason for disqualifying an individual to whom the generalization does not apply. Id. at 707-08; see Metro Broadcasting, Inc. v. F.C.C., 497 U.S. 547, 620 (1990) (O’Connor, J., dissenting) (citing this passage to inform the interpretation of the Equal Protection Clause). 87 See Richard Primus, Equal Protection and Disparate Impact: Round Three, 117 HARV. L. REV. 493, 553 (2002). 88 See id. at 554-59; Jack M. Balkin & Reva B. Siegel, The American Civil Rights Tradition: Anticlassification or Antisubordination?, 58 U. MIAMI L. REV. 9, 9-10 (2004) (reviewing antisubordinationist scholarship). 89 518 U.S. at 533.

16 Published by University of Michigan Law School Scholarship Repository, 2013 19 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

To be sure, it is not always forbidden for the government to rely on statistical generalizations; it would be hard to imagine government functioning if it did not, since it would have to tailor every action it takes to every individual. Government sometimes has to draw clear lines that may overgeneralize—for instance, the state sets a maximum blood-alcohol content for driving, rather than requiring each individual’s fitness to drive to be individually assessed. Frederick Schauer has made this point forcefully, offering a fairly broad defense of reliance on statistically supported generalizations.90 But as Schauer emphasizes, this practice properly has limits— certain kinds of generalizations (including those based on gender) are particularly socially harmful, or expressively invidious, even if they have statistical support.91 The practice of applying more demanding equal protection scrutiny to some government classifications than to others is grounded in similar reasoning. Note that the problem with EBS could be framed either as excess generalization (failure to treat people as individuals whose risk varies for reasons particular to them) or as insufficient generalization (failure to treat all those with the same criminal conduct the same way). Schauer, for instance, defended the then-mandatory U.S. Sentencing Guidelines, and particularly their bar on demographic and socioeconomic considerations, along the latter lines: “Ignoring real differences in sentencing -- sentencing socially beneficial heart surgeons to the same period of imprisonment for murder as socially parasitic career criminals -- may well serve the larger purpose of explaining that at a moment of enormous significance … we are all in this together.”92 In my view, the problem with EBS cannot be simply described in terms of generality versus particularity; the problem is not that the instruments generalize, but that they employ particular kinds of generalizations that are insidious, in a context that has huge consequences for individuals and communities. B. Wealth-Related Classifications in the Criminal Justice System The constitutional problem with EBS goes beyond gender. In this Section, I show that current doctrine also supports application of heightened scrutiny to variables related to socioeconomic status, such as employment status, education, or income. The Supreme Court’s case law in other contexts has consistently held that similar wealth-related classifications are not constitutionally suspect,93 and perhaps this is why EBS scholars have completely ignored the potential constitutional concerns with these variables. But this case law is not dispositive in the sentencing context. Many criminal defendants have challenged policies and practices that effectively discriminate against the indigent, including discrimination in punishment. These defendants have often succeeded, and the Supreme Court and lower courts have applied to their claims a special form of heightened constitutional scrutiny, citing intertwined equal protection and due process considerations. The treatment of indigent criminal defendants has for more than a half-century been a central focus of the Supreme Court’s criminal procedure jurisprudence. Indeed, the Court has often used very strong language concerning the importance of eradicating wealth-related

90 FREDERICK SCHAUER, PROFILES, PROBABILITIES, AND STEREOTYPES (2003). 91 Id. at 38-41. 92 Id. at 261-62. Although I am uncomfortable with group-based sentencing distinctions, I do not favor mandatory sentencing, because offenses are often defined too broadly to capture real differences in criminal conduct and culpability. Also, mandatory sentencing laws generally do not eliminate individualization of punishment, but shift the power to individualize toward prosecutors (a possibility Schauer acknowledges, id. at 256). 93 E.g., Maher v. Roe, 432 U.S. 464, 471 (1977).

17 http://repository.law.umich.edu/law_econ_current/90 20 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

disparities in criminal justice; in Griffin v. Illinois, for instance, it called this objective “the central aim of our entire judicial system.”94 Griffin struck down the requirement that defendants pay court costs before receiving a trial transcript, which they need to prepare an appeal. The Court held that “[i]n criminal trials a State can no more discriminate on account of poverty than on account of religion, race, or color,” and that “[t]here can be no equal justice where the kind of trial a man gets depends on the amount of money he has.”95 Numerous other cases also stand for the principle that both equal protection and due process concerns require that indigent criminal defendants not be subject to special burdens. Principally, these cases have focused on access to the criminal process: “the belief that justice cannot be equal where, simply as a result of his poverty, a defendant is denied the opportunity to participate meaningfully in a judicial proceeding in which his liberty is at stake.”96 Notably, these cases have applied heightened scrutiny even when the wealth-based classification did not deprive the defendant of something to which he otherwise would have had a substantive right— the cases relating to appeal procedures, for instance, reiterated the then-established principle that a State need not provide an appeal as of right at all. Rather, Griffin and its progeny involved a special “equality principle” motivated by “the evil [of] discrimination against the indigent.”97 For this reason, a challenge to EBS need not establish that the defendant has some free-standing constitutional entitlement to a lower sentence than he received. For our purposes, the most on-point Supreme Court case is Bearden v. Georgia, in which the district court had revoked the probation of an indigent defendant who had been unable to pay his court-ordered restitution.98 The Court unanimously reversed, holding that incarcerating a defendant merely because he was unable to pay amounted to unconstitutional wealth-based discrimination.99 Importantly, the Court in Bearden squarely rejected the state’s argument that poverty was a recidivism risk factor that justified additional incapacitation: [T]he State asserts that its interest in rehabilitating the probationer and protecting society requires it to remove him from the temptation of committing other crimes. This is no more than a naked assertion that a probationer's poverty by itself indicates he may commit crimes in the future. …[T]he State cannot justify incarcerating a probationer who has demonstrated sufficient bona fide efforts to repay his debt to society, solely by lumping him together with other poor persons and thereby classifying him as dangerous. This would be little more than punishing a person for his poverty.100 The Court’s resistance to “lumping [the defendant] together with other poor persons” is very similar to its reasoning concerning statistical discrimination in the gender cases. The Court

94 351 U.S. 12, 16 (1956). 95 Id. at 19. Accord Mayer v. City of Chicago, 404 U.S. 189 (1971). 96 Ake v. Oklahoma, 470 U.S. 68, 76 (1985); see also Gideon v. Wainwright, 372 U.S. 335, 344 (1963) (citing the goal of achieving a justice system in which, regardless of finances, “every defendant stands equal before the law”). 97 Lewis v. Casey, 518 U.S. 343, 369-72 & nn. 2-3 (1996) (Thomas, J., concurring) (reviewing case law); see Douglas v. California, 372 U.S. 353, 355-57 (1963); United States v. MacCollom, 426 U.S. 317, 331 (1976) (Brennan, J., dissenting) (referring to the “Griffin equality principle”). 98 461 U.S. 660 (1983). 99 Id. Bearden built on Williams v. Illinois, 399 U.S. 235 (1970), in which the Court had similarly reversed a revocation of probation for failure to pay restitution. In Williams, the resulting incarceration sentence exceeded the statutory maximum for the crime, and the Court stated in dictum that absent that problem, no constitutional concern would have been raised. Id. at 243 In Bearden, however, the incarceration sentence did not exceed the statutory maximum, and the Court nonetheless held it unconstitutional, apparently rejecting the Williams dictum. 100 461 U.S. at 671.

18 Published by University of Michigan Law School Scholarship Repository, 2013 21 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

observed that the state had cited “several empirical studies suggesting a correlation between poverty and crime,” but it was not persuaded by this appeal to a statistical generalization.101 Bearden does not establish that financial background is always irrelevant to sentencing. Although the Court decisively rejected the use of poverty to predict crime risk, it took more seriously a different defense of the provocation revocation. The Court emphasized one reason it may be permissible to consider ability to pay (and related factors such as employment history) when choosing between incarceration and restitution sentences: The State, of course, has a fundamental interest in appropriately punishing persons--rich and poor--who violate its criminal laws. A defendant's poverty in no way immunizes him from punishment. Thus…the sentencing court can consider the entire background of the defendant, including his employment history and financial resources.102 That is, the State may consider financial factors as necessary to ensure the poor do not avoid punishment—as they would if sentenced only to pay a fine or restitution that they then cannot pay. But with EBS, poverty is not being considered to enable equal punishment of rich and poor, but to trigger extra, unequal punishment.103 The Court further held that even when probation revocation is necessary to ensure that the poor do not avoid punishment, it is only permitted after an inquiry to determine if there are viable alternatives, such as “a reduced fine or alternate public service…Only if the sentencing court determines that alternatives to imprisonment are not adequate in a particular situation to meet the State's interest in punishment and deterrence may the State imprison a probationer who has made sufficient bona fide efforts to pay.”104 This requirement that less restrictive alternatives be considered is a hallmark of strict scrutiny. However, the Court resisted expressly categorizing its analysis within any particular tier of scrutiny. Indeed, reviewing the case law on indigent criminal defendants, the Court expressed ambivalence as to whether the key constitutional provision was really the Equal Protection Clause at all, as opposed to the Due Process Clause. As the Court explained, these constitutional concerns are intertwined in these cases, and in any event, “[w]hether analyzed in terms of equal protection or due process, the issue cannot be resolved by resort to easy slogans or pigeonhole analysis, but rather requires a careful inquiry into such factors as ‘the nature of the individual interest affected, the extent to which it is affected, the rationality of the connection between legislative means and purpose, [and] the existence of alternative means for effectuating the purpose ....’”105 This language suggests an unconventional, perhaps somewhat flexible balancing test: a stronger legislative purpose and connection to that purpose might be required depending on the individual

101 Id. at 671 n.11. 102 Id. at 669-70. 103 See also Williams v. Illinois, 388 U.S. at 244 (stating that ability to pay can be considered to avoid “inverse discrimination”); United States v. Altamirano, 11 F.3d 52, 53 (5th Cir. 1993) (discussing the circumstances in which courts can consider indigency). A defendant, indeed, is constitutionally entitled to a judicial inquiry into her ability to pay a fine. See, e.g., Powers v. Hamilton County Public Defender Comm’n, 501 F.3d 592, 608 (6th Cir. 2007). 104 461 U.S. at 671-72. Similarly, Justice White wrote that because “poverty does not insulate those who break the law from punishment,” the poor may be imprisoned if they cannot pay fines, but only “if the sentencing court makes a good-faith effort to impose a jail sentence that in terms of the state's sentencing objectives will be roughly equivalent to the fine and restitution that the defendant failed to pay.” That is, the magnitude of the punishment must be the same, even if the means is not. Bearden, 461 U.S. at 675 (White, J., concurring in the judgment). 105 461 U.S. at 666-67; see Evitts v. Lucey, 469 U.S. 387 (1985) (discussing the interrelationship between due process and equal protection concerns in these cases).

19 http://repository.law.umich.edu/law_econ_current/90 22 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

interest at stake and the extent to which it is effected. But in requiring a “careful inquiry” into each factor, including the existence of alternatives, it is clear that the Court means to require some form of heightened scrutiny, considerably more assertive than mere rational basis review. Although Bearden involved revocation of probation, lower courts have treated it as a constraint on initial sentencing decisions. For instance, the Ninth Circuit has cited Bearden to reverse a district court’s decision to treat inability to pay restitution as an aggravating sentencing factor, explaining that “the court improperly injected socioeconomic status into the sentencing calculus” and that “the authority forbidding such an approach is abundant and unambiguous.” 106 Conversely, citing the same disparity concern, the Ninth Circuit has also reversed (as “unreasonable” under United States v. Gall) a decision to reduce a defendant’s sentence due to ability to pay restitution, holding: “Rewarding defendants who are able to make restitution in large lump sums…perpetuates class and wealth distinctions that have no place in criminal sentencing.”107 Even before Bearden, several circuits had already held that equal protection entitles an indigent defendant who was unable to make bail to credit against the eventual sentence for time served, to avoid impermissible wealth-based distinctions in sentencing.108 The Supreme Court and lower courts have recognized a divergence between the Supreme Court’s treatment of indigent criminal defendants and its normally deferential review of wealth- based classifications: “legislation which has a disparate impact on the indigent defendant should be subject to a more searching scrutiny than requiring a mere rational relationship.”109 The Supreme Court itself has repeatedly noted this divergence. In United States v. Kerr, a district court reasoned that special scrutiny is justified by a combination of the serious stakes and the nature of the class: “At stake here is not mere economic or social welfare regulations but deprivation of a man's liberty. The courts ‘will squint hard at any legislation that deprives an individual of his liberty—his right to remain free.’ Moreover, the indigent, though not a suspect class, have suffered unfair persecution.”110 Outside the context of inability to pay fines and restitution, there is relatively little case law focusing on use of wealth classifications to determine substantive sentencing outcomes. This dearth should not be taken to suggest judicial approval—the issue likely rarely arises because the practice is rare. The criminal justice system has been rife with procedural obstacles to equal treatment of the indigent, and there are no doubt many subtle or de facto ways in which

106 United States v. Burgum, 633 F.3d 810, 816 (9th Cir. 2011); accord United States v. Parks, 89 F.3d 570, 572 (1996) (“[The defendant] may be receiving an additional eight months on this sentence due to poverty. Such a result is surely anathema to the Constitution.”); see also United States v. Ellis, 907 F.2d 12, 13 (1st Cir. 1990) (stating that “the government cannot keep a person in prison solely because of indigency”); but see State v. Todd, 147 Idaho 321, 323 (2009) (upholding inability to pay as an aggravating factor). 107 United States v. Bragg, 582 F.3d 965, 970 (9th Cir. 2009). 108 See, e.g., King v. Wyrick, 516 F.3d 321, 323 (8th Cir. 1975); Ham v. North Carolina, 471 F.2d 406, 407, 408 (4th Cir.1973); Johnson v. Prast, 548 F.2d 699, 703 (7th Cir.1977); but see Vasquez v. Cooper, 862 F.2d 250 (10th Cir. 1988) (finding no constitutional violation because the court considered inability to pay when setting bail). 109 U.S. v. Luster, 889 F.2d 1523 (6th Cir. 1989); see also Maher v. Roe, 432 U.S. 464, 471 n.6 (1977); Kadrmas v. Dickinson Public Schools, 487 U.S. 450, 461 n.*(1988) (rejecting heightened scrutiny in a non-criminal case because “the criminal-sentencing decision at issue in Bearden is not analogous to the user fee … before us”); Dickerson v. Latessa, 872 F.2d 1116, 1119-1120 (1st Cir. 1989) (observing that classifications implicating appeal rights receive heightened scrutiny only if they are wealth-based); United States v. Avendano-Camacho, 786 F.2d 1392, 1394 (9th Cir. 1986) (Kennedy, J.) (same); United States v. Kerr, 686 F. Supp. 1174 (W.D. Pa. 1988). 110 Kerr, 686 F. Supp. at 1178 (quoting Williams v. Illinois, 399 U.S. at 263 (Harlan, J., concurring) and citing Papachristou v. City of Jacksonville, 405 U.S. 156 (1972)).

20 Published by University of Michigan Law School Scholarship Repository, 2013 23 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

poverty might influence the sentence. But the practice of actually treating poverty as an aggravating factor in sentencing has not been prevalent (before EBS) and has been considered illegitimate. For instance, the formerly mandatory U.S. Sentencing Guidelines forbid consideration of socioeconomic status.111 It is true that, now that the guidelines are merely advisory, federal courts do occasionally refer to education or employment when discussing the offender’s circumstances (as do state courts—in contrast to gender, which is essentially never cited).112 Such cases might well also be constitutionally problematic, unless such factors are used in service of the “equal punishment” principle discussed above; I do not focus here on the factors that can be considered in individualized judicial assessments of offenders. But at least such cases do not necessarily reflect a generalization that unemployed or uneducated people are categorically more dangerous, in the mechanical way that the EBS instruments do. Instead, the court can assess what each factor means in the context of a particular case—considering, for instance, whether the offender is making an effort to find employment or otherwise pursue rehabilitation, rather than simply blindly adding a given number of points based on current employment status or past educational attainment. The federal Guidelines do include an enhancement for offenders with a “criminal livelihood,”113 and defendants have occasionally challenged that enhancement as disparately affecting the poor, because the same criminal revenue would constitute a larger share of a low- income person’s livelihood. Soon after the guideline’s adoption, a least one district court held (citing Bearden) that, to avoid this potential constitutional concern, it should be interpreted to focus on the absolute amount of criminal income, rather than the share of total income, and the Sentencing Commission amended the guideline to come closer to this view.114 After the amendment, the Sixth Circuit upheld the new guideline against a similar challenge, holding that although Bearden required heightened scrutiny of sentencing burdens on the poor, the amended guideline appropriately targeted “professional criminals” who have “chosen crime as a livelihood” and that any disproportionate effect on the poor did not reflect disparate treatment, but rather was “an incidental effect of the statute’s objective.”115 This rationale, however, cannot be applied to EBS, in which poverty indicators are themselves treated as recidivism risk factors—exactly the statistical generalization that the Supreme Court squarely condemned in Bearden. As the district court put it in Kerr, even though Bearden recognized “a correlation between poverty and crime,…a person cannot be punished solely for his poverty. As a matter of constitutional belief, the presumption that the indigent will act criminally ‘is too precarious for a rule of law.’”116 It is difficult to see how the socioeconomic variables in EBS can avoid Bearden-like heightened scrutiny. Unemployment and education, the most common such variables, cannot

111 U.S.S.G. 5H1.10; see also Joan Petersilia & Susan Turner, Guideline-Based Justice: Prediction and Racial Minorities, 9 CRIME & JUST. 151, 153-154, 160 (1987) (describing sentencing reformers’ objective of eliminating role of “status” factors like employment). 112 E.g., United States v. Trimble, 2013 WL 1235510 (11th Cir. 2013). 113 U.S.S.G. 4B1.3. 114 United States v. Rivera, 694 F.Supp. 1105 (S.D.N.Y.1988); see United States v. Luster, 889 F.2d 1523 (6th Cir. 1989) (describing the amendment). The amended guideline’s quantitative inquiry concerns only the amount of criminal income; there is also a qualitative inquiry into whether crime was the defendant’s “primary occupation.” 115 Luster, 889 F.2d at 1530. 116 686 F. Supp. at 1179. Cf. Edwards v. California, 314 U.S. 160, 177 (striking down a vagrancy law and holding that it could not be “seriously contended that because a person is without employment and without funds he constitutes a ‘moral pestilence’. Poverty and immorality are not synonymous.”).

21 http://repository.law.umich.edu/law_econ_current/90 24 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

meaningfully be distinguished from the ability to pay, nor can composite variables like “financial status.” All are proxies for poverty, and the case law in the Bearden-Griffin line makes interchangeable references to “wealth,” “poverty,” “class,” and so forth without fine distinctions. For instance, the Court has always treated “ability to pay” as being equivalent to poverty, even though the two are not identical—ability to pay also depends on what one’s other expenses are, whether one can borrow money from someone, and so forth. Bearden directly addresses, and limits, the circumstances under which courts can consider “employment history and financial resources,” specifically rejecting the consideration of such factors as recidivism predictors.117 Indeed, the argument the Court was rejecting in that passage turned fundamentally on employment status; the empirical studies that Georgia had cited in Bearden to support its recidivism-risk argument were mainly studies of the relationship between unemployment and recidivism, and the state emphasized that the defendant’s recent job loss made him a higher recidivism risk.118 Meanwhile, the point of including education in the recidivism instrument is that it is a proxy for the defendant’s future prospects for employment and legitimate earnings; it would be hard to defend the use of this factor using logic that clearly distinguished it from past, present, or future poverty. Neighborhood characteristics could potentially also be considered socioeconomic variables, since they are also very closely related to poverty, although this example is more disputable because these variables operate at a geographic level and do not draw distinctions among persons within the neighborhood.119 While there are limits to the courts’ efforts to protect indigent defendants, those limits have been found in cases testing what affirmative assistance the state must provide in order to level the criminal justice playing field. EBS, in contrast is a deliberate effort to unlevel that field. As with gender, its defenders will be fighting an uphill battle to overcome heightened scrutiny, because if, as Bearden holds, one cannot impute individual risk based on the average risk posed by poor defendants, the rationale for EBS disappears. C. The Social Harm of Demographic and Socioeconomic Sentencing Discrimination EBS’s use of demographic, socioeconomic, and family-related characteristics is also highly troubling on public policy grounds. As noted above, EBS advocates frequently emphasize its potential to help reduce incarceration rates.120 But what they do not typically emphasize is that the mass incarceration problem in the United States is drastically disparate in its distribution. This unequal distribution is a core driver of its adverse social consequences, because it leaves certain neighborhoods and subpopulations decimated. Black men, for instance, are 52 times as likely to be incarcerated as white women are.121 Young black men are especially at risk: one in nine black men under 35 are currently behind bars,122 and one in three will be at some point in

117 461 U.S. at 671. 118 Brief of the Respondent, Bearden v. Georgia, 1982 U.S. Sup. Ct. Briefs LEXIS 438, at 32-35. 119 Given fairly high levels of residential segregation, see generally U.S. Census Bureau, Racial and Ethnic Residential Segregation in the United States: 1980-2000, available at http://www.census.gov/hhes/www/housing/housing_patterns/pdf/censr-3.pdf, neighborhood might also be a racial proxy, but challengers would likely have trouble proving a racially discriminatory purpose. 120 See supra note 38 and accompanying text. 121 See HEATHER C. WEST, U.S. DEP’T OF JUSTICE, BUREAU OF JUSTICE STATISTICS, PRISON INMATES AT MIDYEAR 2009—STATISTICAL TABLES, 21 tbl.18 (2010). 122 PEW CTR. ON THE STATES, ONE IN 100: BEHIND BARS IN AMERICA 2008, at 3 (2008), available at http://www.pewstates.org/research/reports/one-in-100-85899374411.

22 Published by University of Michigan Law School Scholarship Repository, 2013 25 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

their lives.123 And the concentration of incarceration’s effects is even more dramatic when one takes into account socioeconomic and neighborhood-level predictors. High school dropouts, for example, are 47 times as likely to be incarcerated as college graduates are, and young black male dropouts are incarcerated at a rate of approximately 22% at any given time.124 An ample literature documents these disparities and their effects on communities.125 The EBS instruments produce higher risk estimates, other things equal, for the same subgroups that are already disproportionately incarcerated, and so it is reasonable to predict that EBS will exacerbate these disparities. Although we do not know whether EBS will reduce incarceration on balance, the most intuitive expectation is that it will increase incarceration for some people (those deemed high-risk) and reduce it for others (those deemed low-risk). If so, it will further concentrate mass incarceration’s impact demographically. This is likely to include concentrating its racial impact. I have ignored race in my constitutional analysis, because the instruments do not include it. But the socioeconomic, family, and neighborhood variables that they do include are highly correlated with race, as is criminal history, so they are likely to have a racially disparate impact.126 Although the courts have not recognized equal protection claims grounded in disparate impact, policymakers should care about the consequences of their policies, and not just about the facial distinctions that they draw. Ample literature documents mass incarceration’s severe consequences for African- American communities in particular. If EBS exacerbates this problem, it would be particularly hard to defend it as a progressive strategy for responding to the mass incarceration crisis. The demographic concentration problem is one reason to worry about the gender and age variables, in addition to socioeconomic status. In other contexts, discrimination based on young age is often treated as not particularly morally troublesome. Young age is not a significant social disadvantage, nor is it even really a discrete group trait; everyone has it and then loses it. Likewise, many advocates no doubt worry less about gender discrimination that adversely affects men because men, taken as a whole, have dominant political and economic power. But the likely impact of EBS is not centered on “men taken as a whole,” nor on young people generally. Rather, it will principally affect a subgroup of young men—those involved in the criminal justice system, mostly poor men of color—who are highly disadvantaged. The age and gender criteria exacerbate the extent to which incarceration’s impact targets a particular slice of disadvantaged communities, effectively resulting in a substantial part of a generation of men being absent from communities and exacerbating the socially distortive effects of mass incarceration. A broad literature explores the effects of high, demographically concentrated incarceration rates on everything from marriage rates to overall community cohesion.127

123 THOMAS BONCZAR, U.S. DEP’T OF JUSTICE, BUREAU OF JUSTICE STATISTICS, PREVALENCE OF IMPRISONMENT IN THE U.S. POPULATION, 1974-2001 (2003). 124 Center for Labor Market Studies, The Consequences of Dropping Out of School (2009), available at http://hdl handle.net/2047/d20000596; see also Robert J. Sampson & Charles Loeffler, Punishment’s Place: The Local Concentration of Mass Incarceration, DAEDALUS (Summer 2010) (discussing neighborhood effects). 125 E.g., MICHELLE ALEXANDER, THE NEW JIM CROW: MASS INCARCERATION IN THE AGE OF COLORBLINDNESS (2011); TODD R. CLEAR, IMPRISONING COMMUNITIES: HOW MASS INCARCERATION MAKES DISADVANTAGED NEIGHBORHOODS WORSE (2007); IMPRISONING AMERICA: THE SOCIAL EFFECTS OF MASS INCARCERATION (Mary Patillo et al. eds., 2004). 126 See Harcourt, supra note 55 (arguing that “prior criminal history has become a proxy for race”). 127 Todd R. Clear, supra, at 97; William A. Darity, Jr. & Samuel L. Myers, Jr., Family Structure and the Marginalization of Black Men: Policy Implications, in THE DECLINE IN MARRIAGE OF AFRICAN AMERICANS 263,

23 http://repository.law.umich.edu/law_econ_current/90 26 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

Another serious disadvantage is the expressive message sent by state endorsement of sentencing based on group traits. Consider specifically the traits associated with socioeconomic disadvantage. Though many Americans no doubt already suspect that the criminal justice system is biased against the poor, EBS ends any doubt on the matter. It involves the state telling judges explicitly that poor people should get longer sentences because they are poor—and, conversely, that socioeconomic privilege should translate into leniency. That is a message that, I suspect, many state actors would find embarrassing to defend in public. Doing so would require pointing to a justification that hardly improves matters: that the poor are dangerous. Generalizing about groups based on crime risk is a practice with a pernicious social history.128 Dressing up that generalization in scientific language may have succeeded in forestalling public criticism, but mostly because few Americans understand these instruments or even are aware of them. If the instruments were better understood (and as EBS expands, perhaps they will be), they would send a clear message to disadvantaged groups: the system really is rigged. Further, if that message undermines the criminal justice system’s legitimacy in disadvantaged communities, it could undermine EBS’s crime prevention aims.129 Some EBS advocates propose that it should be used only to mitigate sentences, and such proposals have, at first glance, a seductive appeal—reducing incarceration rates is an important objective.130 But there is no persuasive reason to believe access to risk predictions would tend to reduce sentences rather than increasing them (or doing both in different cases). Some advocates a retributivist approach to sentencing for the rise in incarceration, and suggest that EBS would help to make sentencing more moderate by encouraging a practical focus on crime prevention instead.131 This line of argument is curious, however, because much of the political “tough on crime” movement over the past several decades has in fact been accompanied by public safety language, responding to the public’s (oft-exaggerated) perceptions of crime risk.132 One could attempt to force unidirectional use of risk assessments, but it may be difficult. If judges are given the risk assessments before they choose the sentence, even if they are told to only use them for mitigation, it is difficult to expect them to completely ignore high-risk assessments.133 And even if the risk score is not provided until an initial sentence is chosen, judges who know that subsequent mitigation will be available if it turns out that the defendant is

286 (M. Belinda Tucker & Claudia Mitchell-Kernan eds., 1995); Bruce Western et al., Incarceration and the Bonds Between Parents in Fragile Families, in IMPRISONING AMERICA, supra, at 21-45; Elizabeth I. Johnson & Jane Waldfogel, Children of Incarcerated Parents, in IMPRISONING AMERICA, supra, at 98; James P. Lynch & William J. Sabol, Effects of Incarceration on Informal Social Control in Communities, in IMPRISONING AMERICA, supra, at 135-164. 128 For a recent, prominent reflection on the way such generalizations about black men have affected African- American communities, see Barack Obama, Remarks by the President on Trayvon Martin (July 19, 2013), available at http://www.whitehouse.gov/the-press-office/2013/07/19/remarks-president-trayvon-martin. 129 See William Stuntz, Race, Class, and Drugs, 98 COLUM. L. REV. 1795, 1825-30 (1998) (discussing the effects of community perceptions of unfairness on law compliance). 130 E.g., Etienne, supra; J. Richard Couzens, Evidence-Based Practices: Reducing Recidivism to Increase Public Safety: A Cooperative Effort by Courts and Probation 10 (June 27, 2011), available at http://www.courts.ca.gov/documents/EVIDENCE-BASED-PRACTICES-Summary-6-27-11.pdf; Kleiman, supra, at 301 (explaining that Virginia’s EBS program diverts 25% of nonviolent prison-bound offenders to probation). 131 Marcus, supra, at 751. 132 Rachel Barkow, Federalism and the Politics of Sentencing, 105 COLUM. L. REV. 1276, 1278-81 (2005). 133 Analogously, limiting instructions to juries—instructions to consider evidence for one purpose but not another— are “notoriously ineffective” and “may be counterproductive because they call jurors’ attention to the evidence that is supposed to be ignored.” Prescott & Starr, supra, at 323 (citing studies).

24 Published by University of Michigan Law School Scholarship Repository, 2013 27 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

low risk might err on the side of higher preliminary sentences. Likewise, the risk scores could affect the parties’ strategies; in particular, prosecutors might push for longer sentences for higher-risk offenders. Even if the scores are withheld at first from the parties, given that the instruments are quite simple, one would expect the parties to calculate the scores themselves and plan accordingly, and not to wait for the official report. But let us hypothesize that it could be guaranteed that risk scores would only reduce sentences. Would such an approach be justified? I am loath to resist strategies for reducing unnecessary incarceration. But the key question here is not whether low-risk defendants should be diverted from incarceration—it is whether those low-risk diversion candidates should be identified based on constitutionally problematic demographic and socioeconomic characteristics (instead of past or present criminal conduct or other personal, behavioral assessments). I conclude that such an approach raises the same problems as does EBS generally. As a constitutional matter, policies that benefit only the lowest-risk offenders may actually be more objectionable because they are less flexible and narrowly tailored—more like quotas than “plus factors.” Those with sufficiently unfavorable demographic and socioeconomic characteristics will never qualify as “low risk,” no matter how favorable their other characteristics. Consider the Missouri instrument described in Part I. A 20-year-old high school dropout with no job loses six points for those characteristics alone, and can never score higher than 1 on the scale (“average” risk), even if he has no criminal history and no other risk factors and has committed a relatively minor offense. Other instruments that consider gender and a wider variety of socioeconomic and family traits could be even more strongly driven by those factors.134 Special exceptions for the privileged cut against the foundational principle that the justice system should treat everyone equally. Moreover, one likely driver of the growth of incarceration is that the relatively privileged majority of the population has been spared its brunt.135 Those who are primarily incarcerated—poor young men of color—are not politically well represented, and most other citizens have little reason to worry about the growth of incarceration. Progressives should hesitate before endorsing policies that give them another reason not to worry, even if those policies will have the immediate effect of somewhat restraining that growth. Merely raising the potential policy concerns associated with discrimination and disparity does not necessarily end the argument, just as the constitutional inquiry is not ended by establishing that EBS merits heightened constitutional scrutiny. One must consider how strongly EBS advances competing state interests. In the next Part, then, I turn to the question whether the studies support EBS advocates’ optimism.

III. Assessing the Evidence for Evidence-Based Sentencing

Protecting society from crime while avoiding excessive incarceration is no doubt an important interest, even a “compelling” one. But the Constitution and good policy also require

134 The mitigation-only approach also would not deprive defendants of standing to challenge EBS; a defendant who would have received diversion to probation had the risk instrument not considered his gender, for instance, is harmed by that consideration. The Supreme Court has often considered equal protection challenges in which the plaintiff claims she was denied a government benefit (such as university admission) on the basis of some improper consideration. E.g., Fisher, __ S.Ct. at __. 135 James Forman, Jr., Why Care About Mass Incarceration?, 108 MICH. L. REV. 993, 1001 (2010); William J. Stuntz, Unequal Justice, 121 HARV. L. REV. 1969, 1974 (2008).

25 http://repository.law.umich.edu/law_econ_current/90 28 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

assessing the strength of the relationship between EBS and that interest. When heightened scrutiny applies, it is the state’s burden to provide convincing evidence establishing that relationship. In this Part, I show that the current empirical evidence does not suffice. A. Precision, Group Averages, and Individual Predictions The instruments’ first serious limitation is that they do not provide anything even approaching a precise prediction of an individual’s recidivism probability. At best, what they predict with reasonable precision is the average recidivism rate for all offenders who share with the defendant whichever characteristics are included as variables in the model. If the model is well specified and based on a sample that is representative of the population to which the results are extrapolated, then it might perform this task well. But that does not necessarily make it particularly useful for individual predictions. Individual vary much more than groups do, and even a relatively precisely estimated model will often not do well at predicting individual outcomes in particular cases.136 Social scientists sometimes refer to the broader ranges attached to individual predictions as “prediction intervals” (or sometimes as “forecast” uncertainty or “confidence intervals for a forecast”) to distinguish them from the “confidence intervals” that are estimated for the group mean or for the effect of a given variable. To illustrate this point, let’s start with an example that involves predicting a continuous outcome rather than a binary future event. To simplify, we will consider only one explanatory variable (sex) and one normally distributed outcome variable (height), which are quite strongly related. The height distributions of the U.S. male and female populations look approximately like Figure 1 below, which is based on average heights of 70 inches for males (standard deviation 3 inches) and 65 inches for females (standard deviation 2.5 inches).

Height Distributions by Gender .15 .1 .05 0 55 60 65 70 75 80 Height (Inches)

Men Women

But suppose one did not know the true population distributions, and one had to estimate them by taking a random sample. If one takes a large enough sample, it is easy to obtain quite precise estimates of the average male height and the average female height (as well as the average additional height associated with being male, which is just the difference between the

136 See David J. Cooke & Christine Michie, Limits of Diagnostic Precision and Predictive Utility in the Individual Case, 34 LAW & HUM. BEHAV. 259, 259 (2010) (“It is a statistical truism that the mean of a distribution tells us about everyone, yet no one.”).

26 Published by University of Michigan Law School Scholarship Repository, 2013 29 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

group means). This point is illustrated in Table 1. I created simulated data for a “true population” of men and women that has the height distributions shown in Figure 1. Then I drew from that population random samples with sample sizes 20, 200, and 400, regressed height on gender within each sample, and recorded the predicted mean heights for men and women and the confidence intervals for those means.

Table 1. Precision of Predicted Means versus Individual Forecasts: An Illustration Male Height in Inches Female Height in Inches Sample Mean (& 95% Conf. Int. 95% Pred. Int. for Mean (& 95% Conf. Int. 95% Pred. Int. for Size Forecast) for the Mean Indiv. Forecast Forecast) for the Mean Indiv. Forecast 20 69.8 [68.2, 71.4] [64.4, 75.1] 64.8 [63.2, 66.4] [59.4, 70.1] 200 69.8 [69.3, 70.4] [64.3, 75.4] 64.6 [64.0, 65.1] [59.0, 70.1] 400 70.0 [69.6, 70.4] [64.6, 75.4] 64.9 [64.5, 65.3] [59.5, 70.3] Samples are drawn from a simulated "true population" with population means and standard deviations of 70.0 (3.0) for men and 65.0 (2.5) for women.

Notice that even the smallest sample quite closely approximates the true population means of 70 and 65 inches, while the largest sample comes even closer. Exactly how close each sample comes involves chance (different random samples of the same sizes would have different means), but in general chance plays a smaller role the larger the sample is; as the sample grows the estimates should converge on the true population values. This expectation is captured in the estimation of confidence intervals for the mean, which get narrower as the sample gets larger. Confidence intervals are a way of accounting for chance in sampling. For the 400-person sample, one can express 95% condidence in quite a precise estimate: for males, between 69.6 inches and 70.4 inches, and for females, between 64.5 inches and 65.3 inches.137 If you keep drawing more and more 400-person samples, they don’t tend to differ very much; with that sample size, you will generally do quite a good job approximating the underlying population, which is why the confidence interval is narrow. Meanwhile, the 20-person sample gives you wider 95% confidence intervals, each spanning more than three inches—a much rougher estimate. But what if you wanted to use your 400-person sample not to estimate the averages for the population, but to predict the height of just the next random woman you meet? Your single best guess—the one that is statistically expected to err by the lowest margin—would be the group mean from your sample, which is 64.9. But you wouldn’t be nearly as confident in that prediction as you would in the prediction for the group mean. In fact, within that sample, only 13.5% of women have heights that are between 64.5 inches and 65.3 inches, which was your 95% confidence interval for the group mean. If you wanted to give an individual forecast for that next woman that you could be 95% confident in, it would have to be much less precise—you could predict that she would be somewhere between 59.5 inches and 70.3 inches, the 95% prediction interval for the individual forecast that is shown in Table 1. That’s a range of nearly

137 To describe something as a 95% confidence interval for an estimated group mean is to express confidence that 95% of the time, when one draws a random sample and uses the same estimation procedure, the interval one estimates will contain the true group mean for the underlying population.

27 http://repository.law.umich.edu/law_econ_current/90 30 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

eleven inches—in other words, you don’t know much at all about how tall to expect the next woman to be.138 One could make the example much more complicated, with multiple variables and more irregular distributions of outcomes, but the prediction interval for an individual forecast is always wider than the confidence interval for the mean—generally much wider.139 Note that while the confidence intervals for the means gets much narrower as the sample gets larger, the prediction interval does not. The underlying uncertainty that it reflects is not mainly the possibility of having gotten an unusual sample; it’s the variability in the underlying population that we saw in Figure 1. One could narrow the prediction interval by adding variables to the regression that help to explain this underlying variability—for example, the heights of the individual’s parents. The same basic intuition also applies to models of binary outcomes, like whether a defendant will recidivate—the expected outcome for an individual is much less certain than the expected rate for a group. Some of the recidivism risk prediction instruments include confidence intervals for the probabilities they predict. Indeed, some scholars have urged that confidence intervals should always be provided (rather than mere point estimates) so that judges can get an idea of how precise the instruments are.140 But given that judges are using the instruments for the purpose of predicting a specific individual’s probability of recidivism, providing them a confidence interval for the group recidivism rate might even be more misleading than not providing any at all. For instance, if judges are told “The estimated probability that Defendant X will recidivate is 30%, and the 95% confidence interval for that prediction is 25% to 35%,” that may well sound to the judge like a reasonably precise individual prediction, but it is not. It is merely a reasonably precise estimate of an average recidivism rate.141 If the underlying study has a large sample size, such a prediction could be very precise even if the model’s variables do not capture much of the variation in individual probabilities at all. With binary outcomes, though, while the confidence interval for the mean may be misleading, the “prediction interval” is not a very useful alternative way of expressing the precision of an individual forecast, because it does not tell you anything that was not already

138 Note that the estimated uncertainties in Table 1 are based on a regression of height on gender using standard Stata postestimation prediction commands. By construction, the uncertainties are the same for men and women. Another way to estimate a 95% prediction interval for the height of the next woman you meet would be to just ignore the men and give the range within which the middle 95% of the women in your sample fall. Because female height has a slightly narrower distribution, your interval would then be a bit narrower (about 10 inches), but this method would produce a wider interval for the next male’s height (about 12 inches). These ranges are marked on Figure 1. 139 See Cooke & Michie, supra, at 271 (illustrating this point using simulated data on violence risk among psychiatric patients, and showing how measurement error for subjective criteria amplifies the uncertainty of individual predictions). 140 E.g., McGarraugh, supra, at 1095-96. 141 This problem has some similarities to the broader problem of assessing scientific evidence of causation in legal contexts, in which “the law is interested not simply in whether a particular variable causes a particular effect [in general], but, ultimately, in whether a particular variable did cause the effect [in the specific case].” David L. Faigman, A Preliminary Exploration of the Problem of Reasoning from General Scientific Data to Individualized Legal Decision-Making, 75 BROOK. L. REV. 1115, 1119 (2010). But this issue is not identical, and my objection here is not that the models cannot establish “individual-level causation,” McGarraugh, supra, at 1101. The models are predictive, and make no causal claims, so their advocates cannot be accused of confusing correlation with causation. And they aim to predict future probabilistic events, not to prove what caused a particular past event. When one’s goal is merely to predict, correlations can be useful, even if the causal pathway is uncertain. For instance, how one voted in the 2012 presidential election is no doubt a very strong predictor of how one will vote in 2016—information campaign strategists can use even if the former does not cause the latter.

28 Published by University of Michigan Law School Scholarship Repository, 2013 31 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

made clear by the point estimate itself. Unless the predicted probability is extremely low or extremely high, a 95% confidence interval for an individual prediction will by nature always run from 0 to 1.142 Recidivism is rarely nearly certain or nearly impossible. So even a good recidivism prediction model could produce prediction intervals of [0,1] for essentially every defendant: that is, the only prediction that can be made with 95% confidence about any given individual is that she will either recidivate or not. This fact does not reflect poorly on the design of the prediction instruments or the quality of the underlying research. It reflects the inherent uncertainty of this predictive task and the binary nature of the outcome. In order to assess how well a model predicts recidivism risk for individuals, some other metric is necessary.143 There is no single, agreed-upon method for assessing the individual predictive accuracy of a binary model, but there are several possibilities. One common metric used in the recidivism prediction literature is called the “area under the curve” (AUC) approach.144 This method pairs each person who ended up recidivating with a random person who did not; the score is the fraction of these pairs in which the recidivist had been given the higher predicted risk score. A perfect, omniscient model would rank all eventual recidivists higher than all eventual non-recidivists, and the AUC score would be a 1, while coin flips would on average produce a score of 0.5. The best published scores for recidivism prediction instruments appear to be around 0.75, and these are rich models that include various dynamic risk factors, including detailed psychological assessments, rather than the simple point systems based on objective factors.145 Many studies have reported AUC scores closer to 0.65.146 By comparison, a prominent meta-analysis of studies of psychologists’ clinical (non-actuarial) predictions of violence found a mean AUC score of 0.73, which the author characterized as a “modest, better than chance level of accuracy.”147 As another point of comparison, if one turns height into a binary variable called “tall” (which denotes being above the median height of the sample), our basic, one-variable model does much better at predicting who will be tall than any

142 See R. Karl Hanson & Philip D. Howard, Individual Confidence Intervals Do Not Inform Decision-Makers about the Accuracy of Risk Assessment Evaluations, 34 L. & HUM. BEHAV. 275, 276 (2010) 143 See Hanson & Howard, supra, at 276. Stephen D. Hart et al., Precision of Actuarial Risk Assessment Instruments, 174 BRIT. J. PSYCHIATRY s60 (2007), offer an alternative way of calculating a prediction interval for an individual. They use a traditional method for estimating the confidence interval for a probability prediction given a point estimate for the probability and a sample size, and calculate it for each risk-level category in two common violence prediction instruments, using a sample size of 1. See E.B. Wilson, Probable Inference, The Law of Succession, and Statistical Inference, 22 J. AM. STAT. ASSOC. 209 (1927). The intervals Hart et al. calculate do not always run from 0 to 1, but they are always very wide, ranging between 79 and 89 percentage points in width. The authors conclude that it is “impossible to make accurate predictions about individuals using these tests.” Hart et al. interpret their intervals as follows: “Given an individual with an ARAI score in this particular category, we can state with 95% certainty that the probability he will recidivate lies between the upper and lower limit.'” This is a slightly odd interpretation, given that, as the authors state, Wilson’s confidence intervals are normally interpreted as expressing an interval within which one is confident that the actual observed rate for the new sample (not the ex ante probability) will fall. The actual observed binary outcome for one individual always must be 0 or 1, however, so I agree with Hanson and Howard, supra, that the prediction interval for all but the extreme cases should be 0, 1 (rather than, say, .10 to .94). But either way, it is wide. 144 See Douglas Mossman, Assessing Predictions of Violence: Being Accurate About Accuracy, 62 J. CONSULTING & CLINICAL PSYCH. 783 (1994) (describing the method as well as competing approaches). 145 See Mairead Dolan & Michael Doyle, Violence Risk Prediction, 177 BRIT. J. PSYCH. 303, 305-07 (2000); AOUSC, supra, at 9. 146 Dolan & Doyle, supra, at 305-07. 147 Mossman, supra, at 788.

29 http://repository.law.umich.edu/law_econ_current/90 32 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

actuarial model does at predicting who will recidivate—it has an AUC score of 0.825.148 This is despite the fact that, as we saw, that model gives only rather wide bounds for individual predictions of height—gender is actually quite a strong predictor of height (most men are taller than most women), but it still leaves considerable individual variation unexplained.149 Another simple measure of prediction accuracy is the linear correlation between the predicted probabilities and the actual outcomes for offenders; this measure will be 0 if the instrument explains nothing more than chance and 1 if it predicts perfectly.150 In 1994, a prominent meta-analysis of studies comparing several actuarial recidivism prediction instruments found that the LSI-R (the instrument that the Indiana Supreme Court upheld) had the highest reported correlation with outcomes, at 0.35.151 By comparison, the gender-only model of the binary “tall” variable has a correlation coefficient of 0.65 (in the same sample used above). All in all, these metrics suggest that the prediction models do have individual predictive value, but they do not make a resounding case for them. Again, this should not be seen as an indictment of the quality of the science—it is just that even given all the best insights of decades of criminological and psychological research, recidivism remains an extremely difficult outcome to predict at an individual level, much more difficult than height. The models improve considerably on chance, which for some policy purposes (or for the purpose of mental health treatment decisions, which is what many of the models were originally developed for) is no doubt quite valuable. But to justify group-based discrimination in sentencing, both the Constitution and good policy require a much more demanding standard for predictive accuracy. Moreover, note that the accuracy measures discussed here assess the total predictive power of each recidivism model, combining all its variables, and are thus overly generous for the purpose of assessing whether particular variables should be included in the model. The marginal predictive power added by just the constitutionally problematic variables is even less, as discussed in the next Section. The basic difference between individual and group predictions has been pointed out by some scholars in the empirical literature surrounding the risk prediction instruments.152 But it is lost in much of the EBS legal and policy literature, and more importantly, it may be lost on judges and prosecutors, who may have an inflated understanding of the estimates’ precision. Hannah-Moffat explored this issue by interviewing lawyers and probation officers in Canada,

148 This is estimated in the same 400-person sample used above, pairing each “tall” person with one “short” person, scoring the prediction as correct (i.e., 1) if the tall person was male (i.e., predicted to be taller) and the short person was female, incorrect in the reverse case (0), and as 0.5 if the two had the same gender (i.e., predicted to have the same height), following the standard tie-breaking procedure used to calculate AUC scores. Conversely, if one pairs 200 random women with one random man each (eliminating the possibility of “tied” gender), the man is taller 89% of the time—much better than the chance level of 50%. 149 Note that a 95% prediction interval for an individual forecast of the binary variable “tall” would run from 0 to 1 for both men and women—one could not be anywhere close to 95% confident that any given woman would be short, or that any given man would be tall. In the sample, 17.5% of women and 82.5% of men were “tall.” 150 The square of this correlation coefficient is one variant on the “fit” measure “pseudo R-squared.” This and several other variants could be used to assess a model’s ability to explain individual variation, although none should be interpreted as a measure of the overall quality of the model. For a concise summary, see Institute for Digital Research & Education, FAQ: What are pseudo R-squareds?, http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm. 151 Paul Gendreau et al., A Meta-Analysis of the Predictors of Adult Recidivism: What Works!, 34 CRIMINOLOGY 575, tbl. 4 (1994). 152 See, e.g., Hart et al., supra; Cooke & Michie, supra.

30 Published by University of Michigan Law School Scholarship Repository, 2013 33 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

where risk instruments are common. She found that even if caveats about the difference between group and individual predictions are provided, the message often does not get through: [F]ew understand and appropriately interpret probability scores. Despite receiving training on these tools and their interpretation, practitioners tended to struggle with the meaning of the risk score….Rather than understanding that an individual who obtains a high risk score shares characteristics of an aggregate group of high-risk offenders, the individual is likely to become known as a high-risk offender. Instead of being understood as correlations, risk scores are misconstrued in court submissions, pre-sentence reports, and the range of institutional file narratives that ascribe the characteristics of a risk category to the individual.153 Advocates of actuarial methods, in this and other contexts, have often sharply criticized the claim that it is not safe to draw conclusions about individuals based on group averages. Mark Cunningham and Thomas Reedy argue that the “distinction between individualized as opposed to group methods is a false dichotomy,” contending, essentially, that truly individualized methods do not exist; the discipline of psychology, and its sub-discipline of violence prediction, draws its fundamental scientific character from its willingness to draw insights from data collected on groups and apply them to individuals.154 Likewise, EBS advocate Richard Redding quotes Paul Meehl, an early pioneer in actuarial prediction in psychology: “If a clinician says ‘This case is different’ or ‘It’s not like the ones in your [actuarial] table,’…the obvious question is ‘Why should we care whether you think this one is different or whether you are surer?”155 Jennifer Skeem and John Monahan, quoting Grove and Meehl, argue: Our view is that group data can be, and in many cases empirically are, highly informative when making decisions about individual cases….[C]onsider the revolver analogy of Grove and Meehl: …Two revolvers are placed on the table, and you are informed that one of them has five live rounds with one empty chamber, the other has five empty chambers and one live cartridge, and you are required to play Russian roulette….Would you seriously think ‘Well, it doesn’t make any difference what the odds are. Inasmuch as I’m only going to do this once, there is no aggregate involved, so I might as well pick either one of these two revolvers; it doesn’t matter which?”156 These responses strike me as off base. I do not argue, nor could anybody, that group averages have nothing to do with individual behavior. Of course group averages will on average predict outcomes for the individuals in the group—that much is a tautology—and thus provide some information that could guide individual decision-making. But that does not always mean that the group average tells us much about what to expect for any given individual. One does not have to be naïve to think that an individual case may be different from the average if it’s a situation in which individual outcomes in fact vary widely. The question is how much individual variation there is in a given population, and how much of that variation the variables in the

153 Hannah-Moffat, supra, at 12-13. 154 Mark D. Cunningham & Thomas J. Reidy, Violence Risk Assessment at Federal Capital Sentencing, 29 CRIM. JUST. & BEHAV. 512, 517 (2002); accord Jessica M. Tanner, “Continuing Threat” to Whom?: Risk Assessment in Virginia Capital Sentencing Hearings, 17 CAP. DEF. J. 381, 402-05 (2005). 155 Redding, supra, at 12 n.52 (quoting Paul E. Meehl, CLINICAL VERSUS STATISTICAL PREDICTION (1954)). 156 Jennifer L. Skeem & John Monahan, Current Directions in Violence Risk Assessment, U. Va. School of Law Public Law and Legal Theory Research Paper No. 2011-13, 9-10.

31 http://repository.law.umich.edu/law_econ_current/90 34 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

model explain. In the recidivism context (unlike, for instance, the Russian roulette context), the variables included in the instruments leave most of the variation unexplained.157 One could defend the instruments on the ground that the precision of individual predictions does not matter from an efficiency perspective. If the group average estimates are good, then the model will, averaged across cases, improve judges’ predictions of recidivism, leading more efficient use overall of the state’s incarceration resources to prevent crime. There are two main problems with this response. First, it almost certainly does not suffice for constitutional purposes, at least with respect to any variable triggering heightened scrutiny. The argument amounts to the claim that it doesn’t matter whether an instrument has any meaningful predictive power for individuals, so long as the group generalizations have some truth to them. But this is exactly the kind of statistical discrimination defense that the Supreme Court has repeatedly rejected. This point is one reason the Russian roulette analogy is inapt. I would, of course, choose the gun with just one bullet. And if the same dictator forced me to choose between driving on a highway on which 2% of the drivers were drunk and one in which 0.18% of the drivers were drunk, I would choose 0.18% every time. But just that disparity did not suffice, in Craig v. Boren, to justify a gender-discriminatory alcohol law. When demographic and socioeconomic characteristics are used to justify the state’s serious adverse treatment of individuals, the Constitution requires more than a statistical generalization. Nobody would worry that choosing the gun with one bullet is unfair or harmful to the gun with five. But it is not harmless to base an individual’s incarceration on a statistical inference that, based on his poverty or gender, treats him as the human equivalent of a loaded gun. Second, the “efficient discrimination” argument is not even necessarily correct in terms of efficiency. It is not true that any model with any improved predictive power over chance will provide efficiency gains, because EBS isn’t replacing chance. If the actuarial instruments don’t capture much of the individual variation in recidivism probability, then there is certainly a possibility that the thing EBS is meant to displace—judges’ “clinical” prediction of risk—might actually be more efficient because it captures more of that variation. This point is explored further in the next Section. B. Do the Instruments Outperform Clinical Prediction and Other Alternatives? The Bearden test requires assessment of whether other available and nondiscriminatory (or less discriminatory) alternatives could accomplish the state’s penological objectives. Here, I consider two such alternatives: actuarial methods that do not rely on constitutionally troubling variables; and judges’ exercise of their professional judgment (“clinical” prediction). Even if analysis of alternatives were not constitutionally required, if EBS does not improve at least on the clinical method that it seeks to replace, it does not substantially advance the state’s penological interests, and is also undesirable on policy grounds. EBS advocates have concluded that it is superior to available alternatives, but they have had to stretch the existing evidence quite far to support this claim. J.C. Oleson, for instance, argues that even inclusion of race would be constitutionally permissible, and concludes that it is

157 In the Russian roulette hypothetical, the decision-maker is given the only variable that matters. The number of bullets quite strongly predicts the individual’s probability of dying; it would explain most of the individual variation, with the remaining variation being pure chance. The recidivism models are not in the same ballpark.

32 Published by University of Michigan Law School Scholarship Repository, 2013 35 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

“straightforward” to show that no less restrictive means is available.158 To support this conclusion, he cites just a single study from 1987, by Joan Petersilia and Susan Turner, for the proposition that “omitting race-correlated factors from a model to predict recidivism reduced the accuracy of the model by five to twelve percentage points.” Even taking this at face value, it hardly seems obvious that a statistical advantage this modest would justify explicit sentencing discrimination based on race; the Supreme Court has rejected gender discrimination based on stronger statistical evidence than that. And given the Supreme Court’s disparate impact jurisprudence, it is odd to justify including race itself based on the predictive power of race- correlated factors from the model. More importantly for present purposes, the Petersilia and Turner study actually suggests that demographic and socioeconomic factors could be excluded from risk prediction instruments without losing any significant predictive value. The “race-correlated factors” in their study included criminal history and crime characteristics, which accounted for all the additional explanatory value provide by correlates of race (and which no sentencing scheme ignores).159 Once those factors were already included, adding “demographic” and “other” variables—which included employment, education, marital status, substance abuse, and mental health variables— did not significantly improve the model’s predictive power. This is presumably because conduct is generally a better predictor of future conduct than static characteristics are, a point other studies corroborate. For instance, Douglas Mossman’s 1994 meta-analysis of studies concerning violence prediction found that “the average accuracy of predictions based on past behavior is higher” than either mental health professionals’ clinical judgments or actuarial instruments.160 More recent studies of risk prediction instruments have typically not broken down the extent to which adding socioeconomic and demographic variables improves the overall predictive power of the model (a distinct question from the coefficients on those variables). But Peterilia’s and Turner’s results, at least, suggest that a viable alternative is to base actuarial prediction only on crime characteristics and criminal history. Of course, existing sentencing schemes already incorporate those variables, so perhaps providing judges with risk predictions based on them would be redundant. It would be more sensible to have the sentencing commission or legislature incorporate the instruments’ insights when determining sentencing ranges. But the fact that an instrument like this might not be terribly useful to judges does not mean that the instruments with the additional variables are more useful; the Petersilia and Turner study, at least, suggests that they are not. Even setting aside the possibility of using different actuarial instruments, what about the basic question whether the instruments outperform clinical prediction? It is gospel in the EBS literature that they do. But while scores of studies have found that actuarial prediction methods outperform clinical judgment, this finding is not universal, the average accuracy edge is not drastic, and the vast majority of studies are from wholly different contexts (such as medical diagnosis or business failure prediction). In one widely cited meta-analysis, Grove et al. evaluated all the studies addressing the actuarial versus clinical comparison that were published

158 Oleson, supra, at 1386; see id. at 1387 (also concluding that “[o]nce the constitutional door is open to race, all other sentencing factors can pass through: gender, age, marital status, education, class, and so forth.”). 159 Petersilia & Turner, supra, at 171 (showing, in the table for “All Convicted Defendants,” that 57% of outcomes could be accurately predicted by chance, 60% when racially noncorrelated factors were added, 67% when crime characteristics were added, 70% when criminal history variables were added, and still 70% when demographic and “other” variables were added). 160 Mossman, supra, at 789-90.

33 http://repository.law.umich.edu/law_econ_current/90 36 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

between 1945 and 1994 and that met certain quality criteria; just five criminal recidivism studies made the cut, plus 131 other studies.161 Overall, actuarial prediction performed on average about 10% better, but the authors warned: “However, our results qualify overbroad statements in the literature opining that such superiority is completely uniform; it is not. In half of the studies we analyzed, the clinical method is approximately as good as mechanical prediction, and in a few scattered instances, the clinical method was notably more accurate.”162 If the actuarial advantage does not exist in half of studied contexts, then it is obvious that the specifics matter. And the EBS literature often cites research on far more complicated instruments than the simple ones (like Missouri’s, described above) that states actually use. Take, for instance, a study by Grant Harris, Marie Rice, and Catherine Cormier testing an instrument called the Violence Risk Appraisal Guide, which has been cited by EBS advocates.163 The VRAG consists of twelve variables, the first and most heavily weighted of which is itself a composite of twenty variables: “conning, lying, manipulation, callousness, lack of remorse, proneness to boredom, shallow affect, irresponsibility, impulsivity, poor behavior controls, criminal versatility, juvenile delinquency, sexual promiscuity, and parasitic lifestyle.”164 Assessing these factors requires an elaborate psychological profile, which in the study was carried out by groups of mental health clinicians who “knew the patients well.” 165 Nothing like this is typically involved in EBS. Even in the case of sentencing instruments that try to use somewhat nuanced personality characteristics, like the LSI-R, it is not at all obvious that a probation officer filling out a presentence report can carry out a comparable analysis. The VRAG’s success simply says nothing about the potential success of a totally different instrument and assessment process. Moreover, the comparability of the populations is also dubious; the VRAG studies involved Canadian psychiatric patients.166 Indeed, the past success of instruments that rely on elaborate personality profiles may, if anything, suggest a disadvantage of the EBS instruments. The studies show that ideally, after a trained clinician collects all the relevant information and makes the numerous required qualitative assessments, her ultimate predictions will be better informed if she then uses an actuarial model to tell her how much weight to give each factor. This result is unsurprising. But it is a far cry from saying that a different actuarial model that relies on far less overall information (completely ignoring all of the qualitative personality factors) will outperform the judgment of a judge who has had a chance to assess the individual defendant and the complete facts of the case. The relevant comparison, in short, is not just between actuarial versus clinical weighting of variables. It is between actuarial weighting of a few variables versus clinical weighting of a much wider range of variables.167 It is possible that the actuarial instruments would win that comparison, but we cannot conclude that based on existing research.

161 W.M. Grove et al, Clinical vs. Mechanical Prediction: A Meta-analysis, 12 PSYCH. ASSESSMENT 19, 22-24 (2000) (listing studies. 162 Id. at 22-24. 163 Grant T. Harris et al., Prospective Replication of the “Violence Risk Appraisal Guide” in Predicting Violent Recidivism Among Forensic Patients, 26 LAW & HUM. BEHAV. 377 (2002); see Wolff, supra, at n.73. 164 Harris et al., supra, at 378. 165 Id. at 379. 166 Id. at 381. 167 Psychologist Stephen Hart states that similar simplified instruments for predicting sexual violence arguably do not deserve even the label “evidence-based” because “scientific and professional literature would not consider [it] informed, guided, or structured since they only include a relatively small set of risk factors.” Stephen D. Hart, Evidence-Based Assessment of Risk for Sexual Violence, 1 CHAPMAN J. CRIM. JUST. 143, 155, 164 (2009).

34 Published by University of Michigan Law School Scholarship Repository, 2013 37 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

A review of each of the five older recidivism studies that Grove et al. included in their meta-analysis likewise does not produce any meaningful support for the modern EBS instruments. Two of the five studies found no discernable advantage for actuarial prediction.168 Glaeser (1955), one of two studies that found a substantial advantage, involves an archaic prediction instrument in which the most strongly predictive variable was the offender’s (clinically assessed) “social development pattern”: “Respected Citizen,” “Inadequate,” “Fairly Conventional,” “Ne’er-Do-Well,” “Floater,” “Dissipated,” and “Socially Maladjusted.”169 It also involved very few clinical decisionmakers (four psychiatrists and four sociologists who worked in a parole system in the 1940s), so one possible explanation for the results is that a couple of these people might have not have been terribly good at their jobs.170 A study by Wormith and Goldstone (1984) evaluates an instrument with more objective criteria and also found that it predicted recidivism better than did the parole board’s actual (clinical) decisions. But the study relied on a small Canadian sample that the authors warned “should not be construed as being representative of incarcerated offenders either nationally or internationally.”171 The authors also warned that their measures of clinical and actuarial judgment were not really fairly comparable, in that the “clinical prediction” was not actually a risk prediction at all (instead, it was a binary parole decision), whereas the actuarial prediction was.172 Finally, a study by Sacks (1974) includes a brief analysis of the clinical versus actuarial comparison, but the comparison it draws is nonsensical (the clinical measure is a parole decision, but only those granted parole are included in the sample) and the purported actuarial advantage is in any case small and not tested for significance.173 Nor are more recently published studies more compelling. Oleson et al. (2011) purport to compare the accuracy of clinical and actuarial judgment in federal probation officers’ assessment of a probationer’s recidivism risk.174 The study included over a thousand decision-makers (but only one individual’s case) and used a modern instrument recently developed by the Administrative Office of the U.S. Courts, called the Federal Post-Conviction Risk Assessment

168 Terrill L. Holland et al., Comparison and Combination of Statistical and Clinical Predictions of Recidivism Among Adult Offenders, 68 J. APPLIED PSYCH. 203 (1983) (finding that individual decisionmakers better predict violent recidivism, but actuarial prediction better predicts some measures of overall recidivism); James Smith & Richard I. Lanyon, Prediction of Juvenile Probation Violators, 32 J. CONSULTING & CLINICAL PSYCH. 54 (1968) (finding that a juvenile recidivism base expectancy table was slightly more accurate than the predictions of two clinical assessors, but was less accurate than simply predicting that everyone would recidivate would have been). 169 Daniel Glaser, The Efficacy of Alternative Approaches to Parole Prediction, 20 AM. SOC. REV. 283, 285(1955). 170 Id. Problems like this recur in other actuarial versus clinical studies as well—they state a sample size consisting of the number of subjects, and calculate statistical significance as though all of the observations were independent. This approach is misleading because there are usually a far smaller number of clinical decision-makers involved in the study (standard errors should instead be calculated with clustering on the decision-maker). 171 J. Stephen Wormith & Colin S. Goldstone, The Clinical and Statistical Prediction of Recidivism, 11 CRIM. JUST. & BEHAV. 3 (1984). 172 Id. at 20. A general issue with studies that compare real-world “clinical” parole decisions to recidivism risk prediction instruments is that the predictive value of a prediction is being compared to that of a decision. Wormith et al. explain that it is unsurprising that the parole decision does not predict recidivism as well as an actuarial prediction does, because the parole decision might be affected by factors unrelated to risk prediction, and by the desire to err on the side of caution. Id. 173 Howard R. Sacks, Promises, Performances, and Principles: An Empirical Study of Parole Decisionmaking in Connecticut, 9 CONN. L. REV. 347, 402-403 (1977). 174 J.C. Oleson et al., Training to See Risk: Measuring the Accuracy of Clinical and Actuarial Risk Assessments Among Federal Probation Officers, 75-SEP FED. PROBATION 52 (2011).

35 http://repository.law.umich.edu/law_econ_current/90 38 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

(PCRA).175 The researchers asked officers to watch a video about an individual and predict his risk, and then to redo the exercise after being given the individual’s PCRA score and training in the PCRA method. The researchers concluded that the officers were “more accurate” when they had the PCRA.176 But their only evidence for that claim is that officers’ risk scores after being given the PCRA and instructed on its implementation were more consistent with the PCRA. That is, in a study purporting to assess whether the PCRA improved prediction accuracy, the researchers assumed the PCRA was perfectly accurate; there was no other measure of what the “accurate” score was.177 In sum, the shibboleth that “actuarial prediction outperforms clinical prediction” is—like the actuarial risk predictions themselves—a generalization that is not true in every case. Its accuracy depends on the outcome being evaluated, the actuarial prediction instrument, the clinical predictors’ skills, the information on which each is based, and the sample. There is little evidence that the recidivism risk prediction instruments offer any discernable advantage over the status quo, and even if they did, that does not mean particular contested variables need to be included in the model. Alternative models might work as well or better. C. Do the Risk Prediction Instruments Address the Right Question? Even if the instruments could identify high-risk offenders, does that mean that using them would substantially advance the state’s interests? EBS’s advocates have typically taken this for granted, but the answer may well be no. The instruments tell us, at best, who is at the highest risk of recidivism. They do not tell us whose risk of recidivism will be the most reduced by incarceration. The two questions are not the same, and only the latter directly pertains to the state’s penological interests. At the outset, let’s precisely identify the state interest that EBS is designed to serve. Its advocates generally refer either to crime prevention, reduction of incarceration, or both. These can be seen as two sides of the same coin: EBS is meant to help the state balance these interests, which are at least potentially in tension. I agree that this objective is compelling. Crime inflicts great harm on society, and so does excessive incarceration. Striking an appropriate balance between these concerns is an enormous and vital challenge.178 But that does not necessarily mean actuarial prediction of recidivism—even if it were perfect—substantially advances that interest. Suppose a judge is considering whether to sentence a defendant to five years in prison versus three. Assuming that the costs of incarceration are the same across defendants,179 the question is whether the additional two years’ incarceration will reduce enough crime to justify those costs. The EBS prediction instruments do not seek to answer that question. Their predictions are not conditional on the sentence. The

175 This instrument includes qualitative and dynamic factors plus objective factors like age and education. It is in use for planning probation supervision and treatment interventions, not sentencing. Admin. Office of the U.S. Courts, Office of Probation and Pretrial Services, An Overview of the Federal Post Conviction Assessment 1 (Sept. 2011). 176 Oleson et al, supra, at 54-55. 177 The AOUSC’s other validation studies for the PCRA did not compare its effectiveness to clinical prediction, and did not find anything close to perfect accuracy. AOUSC, supra, at 9. 178 One could frame the state interest as being about the efficient use of finite incarceration resources to maximize crime prevention effects. Unless states have reached their prison capacities and cannot expand, though, I assume that the incarceration rate isn’t fixed, so sentencing judges don’t think about incarcerating one defendant as trading off with incarceration of another. Instead, they think about whether that particular sentence is worth its costs. 179 This assumption may not be true. Some defendants have families that are affected, for instance.

36 Published by University of Michigan Law School Scholarship Repository, 2013 39 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

samples in the underlying studies include people given all kinds of sentences. They measure recidivism within a particular period, measured from the time of release or (for probationers) from sentencing, but there are no variables relating to the sentence in the regressions. The judge accordingly cannot use the instrument to answer the question “How much crime should I expect this defendant to commit if I incarcerate her for five years?”, or three years, or any other potential length. The judge only knows how “risky” she is in the abstract.180 This point has been ignored by the EBS literature. Bernard Harcourt, however, makes a similar point about the general deterrence consequences of police profiling and criminal history- based sentencing enhancements.181 Some have argued that it is efficient for police to focus on groups that commit crimes at greater rates because it concentrates the deterrent effect of policing on the more dangerous groups. Harcourt responds that the fact that members of a particular group commit more crimes on average does not mean that that group is more readily deterred by policing. In fact, high-risk, socially disadvantaged groups may be less willing to cooperate with police, or less deterred by the marginal increase in detection risk, meaning that policing in their communities may actually deter fewer crimes than policing in other communities. The relevant question, Harcourt argues, is not rate of crime commission; it is “elasticity” to policing.182 Harcourt’s argument focuses on general deterrence effects on community crime rates, but a similar problem arises when one considers the effects of marginal changes in incarceration specifically on the defendant’s own future crime risk—that is, the very thing that the risk prediction instruments are ostensibly there to help judges minimize. If we are going to base incarceration length on group averages with the objective of reducing crime, then surely the relevant group characteristic is how much incarcerating its members reduces crime—its elasticity to incarceration. And that question is not the same as the question of recidivism probability. There is no particular reason to believe that groups that recidivate at higher rates are also more responsive to incarceration. EBS advocates presumably think that point is intuitive: lock up the people who are the riskiest, and you will be preventing more crimes. But that intuition oversimplifies the relationship between incarceration and recidivism. Incarceration’s effect on an individual’s subsequent offending has two components. First, there is an incapacitation effect: while behind bars, he cannot commit crimes that he would have committed outside.183 If the incapacitation effect were the only effect that incarceration has on subsequent crime, then it would be logical to assume that the state’s incarceration resources are best targeted at the highest-risk offenders. But the situation is not that simple, because of the second component: the effect on the defendant’s post-release crimes. I will refer to this as the “specific deterrence” effect, but it is really more complicated—it includes on the one hand specific deterrence (fear of reincarceration) plus any rehabilitative effect of prison programming, and on the other hand potentially criminogenic effects of incarceration (interfering with

180 A related concern is that the length of incarceration may be a confounding variable in the underlying predictive model. If the people who have one set of characteristics tend to get longer sentences than those with other characteristics, then the comparison of their recidivism rates could be apples-to-oranges, because one group’s rate is the average after, say, an average of 3 years of incarceration and the other group’s rate is the average after 5. We thus don’t even know from the models who is the riskiest today, much less who is the riskiest X or Y number of years from now. 181 HARCOURT, supra note 12, at 122-36. 182 Id.; Bernard E. Harcourt, A Reader’s Companion to Against Prediction, 33 LAW & SOCIAL INQUIRY 265, 269 (2008). 183 This incapacitation effect should be discounted for crime in prison, a complication I will bracket for simplicity.

37 http://repository.law.umich.edu/law_econ_current/90 40 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

subsequent employability, building criminal networks, and so forth). There is no intuitive reason to assume that the specific deterrence effect is determined by, or even correlated with, the defendant’s recidivism risk level. It is very possible that higher-risk defendants (or some of them, anyway) might be more inelastic to specific deterrence and rehabilitation, and might be more vulnerable to the possible criminogenic effects of incarceration. If so, lengthening high- risk offenders’ sentences might be more likely to increase the risk they pose after they get out, or at least to lower that risk less than locking up some low-risk offenders might. If so, this disadvantage has to be weighed against the incapacitation advantage. Implicitly, the current EBS instruments (by ignoring the elasticity question) embrace the premise that only incapacitation matters, but this is not obvious. Most incarceration sentences are fairly short: in 2006, the median prison sentence in state courts was 1.7 years (and that is excluding jail sentences, which are shorter).184 Moreover, EBS advocates often emphasize its value in determining whether a person should be incarcerated at all, versus probation; presumably, in cases on the incarceration margin, the incarceration sentence being considered is quite short. So, suppose a judge is considering whether to incarcerate a person for one year, versus zero. In that case the potential incapacitation effect lasts a year—a one-year slice of the defendant’s offending is taken away. But all the other effects of the judge’s choice may last, at least to some degree, the rest of the defendant’s lifetime after that year. There is simply no reason to assume the incapacitation effect is the most important factor, much less the only important factor—and if it is not, then the correspondence between risk prediction and crime-elasticity prediction may well be wholly lost. And this complication arises even if one assumes the relevant state interest only relates to reducing the defendant’s crime risk. If we also consider effects on other individuals’ crime commission, there are many more factors to consider, none of which have any intuitive connection to recidivism risk scores: general deterrence, expressive effects on social norms, future crime risk from the defendant’s family members, substitution effects in criminal markets, and so forth. While much of the current EBS literature totally ignores the question of responsiveness of recidivism risk to incarceration, some advocates have taken the general position that incarceration increases recidivism risk, citing as evidence simply the fact that persons released from prison recidivate at higher rates than probationers.185 But this reasoning relies on an apples- to-oranges comparison. It is unsurprising that prisoners recidivate more often than probationers, because prisoners are usually more serious offenders with more prior criminal history. Also, the claim that incarceration generally increases recidivism would make the entire premise of EBS dubious: unless one is considering a life sentence, why identify the most dangerous criminals in order to incarcerate them if incarceration will only make them more dangerous? Risk prevention is only a plausible justification for incarceration if the sign on incarceration’s effects goes the other way for at least some offenders—and a truly useful risk prediction instrument would try to identify who those offenders are.

184 Bureau of Justice Statistics, Felony Sentences in State Courts, 2006—Statistics tbl. 1.3 (2009), http://www.bjs.gov/content/pub/pdf/fssc06st.pdf. 185 E.g., McGarraugh, supra , at 1107; Roger K. Warren, Evidence-Based Sentencing: The Application of Principles of Evidence-Based Practice to State Sentencing Practice and Policy, 43 U.S.F. L. REV. 585, 594 (2009); Michael A. Wolff, Lock ‘Em Up and Throw Away the Key? Cutting Recidivism by Analyzing Sentencing Outcomes, 20 FED. SENT. R. 320, 320 (2008).

38 Published by University of Michigan Law School Scholarship Repository, 2013 41 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

Drawing solid causal inferences in this area is difficult. Some studies have used regression or matching methods to compare recidivism rates after controlling for observed characteristics like crime type and criminal history. 186 But while this approach is better than a raw comparison of means, it still does not produce strong causal identification. Causal inference based on regression depends on the assumption that all the important potentially confounding variables have been observed and controlled for. This assumption is often not valid, so one has to be very cautious not to interpret regression results to mean more than they do. A particular concern arises when the treatment variable of interest (here, incarceration) might itself be influenced by a decision-maker’s anticipation of the outcome of interest (here, recidivism). Measuring a statistical association between the two variables provides no way to disentangle which component comes from incarceration causing recidivism, which from anticipated recidivism risk causing incarceration, and which from other confounding variables that affect both sentencing decisions and recidivism outcomes. Regression does not solve the reverse causality problem unless the control variables in the regression account for all the reasons that a judge might think a defendant poses a higher risk. As we have seen already, though, even the best recidivism models do not even come close to accounting for all of the sources of individual variation in risk. They surely do not account for all of the sources of variation in judicial anticipation of risk, either—for instance, judges’ appraisal of the detailed facts of the case or defendants’ courtroom demeanor. Some recidivism studies have used more rigorous quasi-experimental methods to assess causation, seeking to exploit an exogenous source of variation in incarceration length—that is, a source of variation that is not itself affected by anticipated recidivism risk or by any of the other various factors that affect recidivism risk. 187 Several studies take advantage of the random assignment of judges or public defenders. The intuition is that getting randomly assigned to a particularly harsh judge, or to a less capable public defender, will tend to increase a defendant’s sentence in a way unrelated to the defendant’s characteristics—thus, while the sentence is not entirely random, it has an effectively random component. Instrumental variables methods are used to estimate the effect of this exogenous increase in sentences on subsequent recidivism. Other studies take advantage of legal reforms that introduce sentencing variation.188 These studies have fairly consistently found that increased sentence length on average reduces subsequent offending, although the effect seems to be nonlinear—the marginal effect of increasing sentence lengths declines and eventually disappears as sentence lengths get longer.189 Thus, specific deterrence lengths on average cut in the same direction as incapacitation effects do.190 Reported incapacitation effects typically appear larger,191 but the results of the two types of studies are hard compare. Incapacitation studies generally estimate the number of crimes

186 See, e.g., Oregon Dep’t of Corrections, The Effectiveness of Community-Based Sanctions in Reducing Recidivism 18-19 (Sept. 2002). 187 For a useful recent review of this literature, see David A. Abrams, The Imprisoner’s Dilemma: A Cost-Benefit Approach to Incarceration, 98 IOWA L. REV. 905, 929-36 (2013). 188 E.g., Shawn D. Bushway & Emily G. Owens, Framing Punishment: A New Look at Incarceration and Deterrence (Jan. 2010) (unpublished manuscript), www.human.cornell.edu/pam/people/upload/Framing-Jan- 2010.pdf; Ilyana Kuziemko, Going Off Parole: How the Elimination of Discretionary Prison Release Affects the Social Cost of Crime 13-22 (Nat'l Bureau of Econ. Research, Working Paper No. 13380, 2007), available at http:// www nber.org/papers/w13380.pdf? 189 See Abrams, supra, at 936. 190 Id. at 936-39 (reviewing incapacitation studies).. 191 Id.

39 http://repository.law.umich.edu/law_econ_current/90 42 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

avoided during each “person-year” of incarceration,192 measuring incapacitation’s full effect, whereas specific deterrence studies of subsequent recidivism do not estimate the full specific deterrence effect (that is, the change in crime commission over the defendant’s whole remaining lifetime). Instead, such studies mostly have quite short follow-up periods, and generally measure not number of crimes committed but recidivism “survival,” i.e., whether an offender makes it through the study period without being rearrested or reconvicted, and if not, how long he lasts.193 Moreover, incapacitation studies sometimes use reported crime as their measure,194 whereas recidivism studies use the more underinclusive measures of rearrest or reconviction. Regardless, what the existing research on causal effects has not done is to estimate either specific deterrence or incapacitation elasticities that are conditional on the kinds of characteristics that are included in the EBS risk prediction instruments. Instead, the research has focused on estimating the causal relationship between incarceration and crime at a more general level, perhaps subdivided by broad crime category or by deciles of the sentencing-severity distribution, but not by detailed socioeconomic, demographic, and family characteristics. One Urban Institute study, by Avi Bhati, does estimate incapacitation elasticities that are gender, race, and state-specific, but not specific deterrence elasticities, and not broken down by socioeconomic status. It finds no major differences in the total number of crimes averted by either gender or race.195 Notably, variations by state were far more dramatic, suggesting the need to worry about another problem with the risk prediction instruments: extrapolation from the sample on which they were developed to different offender pools in different jurisdictions. A study by Ilyana Kuziemko on specific deterrence effects finds that incarceration length increases have a “much stronger deterrent effect for older offenders than younger ones, for whom time served actually weakly increases recidivism.” That is, young age—one of the most heavily weighted predictors of increased recidivism risk in the current instruments—actually appears to correspond to a lower effectiveness of incarceration length increases in deterring post-release recidivism. This suggests that the EBS instruments are weighing this factor in the wrong direction. Perhaps future research will improve matters. To effectively inform the state’s pursuit of its penological objectives, the research underlying future instruments would have to satisfy the following criteria: (1) the use of valid causal identification methods, e.g., exploiting random assignment of judges; (2) application of those methods to obtain estimates for incarceration’s effects that are interacted with the variables that the state seeks to include in the instrument; (3) accounting for nonlinear effects of incarceration length (e.g., the effect of a tenth year of incarceration is probably not the same as the effect of a first);

192 E.g., Rucker Johnson & Steven Raphael, How Much Crime Reduction Does the Marginal Prisoner Buy? 28 (Oct. 2010) (unpublished manuscript), http://ist-socrates.berkeley.edu/~ruckerj/johnson_raphael_crimeincarcJLE.pdf. 193 E.g., Kuziemko, supra, at 22. 194 E.g., Johnson & Raphael, supra, at 24 195 E.g., Avi Bhati, An Information Theoretic Method for Estimating the Number of Crimes Averted by Incapacitation, Urban Institute Research Report 24 tbl. 2 (July 2007) (showing estimated male elasticities that were slightly greater in most states, but not in every state and by very small margins). Expressed as a percentage reduction in crime rate, rather than an absolute number of crimes averted, females were actually more responsive to incarceration in every state studied. Id. at 27 tbl. 4.3.

40 Published by University of Michigan Law School Scholarship Repository, 2013 43 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

(4) long enough follow-up periods to allow researchers to meaningfully approximate the change in an individual’s lifetime recidivism risk;196 (5) incorporation of both incapacitation and specific deterrence effects, with comparable outcome measures; (6) testing of the instrument within the jurisdiction in which it will be used, on a representative sample; and (7) evidence of substantial additional explanatory power for each constitutionally problematic variable that the state seeks to include. The current instruments do not do anything like this, and I am not optimistic that this research challenge will be overcome soon. And even it is, the above-discussed problems concerning the uncertainty of individual predictions would still apply to the prediction of individual elasticities. Finally, it might also be objected that it would be unfair to treat an individual’s greater expected responsiveness to incarceration as the basis for incarcerating her for longer—offenders might be penalized for not being incorrigible. I am sympathetic to this objection. But once sentencing is based on predicting future actions on the basis of demographic and socioeconomic considerations, “fairness” is no longer a decisive sentencing criterion anyway. I do not really advocate it, but at least an elasticity-prediction sentencing instrument would be connected to the state’s penological interests. The current instruments are not.

IV. Will Risk Prediction Instruments Really Change Sentencing Practice?

Advocates of EBS sometimes defend it against disparity and retributive justice objects by arguing that it will not really change very much at all. These “defenses” come in two forms. The first is to observe the risk prediction instruments don’t directly determine the sentence--they merely provide information to judges. The second defense is that minimization of the defendant’s future crime risk already plays an important role in sentencing, so perhaps EBS merely replaces judges’ individual judgments of that risk with more accurate actuarial predictions. I address these points in Sections A and B, respectively. A. Does EBS Merely Provide Information? One response to disparity concerns is to defend the instruments as innocuous insofar as they only provide information, rather than completely controlling the sentence.197 The judge can take or leave the information, supplement it with her own clinical assessments of risk, and weigh other, non-recidivism-related factors. As a constitutional defense of EBS, this point could be framed in two ways. The strong form of the argument would assert that the state’s adoption of

196 Collecting data on an offender’s entire life is unrealistic, but follow-up periods substantially longer than the typical one or two years are needed. Most people eventually desist from crime, and people who have not recidivated for 7 or 8 years (after release, if they were incarcerated) have quite low subsequent recidivism rates. E.g., Megan C. Kurlychek, Robert Brame, & Shawn D. Bushway, Scarlet Letters and Recidivism: Does an Old Criminal Record Predict Future Offending?, 5 CRIMINOLOGY & PUB. POL’Y 483 (2006). Thus, to study the effect of a first year of incarceration (versus none), eight or ten years of outcome data would probably be fine. The study should simply estimate total crime by each individual over a fixed period of time beginning at sentencing, conditional on (among other things) the share of that time that is spent in prison—that measure would incorporate both incapacitation and specific deterrence effects. 197 E.g., Malenchik v. State, 928 N.E.2d 564, 573 (Ind. 2010); David E. Patton, Guns, Crime Control, and a Systemic Approach to Federal Sentencing, 32 CARDOZO L. REV. 1427, 1456 (2011); Kleiman, supra, at 301.

41 http://repository.law.umich.edu/law_econ_current/90 44 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

the risk prediction instrument does not itself amount to disparate “treatment” at all. Rather, it merely provides social scientific information to a government decision-maker, and surely the Constitution does not require sentencing judges to be ill-informed. The problem with this framing, however, is that the point of evidence-based sentencing is for the sentence to be based on the statistical “evidence,” at least in part. The risk score is not calculated for academic purposes. Even if the instrument itself is “only information,” the sentencing process that incorporates it is not. Sentencing law already tells judges to consider recidivism risk,198and the instrument tells the judge how to calculate that risk. Inescapably, unless judges completely ignore the instruments (rendering them pointless), some defendants will receive longer sentences than they would have but for their group characteristics, such as youth, male gender, or poverty. And that, indeed, is the whole point: if the state did not want unemployed people to be, on average, given longer sentences than otherwise-identical employed people, why put unemployment in the risk prediction instrument? Moreover, arguably even the information provision itself is constitutionally troubling: it represents state endorsement of statistical generalizations like those that, in the gender and poverty contexts, the Supreme Court has condemned. To be sure, for any individual defendant, each factor included in the risk prediction models is not the only determinant of the sentence—it is merely one determinant of the risk score. If a court were looking for ways to distinguish Bearden, it could seize on this difference. That case involved revocation of probation, and the Court emphasized that because the trial court had initially chosen probation, it was clear that “the State is seeking here to use as the sole justification for imprisonment the poverty of a probationer.”199 This distinction is unpersuasive, however. Anything treated as a sentencing factor will at least sometimes solely trigger a change in the sentence relative to what it would otherwise have been. To give a simple illustration, if a sentence is based on crime severity plus gender, and these factors together produce a 10-year sentence for a male when an otherwise identical woman would have received seven years, male gender is not solely responsible for the sentence; crime severity establishes the baseline of seven years. But male gender is solely responsible for the extra three years. If this point is slightly more obscured in EBS cases than in Bearden itself, it is only because judges won’t routinely state what alternative sentence they would have given if the defendant had had different characteristics. In Bearden the dispositive role of poverty could not be hidden because of the posture of the case: the defendant had been sentenced to probation and restitution until he failed to pay. But surely if a court’s decision-making is unconstitutional in substance, it cannot become constitutional through obscurity of reasoning. In any event, here the use of the discriminatory factor is not obscure, even if its specific consequence for any given defendant is not transparent. A defendant subjected to an unconstitutional decision-making process should be entitled to resentencing.200 Notably, the Supreme Court has often applied heightened constitutional scrutiny to the mere consideration of constitutionally suspect factors. In Fisher v. University of Texas at Austin, for instance, the Supreme Court applied strict scrutiny to the use of race as one of many factors in university admissions—indeed, as Justice Ginsburg

198 E.g., 18 U.S.C. 3553(a). 199 Bearden, 461 U.S. at 671. 200 See Chapman v. California, 386 U.S. 18 (1967).

42 Published by University of Michigan Law School Scholarship Repository, 2013 45 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

characterized it in dissent, as a “factor of a factor of a factor of a factor” that very likely was not the reason that the plaintiff in the case was denied admission.201 The claim that “it’s just information” thus should not enable EBS to avoid heightened equal protection scrutiny. A weaker, and more persuasive, version of this claim is that it should make it easier for EBS to survive such scrutiny under a “narrow tailoring” requirement. Analogously, in the affirmative action cases, the Court has held that race may be used as a “plus factor” (if there is no race-neutral alternative that will suffice), but it has squarely rejected the use of racial quotas.202 But the fact that the risk prediction instruments do not completely displace all other sentencing factors is a point in its favor when assessing narrow tailoring, but it is hardly dispositive, as Fisher suggests. One must also consider the extent to which they advance the state’s interests as well as the availability of alternatives. Moreover, although Fisher made narrow tailoring somewhat challenging to demonstrate even in the affirmative action context, it should be even harder to show in the EBS context. Educational affirmative action involves a state interest that is itself defined in race-conscious terms: student body diversity, of which “racial or ethnic origin” is an “important element,” although not the only one.203 It is more than plausible that considering race as one admissions factor is narrowly tailored to the objective of ensuring racial diversity, and that no totally race- blind alternative will suffice to achieve that objective. In the EBS context, however, the state’s penological interests are not defined in group-conscious terms, and the problematic classifications in the instruments are not so closely linked to those interests. B. Does EBS Merely Replace One Form of Risk Prediction With Another? Another response to the disparity concern (and to the retributivist objection raised by other critics) is to say that none of this is new: risk prediction is already part of sentencing.204 If judges are not given statistical risk predictions, many will predict risk on their own, perhaps relying implicitly on many of the same factors that the statistical instruments use, such as gender, age, and poverty; actuarial instruments will merely allow them to do so more accurately.205 One could take this argument further: Conceivably, judges’ current clinical assessments could overweight some of those variables relative to the weights assigned by the actuarial instruments.206 These possibilities not been empirically tested and cannot be ruled out. As a constitutional matter, this “substitution” defense is not very persuasive. It is not likely that courts would uphold an across-the-board state policy explicitly endorsing an otherwise impermissible sentencing criterion on the rationale that the same variables might sometimes

201 Fisher v. University of Texas at Austin, 570 U.S.__, ___ (2013) (Ginsburg, J., dissenting). 202 Fisher, 570 U.S. at __. 203 Id. at __. 204 See, e.g., 18 U.S.C. 3553. 205 See, e.g., Oleson, supra, at 1373; Patton, supra, at 1456; Jennifer Skeem, Risk Technology in Sentencing: Testing the Promises and Perils, 30 JUSTICE Q. 297 (2013); Bergstrom & Kern, supra, at 2; Commentary to Draft MPC § 6B.09; Michael H. Marcus, MPC--The Root of the Problem: Just Deserts and Risk Assessment, 61 FLA. L. REV. 751, 757 (2009); Branham, supra, at 169. 206 This is perhaps a particularly realistic possibility with respect to race, because of its absence from the instruments: if judges currently implicitly take race into account in predicting recidivism risk, it is possible that giving them a statistical prediction that is not race-specific could cause them to stop doing so. Thus, even if EBS increases the weight given to socioeconomic variables that are correlated with race, it could reduce the weight given to race itself, offsetting or even reversing its expected effect on racial disparity.

43 http://repository.law.umich.edu/law_econ_current/90 46 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

already have been used sub rosa. In general, the difficulty of eradicating subtle unconstitutional discrimination does not justify codifying or formally endorsing it. Moreover, the “substitution” defense depends on a questionable empirical premise. Do the EBS instruments really merely substitute one form of risk prediction for another? Or does providing judges with statistical estimates of recidivism risk increase the salience of recidivism prevention in their decision-making vis-à-vis other punishment objectives? Notably, some EBS advocates affirmatively express the hope that EBS will lead to an expanded emphasis on recidivism prevention.207 If it does, it will almost surely increase the role of the individual demographic and socioeconomic characteristics used in the EBS instruments. Those characteristics are not relevant to retributive motivations for punishment (or may even cut the other direction). There are logical reasons to suspect that EBS might increase the emphasis judges place on risk prediction. Most judges no doubt recognize that predicting recidivism risk is difficult, and that difficulty might well lead many of them to discount this factor. If such a judge is presented with a quantified risk assessment framed as scientifically established, they may well give it more weight.208 In many other legal, policy, and other decision-making contexts, scholars have observed that judges and other decision-makers often defer to scientific models that they do not really understand, and to “expert” viewpoints.209 Moreover, sentencing is high-stakes, complex decision-making that many judges describe as weighing heavily on their emotions,210 rendering the use of a simple, seemingly objective algorithm potentially appealing.211 For elected judges, research has shown that political considerations influence sentences,212 and reliance on risk predictions might provide political cover for release decisions while making it

207 E.g., Hyatt, Bergstrom, & Chanenson, supra, at 266. 208 See Hannah-Moffat, supra, at 7 (“Risk scores impart a moral certainty and legitimacy into the classifications that they produce, ‘allowing people to accept them as normative classifications and therefore as scripts for action.”); Harcourt, supra, at 273 (describing the “pull of prediction”). 209 E.g., Janine Pearson, Construing Crane: Examining How State Courts Have Applied its Lack-of-Control Standard, 160 U. PA. L. REV. 1527, 1550-53 (2012) (discussing jury overreliance on expert testimony of dangerousness in civil commitment hearings); Michael H. Shapiro, Updating Constitutional Doctrine: An Extended Response to the Critique of Compulsory Vaccination, 12 YALE J. HEALTH POL'Y, L. & ETHICS 87, 128-29 (discussing the problem of judicial overreliance on expert claims of causation); Kathryn M. Campbell, Expert Estimates from ‘Social’ Scientists, 16 CAN. CRIM. L. REV. 13 (2011); Robert L. Kane, Creating Practice Guidelines: The Dangers of Over-Reliance on Expert Judgment, 23 J.L. MED. & ETHICS 62, 63 (1995); Robert E. Schween & Steven P. Larson, 32 ROCKY MTN. MINERAL L. INST. PROC. 22 (1986) (describing courts’ and policymakers tendency to overrely on models and perceived expertise in the environmental context); Case, Problems in Judicial Review Arising From the Use of Computer Models and Other Quantitative Methodologies in Environmental Decision Making, 10 B.C. L. REV. 251, 256 (1982) (same). 210 See Oleson, supra, at 1330 & n.2 (citing sources); D. Brock Hornby, Speaking in Sentences, 14 Green Bag 2d 147, 157 (2011); Judge Robert Pratt, The Implications of Padilla v. Kentucky on Practice in United States District Courts, 31 ST. LOUIS UNIV. PUBLIC L. REV. 169, 169 (2011) (“Sentencing is unquestionably the most difficult job of any district court judge.”); Judge Thomas M. Hardiman, Foreword, 49 DUQ. L. REV. 637, 637 (2011) (“Any preconceived notions that a judge may have about sentencing upon taking the bench are quickly dwarfed by the awesome responsibility it entails.”). 211 This point may help to explain the continuing heavy weight federal judges give to the sentencing guidelines that they are not required to follow. 212 Gregory A. Huber & Sanford C. Gordon, Accountability and Coercion: Is Justice Blind When It Runs for Office?, 48 AM. J. POL. SCI. 247, 261 (2004) (finding that judges increase sentences as elections approach).

44 Published by University of Michigan Law School Scholarship Repository, 2013 47 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

politically difficult to release offenders rated as high-risk.213 Prosecutors might similarly feel political pressure to push for harsh sentences for offenders rated high-risk, but free to offer better deals to those rated low-risk.214 To be sure, some of the research on clinical versus actuarial prediction has suggested that clinicians may resist reliance on actuarial instruments, but that research comes from medical and mental health diagnosis settings in which the clinician may be much more confident in their professional diagnostic skills than judges are in their ability to foresee a defendant’s future.215 Even if a particular judge does not really trust the instrument, its prediction might influence her thinking through anchoring.216 And presenting the judge with a risk prediction instrument may simply remind her that risk is a central basis on which the state expects her to base punishment. All of this is speculative; no empirical research documents how risk prediction instruments affect judges’ weighting of recidivism risk versus other factors. To provide some suggestive evidence informing the question, I carried out a small experimental study, with 83 law students as subjects. All subjects were given the same fact patterns describing two criminal defendants and told to recommend a sentence for each. The key experimental variation was that for half the subjects, the descriptions also included a paragraph with the defendant’s score on a Recidivism Risk Prediction Instrument (RRPI) and a brief explanation of what the RRPI was. The cases involved the same conviction (grand larceny of $100,000 worth of jewelry) and the same minimal criminal history (one misdemeanor underage-drinking conviction). Both defendants were male, and no race was mentioned. Beyond that, their characteristics varied sharply. Robert was a middle-aged, married, college-educated executive in a jewelry store chain, and was motivated to steal from the chain’s stores by concern about the cost of his daughters’ college education. William was a 21-year-old, single, unemployed, alcoholic high school dropout with incarcerated siblings, recently evicted from his parents’ home, who was visiting a mall looking for retail work when he saw a jewelry display case open and spontaneously grabbed a bunch of items. These fact patterns allowed some possible distinctions between the defendants’ criminal conduct. William’s crime was spontaneous, while Robert’s involved an extended course of conduct, elaborate deceptive behavior (replacement of the jewels with fakes), and arguably more victims (buyers of the fakes). These distinctions allowed subjects primarily motivated by retribution to have a possible basis for distinguishing the two—likely in William’s favor—whereas those inclined to rely on a defendant’s characteristics to assess future dangerousness would likely give William a longer sentence.217 Subjects were given a wide statutory sentencing range (zero to 20 years) and not told what punishment theories to prioritize. All subjects were given all these underlying facts; the difference was whether they were also translated into an RRPI score. Robert’s probability of recidivism was rated “low risk” while William’s was “moderate-to-high risk.” Although the RRPI is fictional, these ratings realistically approximate the difference that one would see using real instruments. For instance, on the Missouri instrument’s -8 to 7 scale, Robert would have a perfect score of 7, while William

213 Hannah-Moffat, supra, at 30 (“The use of risk scores can have considerable cache[t] with ‘elected’ judges and prosecutors who must defend their decisions to an electorate concerned with security.”). 214 Id. 215 E.g., Atul Gawande, THE CHECKLIST MANIFESTO (2009). 216 See Prescott & Starr, supra, at 325-30 (reviewing anchoring research); Cass R. Sunstein et. al., Predictably Incoherent Judgments, 54 STAN. L. REV. 1153, 1170 (2002). 217 Students’ comments after completing the exercise supported this interpretation.

45 http://repository.law.umich.edu/law_econ_current/90 48 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

would score -1 (“below average”). Subjects considered the scenarios in a prescribed, randomized order. The results in Table 2 suggest that the RRPI score sharply affected the relative sentences some subjects gave to Robert and William. Among the 43 students who were not given the RRPI score, 17 gave Robert (the “low-risk” defendant) the higher sentence, 13 gave them the same sentence, and 13 gave William the higher sentence. Among the 40 students who received the RRPI score, only 8 gave Robert the higher sentence, 9 gave them the same sentence, and 23 gave William the higher sentence. Table 2: An Experiment: Risk Prediction Instruments and Relative Sentence Outcomes (3) Which Higher? (1) Robert Higher (2) William Higher (William, Same, Robert) (4) Sentence (Probit) (Probit) (Ordered Probit) (OLS) RRPI -0.603* 0.710* 0.662** -0.871 (0.305) (0.284) (0.257) (0.733) William -0.711 (0.473) william*RRPI 1.67* (0.61) Cols. 1 & 2 show probit regressions of indicators for giving the "low-risk" or "high-risk" defendant, respectively, a higher sentence. Col. 3 shows an ordered probit regression of a variable valued at 2 if the high-risk defendant’s sentence was higher, 1 if they received the same sentence, and 0 if the low-risk defendant’s sentence was higher. Col. 4 is an OLS regression with sentence in years as the outcome. An indicator for which case the subjects considered first was also included. Standard errors in parentheses. *p<0.05, **p<0.01

I assessed the size and statistical significance of this shift toward higher sentences for William in several ways, using different definitions of the outcome variable. First, I used probit regressions to estimate the change in the probabilities that Robert would be given a higher sentence (Col. 1) or that William would (Col. 2.). These two are not just mirror-image inquiries, since there is a third option of giving both the same sentence. I next used an ordinal probit regression to assess change in the relative probability of each of these three possible outcomes (Col. 3). Next, I used the recommended sentence, in years, as the outcome variable, an approach that takes into account the magnitude and not just the direction of the sentencing distinctions (Col. 4). The results are statistically significant, and fairly sizeable, in all specifications. The use of the RRPI instrument is associated with an increase in William’s sentence, relative to Robert’s, of about 1.67 years (that is, 20 months), or about one-third of the overall average sentence (5 years). The average sentence given to William was about 0.8 years higher in the RRPI condition; the average sentence given to Robert was about 0.9 years lower.218 A reasonable interpretation of these results is that receiving the RRPI score caused at least some subjects to emphasize recidivism risk more, relative to other sentencing considerations, than they would have otherwise. Moreover, the instrument’s apparent effect on sentences was not unidirectional—the instrument’s estimated effect on the difference between

218 Subjects who were given William’s case first gave significantly higher sentences to both defendants than those who were given Robert’s case first. But order did not significantly affect the relative sentences given nor the effect of the RRPI.

46 Published by University of Michigan Law School Scholarship Repository, 2013 49 Law & Economics Working Papers, Art. 90 [2013]

EVIDENCE-BASED SENTENCING AND THE SCIENTIFIC RATIONALIZATION OF DISCRIMINATION

the two defendants reflected a combination of an increase in the high-risk defendant’s sentence and a reduction in the low-risk defendant’s sentence. These results provide a piece of suggestive evidence that quantified risk assessments might affect the weight placed on different sentencing considerations. However, the study is small, and moreover, although much experimental research on decision-making uses student subjects, one has to be cautious in extrapolating the results of such studies to “real world” settings. A real criminal case is not a four-paragraph vignette, and judges are not law students— their experience and expertise may make them less suggestible. Still, it cannot be assumed that judges are wholly resistant to attempts to influence their sentencing decision-making. After all, judges tend to defer to non-binding sentencing guidelines, and research from other legal settings suggests that courts tend to defer to scientific expertise.219 While it remains an unsettled question, for now there is no empirical evidence pointing the other way, and little reason to believe that EBS will merely substitute one form of risk prediction for another.

CONCLUSION

The inclusion of demographic and socioeconomic variables in risk prediction instruments that are used to shape incarceration sentences is normatively troubling and, at least with respect to gender and socioeconomic variables, very likely unconstitutional. As the EBS movement charges full steam ahead, advocates have minimized the first concern and almost wholly ignored the second. This is a mistake. To be sure, EBS has an understandable appeal to those seeking a politically palatable way to cut back on the United States’ sprawling system of mass incarceration. It is difficult to persuade policymakers to reduce incarceration at the cost of increased crime, and EBS offers a technocratic solution to this normative dilemma: just identify the people who can be released without increasing crime. But this identification is not that easy, and moreover, there is no reason to assume, and no good way to ensure, that EBS will only lead to sentences being reduced. Even if it does, there is something troubling, at best, about using group identity and socioeconomic privilege as a basis for reducing defendants’ sentences. Note that while I have focused on sentencing, essentially the same arguments apply to use of actuarial instruments in - decisions, which is now routine in thirty states, including almost all of those that have not abolished discretionary parole.220 This practice has been given little attention by legal scholars or the public,221 and has rarely been challenged in court, perhaps because of the absence of counsel in parole proceedings or because parole decision-making is not very transparent. Many prisoners may not even know of the existence of the risk prediction instruments, much less understand how they work or their constitutional infirmities.222 But while risk prediction unquestionably is properly central to the parole decision,223 the use of

219 See supra note 209. 220 HARCOURT, supra, at 78-80. 221 Scholarly criticism has focused on procedural concerns—mainly on the prisoner’s lack of counsel at parole hearings. For this reason, the MPC claims to “‘domesticate[]’ the use of risk assessments by repositioning them in the open forum of the courtroom”—that is, by using them in sentencing instead of in parole (which the MPC seeks to abolish entirely). Draft MPC § 6B.09 cmt. (a). See also McGarraugh, supra (advocating barring the instruments at parole but using them in sentencing). 222 In some states, the basis for the parole decision is confidential by law, so the parole board may refuse the prisoner’s request to see the risk assessment. McGarraugh, supra, at 1079 & n.5. 223 Indeed, risk is arguably the only legitimate parole consideration, because considerations such as retributive justice or general deterrence have already been considered by the sentencing judge. The only reason to leave the

47 http://repository.law.umich.edu/law_econ_current/90 50 Starr: 66 STANFORD LAW REVIEW __ (FORTHCOMING 2014)

demographic and socioeconomic variables to predict risk raises the same disparate treatment concerns that EBS does.224 Moreover, the parole context may offer additional available alternatives to the constitutionally objectionable variables. For instance, rather than basing parole decisions on a prisoner’s prior education or employment, one could consider his efforts while in prison to improve his future prospects, such as participation in job training or education programs. Such factors would speak to the prisoner’s individual efforts to achieve rehabilitation, rather than to his socioeconomic background. In contrast, it is easier to defend the use of risk prediction instruments in assignment of prisoners, probationers, and parolees to correctional and reentry programming (e.g., job training), and to shape conditions of supervised release (e.g., drug tests).225 In this context, risk assessment is often combined with instruments assessing “criminological needs” and predicting “responsivity” to various such interventions. The empirical merits of such instruments are beyond this paper’s scope, though I note that the responsivity instruments at least address the right question: what can be gained by treating an offender in a certain way? In any event, such uses of actuarial instruments raise less serious constitutional and policy concerns. To be sure, supervision conditions may be burdensome, especially if they affect the likelihood that probation or parole will be revoked, and programming decisions can affect access to valuable services. Still, the stakes are not as high as they are in sentencing, and therefore there is less reason to apply heightened scrutiny to socioeconomic classifications and other traits that are not treated as suspect outside the criminal justice context. Distributing access to correctional programming based on risk, needs, and responsivity assessments is not particularly different from distributing access to non-correctional social services and government benefits to those populations who most need them, which is a routine government function, subject to rational basis review unless suspect or quasi-suspect classifications are involved. In sentencing, however, the defendant’s most fundamental liberties and interests are at stake, as are the interests of families and communities. EBS advocates have not made a persuasive case that this crucial decision should turn on a defendant’s gender, poverty, or other group characteristics. The risk prediction instruments offer little meaningful guidance as to each individual’s recidivism risk, and they do not even attempt to offer guidance as to the way in which sentencing choices affect that risk. The instruments, and the problematic variables, advance the state’s penological interests weakly if at all, and there are alternatives available. Risk prediction is here to stay as part of sentencing, and perhaps actuarial instruments can play a legitimate role. But they should not include these problematic variables, which do not offer much additional predictive value once crime characteristics and criminal history are taken into account. The current instruments simply do not justify the cost of state endorsement of express discrimination in sentencing.

sentence indeterminate is to account for the fact that recidivism risk may evolve over time; those who believe risk prediction is an improper basis for punishment should simply oppose indeterminate sentencing. See, e.g., Christopher Slobogin, Prevention as the Primary Goal of Sentencing: The Modern Case for Indeterminate Sentencing, 48 SAN DIEGO L. REV. 1127, 1128-30 (2011). 224 Note that while the Supreme Court once labeled parole an “act of grace,” the deprivation of which a prisoner could not contest, this theory is now considered “long-discredited.” Samson v. California, 547 U.S. 843, 864 n.5 (2006). States have no obligation to provide a system of parole, but once they do, its operation is constrained by the Constitution. Board of Pardons v. Allen, 482 U.S. 369, 378 (1987); Morrissey v. Brewer, 408 U.S. 471, 482 (1982). 225 See Nat’l Ctr for State Cts, supra, at 16-20; Warren, supra; Melissa Aubin, The District of Oregon Reentry Court: An Evidence-Based Model, 22 Fed. Sent. R. 39 (2009) (discussing evidence-based practices in federal “reentry courts”).

48 Published by University of Michigan Law School Scholarship Repository, 2013 51 The Uncertainties of Risk Assessment: Partiality, Transparency, and Just Decisions Author(s): Kelly Hannah-Moffat Source: Federal Sentencing Reporter, Vol. 27, No. 4, The Risk Assessment Era: An Overdue Debate (April 2015), pp. 244-247 Published by: University of California Press on behalf of the Vera Institute of Justice Stable URL: http://www.jstor.org/stable/10.1525/fsr.2015.27.4.244 Accessed: 08/05/2015 10:48

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

University of California Press and Vera Institute of Justice are collaborating with JSTOR to digitize, preserve and extend access to Federal Sentencing Reporter.

http://www.jstor.org

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:48:51 AM All use subject to JSTOR Terms and Conditions The Uncertainties of Risk Assessment: Partiality, Transparency, and Just Decisions

Risk assessment has been used for years in a range of probability scores that uncritically prioritize individual criminal justice contexts. Recent calls for ‘‘evidence based’’ characteristics. This paper summarizes some of my or ‘‘smart’’ sentencing are positioning risk assessment as research,4 specifically data collected from interviews with KELLY a promising path toward more efficient, unbiased, and more than a hundred Canadian probation/parole practi HANNAH- empirically based sentencing practices that depopulate tioners, crown attorneys (prosecutors), defense lawyers, and MOFFAT prisons and treat some prisoners less harshly while at the Canadian judges, as well as from informal interactions same time securing public safety. A range of actuarial tools during judicial and practitioner training seminars on risk/ Centre of have been promoted as being able to replace practitioners’ need assessment and detailed policy analysis in multiple Criminology and discretionary decision making with structured, quantita contexts. Although risk assessment techniques are readily Sociolegal Studies tively derived decision making templates that rely on the available and proliferating, the process of administering and Department of use of social scientific evidence, particularly about recidi and assessing risk remains highly discretionary and is often Sociology, vism and correctional interventions that reduce criminal misunderstood. Most importantly, risk assessment has not University of propensities.1 Actuarial risk tools guide practitioners remedied systemic problems of discrimination and dis Toronto, Canada through a logical and simple process to itemize, score, and parity, nor has it led to more accurate treatment matching interpret statistically significant information about offen or punishments to reduce recidivism, which is itself noto ders. Some scholars have argued that using actuarial risk riously difficult to measure. tools will prevent judges from ‘‘sentencing blindly,’’ that is, will prevent ‘‘over punishing’’ (sending individuals who Accountability, Discretion, and Organizational present little appreciable risk to public safety to prison) or Risk Management ‘‘under punishing’’ (releasing dangerous criminals into Many practitioners we interviewed5 feel that the introduc communities to reoffend).2 tion of evidence based assessment practice has introduced An extensive body of psychological literature has more accountability and produced a sense of perceived focused on the science behind risk prediction. The trend accountability. One commented that decisions relying on toward using generic risk tools as an evidence based prac evidence based tools are ‘‘better than clinical judgment or tice in sentencing and at other crucial decision making any form of subjective judgment. You know, it’s well points needs to be carefully considered, because they can researched, empirically based, and so I think it gives the have significant consequences. The use of some types of risk organization and the individual users some comfort that assessment can elevate sentences on the basis of acuteness their planning and their risk decision making is based on of disadvantage, reinforce/mask racial and gender dispar some evidence.’’ Another said, ‘‘it’s more defensible at an ity, produce false positives, and lead to less transparent inquest or lawsuit than a gut feeling.’’6 Actuarial methods decisions.3 I do not dispute the authenticity of much of this are credited with giving decisions substance and making literature, the use of risk in a clinical context, or the use of them more scientific, auditable, and consequently confer evidence to inform sentences or sanctions. My arguments ring an appearance of legitimacy. Practitioners routinely focus on the use of general risk instruments (such as the admitted that risk assessment tools were not ‘‘reliable’’ Level of Service Inventory or LSI) that are designed to be predictors, but that they preferred them because using administered and interpreted by nonclinical or deskilled them minimized their own risk of being blamed for subse practitioners in correctional or jurisprudential contexts, and quent consequences, and that of the prosecutor’s office or the use of assessment tools that are crafted by local correctional authorizes. The use of these tools suggests the jurisdictions. curtailment of discretion and subjective judgments, which Here, I will draw on empirical research to show that are often viewed negatively and associated with errors. actuarial risk tools may be more akin to a Pandora’s Box Actuarial tools and their accompanying forms and than a panacea. I will argue that evidence based approaches guidelines require practitioners to collect information that draw on risk assessments do not eliminate discretion considered relevant to the calculation of risk and need. They or produce more objective outcomes, and that risk assess appeal to many administrators and policymakers because of ments are poorly understood and are actually misconstrued their perceived simplicity, ability to organize mounds of

Federal Sentencing Reporter, Vol. 27, No. 4, pp. 244 247, ISSN 1053 9867, electronic ISSN 1533 8363. © 2015 Vera Institute of Justice. All rights reserved. Please direct requests for permission to photocopy or reproduce article content through the University of California Press’s Rights and Permissions website, http://www.ucpressjournals.com/reprintInfo.asp. DOI: 10.1525/fsr.2015.27.4.244.

244 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:48:51 AM All use subject to JSTOR Terms and Conditions information, focus on seemingly objective, empirically a personal budget? Are you receiving general welfare based variables, and apparent capacity to reduce or elimi assistance, family benefits allowance? nate discretion. One practitioner commented that the cri An emerging body of literature on the conceptual and teria contained in general risk tools like the LSI are methodological problems associated with the use of generic straightforward, which makes ‘‘them simple to score for risk assessments (such as the LSI series and/or those using a variety of practitioners and reduces the need for time standard combinations of static or dynamic variables) is consuming narrative assessments by professionally trained indicating that they fail to adequately control for gender or clinicians the latter discredited as subjective and unem racial disparity and the potential for discriminatory out pirical and as having poor predictive accuracy and high comes.12 Judges and practitioners who administer these discriminatory potential.’’7 Support for the introduction of tools share concerns about bias and the suitability of pri risk tools rests on the premise that such tools enhance oritizing risk assessments. For these reasons, many delib professionalism by improving the defensibility and erately manipulated risk scores by altering the scores and accountability of decisions, generating uniformity across weights of criteria. Most did not include a written expla jurisdictions and regions, and maintaining a perception of nation for these informal overrides because they felt it objective scientific validity. might jeopardize them professionally, even if their choice Nonetheless, many other practitioners actively resist was ethically defensible. Many practitioners have com and/or only selectively embrace actuarial based risk tools, mented that they strategically exercise their discretion when so the introduction of risk tools shapes but does not elim filling out risk assessments. However, this exercise of dis inate discretion.8 Our interviews with developers of risk cretion is rarely visible because risk tools are effectively instruments, tool trainers, and practitioners revealed that ‘‘black boxes’’ that sanitize the subjective input of practi they routinely questioned the objectivity and accuracy of tioners to produce simple ‘‘objective’’ scores. These scores risk/need assessments and raised significant concerns are then used to inform recommendations about sentenc about their ability to improve bias outcomes. In controlled ing, conditions, institutional placements, and releases. research settings, actuarial assessment tends to be more Risk assessment tools structure what evidence is accurate than clinical assessment, but in the real world, recorded and how it is recorded, but practitioners exercise practitioners must assess risk in a politically charged, substantial judgment in determining what additional socially complex context; many argue that in practice, expert information to collect, the case file information or collateral discretion is actually more accurate than risk assessment sources that are deemed relevant or significant, and the tools. Additionally, research conducted in multiple inter facts that are selected as examples of risk. My research with national jurisdictions has demonstrated that practitioners’ Paula Maurutto and Sarah Turnbull revealed that the pro accumulated experience, knowledge, and professional cess of gathering and assessing information to formulate training/education affect how they interact with and use recommendations involves subjective decisions that are risk assessments.9 informed by a practitioner’s personal knowledge, experi Although risk assessment tools are characterized as ence, values, and beliefs, meaning that different practi objective, their scoring methods and structures actually tioners may frame the same phenomena differently.13 Our obscure the subjective and arbitrary nature of the ques interviews revealed that some practitioners were very aware tions and judgments they contain. For example, their of gender and racial differences. For example, they knew operational manuals include a series of normative ques that counting and scoring the number of offences of an tions designed to determine whether particular dynamic Aboriginal offender would likely yield a classification of areas are criminogenic.10 The specific questions vary high risk, because many Aboriginals living in conditions of across tools, but are designed to gather information about poverty share high risk characteristics (e.g., social, educa relevant relationships and circumstances, so that practi tional, and family problems, substance use, and histories of tioners may assess risk of recidivism and treatment trauma and abuse). Consequently, they would adapt the needs. This is not a neutral statistical exercise; scoring scoring of less quantifiable criteria to lower the overall score methods require practitioners to ask a series of directed and adjust what they considered to be ‘‘White, middle class questions about criminal histories, leisure activities, edu criteria.’’14 We observed similar practices among practi cation, past sentences, associates, family and relationship, tioners working with women and girls, who felt the tools emotional wellbeing, housing, substance use, family child did not adequately reflect women’s risk/need factors and rearing, attitudes, social assistance, finances, employment, scored them too high or too low (often depending on the and various other issues, depending on the tool. For offence type and attitude), and among practitioners work example, the financial domain of the LSI R interview ing with sex offenders who would otherwise score lower guide11 includes the following questions for offenders that because they were educated, less criminalized, and more are based on the assumption that risk free individuals socially integrated. These interviews clearly revealed that have credit, are not financially over extended or concerned scoring risk criteria requires discretionary evaluation of an about finances, and are capable financial planners who offender’s behavior, character, lifestyle, and choices. These live within their means and follow budgets: Do you have qualitative assessments are not easily quantified in actuarial a bank account? Do you have credit? Do you have scales, so practitioners must apply their own judgment and

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 245

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:48:51 AM All use subject to JSTOR Terms and Conditions experiential knowledge when evaluating risk.15 We found ascription severs the link between punishment and indi that practitioners generally preferred to adjust the assess vidual action, lacks nuance, and misses the importance of ment of criteria to control the final score, rather than relying a range of motivational, contextual, and structural factors on formal overrides, which were visible in the file upon that contribute to human action. review. Many of the practitioners we interviewed who either My previous research16 has shown that race and gender worked daily with risk assessments or had learned about are complex social constructs and cannot simply be reduced the nuances in their scoring and application stated that the to binary variables and then tested for significance (pre tools helped structure decisions and consolidate informa dictive validity and reliability) in risk instruments. The tion but they said that a good understanding of the person emphasis on validating risk tools for use on diverse popu before them was more important, along with professional lations can produce disparity in sentencing because the experience. This finding suggests that individualized and relationships between race, gender, social inequality, and contextualized knowledge is more important than actuarial specific risk factors involve important theoretical and information. methodological concerns. For example, simple statistical Risk assessment tools may provide some organizational correlations between gender, race, and crime tell us more accountability in terms of how and what types of informa about criminal justice practices than offender behavior. tion are collected, reported, valued, and ultimately used to Actual rates of recidivism are not observable. Instead, inform the allocation of scarce penal resources. Formal recidivism as defined in risk assessment instruments is an risk/need assessment tools are extremely unlikely to help indicator of arrest or convictions. Here actuarial instru mitigate persistent structural and systemic issues such as ments can accentuate biases and various forms of dis income disparity, racial/ethnic discrimination, gender crimination evident in detection, enforcement, and imbalance, social welfare distribution, unemployment, and prosecution.17 Additionally, the science supporting the use limited health and mental health care provisions that con of risk tools is contested and insufficiently advanced to tribute to criminal involvement and severely limit the prove that they do not replicate or reproduce forms of sys choices of large segments of the population. temic discrimination. Many of the variables used in risk scores are the same as those identified by critical scholars as Notes being correlated with race and social inequality, and thus of 1 See Kirk Heilbrun, Risk Assessment in Evidence Based Sentenc being interpreted quite differently in different contexts. ing Context and Promising Uses, 1 Chapman J. Crim. Just. 127 Individuals who are racialized, live in poverty, are unem (2009). Michael Marcus, MPC The Root of the Problem: Just deserts and risk assessment, 61 Florida L. Rev. 751 (2009a); ployed, and/or struggle with mental illness are potentially Michael Marcus, Conversations on Evidence Based Sentencing, disadvantaged by these criteria, and the suitability of these 1 Chapman J. Crim. Just. 61 (2009b); Roger K. Warren, tools for women and racialized populations is still hotly Evidence Based Practices and State Sentencing Policy: Ten debated. Policy Initiatives to Reduce Recidivism, 82 Ind. L.J., 1307, The practitioners we interviewed were aware of these 1312 14 (2007). Michael A. Wolff, Evidence Based Judicial Discretion: Promoting Public Safety through State Sentencing methodological problems and the very real possibility of Reform, 83 NYUL. Rev. 1389 (2008). discrimination from their use. In some cases, practitioners 2 J.C. Oleson, Risk in Sentencing: Constitutionally Suspect Vari added more information than needed when scoring vari ables and Evidence Based Sentencing, 64 S.M.U. L. Rev. 1329 ables such as those related to substance abuse or emotional (2011). 3 health. They did this to ensure that the individual scored as Paul R. Falzer, Valuing Structured Professional Judgment: Pre dictive Validity, Decision Making, and the Clinical Actuarial high risk in that domain and thereby justified a recom Conflict, 31 Behav. Sci. & L. 40 (2013); Patricia Van Voorhis, mendation for a program that the offender needed, in their On Behalf of Women Offenders, 11 Criminology & Pub. Pol’y professional opinion. These practices may lead to an 111 (2011); Peter Raynor & Sam Lewis, Risk Need Assess abundance of conditions that may facilitate offender man ment, Sentencing and Minority Ethnic Offenders in Britain, 41 agement by requiring treatment, but are not always Brit. J. Soc. Work 1357 (2011). 4 See Kelly Hannah Moffat, Actuarial Sentencing: An Unsettled empirically justified. Proposition, Justice Q. 1 (2012); Kelly Hannah Moffat, Grid One astute practitioner commented, ‘‘risk assessments lock or Mutability: Reconsidering ‘‘Gender’’ and Risk Assess are fancy probabilities scores that can’t account for human ment, 8 Criminology & Pub. Pol’y 221 (2009); Kelly Hannah nature or the situation.’’18 My work, and that of my collea Moffat et al., Negotiated Risk: Actuarial Illusions in Probation 24 gues, has consistently shown that judges and practitioners Can. J.L. & Soc’y 391 (2009); Paula Maurutto & Kelly Hannah Moffat, Understanding Risk in the Context of the Youth Criminal routinely misapply and misinterpret risk scores. Actuarial Justice Act, 49 Can. J. Criminology 465 (2007); Paula Maur prediction instruments yield probabilistic predictions based utto & Kelly Hannah Moffat, Assembling Risk and the Restruc on data from aggregate (often adult male) populations, and turing of Penal Control, 45 Brit. J. Criminology (2005); Kelly do not always consider how race affects variables. In prac Hannah Moffat, Criminogenic Need and the Transformative Risk tice, correlations are conflated with causation. A person Subject: Hybridizations of Risk/Need in Penality, 7 Punishment & Soc’y 29 (2004). may be scored as high risk because he/she shares the same 5 See Hannah Moffat, Criminogenic Need, supra note 4. characteristics as a group of offenders who are already 6 Hannah Moffat et al., Negotiated Risk, supra note 4. involved in the system and prone to recidivate. This 7 Id. at 399.

246 FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:48:51 AM All use subject to JSTOR Terms and Conditions 8 Id.; see Hazel Kemshall, Risk in Probation Practice (1998); Agency Risk Management of Sexual and Violent Offenders,3 Joel Miller & Carrie Maloney, Practitioner Compliance with Punishment & Soc’y 237 (2001); Dale Ballucci, Risk in Action: Risk/Needs Assessment Tools: A Theoretical and Empirical The Practical Effects of the Youth Management Assessment.17 Assessment, 40 Crim. Just. & Behav. 716 (2013); Stephen Soc. Legal Stud. 175 (2008); Hannah Moffat, Gridlock or D. Hart & David J. Cooke, Another Look at the (Im )Precision of Mutability, supra note 4; Hannah Moffat, Criminogenic Need, Individual Risk Estimates Made Using Actuarial Risk Assessment supra note 4; A. Perron & K. Svensson, Shades of Profession Instruments, 31 Behav. Sci. & L. 81 (2013); Falzer, supra note alism: Risk assessment in Presentence Reports in Sweden,9 3; Tim Goddard and R. Myers, Against Evidence Based Eur. J. Criminology 176 (2012); Miller & Maloney, supra note Oppression: Actuarial Youth Justice and Its Potential Future(s), 8. Tim Goddard, The Indeterminacy of the Risk Factor Preven Paper Presented at the American Criminology Conference, tion Paradigm: A Case Study of Community Partnerships Imple San Francisco, California (2014). menting Youth and Gang Violence Prevention Policy, 14 Youth 9 Gwen Robinson, Implementing OASys: Lessons from Research Just. 3 (2014). into LSI R and ACE, 50 Probation J. 33 (2003); see Kerry 10 Hannah Moffat, Criminogenic Need, supra note 4; Hannah Baker, Assessment in Youth Justice: Professional Discretion and Moffat, Actuarial Sentencing, supra note 4. the Use of ASSET, 5 Youth Just. 106 (2005); Diana Wendy 11 D. Andrews & J. Bonta, LSI R: The Level Of Service Inventory Fitzgibbon, Institutional Racism, Pre emptive Criminalisation Revised Users’ Manual (2002). and Risk Analysis, 46 How. L.J. 128 (2007); Diana Wendy 12 Hannah Moffat, Actuarial Sentencing, supra note 4. Fitzgibbon, Fit for Purpose? OASys Assessments and Parole 13 Hannah Moffat et al., Negotiated Risk, supra note 4. Decisions, 55 Probation J. 55 (2008); Anne L. Schneider et al., 14 Id. Further Exploration of the Flight from Discretion: The Role of Risk/ 15 Id. Need Instruments in Probation Supervision Decisions,24J. 16 Hannah Moffat, Actuarial Sentencing, supra note 4, at 19. Crim. Just. 109 (1996); Hazel Kemshall & Mike Maguire, 17 Id. Public Protection, Partnership and Risk Penality: The Multi 18 Interview conducted by author, Department of Justice (date).

FEDERAL SENTENCING REPORTER • VOL. 27, NO. 4 • APRIL 2015 247

This content downloaded from 206.16.244.10 on Fri, 8 May 2015 10:48:51 AM All use subject to JSTOR Terms and Conditions Federal Pretrial Risk Assessment Scoring Guide

Office of Probation and Pretrial Services

Revised March 27, 2013

Acknowledgements: OPPS would like to thank the contributions of it’s Pretrial Services Working group and the officers, supervisors, and chiefs that reviewed the risk assessment instrument and provided invaluable suggestions for improving the current version of the instrument. INTRODUCTION The Federal Pretrial Risk Assessment (PTRA) was developed by OPPS staff using data collected through the PACTS system and initially analyzed as part of a larger project. The PTRA is an objective, quantifiable instrument that provides a consistent and valid method of predicting risk of failure-to-appear (FTA), new criminal arrest (NCA), and technical violations (TV) leading to revocation while on pretrial release. The instrument is comprised of 11 scored items that are divided into two domains or categories: criminal history and other. The criminal history items are static and will not likely change during the pretrial period. The “other” items are changeable and might show some movement based on pretrial supervision and services. Scoring for the items on the PTRA is a fairly straightforward process. The scored items are given a number of points (either 0, 1, or 2). The points from the items are then added up to give an overall score. When administered correctly, the PTRA provides a score that allows for classification into a risk category. Those risk categories are then associated with rates of FTA, NCA, and TVs. This scoring guide is designed in a user-friendly format and provides the purpose behind each of the scoring items, potential prompters and sample questions, and scoring rules/issues. The PTRA is configured as follows:

Item #’s Domain Scored Items 1.1-1.6 Criminal History 6 2.1-2.5 Other 5

TOTAL RISK SCORE

The total risk score is determined by adding up the points for each of the scored items. This score ranges from 0 to 14. The guidelines established based on initial analyses are as follows: 0-4—Cateogry I, 5-6 Category II, 7-8 Category III, 9-10 Category IV, and 11+ Category V.

2 INTRODUCTION The chart below displays the failure rates (FTA/NCA and FTA/NCA/TV) for each category of risk. As can be seen with each increase in risk category the probability of failure also increases. These failure rates were generated using data from the pretrial supervision period.

60

40 35 29

19 20 20 15 10 10 6 2 3 0 Category ICategory II Category III Category IV Category V FTA/NCA FTA/NCA/TV Criminal History Domain #1.1 Domain: Criminal History

Rate and Item #1.1: Number of Felony Convictions score

Purpose of Item: Criminal history is a long-established predictor of future behavior. The versatility, stability & frequency of the criminal patterns are key factors in assessing the risk for recidivism. A more extensive history means a greater likelihood of future criminal behavior. Gathering information regarding how the criminal behavior interrelates with other risk factors will assist in determining interventions needed.

Sample Questions Interviewing Strategies This item can be scored from official records and should Some self-reported convictions might not show up on a at least be verified by official records. record check. Be sure to gather information on these items and attempt to verify this information for scoring How many times have you been arrested for a felony purposes. offense? How many times have you been convicted of a felony?

Scoring Rules/Issues:

1. This information is based on the number of felony convictions.

2. Multiple counts of a conviction, resulting from one arrest date, should be counted as one conviction.

3. Be sure to accurately record the number of convictions and then code that number properly for scoring.

4. No prior felony convictions is given 0 points, 1 to 4 prior felony convictions is given 1 point, and five or more felony convictions is given 2 points.

5. Do not count juvenile arrests/convictions in this section. #1.2 Domain: Criminal History

Item #1.2: Prior FTAs Rate and score

Purpose of Item: One of the best predictors of future behavior is past behavior. The number of prior FTAs allows for some assessment as to how likely this behavior is to occur in the future.

Sample Questions Interviewing Strategies This item can be scored from official records. Response Ask the defendant about court proceedings for criminal from defendants should be verified by official records. cases. How many times have you been arrested? Determine if any criminal proceedings have been missed by the defendant. How many times have you had to appear in court? Attempt to verify self-reported FTAs. Have you ever missed any of those court proceedings? Why? Where charges filed?

Scoring Rules/Issues:

1. Count and code the number of times the defendant has failed to appear for a criminal proceeding.

2. Do not include FTAs for traffic summons, only count FTAs for misdemeanor and felony charges.

3. No prior FTAs is given zero (0) points, 1 prior FTA is given one (1) point, and 2 or more prior FTAs is given two (2) points. FTA scoring does not require a conviction. #1.3 Domain: Criminal History

Rate and Item #1.3: Pending Felonies and Misdemeanors score

Purpose of Item: Pending charges might increase a defendants flight risk and/or be indicative of a person that has difficulties while under criminal justice system supervision.

Sample Questions Interviewing Strategies Do you have any criminal cases besides this one that Make sure to gather information on non-federal charges are not closed? as well as pending federal charges. This should not How many are for felony charges? include the instant offense. How many are for misdemeanor charges? What were you charged with? Are you on any type of supervision for those cases?

Scoring Rules/Issues:

1. No pending cases is given zero (0) points, one or more pending cases is given one (1) point.

2. A probation violation is not be counted as a pending matter.

3. When the current underlying federal charge, is the same as the state/local offense, do not count the state/local offense as a pending matter.

4. If a defendant is convicted it is not considered a pending matter.

5. For traffic violations, only count those at a misdemeanor or higher offense level. #1.4 Domain: Criminal History

Item #1.4: Current offense type Rate and score

Purpose of Item: The type of offense was found to have a statistically significant relationship with outcomes of interest during the pretrial services stage.

Sample Questions Interviewing Strategies This information should be taken directly from official N/A records.

Scoring Rules/Issues:

1. If the most serious charge is for a drug offense, firearms offense, or immigration charge assess one (1) point.

2. Theft/fraud, violent, and other types of offenses are assessed zero (0) points.

3. This item is based on the current federal charges only.

Theft Drug Firearms Immigration Violent Other Auto theft Cocaine Firearms Immigration Assault Absconded Burglary Crack Homicide Drunkenness Counterfeiting Heroin Kidnapping DUI Embezzlement Marijuana Rape- force DWI Forgery Methamphetamine Robbery Escape Fraud Opiate Sex offense Federal Statutes Larceny/theft Other Drug Simple assault FTA Mail fraud Gambling Mail theft General violation Petty theft Income tax Racketeering Liquor (tax) Transportation of Non-payment of financial stolen property penalties Misdemeanor Misprison of a felony Miscellaneous Other minor violation offense Perjury Technical, violation of supervision conditions Traffic violation Cyber crime (non- sexual) #1.5 Domain: Criminal History

Item #1.5: Offense class Rate and score

Purpose of Item: The class of offense was found to have a statistically significant relationship with outcomes of interest during the pretrial services stage.

Sample Questions Interviewing Strategies This information should be taken directly from official N/A records.

Scoring Rules/Issues:

1. Base this item on the current federal charges only.

2. If there is more than one pending federal charge consider the most serious when scoring this item.

3. Those defendant’s charged with a felony are assessed one (1) point. Those charged with a misdemeanor are assessed zero (0) points. #1.6 Domain: Criminal History

Rate and Item #1.6: Age at interview score

Purpose of Item: The relationship between age and criminal behavior is well documented. This item assesses points based on the defendant’s age at the time of the interview or when the information to score the assessment is collected.

Sample Questions Interviewing Strategies This information should be taken directly from official N/A records.

Scoring Rules/Issues:

1. Calculate the defendant’s age on the date of interview or when the information to score this assessment is collected.

2. Calculate age in years. If the defendant is in between years begin rounding up at .5 years. So if a defendant is 22.50 years old he would be considered 23 years old and would be assigned two points for this item.

1. If the defendant is 26 or younger then two (2) points are assessed, one (1) point is assessed for defendants age 27 to 46, and zero (0) points are assessed for defendants 47 or older. Other Domain

Employment Education Residence Substance Abuse Citizenship #2.1 Domain: Other-Education

Item #2.1: Highest Education Rate and score

Purpose of Item: Overall academic achievement is related to pretrial outcomes.

Sample Questions Interviewing Strategies Tell me about your education experiences. Establish overall academic achievement. Tell me about any additional job training you’ve Explore defendant’s motivation for terminating school completed. when they did. How far did you go in school? What was the last grade you finished? Tell me about the circumstances that contributed to you not finishing school?

Scoring Rules/Issues:

1. If the defendant has completed less than a high-school degree assess two (2) points.

2. If the defendant has a GED assess two (2) points.

3. If the defendant has a GED plus a further degree, some college, or vocational training certificates assess one (1) point.

4. If the defendant has a high school degree, vocational training , or some college assess one (1) point.

5. If the defendant has a college degree do not assess any points. #2.2 Domain: Other-Employment

Item #2.2: Employment Status Rate and score

Purpose of Item: Employment is an important activity for a number of reasons. Not only does it provide us with legitimate means to meet financial needs, it exposes us to prosocial others, provides us rewards for engaging in prosocial behavior, and occupies free time in a prosocial and structured way.

Sample Questions Interviewing Strategies Are you currently working? Explore type of job and number of hours per week. If not employed, check if defendant is a full time student. If no: How do you support yourself while unemployed? Seek information about the quality of the defendant’s What was your reason for leaving your last job? performance, ability to get along with others in frequent If yes: Tell me where you are working. contact or in positions of authority. How long have you worked there? Tell me about the type of work you do. Where did you work before that?

Scoring Rules/Issues:

1. Defendant is considered unemployed (score 1 point) if s/he is capable of working but is not working even if he or she is otherwise productively occupied.

2. Students and homemakers are not considered employed or unemployed but are still assessed one (1) point.

3. Defendants that are retired but are capable of employment are assessed one (1) point based on the amount of unstructured free time the defendant has in order to commit an offense.

4. Defendants that are completely disabled and receiving benefits are assessed zero (0) points.

5. Full time and part time (20 hours per week or more) are considered employed and are assessed zero (0) points.

6. Defendants that receive public assistance and are working are considered employed and are assessed zero (0) points.

7. Please circle the item on the scoring sheet. #2.3 Domain: Other-Residence

Item #2.3: Residence Rate and score

Purpose of Item: Lack of ties to the area, as measured by residential attachment, might relate to risk of FTA.

Sample Questions Interviewing Strategies Where do you live now? Determine if the defendant either owns or is purchasing a home. Do you own that home, condo, etc.? This applies to current circumstances and not plans to purchase or buy.

Scoring Rules/Issues:

1. If the defendant does not own his/her home and is not purchasing it then assess one (1) point. #2.4 Domain: Substance Abuse Problems

Item #2.4: Current Substance Abuse Problems Rate and score

Purpose of Item: Substance misuse facilitates or instigates criminal behavior. Drug use has been shown to increase risk for pretrial failure.

Sample Questions Interviewing Strategies Has your use caused disruptions at work, school, or Be sure to keep conversation and questions current. home? For defendants that have been in detention assess Have you ever been in treatment? Treatment Do you use in physically hazardous situations: driving Triggers in community setting a motor vehicle; operating machinery; caring for children? Ability to remain substance free Do you have legal problems related to use? Do you continue to use despite social/interpersonal problems related to use?

Scoring Rules/Issues:

1. Score based on last 12 months. Consider the responses to items in sample questions.

2. If incarcerated less than 2 years evaluate for last 12 months in the community.

3. If incarcerated more than 2 years evaluate over past year in the institution but consider treatment and ability to deal with triggers and stressors. The defendant should be able to identify, in a concrete way, triggers and the skills they will use to cope with them. #2.5 Domain: Other – Citizenship

Item #2.5: Citizenship Rate and score

Purpose of Item: Being a legal or illegal alien might be associated with ties to a foreign country and therefore an increase in FTA risk.

Sample Questions Interviewing Strategies What is your citizenship status? Determine if the defendant is a U.S. Citizen, legal alien, or illegal alien.

Scoring Rules/Issues:

1. If the defendant is a legal or illegal alien assess one (1) point.

2. If the defendant is a U.S. or Naturalized citizen assess zero (0) points. Missing Items What to do if there are missing items?

Prorated Scoring: The PTRA score can be prorated based on the table below. Statistical analysis indicated that the PTRA score was still predictive when 4 or fewer scored items were missing. The prorated score is based on a calculation involving the data on scored/available items. This process is more accurate as assuming missing items are 0 potentially underestimates risk and assuming missing items potentially overestimates risk.

Officers should make every attempt to gather information on all items. If criminal history information is not available this instrument cannot be scored.

Number of Missing Items Raw Score 1 2 3 4 00 0 0 0 11 1 1 2 22 2 3 3 33 4 4 5 44 5 6 6 56 6 7 8 67 7 8 10 7 8 8 10 11+ 8 9 10 11+ 11+ 9 10 11+ 11+ 11+ 10 11+ 11+ 11+ 11+

To use the above chart find the raw score that you obtained in the left most column. Read along that row until you get to the column that indicates the number of items that are missing from the assessment. The number in the cell where the row and column intersect is the prorated score.

For example, an assessment with a raw score of 4 with 2 items missing would equal a prorated score of 5. FEDERAL PRETRIAL RISK ASSESSMENT INSTRUMENT (PTRA)

DEFENDANT’S NAME: DATE OF ASSESSMENT:

OFFICER: PACTS #: DISTRICT:

1.0 CRIMINAL HISTORY & CURRENT OFFENSE 1.1. NUMBER OF FELONY CONVICTIONS

0=NONE 1=ONE TO FOUR 2=FIVE OR MORE

1.2. PRIOR FTAs

0=NONE 1=ONE 2=TWO OR MORE

1.3. PENDING FELONIES OR MISDEMEANORS

0= NONE 1=ONE OR MORE

1.4. CURRENT OFFENSE TYPE

0= THEFT/FRAUD, VIOLENT, OTHER 1=DRUG, FIREARMS, OR IMMIGRATION

1.5. OFFENSE CLASS

0=MISDEMEANOR 1=FELONY

1.6. AGE AT INTERVIEW

0= 47 OR ABOVE 1=27 TO 46 2=26 OR YOUNGER

TOTAL CRIMINAL HISTORY AND CURRENT OFFENSE

Version 3.0 1 March 18, 2013 2.0 OTHER FACTORS: 2.1 HIGHEST EDUCATION

0=COLLEGE DEGREE 1=HIGH SCHOOL DEGREE, VOCATIONAL, SOME COLLEGE 2=LESS THAN HIGH SCHOOL OR GED

2.2 EMPLOYMENT STATUS

CIRCLE APPROPRIATE ITEM BELOW AND RECORD SCORE IN BOX 0=EMPLOYED FULL TIME 0=EMPLOYED PART TIME 0=DISABLED AND RECEIVING BENEFITS 1=STUDENT/HOMEMAKER 1=UNEMPLOYED 1=RETIRED, ABLE TO WORK

2.3 RESIDENCE

0=OWN/PURCHASING 1=RENT, NO CONTRIBUTION, OTHER, NO PLACE TO LIVE

2.4 CURRENT DRUG PROBLEMS

1=YES 0=NO

2.5 CITIZENSHIP STATUS

0= US CITIZEN 1=LEGAL OR ILLEGAL ALIEN

TOTAL OTHER FACTORS

TOTAL SCORE [ITEMS 1.1 – 2.5]

Likelihood of outcomes based on event occurring during pretrial period.

Risk Category N % Risk Score FTA NCA FTA/NCA TV FTA/NCA/TV Category 1 52,677 29 0-4 1% 1% 2% 1% 3% Category 2 52,653 29 5-6 3% 3% 5% 4% 9% Category 3 49,920 27 7-8 4% 5% 10% 9% 18% Category 4 21,779 12 9-10 6% 7% 15% 15% 28% Category 5 4,710 3 11+ 6% 10% 20% 19% 35%

Version 3.0 2 March 27, 2013

First Step Act Risk and Needs Assessment FAQ

Background: The First Step Act, signed into law on December 21, 2018, required the Department of Justice (DOJ) to create a new risk and needs assessment tool. The tool is to be used within the Bureau of Prisons (BOP) to periodically assess prisoners’ risk of recidivism and categorize them as having a minimum, low, medium, or high risk of recidivism. Their assessment outcome will determine the rehabilitative programming that individuals are assigned, as well as the incentives they will be offered to complete programs. On July 20, 2019, the DOJ released its tool, named the Prisoner Assessment Tool Targeting Estimated Risk and Needs (PATTERN).

Q: What does the PATTERN tool measure?

A: The PATTERN tool determines two separate categories. The first is the risk of general recidivism, defined as a new arrest or return to BOP custody. The second is the risk of violent recidivism, defined as a new arrest or return to BOP custody due to a violent offense. Each individual is given a score for both general and violent recidivism. These scores help determine whether an individual is at a minimum, low, medium, or high risk of recidivism.

If an individual is found to be at minimum risk of both general and violent recidivism, he or she is classified as having an overall minimum risk of recidivism. An individual who is at a lower than medium risk in both general and violent recidivism (but not minimum in both) is determined to be at a low risk of recidivism overall. Those who receive a high risk score in either general or violent recidivism are given a designation of high risk of recidivism. Finally, those who are found to be at neither minimum, low, or high risk are classified as at medium risk.

Q: How does the PATTERN tool determine an individual’s risk score?

A: The PATTERN tool assesses 17 individual factors. Six of these factors are static, meaning they do not change over time -- for example, “age at first arrest.” The remaining 11 factors are dynamic, meaning they can change over the course of incarceration. The factors are as follows:

Static: Age at first arrest; Instant offense (violent or nonviolent); Sex offender; Criminal history score; History of violence; and Voluntary surrender Dynamic: Age at time of assessment; Infraction convictions while incarcerated (any);

Infraction convictions while incarcerated (serious and violent); Number of programs completed (any); Number of technical or vocational courses; UNICOR employment; Drug treatment while incarcerated; Non-compliance with financial restitution; History of escapes; and Education (GED or HS Degree)

Q: How is each factor weighted and how are points tallied?

A: Each factor is weighted differently, with some factors raising an individual’s scores and others decreasing an individual’s score. A full chart detailing the point totals for each factor can be found at the end of this document.

Q: How often are individuals assessed?

A: Each individual incarcerated in a BOP facility will be assessed no less than once a year. Per the First Step Act, an individual “determined to be at a medium or high risk of recidivating and who has less than 5 years until his or her projected release date shall receive more frequent risk reassessments.”

Q: Does an individual’s PATTERN score determine where an individual is incarcerated?

A: Not necessarily. A person’s security classification is determined by a separate assessment tool called the Bureau Risk Assessment Verification and Observation (BRAVO) tool. However, the First Step Act states that an individual’s PATTERN score shall help the BOP determine program groupings and housing assignments.

Q: What incentives are available for programming?

A: Eligible individuals can receive up to 15 earned time credits per month (depending on their risk level), which can be redeemed for added time in pre-release custody (halfway house or home confinement). You can read more about eligibility here. Those ineligible for earned time credits can receive alternative incentives, such as an additional 30 minutes per day of phone privileges and more options at the commissary.

Q: When can I start receiving earned time credits?

A: The First Step Act allows for people to start receiving earned time credits now. However, earned time credits require the presence of evidence-based recidivism reduction programming. As of now, it is unclear how many programs have been designated as evidence-based recidivism reduction programming and when more will be put in to begin the accumulation of earned time credits. The First Step Act includes a phase-in period of two years from the date that the PATTERN tool was released (July 20, 2019) to provide evidence-based recidivism reduction programming and productive activity to everyone within a BOP facility.

Q: Is the PATTERN tool final?

A: Not necessarily. The tool is currently under a 45-day review period after which the DOJ will solicit comments from the public. Additionally, the bill allows the Attorney General to revise the PATTERN tool annually to address issues in the tool including racial disparities.

PATTERN Point Totals

Measure Male General Female General Male Violent Female Violent Recidivism Recidivism Recidivism Recidivism Age at Less than 18: 12 Less than 18: 15 Less than 18: 6 Less than 18: 3 conviction 18 – less than 25: 18 – less than 25: 10 18 – less than 25: 4 18 – less than 25: 2 8 25 – less than 35: 5 25 – less than 35: 2 25 – less than 35: 1 25 – less than 35: 35 or older: 0 35 or older: 0 35 or older: 0 4 35 or older: 0 Age at Less than 18 – 25: Less than 18 – 25: Less than 18 – 25: Less than 18 – 25: assessment 30 15 15 5 26 – 29: 24 26 – 29: 12 26 – 29: 12 26 – 29: 4 30 – 40: 18 30 – 40: 9 30 – 40: 9 30 – 40: 3 41 – 50 : 12 41 – 50 : 6 41 – 50 : 6 41 – 50 : 2 51 – 60: 6 51 – 60: 3 51 – 60: 3 51 – 60: 1 60 +: 0 60 +: 0 60 +: 0 60 +: 0 Infraction More than 2: 9 More than 2: 6 More than 2: 6 More than 2: 6 convictions 2 convictions: 6 2 convictions: 4 2 convictions: 4 2 convictions: 4 (any) 1 conviction: 3 1 conviction: 2 1 conviction: 2 1 conviction: 2 0 convictions: 0 0 convictions: 0 0 convictions: 0 0 convictions: 0 Infraction More than 2: 6 More than 2: 6 More than 2: 6 More than 2: 6 convictions 2 convictions: 4 2 convictions: 4 2 convictions: 4 2 convictions: 4 (violent and 1 conviction: 2 1 conviction: 2 1 conviction: 2 1 conviction: 2 serious) 0 convictions: 0 0 convictions: 0 0 convictions: 0 0 convictions: 0 Number of More than 10: -12 More than 10: -8 More than 10: -4 More than 10: -4 programs 4 - 10: -9 4 - 10: -6 4 - 10: -3 4 - 10: -3 completed 1 – 3: -6 1 – 3: -4 1 – 3: -2 1 – 3: -2 (any) 1: -3 1: -2 1: -1 1: -1 0: 0 0: 0 0: 0 0: 0 Number of More than one: 0 More than one: 0 technical or 1 course: -1 1 course: -2 vocational 0 courses: -2 0 courses: -4 courses Federal Yes: -1 Yes: -1 industry No: 0 No: 0 employment (UNICOR) Drug No need indicated: No need indicated: No need indicated: No need indicated: treatment -6 -9 -3 -3

while incarcerated Completed RDAP: Completed RDAP: - Completed RDAP: Completed RDAP: -4 6 -2 -2

Completed drug Completed drug Completed drug Completed drug treatment during treatment during treatment during treatment during incarceration: -2 incarceration: -3 incarceration: -1 incarceration: -1

Need indicated but Need indicated but Need indicated but Need indicated but no treatment no treatment during no treatment during no treatment during during incarceration: 0 incarceration: 0 incarceration: 0 incarceration: 0

Drug No: 0 No: 0 No: 0 education Yes: -1 Yes: -1 Yes: -1 while incarcerated Non- No: 0 compliance Yes: 3 with financial responsibility Instant No: 0 No: 0 No: 0 offense Yes: 4 Yes: 5 Yes: 3 violent Sex offender No: 0 No: 0 Yes: 1 Yes: 1 Criminal 0 – 1 points: 0 0 – 1 points: 0 0 – 1 points: 0 0 – 1 points: 0 history 2 – 3 points: 6 2 – 3 points: 6 2 – 3 points: 2 2 – 3 points: 6 points 4 – 6 points: 12 4 – 6 points: 12 4 – 6 points: 4 4 – 6 points: 12 7 – 9 points: 18 7 – 9 points: 18 7 – 9 points: 6 7 – 9 points: 18 10 – 12 points: 24 10 – 12 points: 24 10 – 12 points: 8 10 – 12 points: 24 13+ points: 30 13+ points: 30 13+ points: 10 13+ points: 30 History of None: 0 None: 0 None: 0 None: 0 violence >10 years minor: 1 >10 years minor: 1 >10 years minor: 1 >10 years minor: 1 >15 years serious: >15 years serious: 2 >15 years serious: >15 years serious: 2 5-10 years minor: 3 2 2 5-10 years minor: 10-15 years serious: 5-10 years minor: 3 5-10 years minor: 3 3 4 10-15 years 10-15 years 10-15 years <5 years minor: 5 serious: 4 serious: 4 serious: 4 5-10 years serious: 6 <5 years minor: 5 <5 years minor: 5 <5 years minor: 5 <5 years serious: 7 5-10 years serious: 5-10 years serious: 5-10 years serious: 6 6 6 <5 years serious: 7 <5 years serious: 7 <5 years serious: 7

History of None: 0 None: 0 None: 0 None: 0 escapes >10 years minor: 2 >10 years minor: 2 >10 years minor: 1 >10 years minor: 1 5-10 years minor: 5-10 years minor: 4 5-10 years minor: 2 5-10 years minor: 2 4 <5 years minor or <5 years minor or <5 years minor or <5 years minor or any serious: 6 any serious: 3 any serious: 3 any serious: 6 Voluntary Yes: -12 Yes: -9 Yes: -1 Yes: -2 surrender No: 0 No: 0 No: 0 No: 0

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

Co-Chairs

David Patton Executive Director Federal Defenders of New York

Jon Sands Federal Defender District of Arizona

September 13, 2019

David B. Muhlhausen, Ph.D. Director National Institute of Justice Office of Justice Programs Department of Justice 810 7th Street NW Washington, DC 20531 Re: DOJ First Step Act Listening Session on PATTERN Dear Dr. Muhlhausen: Thank you for inviting comment from the Federal Public and Community Defenders regarding the Department of Justice’s (DOJ) development of the Prisoner Assessment Tool Targeting Estimated Risk and Needs (PATTERN) as part of its obligations under the First Step Act (FSA). The Federal Public and Community Defenders represent the vast majority of defendants in 91 of the 94 federal judicial districts nationwide, and we welcome the opportunity to provide our views. PATTERN will directly affect how much time many of our clients spend in prison. This makes it a high-stakes tool, and means testing for accuracy and bias is crucial. Indeed, Congress understood the stakes and called for transparency throughout the FSA, including a mandate that the risk and needs assessment system be “developed and released publicly.”1 Congress also repeatedly required that the system be monitored for bias.2 The limited information released by the DOJ in its July 19, 2019

1 First Step Act of 2018 (FSA), Pub. L. 115-391, Title I, § 101(a) (Dec. 21, 2018) (codified at 18 U.S.C. § 3632(a)). 2 See, e.g., FSA at Title I, § 103 (requiring the Comptroller General to conduct an audit of the use of the risk and needs assessment system every two years, which must include an analysis of “[t]he rates of recidivism among similarly classified prisoners to identify any unwarranted disparities, including disparities among similarly classified prisoners of different demographic groups, in such rates.”); FSA at Title I, § 107(g) (requiring the Independent Review Committee to submit to Congress a report addressing the demographic percentages of inmates ineligible to receive and apply time credits, including by age, race, and sex); FSA at Title VI, § 610(a)(26) (requiring the Director of the Bureau of Justice Statistics to annually submit to Congress statistics on “[t]he breakdown of

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

minimum.”6 These risk categories determine the number of credits an individual may earn by participating in programs and productive activities, and also eligibility to attribute those credits toward supervised release or prerelease custody.7 In other words, the risk categories will directly affect how much time many individuals spend in prison. The development of PATTERN, as with all risk assessment tools, necessarily relies on both empirical research and moral choices.8 Based on the DOJ Report, we have concerns, but even more questions, in both areas. Additional information is needed to assess many important issues including: PATTERN’s accuracy; its scoring mechanisms; its fairness across age, gender, race and ethnicity; how much it will exacerbate racial disparity in the federal prison population; its impact on privacy interests; and whether it is consistent with the congressional mandate to “ensure” that “all prisoners at each risk level have a meaningful opportunity to reduce their classification during the period of incarceration.”9 A. Transparency & Accountability: Development, Validation and Bias Testing Transparency in the methods for developing, validating and bias testing PATTERN is vital. Full transparency is a primary way (along with accountability and auditability) to create and justify confidence by stakeholders and the public. Indeed, across risk assessments in criminal justice, the secrecy that permeates black box instruments causes significant concerns about how reasonable they are in practice. 1. Dataset Full transparency requires DOJ to release the same dataset used by Grant Duwe, Ph.D., and Zachary Hamilton, Ph.D., to create PATTERN.10 This is consistent not only with the transparency directives in the FSA,11 but also with the advice of leading organizations such as the National Center for State Courts which recommends that independent evaluators determine whether their independent “research findings support or contradict conclusions drawn by the instrument developers.”12

6 DOJ Report at 50. 7 See supra note 4.

8 See Michael Tonry, Legal and Ethical Issues in the Prediction of Recidivism, 26 FED. SENT’G REP. 167, 167 (2014). 9 FSA at Title I § 101(a) (codified at 18 U.S.C. § 3632(a)(5)(A)). 10 See DOJ Report at 42-43. 11 See supra notes 1 & 2. 12 Pamela M. Casey et al., National Center for State Courts, Offender Risk & Needs Assessment Instruments: A Primer for Courts 19 (2014) (stressing that third party audits are valued because “it is always helpful to know whether existing research descriptions about the reliability, validity, and fairness of a tool have been replicated by others.” Any “decisions based on a [risk and needs] tool which grossly

3

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

 Access to the full dataset would permit independent researchers to assess validity and algorithmic fairness using a variety of measures and calculations.13  Despite recognizing the existence of multiple measures and calculations concerning validity,14 the DOJ Report focused mostly on the Area Under the Curve (AUC). The AUC, however, has limited utility as a measure of relative risk.15 Further, when tools are assessed using multiple measures of predictive validity (e.g., correlations, calibration metrics, Somers’ D), results for the same tools vary.16  Access to the dataset would allow interested parties to complete 2 x 2 contingency tables (number of false negatives, false positives, true negatives, true positives) for general and violent recidivism at each cutoff (minimum to low; low to medium; medium to high) by age, gender and race/ethnicity groupings. These contingency tables would provide important information on the degree to which the categorizations created by the cut-points capture true positives and true negatives (in addition to the associated recidivism rates that the DOJ Report included).17  The dataset would allow independent researchers to compute the algorithmic fairness measures called balance for the positive and negative classes by calculating average scores by recidivists versus non-recidivists across each age, gender, and racial/ethnic groupings.

misclassifies the risk levels of offenders may not simply fail to improve outcomes; they may actually do harm to the offender.” As a result, “[i]nstrument validation is not only important to ensure that decision making is informed by data, but to establish stakeholder confidence.”); see also Nathan James, CONG. RESEARCH SERV., Risk and Needs Assessment in the Federal Prison System 11 (July 10, 2018) (Congressional Research Service report concerning risk assessment in the federal prison system positively citing the recommendation of the Council of State Governments that independent third parties should be permitted to validate the tool to assess accuracy by race and gender). 13 For example, release of the full dataset would allow independent researchers to calculate relevant measures such as false positive rates, false negative rates, positive predictive value, negative predictive value, equal calibration, balance for the positive class, balance for the negative class, diagnostic odds ratios, correlations, treatment equality, and demographic parity. The importance of these various measures are discussed and calculated regarding other risk tools in sources cited in the DOJ Report. See DOJ Report at 38-39 nn.20-24. 14 See DOJ Report at 28 (discussing multiple algorithmic measures of racial bias).

15 See Melissa Hamilton, Debating Algorithmic Fairness, 52 UC DAVIS L. REV. ONLINE 261 (2019); Jay P. Singh, Predictive Validity Performance Indicators in Violent Risk Assessment, 31 BEHAV. SCI. & L. 8, 16- 18 (2013). 16 See generally Sarah L. Desmarais et al., Performance of Recidivism Risk Assessment Instruments in U.S. Correctional Settings, 13 PSYCHOL. SCI. 206 (2016).

17 See Richard Berk et al., Fairness in Criminal Justice Settings: The State of the Art, SOC. METHODS & RES. (forthcoming 2019).

4

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

 Access to the dataset would allow interested parties to complete the bivariate correlations between predictors and risk outcomes which the DOJ Report indicates were completed by the developers, but are not reported.18  Access to the dataset would permit independent researchers to test for bias, including comparing each racial/ethnic grouping. As discussed above, the DOJ Report indicates the need for additional inquiry regarding racial disparity and other biases.19 First, DOJ data show that black males are far less likely than white males to fall into the two lower risk categories that receive the full benefits of earned time credit and eligibility to use those credits for supervised release or prerelease custody.20 In addition, the relative rate index (RRI) of 1.54 reported in Table 8, but not discussed in the text, comparing white to non-white males, also shows PATTERN has a racially disparate impact.21 More information is needed, including data on Native-Americans and Asians, which is not included in the DOJ Report.22 Access to the data would allow independent researchers to isolate individual factors and determine which contributed to any disparate impact. For example, research on the Post- Conviction Risk Assessment (PCRA) found that “Black offenders tend to obtain higher scores on the PCRA than do White offenders” and that “most (66 percent) of the racial difference in the PCRA scores is attributable to criminal history.”23 Because PATTERN plays a role in determining how much time a person spends in prison, a similar finding of racial difference with PATTERN could “exacerbate racial disparities in prison.”24 Identifying

18 See DOJ Report at 65 n.17. 19 See supra note 3 and accompanying text. 20 See id. 21 See DOJ Report at 62, tbl. 8 22 See William Feyerherm et al., Identification and Monitoring in Dept. of Just. Office of Juvenile Justice and Delinquency Prevention, Disproportionate Minority Contact Technical Assistance Manual, 1-1, 1-2, 3 (4th ed. 2009) (recommending the RRI be calculated separately for each minority group that comprises at least 1% of the total population scored); BOP Statistics: Inmate Race, Federal Bureau of Prisons, https://www.bop.gov/about/statistics/statistics_inmate_race.jsp. 23 Jennifer L. Skeem & Christopher T. Lowenkamp, Risk, Race, and Recidivism: Predictive Bias and Disparate Impact, 54 CRIMINOLOGY 680, 700 (2016). 24 Id. at 705; see also id. at 703, 705 (explaining that as assessment of whether a tool produces “inequitable consequences” depends on “what decision they inform” and that “some applications of instruments might exacerbate racial disparities in incarceration”).

5

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

which factors generate the disparate impact would open an opportunity to brainstorm with people across disciplines about how to ameliorate such impact.25  Access to the dataset would allow independent researchers to evaluate test bias employing the hierarchical modeling method considered best practice in the educational testing literature as referred to, but not reported in, the DOJ Report.26  Access to the dataset would allow interested parties to determine whether there are mistakes in the DOJ Report regarding the recidivism rates by ordinal ranking. Table 5 reports general recidivism rates of 9% (minimum), 31% (low), 51% (medium), and 73% (high). Table 9 reports identical recidivism rates in each of these categories for white males,27 which might either be coincidental or a mistake in reporting.  Similarly, access to the dataset would allow independent researchers to determine the correct AUC for violent recidivism as defined by the developers. The DOJ Report is inconsistent, reporting in one table the AUCs for violent recidivism as .78 for males and .77 for females.28 In another table, they are reversed, indicating AUCs of .77 for males and .78 for females.29 These differences are not significant in terms of numbers, but flaws such as these (reasonable considering the tight time frame which the PATTERN team faced) call for independent audits to check for other potential errors. 2. Eligibility Additional information is needed regarding the assumptions behind the assertion that “99% of offenders have the ability to become eligible for early release through the accumulation of earned time credits even though they may not be eligible immediately upon admission to prison. That is . . . nearly all have the ability to reduce their risk score to the low category.”30 Without more information it is impossible to test this assertion, but it appears suspect in light of: the percentage of the developmental sample that fell in the medium and high categories (52% of all and 58% of men);31 that high scores are likely driven by static factors such as age of first conviction and criminal history

25 See Richard Berk, Accuracy and Fairness for Juvenile Justice Risks Assessments, 16 J. EMPIRICAL LEG. STUD. 175, 184 (2019). 26 See DOJ Report at 29 (referring implicitly to what is known as the Cleary method). 27 See DOJ Report at 59, tbl. 5 & 62, tbl. 9. 28 See DOJ Report at 57, tbl. 3. 29 See DOJ Report at 60, tbl. 7. 30 DOJ Report at 57-58. 31 See DOJ Report at 59, tbls. 5 & 6.

6

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

score; and the limited number of programs/productive activities currently available (with correspondingly far fewer points allocated by the tool).32 3. Developmental Sample Additional information is needed regarding the developmental sample.

 Additional information is needed regarding the attributes of the developmental sample. The DOJ Report includes apparently contradictory, or at least confusing information, about the composition of the developmental sample. o The DOJ Report indicates the BOP provided its contractors, Duwe and Hamilton, with a dataset used to “develop and validate”33 PATTERN containing 278,940 BOP inmates released from BOP facilities between 2009 and 2015,” which included “only those inmates released to the community,” and excluded “released inmates who died” and those “scheduled for deportation.”34 DOJ also reports that developers relied on a smaller “eligible sample size” of 222,970, described as “those who were released from a BOP facility to a location in the United States and had received a BRAVO assessment,” which may mean that 55,970 individuals from the original dataset (20%) were excluded from what became the developmental sample because they had not been scored on BRAVO.35 More information is needed regarding the excluded individuals, including demographic characteristics, and reasons they may have been released but not scored on BRAVO. Such a reduction in the sample size could introduce sample bias. o It appears that the training sample contained individuals who were released in 2009- 2013, and the test (or validation) sample contained individuals who were released in 2014-2015.36 More information is needed about why the training and test samples were drawn from different years. Information is also needed regarding what consideration was given to the possibility that there were risk-relevant differences between the groups. For example, policy changes, such as the retroactive 2014 amendment to the drug guidelines, may have resulted in a different composition of

32 See Emily Tiry, Julie Samuels, How Can the First Step Act’s Risk Assessment Tool Lead to Early Release from Federal Prison?, Urban Wire, Crime and Justice (Sept. 5, 2019), https://www.urban.org/urban- wire/how-can-first-step-acts-risk-assessment-tool-lead-early-release-federal-prison. 33 DOJ Report at 43. 34 DOJ Report at 42-43. 35 DOJ Report at 46. 36 See DOJ Report at 49 & 50.

7

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

individuals released in 2015 than in prior years.37 It is important for stakeholders to understand whether the differentials in samples here also embed bias into the tool. o More information is needed regarding why the size of the developmental sample used in the DOJ Report is significantly lower than the number of federal prisoners released in those years, as indicated from another official database. An online tool for calculating the number of released prisoners offered by the Bureau of Justice Statistics indicates that 385,405 individuals were released from federal correctional institutions from 2009-2015.38 Yet, the DOJ Report specifies that its developmental sample includes only 278,940 released prisoners.39 Specifically, it is important to know whether the reported exclusions for death and deportation40 account for the entire differential or whether there are additional explanations. Similarly, more information is needed about the size of the training and test groups. The DOJ Report indicated the training group as 66% of the total developmental sample, with the test group as 33% of the sample, but also described the training group as including 5 years of releases, with the test sample including only 2 years of releases. Information is needed to explain this apparent discrepancy.41  Additional information is needed regarding the sample descriptive statistics (including recidivism rates). Table 1 provides data on the entire eligible developmental sample, but is also needed separately for each of the (a) training sample and (b) test sample.42  Additional information is needed regarding the sample descriptive statistic on “BRAVO-R Initial: History of Escapes.” The total reported percentage is 86%, but no information is provided regarding whether this means there is 14% missing data on this factor, and if so, how missing data cases were scored.43  Information is needed regarding the inter-rater reliability scores for the evaluators concerning the development sample, both training and then test data. These statistics will provide information relevant to whether PATTERN can be scored consistently, as

37 See Remarks for Public Meeting of the U.S. Sentencing Comm’n, Washington, D.C., at 2 (Jan. 8, 2016) (Honorable Patti B. Saris, Chair) (recognizing that approximately 6,000 offenders were released on or about November 1, 2015 as a result of the 2014 amendment to the drug guidelines). 38 These were calculated using an online tool and narrowing to federal prisoners. See Bureau of Justice Statistics, Corrections Statistical Analysis Tool-Prisoners, https://www.bjs.gov/index.cfm?ty=nps. 39 See DOJ Report at 42. 40 See DOJ Report at 42-43. 41 See DOJ Report at 49-50. 42 See DOJ Report at 46-48, tbl. 1. 43 See DOJ Report at 48, tbl. 1.

8

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

recognized by the DOJ Report, but for some reason not reported.44 Low inter-rater reliability outcomes decrease the utility of a tool. 4. Weighting The DOJ Report indicates that PATTERN involves “analytically weighting assessment items,”45 but more information is needed on whether the weights are assigned solely through the points identified for each of the factors included in Table 2,46 or are somehow reweighted in an algorithm not discussed in the report. The DOJ Report provides so few details on weighting, it is unclear what type(s) of models were used (such as regressions) and/or whether any type of machine learning (supervised or unsupervised) was employed. If the former, more information is needed regarding whether and how step-wise procedures were used, data on intercorrelations, and if multicollinearity exists. If the algorithm was developed with any form of machine learning, this more “black box” method has different and profound implications on transparency of the developmental procedures. 5. Overrides The DOJ Report does not mention overrides. Information is needed regarding whether PATTERN allows for policy overrides and/or discretionary (also referred to as professional) overrides, and if so, whether there will be a supervisory approval process for discretionary overrides. Information is also needed as to whether any of the final scores in the development sample (training and/or testing) involved overrides of original scores and the reasons for such overrides. 6. Relevant Research Copies of two governmental papers cited in the DOJ Report, but not readily available to the public, must be made available. Specifically, documents detailing the BRAVO-R, from which “PATTERN builds,”47 and relevant RRI computations are cited as important to understanding PATTERN48 but are not readily available to the public. 7. Definitions & Scoring More information is needed regarding the definitions of key terms and rules for scoring.

 Recidivism. It appears that for purposes of developing and testing PATTERN, “general recidivism” is broadly defined to include “any arrest or return to BOP custody following release.”49 More information is needed to determine whether this is as (unduly) broad as it appears, and includes revocations for minor technical violations such as failure to timely

44 See DOJ Report at 27. 45 DOJ Report at 50. 46 See DOJ Report at 53-56. 47 DOJ Report at 44; 64 nn.8 & 9. 48 See DOJ Report at 66 n.25. 49 DOJ Report at 50.

9

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

report a change of residence, purportedly lying in response to queries from a probation officer, or failing to timely notify the probation officer of being questioned by police.50 Similarly, it appears that for purposes of developing and testing PATTERN, “violent recidivism” is defined as “violent arrests following release.”51 More information is needed here, as well, regarding what kinds of arrests are considered “violent.” A separate discussion in the DOJ Report regarding whether the instant offense was violent, appears to cite a definition of “violent recidivism.”52 More information is needed regarding whether this is also the intended definition of violent recidivism. If so, more information is needed about what is included in “other violent.”53 Defenders are concerned that revocations, arrests, and misdemeanor convictions are poor and biased proxies for the kind of serious re-offenses targeted by the recidivism-reduction programming at the core of the FSA. In addition, more information is needed regarding whether any mechanism was used to exclude pseudo-recidivism (prior offenses that were not detected and pursued—subject to arrest or return to prison as a result—until after the instant offense).  Age of First Arrest/Conviction. More information is needed regarding whether the first risk factor for purposes of developing, testing and implementing PATTERN is age of first arrest or age of first conviction. The DOJ Report contains contradictory information, referring to both arrest and conviction without explanation for the inconsistency.54 If looking to conviction, is the relevant age determined by the individual’s age on the date of the alleged conduct, date of arrest, or date of conviction? More information is also needed about what is being counted in the “under 18” category. It is unclear whether this factor sweeps in all juvenile adjudications (including status offenses), or is limited to convictions in adult court. Among our many concerns with this factor is the relative unreliability of juvenile

50 See, e.g., USSG §5D1.3(c)(4), (c)(5), (c)(9). 51 DOJ Report at 50. 52 DOJ Report at 46 n.16; 65 n.15. 53 DOJ Report at 65 n.15. 54 Compare DOJ Report at 46, tbl. 1 (age of first arrest) with DOJ Report at 45; 53, tbl. 2; 65, n.14 (age of first conviction).

10

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

adjudications55 and that “youth of color—and especially black youth—experience disproportionate court involvement.”56  Infractions. More information is needed regarding the infraction factors. First, what is meant by an “infraction,” a “conviction” for an infraction, and a “guilty finding” for purposes of these factors? It is unclear whether the infraction factors will count any and all disciplinary misconduct. Second, how are infractions scored? Would multiple acts during a single course of conduct be counted as one or more? Would multiple acts processed at the same time (whether a single course of conduct or not) be considered one or more? Third, what is the empirical basis for treating all 100 and 200 level offenses the same, such that refusing a Breathalyzer and possessing pot are scored the same as killing and taking hostages?57 Fourth, is there any limitation on the reach of this factor? For example, does it look only to infractions in the past year, all infractions while in prison for the instant offense (and whether serving the original sentence or a revocation sentence), all infractions while serving any federal sentence, or for any offense ever, regardless of jurisdiction? We have numerous concerns about counting infractions in any form, and particularly minor infractions, for the purposes of determining eligibility for earned time credits and release under the Act. First, there is minimal due process structure over BOP disciplinary actions. Second, likely varied and divergent infraction cultures and practices from one BOP facility to another would mean the likelihood of attracting an infraction may be due to luck of the draw on institutional assignment. In addition, we are concerned about ex post facto use of infractions to negatively score defendants on PATTERN when individuals had no notice such infractions would count against them for these purposes, particularly in light of the FSA provisions indicating past participation in programs will not be counted to positively score individuals.58  Programs & Technical/Vocational Courses. More information is needed on the types and descriptions of the programs and technical or vocational courses for which points were given for these two variables. For example, information is needed on the name of the programs/courses, the providers, the personnel involved, the number of hours required, the length of the programs/courses, the program/course goals, the definition of completion,

55 For example, the vast majority of states do not provide jury trials for juveniles, and “children routinely waive their right to counsel without first consulting with an attorney.” Nat’l Juvenile Defender Ctr. (NJDC), Defend Children: A Blueprint for Effective Juvenile Defender Services 10 (Nov. 2016); NJDC, Juvenile Right to Jury Trial Chart (last rev. July 17, 2014), http://njdc.info/wp- content/uploads/2014/01/Right-to-Jury-Trial-Chart-7-18-14-Final.pdf.

56 Katherine Hunt Federle, The Right to Redemption: Juvenile Dispositions and Sentences, 77 LA. L. REV. 47, 52 (Fall 2016). 57 See Dep’t of Justice, Bureau of Prisons, Inmate Discipline Program, Program Statement 5270.09, tbl. 1, (July 8, 2011). 58 See FSA at Title I, § 101(a) (codified at 18 U.S.C. § 3632(d)(4)(B)).

11

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

and the locations where the programs/courses were made available. Information is needed about why the direction of the points for the number of technical/vocational courses is the reverse of what might be expected. Specifically, information is needed on why the tool penalizes an individual for taking a technical/vocational course.59 In addition, information is needed on whether there is an error in the description of the technical/vocational factor when it references the number of courses “created” rather than “completed,” and if not, what is meant by courses “created.”  Drug Treatment and Drug Education. More information is needed regarding the difference between drug treatment and drug education for purposes of scoring the PATTERN. More information is also needed regarding how drug treatment “need” is determined and scored, including whether it is based on self-report. The DOJ Report suggests it is tied to the BRAVO drug/alcohol abuse indicator, but it is not clear what data informs this factor, particularly without access to the BRAVO-R document requested above.  Instant Offense Violent. More information is needed regarding what constitutes a violent offense. The DOJ Report is unclear on the scope of this factor. The discussion in the text of the DOJ Report points to endnote 16, though it appears the content of the note is actually included under endnote 15.60 But even this is not clear because, in contrast with the “instant” offense discussed in the text, endnote 15 defines “violent recidivism” and looks at the nature of the “arrest.”61 If this definition of violent recidivism is consistent with the definition of instant violent offense, more information is needed regarding whether an instant violent offense requires a conviction in the listed categories, and what is meant by the category of “other violent.”62 In addition, information is needed on the empirical basis for including this factor. It appears to be contrary to DOJ studies of national samples that show lower risk of general recidivism for individuals with an instant violent offense, compared with others.63 Is this factor essentially operating as a policy override for other purposes?  Sex Offender. Additional information is needed on how this factor is scored, including whether it is limited to convictions for sex offenses, or is broader and informed by arrests, self-report, hearsay, and whether it includes exonerated charges. As with other factors, additional information is also needed on whether there are any time limits on how recent the

59 See DOJ Report at 54, tbl. 2. 60 See DOJ Report at 46, n.16 & 65, n.15. The numbering of the Chapter Three endnotes is off, such that the content of the notes does not always match the text. It appears that the mismatch begins with endnote 14, which according to the text should have provided information on “non- compliance with fiscal responsibility” but instead discusses “Age at first conviction.” 61 See DOJ Report at 46, n.16 & 65, n.15. 62 See DOJ Report at 46, n.16 & 65, n.15. 63 See Mariel Alper & Matthew R. Durose, 2018 Update on Prisoner Recidivism: A 9-Year Follow-up Period (2006-2014) (2019) (Special Report, U.S. Dep’t of Just.).

12

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

conduct must be for it to count. In addition, information is needed regarding the empirical basis for including this factor. It appears contrary to DOJ studies of national samples that show lower risk of recidivism for individuals convicted of sex offenses than other types of offenses.64 Is this factor essentially operating as a policy override for other purposes?  Criminal History Score. Information is needed on whether this is a static figure based strictly on the U.S. Sentencing Commission guidelines’ criminal history score at the time of sentencing or whether it can increase at reassessment because of events between sentencing and reassessment. Further, can the criminal history score be reduced at reassessment pursuant to a time decay mechanism?  History of Violence. Information is needed regarding the definition of violence, and whether it requires a conviction for a violent crime. Specifically, which crimes are considered “violent” for purposes of this factor? If not limited to convictions for violent offenses, more information is needed regarding the sources of information that may be considered when assessing this factor, and whether it permits consideration of arrests, prison disciplinary records, hearsay, and/or self-reports. In addition, information is needed on whether there is any time limit for this factor, or some time decay mechanism, as would be supported by available research on desistence.  History of Escapes. Information is needed regarding the definition of escape, including, for example, whether it would include failure to appear in a pre-trial context, or walking away from a halfway house. Information is also needed regarding whether there is a time limit for inclusion of old escapes, or a time decay mechanism.  Education Score. Information is needed regarding the ordinal rankings for the education score for the violent recidivism tool.  Databases. Several factors rely on past criminal conduct. More information is needed regarding the databases that will be accessed to determine recidivism, and the known gaps and biases in such databases.  Missing Data. Information is needed regarding what adjustments were made for missing data, and the rate of missing data for each predictor. In addition, information is needed regarding the policy going forward when there is missing data in one of more factor in an individual case. For example, will information about missing data be communicated with the risk score and classification? 8. Double Counting More information is needed to determine the scope of double counting under PATTERN, and whether any consideration has been given about ways to ameliorate it.

64 See Matthew R. Durose et al., Recidivism of Prisoners Released in 30 States in 2005: Patterns from 2005 to 2010 (2014) (Special Report, U.S. Dep’t of Just.).

13

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

 Age. Young age will be counted twice for young first offenders, who will be young at time of first arrest/conviction,65 as well as at time of assessment.  Infractions. Information is needed on whether a single “infraction conviction” that is deemed “serious and violent” would count as both “any” and then again as “serious and violent” infraction. In addition, would an “infraction conviction” that resulted in a criminal conviction also count toward a criminal history score if criminal history is not static? And could an “infraction conviction” also result in points under the history of violence and/or “sex offender” factors?  History of Violence. Information is needed on whether a person with multiple violent priors receives multiple point scores in this single variable. For example, in the male general recidivism tool, if an individual had a minor violent offense < 5 years and a serious violent offense > 15 years, would the individual receive 5 points or 7?  Violence. Information is needed on whether the same violent offense can be counted multiple times, such as in the criminal history score, infraction convictions, instant offense violent, history of violence and/or sex offender.  Sex Offense. Information is needed on whether the same sex offense can be counted multiple times, such as in the criminal history score, infraction convictions, instant offense violent, history of violence and/or sex offender.  Criminal History. Information is needed on whether consideration was given to ameliorating the repeated counting of criminal history, first in the imposition of the sentence based on a guideline calculation that relies heavily on criminal history and then throughout PATTERN, including age of first arrest/conviction, sex offender, criminal history score and history of violence. We are concerned about the inclusion and weight (repeatedly) given to this factor for a number of reasons. Some concerns arise from the unique way in which the guidelines count criminal history, such as including all juvenile adjudications on par with adult convictions (with some difference in decay periods), and using sentence imposed rather than time served as a proxy for seriousness of the offense (affecting the number of points received).66 In addition, as mentioned above, research on other risk tools has shown racial differences in scores with black individuals obtaining higher scores than white individuals, where most of the difference “is attributable to criminal history.”67 Criminal history correlates with race because it reflects prior instances of racial disparity in the criminal justice system or disadvantage earlier in life. Criminal history is not just the product of participation in crime, but of biased practices throughout the criminal justice system. Blacks do not sell

65 See supra note 54 and related text regarding issue of whether the first predictor looks to age of first conviction or arrest. 66 See USSG §4A1.2(d), (e). 67 Jennifer L. Skeem & Christopher T. Lowenkamp, Risk, Race, and Recidivism: Predictive Bias and Disparate Impact, 54 CRIMINOLOGY 680, 700 (2016).

14

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

drugs or possess guns at a greater rate than Whites.68 Studies show that Blacks are stopped and frisked or searched at higher rates than Whites, but that Whites who are frisked or searched are found with contraband at higher rates than Blacks who are frisked or searched.69 And Blacks are arrested more than twice as often as Whites.70 Charging decisions and bail determinations further compound these racial disparities as individuals move through the criminal justice system.71 We urge the DOJ to open discussion to a multidisciplinary team on methods to ameliorate the overreliance upon, and negative impacts of, criminal history. 9. Protective & Promotive Factors Additional information is needed on whether there are any plans to incorporate additional protective and promotive factors in PATTERN. Currently, program/course participation and educational attainment appear to be the only proxies for protective factors included in PATTERN. Similarly, additional information is needed on whether there are plans to incorporate a desistance factor into PATTERN that would significantly adjust the risk rating according to the literature on the age-crime curve and the literature on cessation of offending.72 We urge DOJ to engage with a multidisciplinary team to consider incorporating more protective and promotive factors to better meet the goals of the FSA. 10. Policy Decisions Risk assessments are not simply math. Every risk assessment involves moral choices and tradeoffs. Some of our questions in this area are incorporated above, such as whether consideration has been given to ameliorating the effects of certain factors that are unacceptable regardless of predictive value. In addition, information is needed generally regarding the mechanisms in place to ensure that issues which have distinct policy implications will be resolved by appropriate personnel—ideally a

68 See Amy Baron-Evans & David Patton, A Response to Judge Pryor’s Proposal to “Fix” the Guidelines: A Cure Worse than the Disease, 29 FED. SENT’G. REP. 104, 112 (Dec. 1, 2016-Feb. 1, 2017). 69 See id. at 112-13 (collecting studies); see also Radley Balko, Op-Ed., There’s Overwhelming Evidence that the Criminal-Justice System is Racist. Here’s the Proof, WASH. POST, Updated Apr. 10, 2019 (collecting studies). 70 See Bureau of Justice Statistics, Arrest Data Analysis Tool, 2014 (most recent data available), https://www.bjs.gov/index.cfm?ty=datool&surl=/arrests/index.cfm#.

71 See supra note 69; see also USSC, Application and Impact of 21 U.S.C. § 851: Enhanced Penalties for Federal Drug Trafficking Offenders 7, 33-36, figs. 13-14 (2018).

72 See Cecelia Klingele, Measuring Change: From Rates of Recidivism to Markers of Desistance, 109 J. CRIM. L. & CRIMINOLOGY (forthcoming 2019), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3142405; Ralph C. Serin & Caleb D. Lloyd, Integration of the Risk Need, Responsivity (RNR) Model and Crime Desistance Perspective: Implications for Community Correctional Practice, 7 ADVANCING CORRECTIONS 37, 38 (2019).

15

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

multidisciplinary team that includes policymakers and stakeholders73—rather than solely data scientists. For example, the decisions on the cut-points, which necessarily impact fairness measures such as false positive rates and positive predictive values, appear to have been made by the researchers and based on arbitrary fractions or multiples of the recidivism rates.74 Yet, where those decisions affect moral and political outcomes with real-world consequences to individuals, they should instead be made by a multidisciplinary team that has the authority and direct interest in such consequences. Risk tool developers have a natural incentive to focus on overall accuracy. However, accuracy may need to yield to other important goals, such as differential validity, group fairness, and individual rights. Selecting the right tradeoff between these sometimes competing goals are more rightly within the power of policymakers and stakeholders. Here, it appears the cut-points were established somewhat arbitrarily without regard to such consequences as the false discovery rate and false omission rate (the reciprocals of positive predictive value and negative predictive value) and equal calibration, among other validity and fairness measures discussed above.75 Because PATTERN was developed to meet the obligations of the FSA, a preferable method for setting cut-points would be attuned to the goal of maximizing incentives for participation in rehabilitative programs and courses. Increasing the cut-point between low and medium would be more suitable to achieve this goal. Relatedly, information is needed regarding the process, and who was involved, in setting the rules governing the combined (final) RLC. The current rule dictates that the highest risk category from the general and violent scales will be used to set the final RLC. Different choices could have been made that would be more suitable to achieve the FSA’s goal of incentivizing and rewarding more individuals to complete programs and courses. For example, a person who scores low or minimum on one scale and medium on the other should have a final RLC of low. And a person who scores high risk on one scale, yet medium risk on another should be classified for purposes of the final RLC as medium. Additional information is also needed regarding the process for deciding on the definition of “recidivism.” This is a policy decision that requires identifying the scope of conduct that should be included, consistent with the purpose of the FSA to successfully reintegrate individuals in the community. For example, what was the process for deciding to include all revocations, including

73 See Partnership on AI, Report on Algorithmic Risk Assessment Tools in the U.S. Criminal Justice System 31 (2019), https://www.partnershiponai.org/wp-content/uploads/2019/04/Report-on-Algorithmic- Risk-Assessment-Tools.pdf (suggesting an oversight body including “legal, technical, and statistical experts, current and formerly incarcerated individuals, public defenders, public prosecutors, judges, and civil rights organizations”); Danielle Kehl et al., Algorithms in the Criminal Justice System 34 (2017), https://dash.harvard.edu/bitstream/handle/1/33746041/2017- 07_responsivecommunities_2.pdf?sequence=1&isAllowed=y. 74 See DOJ Report at 50. 75 See DOJ Report at 50-51.

16

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

technical violations, and for looking to arrests, despite the literature showing the serious racially disparate impact of looking to arrests, rather than convictions?76 In light of the FSA’s purpose, a more limited definition of recidivism focused on serious offending would be more appropriate than the broad definition used to develop PATTERN. B. Transparency & Accountability: Implementation Transparency and accountability are both mandated and essential in the implementation of PATTERN. While much remains unknown in this area, we already have several questions which warrant the attention of a multidisciplinary team as PATTERN is implemented. 1. Privacy/Confidentiality It appears that several of the factors in PATTERN, and the yet-to-come needs assessment, may require interviews and be based at least partially on self-reporting. This raises several questions and concerns. Additional information is needed on what protections will be in place to honor an individual’s right to be free of self-incrimination. More information is needed on what protections will be in place to prohibit the use of any interview admissions against an individual, either in a new prosecution or prison disciplinary proceeding. Information is also needed regarding how scores and information obtained in the scoring process will be maintained and confidentiality protected. And information is needed on the data retention policies for risk scores, needs assessments, and information obtained to complete the tools. 2. Challenges As discussed above, PATTERN scores and accompanying risk categories will directly affect how much time many individuals spend in prison. Information is needed on the procedures for contesting individual scores and category assignments. Risk assessment is unique enough that treating a challenge like any other grievance is not a sufficient process. Potential concerns include discovering factual errors, contesting judgment calls, challenging an override decision, and correcting a scoring miscalculation. To equip individuals to assess and challenge their PATTERN scores we expect individuals will be provided not only with their final PATTERN score and related risk category, but also scores on each of the individual factors, and information on the limitations of the scores, including the warnings set forth below. And individuals challenging their PATTERN score and category will need more. Indeed, much of the information individuals will need to challenge their scores tracks the information requested above regarding the development, validation and bias testing of PATTERN. In addition, among other information, individuals will need codebooks and scoring sheets, training materials, and inter-rater reliability scores for those scoring the tool. Additional information is needed regarding the plans to ensure adequate information and processes are provided to individuals challenging their PATTERN scores.

76 See Jennifer Eaglin, Constructing Recidivism Risk, 67 EMORY L.J. 59, 94 (2017).

17

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

3. Risk Communication Information is needed on the manner in which risk scores and categories will be reported both within and outside the BOP. Studies show that risk communication format matters in how decision- makers understand the results and can be manipulated.77 We are concerned that the scores and categories will not be communicated with sufficient context to make the scoring and results translatable to those who were not deeply involved in the development of the tool. To that end, we recommend reporting risk results as the ordinal bins plus that bin’s relevant observed (a) recidivism rate and (b) success rate (1-recidivism rate). The communication should also include the definition of recidivism to contextualize the meaning of the rates. In addition, we recommend including a set of warnings to ensure users of the scores and categories understand the tool’s limits.78 The following list includes ideas on the warnings we believe appropriate in light of our current understanding of PATTERN:

 PATTERN is based on group statistics and cannot assess an individual’s probability of reoffending;  (as relevant) PATTERN disproportionately judges minorities at higher risk than whites;  PATTERN relies on arrest data, which may merely replicate biases in policing practices;  PATTERN does not include all protective or promotive factors that may reduce the individual’s risk prediction;  PATTERN does not predict the aspects of risk regarding imminence, frequency, severity, or duration;  PATTERN’s rankings of risk (minimum, low, medium, high) are merely relative to the population studied;  PATTERN’s score includes criminal history measures that did not require conviction and thereby may overestimate risk because of faulty data;  PATTERN’s score may be higher based on evidence of juvenile offending;  PATTERN may increase risk when the individual does not engage in various types of programming; however, such programs may not have been made available to this individual for reasons not within the individual’s control;  (as relevant) PATTERN factors can count the same events twice or multiple times;

77 See Ashley B. Batastini et al., Does the Format of the Message Affect What Is Heard? A Two-Part Study on the Communication of Violence Risk Assessment Data, 19 J. FORENSIC PSYCHOL. RES. & PRAC. 44, 46 (2019); Daniel A. Krauss et al., Risk Assessment Communication Difficulties: An Empirical Examination of the Effects of Categorical Versus Probabilistic Risk Communication in Sexually Violent Predator Decisions, 36 BEHAV. SCI. & L. 532, 534 (2018); Nicholas Scurich, The Case Against Categorical Risk Estimates, 36 BEHAV. SCI. & L. 554, 558 (2018). 78 See Wisconsin v. Loomis, 881 N.W.2d 749, 765 (Wis. 2016) (identifying necessary cautions, that may evolve, before considering risk assessment at sentencing).

18

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

 (as relevant) this PATTERN score represents an override of the algorithm and the reason for the override. 4. User Buy-In Research studies and anecdotal evidence indicate that users (e.g., those scoring the tool and relevant decision-makers who receive scores) tend to distrust, and find ways to deviate from, algorithmic risk results if they are not included enough in the process and program.79 Information is needed on the methods planned to achieve sufficient user buy-in to improve compliance and consistency in order to achieve the FSA’s goals in this endeavor. II. NEEDS ASSESSMENT A core purpose of Title I of the FSA is to help prisoners succeed in their communities upon release and thereby reduce recidivism. The Act contemplates accomplishing this by providing all individuals in prison evidence-based programming that is designed to help them succeed upon release and that has been shown by empirical evidence to reduce recidivism.80 We are deeply concerned that the DOJ has not yet released the needs assessment required by the FSA. We understand from DOJ’s Report that the needs assessment is in the works, and there will be an opportunity to comment on that aspect of the DOJ’s FSA obligations at a later time. In light of that, we raise only a few critical issues here. 1. Programs Evidence-based programming is the bedrock of the FSA. Other aspects of the risk and needs assessment system only make sense if there is programming. Assessing (and reassessing) needs and assigning (and reassigning) individuals to programming based on those needs require that appropriate and available programming exist.81 In addition, the incentives and rewards identified in the law are contingent on participation in appropriate and available programming.82 DOJ’s Report, however, suggests there are few programs or courses available, as indicated by the relatively few individuals who were scored on them in the developmental sample.83 This is consistent with other information that waitlists to participate in BOP programs are long: 25,000 inmates are currently

79 See Jean-Pierre Guay & Geneviève Parent, Broken Legs, Clinical Overrides, and Recidivism Risk: An Analysis of Decisions to Adjust Risk Levels with the LS/CMI, 45 CRIM. JUST. & BEHAV. 82, 83-84 (2018). 80 See FSA at Title I, § 101(a) (codified at 18 U.S.C. §§ 3632, 3635(3)) and § 102(a) (codified at 18 U.S.C. § 3621(h)). 81 See FSA at Title I, § 101(a) (codified at 18 U.S.C. § 3632(a)(3)-(4)). 82 See FSA at Title I, § 101(a) (codified at 18 U.S.C. § 3632(a)(6), (a)(7), (d)). 83 See DOJ Report at 47, tbl. 1 (showing almost half (49%) of the developmental sample had completed no programs, a vast majority had no technical/vocational courses (82%) or federal industry employment (92%) and well over half (57%) had not had drug treatment while incarcerated despite indication of need).

19

Federal Public & Community Defenders 52 Duane Street, 10th Floor New York, NY 1007 Legislative Committee Tel: (212) 417-8738

waiting to be placed in prison work programs,84 at least 15,000 are waiting for education and vocational training,85 and at least 5,000 are awaiting drug abuse treatment.86 More information is needed on how programming will be expanded to ensure the goals of the FSA are met. 2. BOP Current Needs Assessment The DOJ Report indicates the BOP is using its current needs assessment until one is developed pursuant to the FSA. More information is needed on BOP’s current needs assessment and processes. 3. Responsivity Information is needed about how responsivity will be considered in connecting needs to programs. Relatedly, additional information is needed on the availability of culturally-sensitive programming (e.g., programs in Spanish for those with weak English skills and modification of 10 Step-like programs for non-Christians). III. CONCLUSION PATTERN is a high-stakes tool that directly affects how much time many people will spend in prison. High levels of transparency, accountability and auditability are both required and critical. We appreciate the opportunity to share our questions and concerns and hope there will be additional opportunities for feedback and dialogue after we have received the information identified above. Very truly yours,

/s David Patton Executive Director, Federal Defenders of New York Co-Chair, Federal Defender Legislative Committee

84 See BOP: UNICOR, Federal Bureau of Prisons, https://www.bop.gov/inmates/custody_and_care/unicor_about.jsp (estimating the participation rate at 8%). 85 See Oversight of the Federal Bureau of Prisons Before the H. Subcomm. on Crime, Terrorism, Homeland Security and Investigations of the H. Comm. on the Judiciary, 115th Cong. 20 (2018) (BOP Director Inch). 86 See Dep’t of Justice, Bureau of Prisons, Drug Abuse Treatment Program, 81 Fed. Reg. 24484, 24488 (Apr. 26, 2016) (“over 5,000 inmates waiting to enter treatment”); Colson Task Force, at 36 (“at the end of FY 2014, more than 12,300 people systemwide were awaiting drug abuse treatment”).

20