Human Decisions and Machine Predictions
Total Page:16
File Type:pdf, Size:1020Kb
Human Decisions and Machine Predictions Jure Leskovec (@jure) Including joint work with Himabindu Lakkaraju, Sendhil Mullainathan, Jon Kleinberg, Jens Ludwig Machine Learning Humans vs. Machines Humans vs. Machines § Given the same data/features X machines tend to beat human: § Games: Chess, AlphaGo, Poker § Image classification § Question answering (IBM Watson) § But humans tend see more than machines do. Humans make decisions based on (X,Z)! Humans Make Decisions Humans make decisions based on X,Z! § Machine is trained based on P(Y|X) § But humans use P(Y|X,Z) to make decisions § Could this be a problem? Yes! Why? Because the data is selectively labeled and cross validation can severely overestimate model performance. Plan for the Talk 1) Can machines make better decisions than humans? 2) Why do humans make mistakes? 3) Can machines help humans make better decisions? Jure Leskovec (@jure) Stanford University 5 Example Problem: Criminal Court Bail Assignment Jure Leskovec, Stanford University 6 Example Decision Problem U.S. police makes >12M arrests/year Q: Where do people wait for trial? Release vs. Detain is high stakes: § Pre-trial detention spells avg. 2-3 months (can be up to 9-12 months) § Nearly 750,000 people in jails in US § Consequential for jobs, families as well as crime Jure Leskovec (@jure) Stanford University 7 Judge’s Decision Problem Judge must decide whether to release (bail) or not (jail) Outcomes: Defendant when out on bail can behave badly: § Fail to appear at trial § Commit a non-violent crime § Commit a violent crime The judge is making a prediction! Jure Leskovec (@jure) Stanford University 8 Bail as a Decision Problem § Not assessing guilt on this crime § Not a punishment § Judge can only pay attention to failure to appear and crime risk Jure Leskovec (@jure) Stanford University 9 The ML Task We want to build a predictor that will based on defendant’s criminal history predict defendant’s future bad behavior Prediction Defendant Learning of bad characteristics algorithm behavior in the future Jure Leskovec (@jure) Stanford University 10 What Characteristics to Use? Only administrative features § Common across regions § Easy to get § No “gaming the system” problem Only features that are legal to use § No race, gender, religion Note: Judge predictions may not obey either of these Jure Leskovec (@jure) Stanford University 11 Data: Defendant Features § Age at first arrest § Arrest for new offense while on § Times sentenced residential supervision or bond correction § Has active warrant § Level of charge § Has active misdemeanor § Number of active warrants warrant § Number of misdemeanor cases § Has other pending charge § Number of past revocations § Had previous adult conviction § Current charge domestic § Had previous adult violence misdemeanor conviction § Is first arrest § Had previous adult felony § Prior jail sentence conviction § § Prior prison sentence Had previous Failure to Appear § § Employed at first arrest Prior supervision within 10 years § Currently on supervision § Had previous revocation (only legal features and easy to get) Jure Leskovec (@jure) Stanford University 12 Data: Kentucky & Federal Fraction Fraction Failure to Non- Number Violent Jurisdiction released of Appear at violent of cases Crime people Crime Trial Crime Kentucky 362k 73% 17% 10% 4.2% 2.8% Federal Pretrial 1.1m 78% 19% 12% 5.4% 1.9% System Jure Leskovec (@jure) Stanford University 13 Applying Machine Learning § Just apply Machine Learning: § Use labeled dataset (released defendants) § Train a learning model to predict whether defendant when our on bail will misbehave § Report accuracy on the held-out test set Jure Leskovec, Stanford University 14 Prediction Model ������������������������������������������� ��� ���� ������������ �������������������������� ��� ���� ��� ���� ������������ ������������ ��������������������������������� ������������������������ ��� ���� ��� ���� ��� ���� ���� ��� �������������������������������� �������������������������� ���������������� �������������������� ������������������� �������������� ������������ ������������������� ���� ����� ���� ����� ������ ����� ���� ����� ���� ����� ���� ����� ���� ����� ���� ����� ���� ���� ���� ���� ���� ��� ��� ���� ���� ���� ���� ���� ���� ���� ��� ���� Top four levels the decision tree ButProbability there of NOT committing is a a crimeproblem with this… Typical tree depth: 20-25 Jure Leskovec (@jure) Stanford University 15 The Problem § Issue: Judge sees factors the machine does not § Judge makes decisions based on P(Y|X,Z) § X … prior crime history § Z … other factors not seen by the machine § Machine makes decisions based on P(Y|X) § Problem: Judge is selectively labeling the data based on P(Y|X,Z)! Jure Leskovec, Stanford University 16 Problem: Selective Labels Judge is selectively labeling the dataset Judge Release Jail Crime? Crime? Yes No ? ? § We can only train on released people § By jailing judge is selectively hiding labels! Jure Leskovec, Stanford University 17 Selection on Unobservables Selective labels introduce bias: § Say young people with no tattoos have no risk for crime. Judge releases them. § But machine does not observe whether defendants have tattoos. § So, the machine would falsely conclude that all young people do no crime. § And falsely presume that by releasing all young people it does better than judge! Jure Leskovec, Stanford University 18 Challenges: (1) Selective Labels (2) Unobservables Z Jure Leskovec, Stanford University 19 Solution: 3 Properties Our solution depends on three key properties: § Multiple judges § Natural variability between judges § Random assignment of cases to judges § One-sidedness of the problem Solution: Tree Observations 1) Problem is one-sided: § Releasing a jailed person is a problem § Jailing a released person is not Algorithm Release Jail ü ü Judge Jail Release û ü Jure Leskovec (@jure) Stanford University 21 Solution: Three Observations 2) In Federal system cases are randomly assigned § This means that on average all judges have similar cases Jure Leskovec (@jure) Stanford University 22 Solution: Three Observations 3) Natural variability between judges § Due to human nature there will be some variability between the judges § Some judges are more strict while others are more lenient Jure Leskovec (@jure) Stanford University 23 Solution: Contraction Solution: Use most lenient judges! § Take released population of a lenient judge § Which of those would we jail to become less lenient § Compare crime rate to a strict human judge Note: Does not rely on judges having “similar” predictions Jure Leskovec (@jure) Stanford University 24 Solution: Contraction Making lenient judges strict: Released by a lenient judge Use algorithm to make a lenient judge more strict Strict human judge Defendants (ordered by crime probability) What makes contraction work: § 1) Judges have similar cases (random assignment of cases) § 2) Selective labels are one-sided Jure Leskovec (@jure) Stanford University 25 Evaluating the Prediction Predictions create a new release rule: § Parameterized by % released § For every defendant predict P(crime) § Sort them by increasing P(crime) and “release” bottom k § Note: This is different than AUC What is the fraction released vs. crime rate tradeoff? Jure Leskovec (@jure) Stanford University 26 Overall Predicted Crime Holding release rate constant, crime rate is reduced by 0.102 (58.45% decrease) Judges 17% Machine Learning 11% Overall crime rate All features + ML algorithm Crime Rate (includes FTA, NCA, NVCA) 73% 88% ReleaseRelease Rate Rate Standard errors too small to display Decision trees beat LogReg by 30% Jure Leskovec (@jure) Stanford University Contrast with AUC ROC 27 Predicted Failure to Appear Holding release rate constant, crime rate is reduced by 0.06 (60% decrease) Judges decisions 10% FTA Rate FTA ML algorithm 4.1% 73% Release Rate Jure Leskovec (@jure) Stanford University 28 Predicted Non-Violent Crime Holding release rate constant, crime rate is reduced by 0.025 (58% decrease) Judges decisions 4.2% NCA Rate 1.7% 73% Release Rate Notes: Standard errors too small to display on graph Jure Leskovec (@jure) Stanford University 29 Predicted Violent Crime Holding release rate constant, crime rate is Judges decisions reduced by 0.014 2.8% (50% decrease) NVCA Rate 1.4% Release Rate 73% Notes: Standard errors too small to display on graph Jure Leskovec (@jure) Stanford University 30 Effect of Race h#H2 d, _+BH 6B`M2bb .`QT _2HiBp2 S2`+2Mi;2 Q7 CBH SQTmHiBQM *`BK2 _i2 _2H2b2 _mH2 iQ Cm/;2 "H+F >BbTMB+ JBMQ`Biv .Bbi`B#miBQM Q7 .272M/Mib U"b2 _i2V X93dd XjjR3 X3RN8 Cm/;2 XRRj9 X8dj XjRek X33Nk yW UXyyRyV UXyykNV UXyykdV UXyyR3V H;Q`Bi?K lbmH _MFBM; Xy389 X8N39 Xjykj XNyyd @k9Xe3W UXyyy3V UXyykNV UXyykdV UXyyRdV Ji+? Cm/;2 QM _+2 Xy388 X8dj XjRek X33Nk @k9Xe9W UXyyy3V UXyykNV UXyykdV UXyyR3V 1[mH _2H2b2 _i2b Xy3dj X93dd XjjR3 X3RN8 @kjXykW 7Q` HH _+2b UXyyy3V UXyykNV UXyyk3V UXyykjV Ji+? GQr2` Q7 Xy3de X93dd XjRek X3yjN @kkXd9W "b2 _i2 Q` Cm/;2 UXyyy3V UXyykNV UXyykdV UXyykjV LQi2b, h#H2 `2TQ`ibML i?2 TQi2MiBH does ;BMb Q7 i?2not H;Q`Bi?KB+ beat `2H2b2 `mH2 the `2HiBp2 iQ judge i?2 Dm/;2 i i?2 Dm/;2bby `2H2b2 `i2 rBi? `2bT2+i iQ +`BK2 `2/m+iBQMb M/ b?`2 Q7 i?2 DBH TQTmHiBQM i?i Bb #H+F- >BbTMB+ Q` 2Bi?2` #H+F Q` >BbTMB+X h?2 }`bi `Qr b?Qrb i?2 b?`2 Q7 i?2 /272M/Miracially TQTmHiBQM Qp2`HH discriminating i?i Bb #H+F Q` >BbTMB+X h?2 b2+QM/ `Qr b?Qrb i?2 `2bmHib Q7 i?2 Q#b2`p2/ Dm/;2 /2+BbBQMbX h?2 i?B`/ `Qr b?Qrb i?2 `2bmHib Q7 i?2 mbmH H;Q`Bi?KB+ `2@`MFBM; `2H2b2 `mH2- r?B+? /Q2b