1

Predicting Brazilian court decisions Andre´ Lage-Freitas, Hector´ Allende-Cid, Orivaldo Santana, and L´ıvia de Oliveira-Lage [email protected], [email protected], [email protected], [email protected]

Abstract—Predicting case outcomes is useful but still an them into account – or even use them as guidelines – for next extremely hard task for attorneys and other Law professionals. decisions. It is not easy to search case information to extract valuable In order to understand Acord´ ao˜ decisions, one has to read information as this requires dealing with huge data sets and their complexity. For instance, the complexity of Brazil legal the subject at the summary, read the decision Report, how each system along with the high litigation rates makes this problem judge voted in this case (Votes), and the final decision which even harder. can be unanimous or not. Moreover, each Acord´ ao˜ might have This paper introduces an approach for predicting Brazilian more than one decision – regarding one or more appealed case court decisions which is also able to predict whether the decision claims – which can increase the Acord´ ao˜ complexity. This will be unanimous. We developed a working prototype which performs 79% of accuracy (F1-score) on a data set composed of problem becomes harder as there usually are hundreds – and 4, 043 cases from a Brazilian court. To our knowledge, this is the sometimes thousands – of Acord´ aos˜ related to the case on first study to forecast judge decisions in Brazil. which a Law professional is working. Index Terms—, legal outcome forecast, predictive A very common and extremely important task for Law algorithms professionals is to speculate how a specific court would decide given the ideas and the facts which compose the case. For example, this is useful for preparing and tuning a case to I.INTRODUCTION have a favourable decision. Hence, attorneys can rely on Since Code of Hammurabi1, we have been trying to improve substantial assumptions on how judges will decide based on legal certainty in human relationships by making public the their arguments. Although this information can be found in law and the rulings of courts. In addition to publicizing the public Acord´ aos,˜ the myriads of available documents make laws, legal systems usually provide further support to legal this task very complex and error prone, even for certainty through judicial decisions. These decisions are useful lawyers. not only for judging specific situations, but also to influence In addition to Brazil, several other legal systems in the world society behavior by exposing the legal consequences of our share the very same problem of predicting legal decisions. actions. Thereby, predicting legal decisions is fundamental to The challenge is hence generalized as how to predict legal understand the legal consequences of our actions as well as decisions with a satisfactory level of accuracy to support the for supporting law professionals to improve the quality of their work of attorneys, judges, and other professionals such as work. counters and real state offices. By satisfactory, we that In Brazil for example, lower court judges decisions might be the quality of the prediction in terms of accuracy should be appealed to Brazilian courts (Tribiunais de Justic¸a) to be re- better – or even higher – than Law experts. viewed by second instance court judges. In an appellate court, Nevertheless, it still is very hard to perform any legal judges decide together upon a case and their decisions are decision prediction with satisfactory accuracy, even though compiled in Agreement reports named Acord´ aos˜ . Similar to computers have been used for such challenge for decades [7]. lower court decisions, Acord´ aos˜ include Report, Fundamentos, For instance, Ashley and Bruninghaus¨ [2] propose a method 2 for classifying and predicting cases which is able to meet

arXiv:1905.10348v1 [cs.SI] 20 Apr 2019 Votes , and further metadata such as judgment date, attorneys, judges, etc. These Agreements documents are very useful for 91.8% of accuracy, however the evaluation takes only into understanding jurisprudence thus guiding lawyers and court account a small data set (146 cases). Katz et al. [6] uses members about the decisions. For instance, attorneys often historical data for predicting USA Supreme Court decisions use these documents to prepare cases while judges should take by classifying decisions in two and three classes and by presenting judge profiles. That approach reaches 70.2% of Andre´ Lage-Freitas is with Universidade Federal de Alagoas. accuracy for predicting case decisions and is assessed by on Hector´ Allende-Cid is with Pontificia Universidad Catlica de Valparaso. a data set with 28, 000 cases. Also using data from the USA Orivaldo Santana is with Universidade Federal do Rio Grande do Norte. L´ıvia de Oliveira-Lage is a Prosecutor at States Attorneys Office at Supreme Court, Ruger et al. [11] exposes how the prediction Procuradoria Geral do Estado de Alagoas. of Law experts performs in comparison to a trained statistical April 20th, 2019. Authorship: A.L.F. proposed the research problem, model for different Law fields by using less than 200 cases. supervised this work, and executed the . H.A.C. proposed and developed NLP and algorithms. O.S. verified the method- In [1], Aletras et al. uses Support-vector Machine (SVM) for ology. L.O.L. supervised and reviewed the Law aspects of this work. All predicting if cases violate Articles 3, 6, and 8 of European authors discussed the results and contributed to the final manuscript. The Convention on Human Rights of Human Rights. Their results authors declare that have no competing interests and that received no funding for this study. acheived 78% of accuracy performed on 584 European Court 1https://en.wikipedia.org/wiki/Code of Hammurabi cases separated by subjects. 2C.f. Brazilian Law: Art. 489, Lei 13105/15. Other related work takes advantage of machine learning 2 techniques to support further legal tasks. In [8], the authors court conclusion at the end of that document. In France, the propose a framework for automatically judging legal decisions Appellate Court (Cour d’appel) also renders decisions coming by using Attention neural network models. They applied the from the agreement of three judges. That decision is called an approach for divorce decisions in China. In [13], Shulayeva Arretˆ whose structure is also composed of legal basis for the et al. separates legal principles from case facts on legal docu- appeal, case history, and the final decision. ments by using a Naive Bayseian Multimodal classifier. In [4], Further, we share the same assumption of Aletras et. al [1]: the authors proposes to use transfer learning to recognize “there is enough similarity between (at least) certain chunks of the same words which have different meanings in different the text of published judgments and applications lodged with contexts, i. e., name-entity linking task. In [3], the authors uses the Court and/or briefs submitted by parties with respect to Bayesian networks to classify legal decisions from a Brazilian pending cases”. Labor court and conclude that both employees and employers are roughly successful in their litigation. Last, Ruhl et al. [12] B. Decision labels and data set overviews some perspectives on how complex systems are useful for supporting policy-makers on legal-related topics Regarding the flow process of a Brazilian appeal, when such as appellate jurisprudence and tax policy analysis. lawyers lodge appeals at a court it is analyzed by a group We are also motivated by recent results that show that composed of three judges to check whether the appeal is able intelligent systems can perform better than Law experts3. Our to be judged by the court. If the appeal does not meet the hypothesis is that by taking advantage of Natural Language formal requirements, the appeal is identified as not cognized Processing (NLP) and Machine Learning techniques it is (recurso nao˜ conhecido) hence not judged by the court. able to build a system that meets high quality legal decision Otherwise, the appeal is therefore judged and classified in predictions. Different from the closest related works of this various categories. We therefore assumed that court decisions paper [1], [2], [6] which address United States and European can be classified by using the following labels: courts, we propose an approach for legal decision prediction • not-cognized, when the appeal was not accepted to for Brazilian courts which also predicts whether the court be judged by the court; decision will be unanimous. Moreover, in contrast to [1], [2], • yes, for full favourable decisions; we trained a model at thousand-scale data set with 4, 043 • partial, for partially favourable decisions; cases. Moreover, in contrast to [1], our approach does not • no, when the appeal was denied; rely on a binary classification problem – since it uses three • prejudicada, to mean that the case could not be possible prediction results – nor require that case data set judged for any impediment such as the appealer died or should be separated by specific Law articles, hence being a gave up on the case for instance; more generic approach. • administrative, when the decision refers to a court The reminder of this paper is structured as follows. In administrative subject as conflict of competence between SectionII, we present details on the aforementioned prob- lower court judges. lem such as the case study and the methodology employed. In addition to the decision labels, an orthogonal concern of Section III exposes the results while SectionIV concludes our Brazilian court decisions – as well as for other legal systems investigation and proposes future directions on this subject. – refers to its unanimity aspect, being labeled as: • unanimity which that the decision was unani- II.MATERIAL AND METHODS mous among the three judges that voted in the case; and • not-unanimity The research question which guides our study is how to by meaning that one of the judges predict legal decisions with a satisfactory level of accuracy disagreed on the decision. for Brazilian courts by including the prediction of the court With respect to the data set, we relied on 4, 762 decisions unanimous behavior. Next sections provide further informa- (Acord´ aos˜ ) from a State higher court (appellate court), the tion about our assumptions and the proposed methodology. Tribunal de Justic¸a de Alagoas. From this data set, we re- moved the decisions that had repeated descriptions to not bias the sample thus resulting 4, 332 examples. Repeated decision A. A generic approach descriptions occur owing to very similar cases which share We focus on Brazilian courts as Brazil legal system is not the same description. Moreover, for the sake of predictability, trivial. We believe that if we are able to solve this problem we removed all the decisions classified as prejudicada, for such complex legal system, our approach would also fit not-cognized and admnistrative as these labels refer other simpler or it could be straightforwardly adapted for more to very peculiar situations which are not useful for prediction complex legal systems which share similarities. Nevertheless, purposes addressed by this paper. Finally, the total amount of it is worth to state that other legal systems also rely on examples were 4, 043 cases. similar documents. For instance, in Indiana Court of Appeal (United States), the Appellate Court decisions are composed C. Methodology by a group of three judges whose decisions (opinions) are divided in Case Summary, Facts, Procedural History, and the Figure1 depicts an overview of our methodology. From the legal decision data set, we extracted and separated the 3https://www.bbc.com/news/technology-41829534 texts which hold information about the case description, the 3 decision, and their unanimity aspect. This Natural Language label. The accuracy of case outcome prediction in this situation Pre-processing task includes removing stop-words and word was 74.07% (σ2 = 0.00029) for the F1-score metrics. This suffixes for improving the capacity of word representation. result is very interesting as it does not validate our previous Then, we took advantage of Term Frequency-inverse Docu- assumption that the not regularly distributed data set would ment Frequency (tdf-df) to increase the importance strongly bias the model. of relevant words while decreasing the importance of general With respect to predicting the unanimous behaviour of the repetitive words not relevant to the addressed problem. As Tribunal de Justic¸a de Alagoas Brazilian court, our approach follows, we used texts which refers to decisions and unanimity scored 98.46% (σ2 = 0.000031) for the F1-score metrics. to classify them to one of the possible labels (c.f. Section II-B). This assessment was performed in a data set with 2, 274 As a result, we built a structured training data set depicted by cases. From the 4, 332 data set – which had no repeated TableI. decision descriptions –, we removed the samples that either Next, we used 80% of the data set to train a Machine our classifier did not managed to label or the decision itself Learning model which was then assessed by using the latter did not had any information about unanimity. The resulting 20% of the data set (c.f. Figure1). To train a model means to data set had 2, 289 samples. As follows, we removed from automatically find out which parameters are the most suitable this data set the decision whose labels were prejudicada, for predicting decisions based on the training data set. Because not-cognized and admnistrative – as these labels we address decision and unanimous predictions, it requires to are not relevant for the predictive addressed problem – re- train two models to address both predictions. Once trained, the sulting in a data set with 2, 274 examples. The distribution of models can be used to predict decision and unanimity given unanimity and not-unanimity labels are depicted by a case description. Last, to evaluate our approach, we used TableIV. the F1-score metrics and performed 5-fold cross-validation to The very-high unanimous predictive accuracy of 98.46% is improve the practicability of our approach. Results are thus explained by the fact that most of decisions are unanimous, exposed as success accuracy rate in percentage. therefore the model was biased to this label. We indeed expected that most decisions were unanimous since this is well known by law experts. However, the great difference between unanimous and non unanimous decision is a surprising result. In order to understand how our approach would perform when predicting unanimity by using a more uniformly distributed data set, we therefore performed another evaluation by ran- domly removing decisions whose label was unanimity to meet the same number of not-unanimity decisions. The resulting data set had 90 examples, half of them labeled as unanimity and the other half not-unanimity. With this configuration, our prototype reached 76.94% (σ2 = 0.015) F1-score accuracy.

Fig. 1. Methodology. Specific data from case outcomes are extracted then classified to different labels. A machine learning model is hence trained and IV. CONCLUSION assessed by using 5-fold cross-validation. This paper proposes a methodology for predicting Brazilian court legal decisions which is able to reach 79% of accuracy Furthermore, in order to assess the exposed methodology, when employed for a Brazilian court data set with 4, 043 we developed a working prototype in Python. We used the cases. Our approach is able to predict case outcomes by using NLTK framework [9] for Natural Language Processing in such three different labels: yes, no, partial. Moreover, the proposed a way that our prototype is easily configurable for various lan- method also predicts whether the decision will be unanimous, guages in addition to Portuguese. The prototype also provides which fits not only Brazil legal system, but also several others a graphical user interface which can be accessible from any whose decisions are judged by more than one judge. The Web browser. unanimity prediction performance of our approach is 77% of accuracy. To our knowledge, this is the first study to predict III.RESULTS Brazil legal decisions. Our approach was able to score 78,99% F1-score (σ2 = Moreover, our approach is easy to use as it only requires that 0.000017) when predicting legal decision for the Tribunal users provide the description of their litigation and the output de Justic¸a de Alagoas Brazilian court by using 4, 043 judge will be one of the aforementioned case outcome predictive decisions. The number of samples for each label is exposed label along with its predictive unanimity label. These infor- in TableII. mation are very useful for attorneys, judges, and other Law In order to analyze our prediction over a more uniformly professionals as they provide practical support for their work. distributed data set, we randomly removed 1, 549 no-labeled Moreover, our contribution also includes a working prototype decisions to have the same number of partial-labeled which can be configured to further languages as well as for decisions. Table III depicts the distributions of each decision different data sets. 4

Data Decision description Decision Unanimity Label Unanimity label Sample 1 Direito Processual Civil... Recurso conhecido e provido Unanimidade yes unanimity Sample 2 Apelac¸ao˜ criminal... Recurso conhecido e parcialmente provido Decisao˜ unnime patial unanimity Sample 3 Apelac¸ao˜ C´ıvel em Ac¸ao˜ Ordinaria...´ Recurso conhecido e nao˜ provido Decisao˜ unanimeˆ no unanimity TABLE I TRAINING DATA SET INCLUDES DECISION TEXTS AND LABELS WHICH WERE CLASSIFIED ACCORDING TO RESPECTIVE DECISION TEXTS.E.G., IN SAMPLE 1, provido WAS CLASSIFIED AS A FAVORABLE (YES) DECISIONAND Unanimidade WAS CLASSIFIED AS UNANIMITY.

Labels no partial yes [6] D. M. Katz, M. J. Bommarito, and J. Blackman. A general approach N. of decisions 2,415 866 762 for predicting the behavior of the Supreme Court of the United States. TABLE II PLOS ONE, 12(4):e0174698, apr 2017. DISTRIBUTIONOFDECISIONSACCORDINGTOTHEIRLABELS. [7] L. Loevinger. Jurimetrics: The Methodology of Legal Inquiry. Law and Contemporary Problems, 28(1):5, 1963. [8] S. Long, C. Tu, Z. Liu, and M. Sun. Automatic Judgment Prediction via Labels no partial yes Legal Reading Comprehension. Technical report, Tsinghua University, N. of decisions 866 866 762 Beijing, sep 2018. [9] E. Loper and S. Bird. NLTK. In Proceedings of the ACL-02 Workshop TABLE III on Effective tools and methodologies for teaching natural language DISTRIBUTIONOFDECISIONSACCORDINGTOTHEIRLABELSWHEN processing and computational linguistics -, volume 1, pages 63–70, RANDOMLY REMOVING NO-LABELEDDECISIONSAMPLESTOCREATEA Morristown, NJ, USA, 2002. Association for Computational Linguistics. REGULAR DISTRIBUTED DATA SET. [10] P. H. Luz de Araujo, T. E. de Campos, R. R. R. de Oliveira, M. Stauffer, S. Couto, and P. Bermejo. LeNER-Br: A Dataset for Named Entity Recognition in Brazilian Legal Text. In Proceedings of the International Labels not-unanimity unanimity Conference on the Computational Processing of Portuguese, pages 313– N. of decisions 45 2,229 323, Canela, 2018. [11] T. W. Ruger, P. T. Kim, A. D. Martin, and K. M. Quinn. The Supreme TABLE IV Court Forecasting Project: Legal and Political Science Approaches to DISTRIBUTIONOFDECISIONSACCORDINGTOJUDGEUNANIMOUS Predicting Supreme Court Decisionmaking. Columbia Law Review, BEHAVIOR. 104(4):1150, may 2004. [12] J. B. Ruhl, D. M. Katz, and M. J. Bommarito. Harnessing legal complexity. Science, 355(6332):1377–1378, mar 2017. [13] O. Shulayeva, A. Siddharthan, and A. Wyner. Recognizing cited facts Although we believe that our contribution is quite satisfac- and principles in legal judgements. Artificial Intelligence and Law, tory given the accuracy rate aforementioned, future investiga- 25(1):107–126, mar 2017. tions might consider comparing our results with Law experts, as performed in [11] and by current Lawtech products such as Case Crunch and LawGeex4. Other future work includes to investigate whether taking advantage of existent Named- entity recognition data sets for Brazilian law documents [10] improve the prediction quality. Furthermore, the assessment of the proposed method can be performed on larger and/or different data sets, such as the European Court of Human Rights for instance. Ultimately, despite the various directions one might take to leverage our work, we believe that Mireille Hildebrandt’s viewpoint on “agnostic machine learning” and its consequences to the Rule of Law [5] should be taken into account when designing and using a legal predictive system.

REFERENCES [1] N. Aletras, D. Tsarapatsanis, D. Preoiuc-Pietro, and V. Lampos. Predict- ing judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective. PeerJ Computer Science, 2:e93, oct 2016. [2] K. D. Ashley and S. Bruninghaus.¨ Automatically classifying case texts and predicting outcomes. Artificial Intelligence and Law, 17(2):125– 165, jun 2009. [3] R. Barros, F. Lorenzi, and L. K. Wives. Recent Trends and Future Technology in Applied Intelligence, volume 10868 of Lecture Notes in Computer Science. Springer International Publishing, Cham, 2018. [4] A. Elnaggar, R. Otto, and F. Matthes. Named-Entity Linking Using Deep Learning For Legal Documents: A Transfer Learning Approach. Technical report, Technische Universitat,¨ Munchen,¨ 2018. [5] M. Hildebrandt. Algorithmic regulation and the rule of law. Philosoph- ical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 376(2128):20170355, sep 2018.

4https://www.artificiallawyer.com/2018/02/26/lawgeex-hits-94-accuracy-in- nda-review-vs-85-for-human-lawyers/