Against Corpus Linguistics

Against Corpus Linguistics JOHN S. EHRETT* Corpus linguistics—the use of large, computerized word databases as tools for discovering linguistic meaning—has increasingly become a topic of interest among scholars of constitutional and statutory interpretation. Some judges and academics have recently argued, across the pages of multiple law journals, that members of the judiciary ought to employ these new technologies when seeking to ascertain the original public meaning of a given text. Corpus linguistics, in the minds of its proponents, is a powerful instrument for rendering constitutional originalism and statutory textualism “scientific” and warding off accusations of interpretive subjectivity. This Article takes the opposite view: on balance, judges should refrain from the use of corpora. Although corpus linguistics analysis may appear highly promising, it carries with it several under-examined dangers—including the collapse of essential distinctions between resource quality, the entrenchment of covert linguistic biases, and a loss of reviewability by higher courts. TABLE OF CONTENTS INTRODUCTION……………………………………………………..…….51 I. THE RISE OF CORPUS LINGUISTICS……………..……………..……54 A. WHAT IS CORPUS LINGUISTICS? …………………………………54 1. Frequency……………………………………………...…54 2. Collocation……………………………………………….55 3. Keywords in Context (KWIC) …………………………...55 B. CORPUS LINGUISTICS IN THE COURTS……………………………56 1. United States v. Costello…………………………………..56 2. State v. Canton……………………………………………58 3. State v. Rasabout………………………………………….59 II. AGAINST “JUDICIALIZING”C ORPUS LINGUISTICS………………….61 A. SUBVERSION OF SOURCE AUTHORITY HIERARCHIES……………..61 * Yale Law School, J.D. 2017. © 2019, John S. Ehrett. 51 THE GEORGETOWN LAW JOURNAL ONLINE [VOL. 108 B. IMPROPER PARAMETRIC OUTSOURCING…………………………65 C. METHODOLOGICAL INACCESSIBILITY…………………………...68 III. THE FUTURE OF JUDGING AND CORPUS LINGUISTICS………………70 INTRODUCTION “Corpus linguistics” may sound like a forensic investigative procedure on CSI or NCIS, but the reality is far less dramatic—though no less important. Summarized briefly, corpus linguistics is the use of large, searchable databases, or corpora, of computer-annotated1 texts to ascertain evolving patterns of word use over time. Technically speaking, corpora are “large collection[s] of naturally occurring texts that are sampled to be representative of a particular type of language variety”; 2 sociologically speaking, corpora are “sample[s] of the speech of a given speech community at a given point in time.”3 Given the potential for corpora to capture a broad “sense” of word meaning drawn from an ever-swelling mass of source material, a growing number of judges and scholars have argued that members of the judiciary should regularly use corpora when seeking to grasp the underlying meaning and relevant connotations of a given legal text. As often proves the case, certain philosophical commitments are at play beneath this enthusiasm for corpus linguistics methodology. Advocates of constitutional originalism and textualism in statutory interpretation have long argued for a return to the “original public meaning” of both the Constitution and state and federal laws.4 Stefan Gries and Brian 1 See Geoffrey Leach, Introducing Corpus Annotation, in CORPUS ANNOTATION: LINGUISTIC INFORMATION FROM COMPUTER TEXT CORPORA 1, 2 (Roger Garside et al. eds., 2013) (explaining that annotation is “the practice of adding interpretive, linguistic information to an electronic corpus of spoken and/or written language data”). 2 Lawrence M. Solan & Tammy A. Gales, Corpus Linguistics as a Tool in Legal Interpretation, 2017 BYU L. REV. 1311, 1337. 3 Stephen C. Mouritsen, Corpus Linguistics in Legal Interpretation—An Evolving Interpretive Framework, 6 INT’L J. LANG.& L. 67, 86 (2017). 4 A full survey of the longstanding debates surrounding the cohesiveness of “original public meaning” as a concept is far beyond the scope of this Article: many authors in many venues have advanced and defended this principle at length. See, e.g., District of Columbia v. Heller, 554 U.S. 570 (2008); ROBERT H. BORK, THE TEMPTING OF AMERICA: THE POLITICAL SEDUCTION OF THE LAW 159 (1990); Lawrence B. Solum, We Are All Originalists Now, in CONSTITUTIONAL ORIGINALISM:AD EBATE 1, 4 (Robert W. Bennett & Lawrence B. Solum eds., 2011); Oliver Wendell Holmes, The Theory of Legal Interpretation, 12 HARV. L. REV. 417 (1899); Richard S. Kay, Original Intention and Public Meaning in Constitutional Interpretation, 103 NW. U. L. REV. 703 (2009); Jack M. Balkin, Abortion and Original Meaning, 24 CONST.C OMMENT. 291 (2007); Eric Berger, 2019] THE GEORGETOWN LAW JOURNAL ONLINE 52 Slocum explain that “[t]he basic premise of the ordinary meaning doctrine is that a legal text is a form of communication that uses natural language in order to accomplish its purposes. Thus, for various reasons including rule of law and notice concerns, textual language should be interpreted in light of the accepted and typical standards of communication that apply outside of the law.”5 In the eyes of its proponents, the doctrine of original public meaning is the only way courts can affirm a consistent reading of the law over time, thus placing the responsibility for legal reform squarely in the hands of legislatures and circumscribing the role of the courts. Viewed thus, courts do not interpret the law so much as apply it (in a somewhat mechanistic) fashion to the disputes before them. And this general philosophy of legal language carries with it broad implications for contemporary disputes over the meaning of law.6 For example, the Second Amendment speaks of the right of the people to “keep and bear arms”—but what did these words actually mean at the time the Amendment was penned? In the late eighteenth century, could an American “bear” a weapon openly in public spaces without being sanctioned by the authorities? What sort of “arms” were contemplated by the Constitution’s framers? These and similar questions have both perpetually vexed courts and filled the pages of law reviews7—and they are precisely the questions advocates of “original public meaning” hope to resolve decisively.8 Originalism’s Pretenses, 16 U. PA. J. CONST. L. 329 (2013); Lawrence B. Solum, Originalism and Constitutional Construction, 82 FORDHAM L. REV. 453, 463–64 (2013). 5 Stephan Th. Gries & Brian G. Slocum, Ordinary Meaning and Corpus Linguistics, 2017 BYU L. REV. 1417, 1424. 6 Some particularly notable constitutional research projects employing corpus linguistics include inquiries into the original meanings of the Commerce Clause, the Second Amendment, the phrase “officers of the United States,” and the Emoluments Clauses. See Randy E. Barnett, New Evidence of the Original Meaning of the Commerce Clause, 55 ARK. L. REV. 847 (2002); Joel W. Hood, The Plain and Ordinary Second Amendment: Heller and Heuristics, SOC. SCI. RES. NETWORK (April 17, 2014), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2425366; Jennifer L. Mascott, Who Are “Officers of the United States”?, 70 STAN. L. REV. 443 (2018); James Cleith Phillips & Sara White, The Meaning of the Three Emoluments Clauses in the U.S. Constitution: A Corpus Linguistic Analysis of American English From 1760–1799, 59 S. TEX. L. REV. 181 (201). 7 See, e.g., CLAYTON E. CRAMER,F OR THE DEFENSE OF THEMSELVES AND THE STATE:T HE ORIGINAL INTENT AND JUDICIAL INTERPRETATION OF THE RIGHT TO KEEP AND BEAR ARMS 8–9 (1994); Don B. Kates, Jr., Handgun Prohibition and the Original Meaning of the Second Amendment, 82 MICH. L. REV. 204 (1983); Don B. Kates & Clayton E. Cramer, Second Amendment Limitations and Criminological Considerations, 60 HASTINGS L.J. 1339 (2008); Dan Terzian, The Right to Bear (Robotic) Arms, 117 PENN ST. L. REV. 755 (2013). 8 Much scholarship in the field of law and corpus linguistics has centered on defending the normativity and intelligibility of “original public meaning” as a framework for ascertaining textual meaning. Those debates are longstanding, and this Article does not engage them; its focus is methodological. This Article largely accepts the premise of corpus linguistics advocates that recovering original public meaning is a laudable—if sometimes evanescent—judicial goal, and its analysis is therefore predominantly concerned with whether corpus-based research can meaningfully achieve what its proponents say it can. 53 THE GEORGETOWN LAW JOURNAL ONLINE [VOL. 108 Practically speaking, originalists and textualists alike operate from the methodological assumption that the “original public meaning” of a text is both meaningful and recoverable—an assumption many scholars have challenged on theoretical and pragmatic grounds. For one thing, methodologies contingent on the use of extrinsic clues to textual meaning are easily accused by their critics of encouraging a “cherry-picking” approach to interpretation. That is to say, despite the claims to objectivity of an original public meaning standard, advocates of this framework may be (and frequently are) charged with making subjective determinations about both the sources to be consulted as guides to original public meaning and the proper resolution of apparent ambiguities. 9 Whither, then, the consistent originalist or textualist? Corpus linguistics offers a novel way to address these persistent difficulties and allegedly make both originalism and textualism more “scientific.” As Lawrence Solan notes, “if scholars want to investigate how the public likely understood the Constitution’s words, then scholars would benefit from examining the data contained

Against Corpus Linguistics

Usage of IT Terminology in Corpus Linguistics Mavlonova Mavluda Davurovna

Linguistic Annotation of the Digital Papyrological Corpus: Sematia

Creating and Using Multilingual Corpora in Translation Studies Claudio Fantinuoli and Federico Zanettin

Corpus Linguistics As a Tool in Legal Interpretation Lawrence M

TEI and the Documentation of Mixtepec-Mixtec Jack Bowers

Corpus Linguistics: a Practical Introduction

A Scaleable Automated Quality Assurance Technique for Semantic Representations and Proposition Banks K

A Corpus-Based Investigation of Lexical Cohesion in En & It

Verb Similarity: Comparing Corpus and Psycholinguistic Data

Helping Language Learners Get Started with Concordancing

Rule-Based Machine Translation for the Italian–Sardinian Language Pair

A Comparative Analysis of Hedging in a Corpus of Two Written Legal Discourse Genres