Recognizing Speech Acts in Presidential E-Records
Total Page:16
File Type:pdf, Size:1020Kb
Recognizing Speech Acts in Presidential E-records William Underwood Technical Report ITTL/CSITD 08-03 October 2008 Computer Science and Information Technology Division Information Technology and Telecommunications Laboratory Georgia Tech Research Institute Georgia Institute of Technology The Army Research Laboratory (ARL) and the National Archives and Records Administration (NARA) sponsor this research under Army Research Office Cooperative Agreement W911NF-06-2-0050. The findings in this paper should not be construed as an official ARL or NARA position unless so indicated by other authorized documentation. Abstract Among the challenges facing contemporary archivists at Presidential Libraries and the National Archives are the tasks of reading, understanding, describing, accessing and reviewing terabyte- and petabyte-sized collections of electronic records. The Advanced Decision Support for Archival Processing of Presidential E-Records Project seeks to apply natural language processing technologies to the support of these archival tasks. This paper discusses the relevance of the science of Diplomatics and of speech act theory to the comprehension of the actions conveyed by records, and to the archival description and review of these records. Speech acts include common acts such as asserting, promising, requesting, ordering, and congratulating. They also include acts that can only be performed by persons with the power and authority to do so such as proclaiming, declaring, directing, pardoning, appointing, nominating, and counseling. Concepts from speech act theory as developed by Searle are reviewed. A corpus of 120 Presidential records was analyzed to determine the occurrence of explicit and implicit speech acts and assertions about the writer‘s or others‘ speech acts. Vanderveken defined 271 speech acts [Meaning and Speech Acts Vol. 1, Chapt. 4]. Instances of 76 of the speech acts defined by Vanderveken were discovered in the corpus. Instances of 32 speech acts that are not defined by Vanderveken were discovered and defined. The report gives examples from the corpus of each of these speech acts. A method is proposed for recognizing the speech acts performed by the sentences of a record and the primary act(s) conveyed by a record. Such a method will facilitate automatic description of individual records as well as aggregations of records. It will also facilitate automatic identification of passages of records that might be subject to restrictions on disclosure prescribed by the Presidential Records Act and Freedom of Information Act. Keywords: speech acts, performative verbs, pronominal co-reference, e-records, computational linguistics ii Copyright © William Underwood 2008 iii Table of Contents 1 INTRODUCTION ..................................................................................................................................... 1 1.1 BACKGROUND ........................................................................................................................................... 1 1.2 PURPOSE .................................................................................................................................................. 2 1.3 SCOPE ..................................................................................................................................................... 2 2 RECORDS, ARCHIVAL PROCESSES AND SPEECH ACTS ............................................................................. 2 2.1 JURIDICAL ACTS AND RECORDS ..................................................................................................................... 2 2.2 ARCHIVAL DESCRIPTION AND REVIEW OF PRESIDENTIAL RECORDS ....................................................................... 4 2.3 SPEECH ACTS ............................................................................................................................................ 6 2.3.1 Searle’s Theory of Illocutionary Acts .............................................................................................. 7 2.3.2 Indirect Speech Acts ..................................................................................................................... 10 3. ANALYSIS OF SPEECH ACTS IN PRESIDENTIAL RECORDS ..................................................................... 11 3.1 EXAMPLES OF SPEECH ACTS EXPRESSED WITH EXPLICIT PERFORMATIVE SENTENCES .............................................. 11 3.1.1 Assertives ..................................................................................................................................... 11 3.1.2 Commissives ................................................................................................................................ 17 3.1.3 Directives ..................................................................................................................................... 18 3.1.4 Declaratives ................................................................................................................................. 25 3.1.5 Expressives ................................................................................................................................... 31 3.2 ILLOCUTIONARY ACTS INDICATED BY DEVICES OTHER THAN PERFORMATIVE SENTENCES ......................................... 36 3.3 SPEECH ACTS THAT ARE IN THE PROPOSITIONS OF OTHER SPEECH ACTS .............................................................. 48 4 A METHOD FOR RECOGNIZING SPEECH ACTS IN E-RECORDS ............................................................... 61 4.1 A METHOD FOR ANNOTATING NAMES OF ENTITIES IN E-RECORDS .................................................................... 62 4.2 A METHOD FOR INTERPRETING THE DOCUMENTARY FORMS OF E-RECORDS ........................................................ 62 4.3 METHOD FOR SPEECH ACT RECOGNITION ..................................................................................................... 62 4.3.1 Orthomatcher .............................................................................................................................. 63 4.3.2 Pronominal Coreference .............................................................................................................. 63 4.3.3 Morphological Analyzer ............................................................................................................... 64 4.3.4 Parsing and Interpreting Sentences ............................................................................................. 64 4.3.5 Speech Act Transducer ................................................................................................................. 65 4.4 TEST AND EVALUATION OF THE METHOD ...................................................................................................... 69 5. RELATED RESEARCH ........................................................................................................................... 69 6 CONCLUSIONS AND FUTURE WORK .................................................................................................... 71 REFERENCES ........................................................................................................................................... 72 APPENDIX A: PRESIDENTIAL RECORDS AND PRMS INCLUDED IN THE ANALYSIS ..................................... 75 APPENDIX B: NOTES ON ANALYSIS OF SPEECH ACTS .............................................................................. 80 iv TABLE OF FIGURES Figure 1. Reporting as a Sequence of Assertions ..........................................................................................37 Figure 2. Reporting Indicated by Document Type and Sequence of Assertions. ..........................................38 Figure 3. Directing Indicated by Document Type and Use of 'should' and 'will' ...........................................39 Figure 4. A Request Indicated Using an Imperative Sentence and 'Please' ...................................................40 Figure 5. A Request Indicated by the Caption ACTION REQUESTED .......................................................41 Figure 6. Recommendation Indicated by a Section Heading .........................................................................44 Figure 7. Recommendation Indicated by 'should consider' ...........................................................................45 Figure 8. Approval or Disapproval Indicated by Captions ............................................................................46 v 1 Introduction 1.1 Background Intellectual control of an archival collection is not achieved until the collection has been fully described. Furthermore, a collection cannot be fully described until archivists have read, understood, and recorded details as to the document types, the creators, the titles, dates, extent and contents of the records in the collection. Archivists and researchers will not have the capability to reliably locate relevant records and understand those records until catalogs, finding aids, and indexes are produced by the process of archival description. Given the increasing volumes of Federal and Presidential electronic records acquired by NARA, it will be decades, if not centuries, before full intellectual control of these records is achieved. This is especially true of Presidential e-records as they must reviewed page- by-page for restrictions on disclosure as provided by the Presidential Records Act (PRA). The Freedom of Information Act provides that citizens