Latent Semantic Analysis, Corpus Stylistics and Machine Learning
Total Page:16
File Type:pdf, Size:1020Kb
Latent Semantic Analysis, Corpus stylistics and Machine Learning Stylometry for Translational and Authorial Style Analysis: The Case of Denys Johnson-Davies’ Translations into English A dissertation submitted to Kent State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Mohammed Al Batineh May, 2015 © Copyright by Mohammed S. Al-Batineh All Rights Reserved Dissertation written by Mohammed Al Batineh BA., Yarmouk University, Jordan, 2008 MA., Yarmouk University, Jordan, 2010 APPROVED BY __________________________, Chair, Doctoral Dissertation Committee Dr. Françoise Massardier-Kenney (advisor) __________________________, Members, Doctoral Dissertation Committee Dr. Carol Maier __________________________, Dr. Gregory M. Shreve __________________________, Dr. Jonathan I. Maletic __________________________, Dr. Katherine Rawson ACCEPTED BY __________________________, Interim Chair, Modern and Classical Language Studies Dr. Keiran J Dunne __________________________, Dean, College of Arts and Sciences Dr. James L. Blank TABLE OF CONTENTS LIST OF FIGURES ........................................................................................... viii LIST OF TABLES ............................................................................................... ix DEDICATION ...................................................................................................... x ABSTRACT ........................................................................................................ xii CHAPTER 1: INTRODUCTION ......................................................................... 1 1.1. Introduction ........................................................................................... 1 1.2. Denys Johnson-Davies .......................................................................... 6 1.3. Research Hypotheses ............................................................................ 8 1.4. Research Method .................................................................................. 9 1.5. Significance of the Study .................................................................... 11 1.6. Summary of Chapters ......................................................................... 12 CHAPTER 2: LITERATURE REVIEW ............................................................ 14 2.1. A Brief History of Literary Stylistics .................................................. 14 2.2. Approaches to Style in Translation Studies ........................................ 17 2.3. Text-Oriented Approaches .................................................................. 18 2.3.1. Comparative Approach ................................................................... 19 2.3.2. Target-Oriented Approach .............................................................. 25 2.4. Translator-Oriented Approaches ......................................................... 27 2.5. Cognitive-Oriented Approach ............................................................. 44 2.6. Conclusion .......................................................................................... 47 iii CHAPTER 3: METHODOLOGY ...................................................................... 51 3.1. Introduction ......................................................................................... 51 3.2. Data Collection ................................................................................... 53 3.3. Corpus Database ................................................................................. 53 3.4. Corpus Compilation and Pre-processing ............................................ 54 3.5. Latent Semantic Analysis ................................................................... 56 3.5.1. LSA Similarity Query ..................................................................... 60 3.5.2. LSA Similarity Cutoff .................................................................... 62 3.5.3. LSA Output Evaluation ................................................................... 62 3.6. Corpus Stylistics ................................................................................. 62 3.6.1. Standardized Type-Token Ratio (STTR) ........................................ 63 3.6.2. Mean Sentence Length .................................................................... 64 3.6.3. Punctuation marks ........................................................................... 65 3.7. Statistical Testing ................................................................................ 65 3.8. Machine Learning Approach .............................................................. 66 3.8.1. Character n-grams ........................................................................... 68 3.8.2. Part of Speech (POS) n-grams ........................................................ 69 3.8.3. Word n-grams ................................................................................. 72 3.9. Tools Used in the Dissertation ............................................................ 73 3.10. Conclusion .......................................................................................... 74 CHAPTER 4: LATENT SEMANTIC ANALYSIS RESULTS ......................... 78 4.1. Introduction ......................................................................................... 78 iv 4.2. LSA Similarity Analysis ..................................................................... 79 4.2.1. LSA Similarity Query on J-D’s Translation before Creative Writing 80 4.2.1.1. LSA Results with V=100 ...................................................... 82 4.2.2. LSA Similarity Query on J-D’s Translation after Creative Writing 87 4.2.2.1. LSA Results with V=50 ........................................................ 89 4.3. Conclusion .......................................................................................... 93 CHAPTER 5: CORPUS STYLISTICS AND MACHINE LEARNING ANALYSIS RESULTS ...................................................................................... 94 5.1. Introduction ......................................................................................... 94 5.2. Corpus Analysis .................................................................................. 95 5.2.1. Textual Analysis ............................................................................. 95 5.2.1.1. Standardized Type-Token Ratio ........................................... 95 5.2.1.2. Mean Sentence Length .......................................................... 97 5.2.2. Punctuation Marks Analysis ........................................................... 98 5.2.2.1. Standardized hyphen Analysis .............................................. 99 5.2.2.2. Standardized Comma Analysis ........................................... 101 5.2.2.3. Standardized Semicolon Analysis ....................................... 102 5.2.3. SPSS Statistical Analysis .............................................................. 103 5.2.3.1. Textual Analysis ................................................................. 104 5.2.3.1.1. Standardized Type-Token Ratios (STTRs) .................. 104 5.2.3.2. Mean Sentence Length ........................................................ 105 v 5.2.3.3. Punctuation Marks analysis ................................................ 105 5.2.3.3.1. Standardized Comma analysis ...................................... 105 5.2.3.3.2. Standardized Hyphen analysis ...................................... 106 5.2.3.3.3. Standardized Semicolon analysis ................................. 107 5.3. Machine Learning Stylometry .......................................................... 108 5.3.1. JGAAP Tool ................................................................................. 110 5.3.2. Corpus Pre-processing .................................................................. 112 5.3.3. JGAAP Analysis Method .............................................................. 113 5.3.4. Style Markers Analysis ................................................................. 114 5.3.4.1. Character n-gram analysis ................................................... 114 5.3.4.2. Part-of-Speech (POS) Analysis ........................................... 115 5.3.4.3. Word n-gram Analysis ........................................................ 117 5.3.5. Conclusion .................................................................................... 118 CHAPTER 6: DISCUSSION ............................................................................ 122 6.1. Introduction ....................................................................................... 122 6.2. Zooming into the Results .................................................................. 123 6.3. Thematic analysis .............................................................................. 125 6.4. Textual Analysis ............................................................................... 126 6.4.1. STTR ............................................................................................. 126 6.4.2. Mean Sentence length ................................................................... 127 6.5. Punctuation Marks ............................................................................ 128 6.6. Syntactic Analysis ............................................................................. 130 vi 6.7. Word n-gram Analysis ...................................................................... 131 6.8. Character n-gram Analysis