Latent Semantic Analysis, Corpus Stylistics and Machine Learning

Latent Semantic Analysis, Corpus Stylistics and Machine Learning

Latent Semantic Analysis, Corpus stylistics and Machine Learning Stylometry for Translational and Authorial Style Analysis: The Case of Denys Johnson-Davies’ Translations into English A dissertation submitted to Kent State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Mohammed Al Batineh May, 2015 © Copyright by Mohammed S. Al-Batineh All Rights Reserved Dissertation written by Mohammed Al Batineh BA., Yarmouk University, Jordan, 2008 MA., Yarmouk University, Jordan, 2010 APPROVED BY __________________________, Chair, Doctoral Dissertation Committee Dr. Françoise Massardier-Kenney (advisor) __________________________, Members, Doctoral Dissertation Committee Dr. Carol Maier __________________________, Dr. Gregory M. Shreve __________________________, Dr. Jonathan I. Maletic __________________________, Dr. Katherine Rawson ACCEPTED BY __________________________, Interim Chair, Modern and Classical Language Studies Dr. Keiran J Dunne __________________________, Dean, College of Arts and Sciences Dr. James L. Blank TABLE OF CONTENTS LIST OF FIGURES ........................................................................................... viii LIST OF TABLES ............................................................................................... ix DEDICATION ...................................................................................................... x ABSTRACT ........................................................................................................ xii CHAPTER 1: INTRODUCTION ......................................................................... 1 1.1. Introduction ........................................................................................... 1 1.2. Denys Johnson-Davies .......................................................................... 6 1.3. Research Hypotheses ............................................................................ 8 1.4. Research Method .................................................................................. 9 1.5. Significance of the Study .................................................................... 11 1.6. Summary of Chapters ......................................................................... 12 CHAPTER 2: LITERATURE REVIEW ............................................................ 14 2.1. A Brief History of Literary Stylistics .................................................. 14 2.2. Approaches to Style in Translation Studies ........................................ 17 2.3. Text-Oriented Approaches .................................................................. 18 2.3.1. Comparative Approach ................................................................... 19 2.3.2. Target-Oriented Approach .............................................................. 25 2.4. Translator-Oriented Approaches ......................................................... 27 2.5. Cognitive-Oriented Approach ............................................................. 44 2.6. Conclusion .......................................................................................... 47 iii CHAPTER 3: METHODOLOGY ...................................................................... 51 3.1. Introduction ......................................................................................... 51 3.2. Data Collection ................................................................................... 53 3.3. Corpus Database ................................................................................. 53 3.4. Corpus Compilation and Pre-processing ............................................ 54 3.5. Latent Semantic Analysis ................................................................... 56 3.5.1. LSA Similarity Query ..................................................................... 60 3.5.2. LSA Similarity Cutoff .................................................................... 62 3.5.3. LSA Output Evaluation ................................................................... 62 3.6. Corpus Stylistics ................................................................................. 62 3.6.1. Standardized Type-Token Ratio (STTR) ........................................ 63 3.6.2. Mean Sentence Length .................................................................... 64 3.6.3. Punctuation marks ........................................................................... 65 3.7. Statistical Testing ................................................................................ 65 3.8. Machine Learning Approach .............................................................. 66 3.8.1. Character n-grams ........................................................................... 68 3.8.2. Part of Speech (POS) n-grams ........................................................ 69 3.8.3. Word n-grams ................................................................................. 72 3.9. Tools Used in the Dissertation ............................................................ 73 3.10. Conclusion .......................................................................................... 74 CHAPTER 4: LATENT SEMANTIC ANALYSIS RESULTS ......................... 78 4.1. Introduction ......................................................................................... 78 iv 4.2. LSA Similarity Analysis ..................................................................... 79 4.2.1. LSA Similarity Query on J-D’s Translation before Creative Writing 80 4.2.1.1. LSA Results with V=100 ...................................................... 82 4.2.2. LSA Similarity Query on J-D’s Translation after Creative Writing 87 4.2.2.1. LSA Results with V=50 ........................................................ 89 4.3. Conclusion .......................................................................................... 93 CHAPTER 5: CORPUS STYLISTICS AND MACHINE LEARNING ANALYSIS RESULTS ...................................................................................... 94 5.1. Introduction ......................................................................................... 94 5.2. Corpus Analysis .................................................................................. 95 5.2.1. Textual Analysis ............................................................................. 95 5.2.1.1. Standardized Type-Token Ratio ........................................... 95 5.2.1.2. Mean Sentence Length .......................................................... 97 5.2.2. Punctuation Marks Analysis ........................................................... 98 5.2.2.1. Standardized hyphen Analysis .............................................. 99 5.2.2.2. Standardized Comma Analysis ........................................... 101 5.2.2.3. Standardized Semicolon Analysis ....................................... 102 5.2.3. SPSS Statistical Analysis .............................................................. 103 5.2.3.1. Textual Analysis ................................................................. 104 5.2.3.1.1. Standardized Type-Token Ratios (STTRs) .................. 104 5.2.3.2. Mean Sentence Length ........................................................ 105 v 5.2.3.3. Punctuation Marks analysis ................................................ 105 5.2.3.3.1. Standardized Comma analysis ...................................... 105 5.2.3.3.2. Standardized Hyphen analysis ...................................... 106 5.2.3.3.3. Standardized Semicolon analysis ................................. 107 5.3. Machine Learning Stylometry .......................................................... 108 5.3.1. JGAAP Tool ................................................................................. 110 5.3.2. Corpus Pre-processing .................................................................. 112 5.3.3. JGAAP Analysis Method .............................................................. 113 5.3.4. Style Markers Analysis ................................................................. 114 5.3.4.1. Character n-gram analysis ................................................... 114 5.3.4.2. Part-of-Speech (POS) Analysis ........................................... 115 5.3.4.3. Word n-gram Analysis ........................................................ 117 5.3.5. Conclusion .................................................................................... 118 CHAPTER 6: DISCUSSION ............................................................................ 122 6.1. Introduction ....................................................................................... 122 6.2. Zooming into the Results .................................................................. 123 6.3. Thematic analysis .............................................................................. 125 6.4. Textual Analysis ............................................................................... 126 6.4.1. STTR ............................................................................................. 126 6.4.2. Mean Sentence length ................................................................... 127 6.5. Punctuation Marks ............................................................................ 128 6.6. Syntactic Analysis ............................................................................. 130 vi 6.7. Word n-gram Analysis ...................................................................... 131 6.8. Character n-gram Analysis

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    198 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us