Latent Semantic Analysis, Corpus Stylistics and Machine Learning

Latent Semantic Analysis, Corpus stylistics and Machine Learning Stylometry for Translational and Authorial Style Analysis: The Case of Denys Johnson-Davies’ Translations into English A dissertation submitted to Kent State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Mohammed Al Batineh May, 2015 © Copyright by Mohammed S. Al-Batineh All Rights Reserved Dissertation written by Mohammed Al Batineh BA., Yarmouk University, Jordan, 2008 MA., Yarmouk University, Jordan, 2010 APPROVED BY __________________________, Chair, Doctoral Dissertation Committee Dr. Françoise Massardier-Kenney (advisor) __________________________, Members, Doctoral Dissertation Committee Dr. Carol Maier __________________________, Dr. Gregory M. Shreve __________________________, Dr. Jonathan I. Maletic __________________________, Dr. Katherine Rawson ACCEPTED BY __________________________, Interim Chair, Modern and Classical Language Studies Dr. Keiran J Dunne __________________________, Dean, College of Arts and Sciences Dr. James L. Blank TABLE OF CONTENTS LIST OF FIGURES ........................................................................................... viii LIST OF TABLES ............................................................................................... ix DEDICATION ...................................................................................................... x ABSTRACT ........................................................................................................ xii CHAPTER 1: INTRODUCTION ......................................................................... 1 1.1. Introduction ........................................................................................... 1 1.2. Denys Johnson-Davies .......................................................................... 6 1.3. Research Hypotheses ............................................................................ 8 1.4. Research Method .................................................................................. 9 1.5. Significance of the Study .................................................................... 11 1.6. Summary of Chapters ......................................................................... 12 CHAPTER 2: LITERATURE REVIEW ............................................................ 14 2.1. A Brief History of Literary Stylistics .................................................. 14 2.2. Approaches to Style in Translation Studies ........................................ 17 2.3. Text-Oriented Approaches .................................................................. 18 2.3.1. Comparative Approach ................................................................... 19 2.3.2. Target-Oriented Approach .............................................................. 25 2.4. Translator-Oriented Approaches ......................................................... 27 2.5. Cognitive-Oriented Approach ............................................................. 44 2.6. Conclusion .......................................................................................... 47 iii CHAPTER 3: METHODOLOGY ...................................................................... 51 3.1. Introduction ......................................................................................... 51 3.2. Data Collection ................................................................................... 53 3.3. Corpus Database ................................................................................. 53 3.4. Corpus Compilation and Pre-processing ............................................ 54 3.5. Latent Semantic Analysis ................................................................... 56 3.5.1. LSA Similarity Query ..................................................................... 60 3.5.2. LSA Similarity Cutoff .................................................................... 62 3.5.3. LSA Output Evaluation ................................................................... 62 3.6. Corpus Stylistics ................................................................................. 62 3.6.1. Standardized Type-Token Ratio (STTR) ........................................ 63 3.6.2. Mean Sentence Length .................................................................... 64 3.6.3. Punctuation marks ........................................................................... 65 3.7. Statistical Testing ................................................................................ 65 3.8. Machine Learning Approach .............................................................. 66 3.8.1. Character n-grams ........................................................................... 68 3.8.2. Part of Speech (POS) n-grams ........................................................ 69 3.8.3. Word n-grams ................................................................................. 72 3.9. Tools Used in the Dissertation ............................................................ 73 3.10. Conclusion .......................................................................................... 74 CHAPTER 4: LATENT SEMANTIC ANALYSIS RESULTS ......................... 78 4.1. Introduction ......................................................................................... 78 iv 4.2. LSA Similarity Analysis ..................................................................... 79 4.2.1. LSA Similarity Query on J-D’s Translation before Creative Writing 80 4.2.1.1. LSA Results with V=100 ...................................................... 82 4.2.2. LSA Similarity Query on J-D’s Translation after Creative Writing 87 4.2.2.1. LSA Results with V=50 ........................................................ 89 4.3. Conclusion .......................................................................................... 93 CHAPTER 5: CORPUS STYLISTICS AND MACHINE LEARNING ANALYSIS RESULTS ...................................................................................... 94 5.1. Introduction ......................................................................................... 94 5.2. Corpus Analysis .................................................................................. 95 5.2.1. Textual Analysis ............................................................................. 95 5.2.1.1. Standardized Type-Token Ratio ........................................... 95 5.2.1.2. Mean Sentence Length .......................................................... 97 5.2.2. Punctuation Marks Analysis ........................................................... 98 5.2.2.1. Standardized hyphen Analysis .............................................. 99 5.2.2.2. Standardized Comma Analysis ........................................... 101 5.2.2.3. Standardized Semicolon Analysis ....................................... 102 5.2.3. SPSS Statistical Analysis .............................................................. 103 5.2.3.1. Textual Analysis ................................................................. 104 5.2.3.1.1. Standardized Type-Token Ratios (STTRs) .................. 104 5.2.3.2. Mean Sentence Length ........................................................ 105 v 5.2.3.3. Punctuation Marks analysis ................................................ 105 5.2.3.3.1. Standardized Comma analysis ...................................... 105 5.2.3.3.2. Standardized Hyphen analysis ...................................... 106 5.2.3.3.3. Standardized Semicolon analysis ................................. 107 5.3. Machine Learning Stylometry .......................................................... 108 5.3.1. JGAAP Tool ................................................................................. 110 5.3.2. Corpus Pre-processing .................................................................. 112 5.3.3. JGAAP Analysis Method .............................................................. 113 5.3.4. Style Markers Analysis ................................................................. 114 5.3.4.1. Character n-gram analysis ................................................... 114 5.3.4.2. Part-of-Speech (POS) Analysis ........................................... 115 5.3.4.3. Word n-gram Analysis ........................................................ 117 5.3.5. Conclusion .................................................................................... 118 CHAPTER 6: DISCUSSION ............................................................................ 122 6.1. Introduction ....................................................................................... 122 6.2. Zooming into the Results .................................................................. 123 6.3. Thematic analysis .............................................................................. 125 6.4. Textual Analysis ............................................................................... 126 6.4.1. STTR ............................................................................................. 126 6.4.2. Mean Sentence length ................................................................... 127 6.5. Punctuation Marks ............................................................................ 128 6.6. Syntactic Analysis ............................................................................. 130 vi 6.7. Word n-gram Analysis ...................................................................... 131 6.8. Character n-gram Analysis

Latent Semantic Analysis, Corpus Stylistics and Machine Learning

St.Litter.-2 Lam.(4)-2018.Indd

Genealogies of Feminism: Leftist Feminist Subjectivity in the Wake of the Islamic Revival in Contemporary Morocco

The Role of Social Agents in the Translation Into English of the Novels of Naguib Mahfouz

Ramzi Salti, Ph.D

Unit 1: Introduction

A Study of Short Stories by Assia Djebar and Alifa Rifaat

Egypt & Alifa Rifaat

Spotlight on the Muslim Middle East-Issues of Identity. a Student

Feminism and Religion in Alifa Rifaat's Short Stories Ramzi M. Salti

Cambridge Handbook of English Corpus Linguistics Chapter 2: Computational Tools and Methods for Corpus Compilation and Analysis1

Introduction ¸¹º

Significant Concordance and Co-Occurrence in Quantitative