Latent Semantic Analysis, Corpus stylistics and Machine Learning

Stylometry for Translational and Authorial Style Analysis: The Case of Denys

Johnson-Davies’ into English

A dissertation submitted to Kent State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy

by

Mohammed Al Batineh

May, 2015

© Copyright by Mohammed S. Al-Batineh

All Rights Reserved

Dissertation written by

Mohammed Al Batineh

BA., Yarmouk University, Jordan, 2008

MA., Yarmouk University, Jordan, 2010

APPROVED BY

______, Chair, Doctoral Dissertation Committee

Dr. Françoise Massardier-Kenney (advisor)

______, Members, Doctoral Dissertation Committee

Dr. Carol Maier

______,

Dr. Gregory M. Shreve

______,

Dr. Jonathan I. Maletic

______,

Dr. Katherine Rawson

ACCEPTED BY

______, Interim Chair, Modern and Classical Language Studies

Dr. Keiran J Dunne

______, Dean, College of Arts and Sciences

Dr. James L. Blank

TABLE OF CONTENTS

LIST OF FIGURES ...... viii

LIST OF TABLES ...... ix

DEDICATION ...... x

ABSTRACT ...... xii

CHAPTER 1: INTRODUCTION ...... 1

1.1. Introduction ...... 1

1.2. Denys Johnson-Davies ...... 6

1.3. Research Hypotheses ...... 8

1.4. Research Method ...... 9

1.5. Significance of the Study ...... 11

1.6. Summary of Chapters ...... 12

CHAPTER 2: LITERATURE REVIEW ...... 14

2.1. A Brief History of Literary Stylistics ...... 14

2.2. Approaches to Style in Studies ...... 17

2.3. Text-Oriented Approaches ...... 18

2.3.1. Comparative Approach ...... 19

2.3.2. Target-Oriented Approach ...... 25

2.4. Translator-Oriented Approaches ...... 27

2.5. Cognitive-Oriented Approach ...... 44

2.6. Conclusion ...... 47

iii

CHAPTER 3: METHODOLOGY ...... 51

3.1. Introduction ...... 51

3.2. Data Collection ...... 53

3.3. Corpus Database ...... 53

3.4. Corpus Compilation and Pre-processing ...... 54

3.5. ...... 56

3.5.1. LSA Similarity Query ...... 60

3.5.2. LSA Similarity Cutoff ...... 62

3.5.3. LSA Output Evaluation ...... 62

3.6. Corpus Stylistics ...... 62

3.6.1. Standardized Type-Token Ratio (STTR) ...... 63

3.6.2. Mean Sentence Length ...... 64

3.6.3. Punctuation marks ...... 65

3.7. Statistical Testing ...... 65

3.8. Machine Learning Approach ...... 66

3.8.1. Character n-grams ...... 68

3.8.2. Part of Speech (POS) n-grams ...... 69

3.8.3. Word n-grams ...... 72

3.9. Tools Used in the Dissertation ...... 73

3.10. Conclusion ...... 74

CHAPTER 4: LATENT SEMANTIC ANALYSIS RESULTS ...... 78

4.1. Introduction ...... 78

iv

4.2. LSA Similarity Analysis ...... 79

4.2.1. LSA Similarity Query on J-D’s Translation before Creative Writing

80

4.2.1.1. LSA Results with V=100 ...... 82

4.2.2. LSA Similarity Query on J-D’s Translation after Creative Writing 87

4.2.2.1. LSA Results with V=50 ...... 89

4.3. Conclusion ...... 93

CHAPTER 5: CORPUS STYLISTICS AND MACHINE LEARNING

ANALYSIS RESULTS ...... 94

5.1. Introduction ...... 94

5.2. Corpus Analysis ...... 95

5.2.1. Textual Analysis ...... 95

5.2.1.1. Standardized Type-Token Ratio ...... 95

5.2.1.2. Mean Sentence Length ...... 97

5.2.2. Punctuation Marks Analysis ...... 98

5.2.2.1. Standardized hyphen Analysis ...... 99

5.2.2.2. Standardized Comma Analysis ...... 101

5.2.2.3. Standardized Semicolon Analysis ...... 102

5.2.3. SPSS Statistical Analysis ...... 103

5.2.3.1. Textual Analysis ...... 104

5.2.3.1.1. Standardized Type-Token Ratios (STTRs) ...... 104

5.2.3.2. Mean Sentence Length ...... 105

v

5.2.3.3. Punctuation Marks analysis ...... 105

5.2.3.3.1. Standardized Comma analysis ...... 105

5.2.3.3.2. Standardized Hyphen analysis ...... 106

5.2.3.3.3. Standardized Semicolon analysis ...... 107

5.3. Machine Learning Stylometry ...... 108

5.3.1. JGAAP Tool ...... 110

5.3.2. Corpus Pre-processing ...... 112

5.3.3. JGAAP Analysis Method ...... 113

5.3.4. Style Markers Analysis ...... 114

5.3.4.1. Character n-gram analysis ...... 114

5.3.4.2. Part-of-Speech (POS) Analysis ...... 115

5.3.4.3. Word n-gram Analysis ...... 117

5.3.5. Conclusion ...... 118

CHAPTER 6: DISCUSSION ...... 122

6.1. Introduction ...... 122

6.2. Zooming into the Results ...... 123

6.3. Thematic analysis ...... 125

6.4. Textual Analysis ...... 126

6.4.1. STTR ...... 126

6.4.2. Mean Sentence length ...... 127

6.5. Punctuation Marks ...... 128

6.6. Syntactic Analysis ...... 130

vi

6.7. Word n-gram Analysis ...... 131

6.8. Character n-gram Analysis ...... 132

6.9. On the Translating and Writing Activities of J-D ...... 133

6.10. A Framework for Studying Translator Style ...... 138

6.10.1. Corpus Compilation and Control ...... 139

6.10.2. Digital Corpus Preparation: ...... 140

6.10.3. Corpus Preprocessing ...... 141

6.10.4. Style Markers Selection: ...... 141

6.10.5. Corpus Analysis Method...... 143

6.11. Conclusion ...... 145

CHAPTER 7: CONCLUSION ...... 147

7.1. Summary of Results ...... 147

7.2. Limitations of the Study...... 153

7.3. Implication of LSA method for Translation Studies ...... 154

7.4. Future Directions ...... 154

GLOSSARY OF ACRONYMS ...... 156

REFERENCES ...... 158

APPENDIX A: List of Denys Johnson-Davies’ Translated Short Stories ...... 168

APPENDIX B: List of Denys Johnson-Davies’ Creative Writing Short Stories

...... 187

vii

LIST OF FIGURES

Figure 1: Matrix of made up documents ...... 58

Figure 2: Made up documents in 2-dimensional space ...... 59

Figure 3: LSA similarity query ...... 60

Figure 4: One-to-many similarity query process ...... 61

Figure 5: LSA Analysis of TBCRW_raw ...... 80

Figure 6: LSA experiment 1 results (Q1--Q5) ...... 85

Figure 7: LSA experiment 1 results (Q6--Q10) ...... 86

Figure 8: LSA experiment 1 results (Q11--Q15) ...... 86

Figure 9: LSA analysis of TACRW_raw ...... 88

Figure 10: LSA experiment 2 results (Q1--Q5) ...... 91

Figure 11: LSA experiment 2 results (Q6--Q10) ...... 92

Figure 12: LSA experiment 2 results (Q11-Q15) ...... 92

Figure 13: Machine Learning Translator style detection (Adopted form Efstathios

Stamatatos) ...... 109

Figure 14: JGAAP tool interface ...... 111

Figure 15: Vectors of made up documents ...... 113

Figure 16: Machine Learning Character 3-gram analysis ...... 114

Figure 17: Machine Learning POS n-gram analysis ...... 116

Figure 18: Machine Learning Word n-gram analysis ...... 117

Figure 19: Framework for translator style analysis ...... 144

viii

LIST OF TABLES

Table 1: The corpora of the present study ...... 55

Table 2: Penn Treebank tag set (Adopted from Marcus, Santorini, and

Marcinkiewicz) ...... 70

Table 3: Tools used in the dissertation ...... 73

Table 4: CRW and TBCRW_raw size ...... 81

Table 5: LSA output of similarity query analysis on TBCRW_raw ...... 84

Table 6: CRW and TACRW_raw size ...... 88

Table 7: LSA output of similarity query analysis on TACRW_raw ...... 90

Table 8: Study corpora from the LSA results ...... 94

Table 9: STTR score in the three corpora ...... 96

Table 10: Mean Sentence Length score in the three corpora ...... 97

Table 11: Standardized Hyphen score in the three corpora ...... 100

Table 12: Standardized Comma score in the three corpora ...... 101

Table 13: Standardized Semicolon score in the three corpora ...... 102

ix

DEDICATION

To my parents, Sabri and Aisha

To my family and friends

x

ACKNOWLEDGEMENTS

I would like to express my sincere gratitude to my supervisor, Dr. Françoise

Massardier-Kenney for her guidance and insightful feedback throughout the writing process of this dissertation. Dr. Kenney’s support is beyond what words can express. I would also like to thank the committee members: Dr. Carol Maier, Dr. Gregory M.

Shreve, Dr. Jonathan Maletic and Dr. Katherine Rawson for their time and for all their comments and suggestions.

Many thanks to the faculty members of the Department of Modern and Classical

Language Studies for helping me grow as a graduate student and a researcher in translation. I would like to particularly thank Dr. Isabel Lacruz for her help refining the experimental component of this study. I am also grateful to Dr. Judy Wakabayashi for helping me present some of research in two conferences and for always being encouraging and supportive.

Special thanks to Dr. Nouh Al-Hindawi, Brian Bartman, Loubna Bilali, Mohammed

Al-Rawashdeh, Adriana Di Biase, Bilal Sayaheen and Ibrahim Al- for their help, support and friendship.

xi

ABSTRACT

The analysis of style in translation discipline typically relies on methods borrowed from literary studies. Most of the style-related research conducted in translation studies has either focused on the style of the author or on the text type as manifested in the translation as opposed to the style of the translator. The few studies of translator style that have been carried out using corpus methodologies present some methodological limitations related to corpus compilation and control which affect the analyis of style. To address these limitations, the present study adopts an interdisciplinary approach combining Latent Semantic Analysis (LSA), and methods from Corpus Stylistics, and Machine Learning Stylometry in order to develop a rigorous framework for studying translator style. The suggested framework is developed based on the investigation of the translations and creative writings of

Denys Johnson-Davies (J-D), a British creative writer and an -English translator. This study attempts to trace instances where the style of J-D the translator intersects with the style of J-D the author. It investigates the effect of J-D’s translating activity on his own writing and vice versa in order to determine the extent to which the two activities influence each other. The computational stylistic (corpus

& machine learning) and the thematic (LSA) analyses suggest that J-D’s style as a translator impacted his style as a writer. In addition, it was evident that translation helped J-D to develop his writing skills and style. Indeed, the translating activity served as a source of inspiration and intertextuality for his creative writing. As for the interaction between J-D’s creative writing and the post-creative writing translations,

xii

the findings show that J-D’s creative writing impacted the selection of short stories he translated after the production of his creative writing, which revolved around themes he developed as a creative writer.

xiii

CHAPTER 1: INTRODUCTION

1.1. Introduction

This dissertation investigates the interaction between translation and creative writing activities of translators by focusing on the case of the translator- writer Denys Johnson-Davies (J-D). J-D was the first and one of the most influential

Arabic-English literary translators of Modern . The study focuses on

J-D’s style as displayed in his translations and his creative writings in an attempt to determine if his translating activity influenced or was impacted by his creative writing activities. It discusses the history and the impact that stylistics within literary studies has on the study of style in translation studies. It also reviews the literature related to translator style within translation studies. The study also highlights the limitations of previous research in this area; it establishes the need for more empirical investigations of translator style and proposes an interdisciplinary framework for translator style analysis combining methods from corpus stylistics, computer science and stylometry.

The interdisciplinary nature of translation studies has encouraged translation scholars to adopt methods from literary studies in order to study style. However, the application of such methods in translation studies is still debatable and problematic.

Within literary studies, Geoffrey Leech defines style as “the sum of linguistic features associated with texts or textual samples defined by some set of contextual parameters” (55). In translation, style refers to the Target Text (TT) style as

1

manifested in its type1 or to the translator style. The former has been studied either by tracing the manifestation of the Source Text (ST) style in the Target Text (TT)

(Reiss, Translation Criticism- Potentials and Limitations: Categories and Criteria for

Translation Quality Assessment), by finding out the extent to which the TT follows the conventions and the norms of the target culture (Nord) or by examining the realization of the translation brief in the TT (Vermeer). The latter (translator style) has been studied through the analysis of the stylistic patterns of a specific translator that distinguish him/her from other translators.

Kirsten Malmkjær defines translational stylistics as “the study of why, given the source text, the translation has been shaped in such a way that it comes to mean what it does [emphasis in the original]” (39). Works on translational stylistics have focused on either source texts or on target texts as their object of analysis. For instance, Eugene Nida, the leading theoretician of the linguistic school, focuses on the notions of content and form, message and style, as a way to talk about equivalence placing more emphasis on the style of the source text as a reference for producing the style of the target text. In contrast, the functionalist approach has shifted the focus from source-oriented to target-oriented stylistics. The study of style has become more concerned with the function of the TT and has departed from any stylistic constraints imposed by the style of the ST. Katharina Reiss, for instance, has taken style as a

1 Typology of texts based on their linguistic and stylistic characteristics. Based on these features, texts are divided in to types depending on their function, which could be narrative, descriptive, expository or argumentative.

2

point of departure for her work on translation criticism. She relies on text types and the way the stylistic features of each text type should be exhibited in target culture and the target text (TT). According to the functionalist approach, a poem needs to function as a poem in the target literary system. In the same manner, a novel needs to function as a novel based on the norms and the convention of the target culture’s literature (Nord).

With the emergence of the cultural turn, which placed more emphasis on translators as cultural mediators and translation as rewriting and manipulation, some scholars have started to argue for the importance of recognizing the translator’s voice and presence in his/her translations and for considering translators as one of the main agents in the translation process. Theo Hermans argues, for example, that the TT always implies more than one voice in the text, more than one discursive presence.

He also indicates that the “illusion of transparency” and the “illusion of one voice” blind the reader to the presence of the other voice (27). In the same vein, Lawrence

Venuti in The Translator’s Invisibility advocates making “the translator more visible so as to resist and change the conditions under which translation is theorized and practiced” (17). These arguments for a more sophisticated understanding of the translator’s voice or visibility have implications for the study of style in translation studies.

The matter of close reading, i.e., the careful reading of passages with an emphasis on lexical choices, figures of speech and the syntax of a specific text, has dominated the study of style or voice in translation. Close reading might be a good

3

tool when analyzing style in isolated texts; however, it is not applicable to a large corpus of texts. In this kind of corpus, close reading becomes “untenable as an exhaustive or definitive method of evidence gathering [; and in following it,] something important will inevitably be missed” (Jockers 9). Tracing the stylistic patterns of a specific translator in a large corpus of different translations through close reading as a methodology is not only extremely time consuming and but it is likely that some stylistic patterns will be missed.

However, compiling and analyzing a corpus of texts for the purpose of analyzing translator style is not an easy task. Studies on this topic are rare and are mostly derived from literary studies (e.g. Mason and Abdullah). Gabriela Saldanha argues that stylistics in literary studies, as defined by Leech and other scholars, are meant to study the “textual” style of the translation by focusing on the reproduction of the source-text style in the target text, and not the “personal” style of the translator, which is a ‘way of translating’ that “distinguishes one translator’s work from that of others, and is felt to be recognisable across a range of translations by the same translator” (Saldanha 28). In order to determine this style across different TT by the same translator, an interdisciplinary approach that goes beyond traditional close reading might be useful to allow translation scholars to analyze and reveal translator stylistic patterns of choice.

Recently, translation scholars have begun to use computational methods in the analysis of style. One of the first studies that makes use of computational methods to analyze translator style is Mona Baker's “Towards a Methodology”. In her study,

4

Baker uses corpus stylistics, which is a method that makes use of the computer for extracting some stylistic patterns in a large corpus of digital texts. Another seminal work on translator style is done by Saldanha. Following Baker’s steps, Saldanha makes use of corpus stylistics to analyze the style of two translators in an attempt to develop a methodology and to propose a working definition for translator style.

However, the two studies present some methodological problem related to corpus compilation, analysis and control. These issues are discussed in more details in

Chapter two.

What might be needed is a broader perspective that adopts interdisciplinary approaches to reveal the personal attributes of the translator. Within computer science, non-traditional authorship attribution (AA) and stylometry have exclusively worked on developing computational methods, relying on artificial intelligence and on statistical analysis to analyze authorial style. AA is a domain aimed at automatically analyzing texts based on their author’s style (Cristani et al.). Stylometry is also a field of study that draws on the use of computers to statistically analyze the style of one author or the variation in style between two or more authors. It builds on the assumption that writers have unique unconscious writing habits or features. These unique writing features are measured to create an author profile against which other texts or authors can be compared (Schulstad et al.). Tony McEnery and Michael

Oakes define stylometric analysis as “an attempt to capture the essence of the style of a particular author by reference to a variety of quantitative criteria” (548). Different scholars in these varying fields have proven that methods used in such research are, to

5

a great extent, accurate (Grieve, Houvardas and Stamatatos and McEnery and Oakes) and outperform manual analysis of patterns of personal style of text producers.

Applied to the personal style of authors, methods from authorship attribution and stylometry could in turn be applied to the study of translator style. In this regard,

Meng Ji argues that stylometry is one of the best-established methodologies for studying the authorial style of any document author, “but it has rarely been practiced on translation texts in exploring literary stylistics or authorship attribution. As a result, many scholarly works on individual translators’ style seem to have based their judgments on limited excerpts randomly and irregularly selected from parallel source/target texts” (Ji 79). Thus, the present study draws on the notion of translator style and adopts methods from authorship attributions and stylometry in order to analyze and compare the authorial and the translational style of Denys Johnson-

Davies (J-D) and to propose an interdisciplinary framework for translator style analysis.

1.2. Denys Johnson-Davies

Denys Johnson-Davies is one of the most influential Arabic-English translators of modern Arabic literature. J-D was born in 1922 in Canada to English parents. He spent his childhood in , then in Sudan (Johnson-Davies 1). At the age of fourteen, J-D attended the School of Oriental Studies in London. In the summer of that year, he went to Cairo to learn Arabic; this was his first academic encounter with the Arabic language and culture. During his stay, he attempted to

6

immerse himself in the Arabic Egyptian culture, frequenting traditional cafés in

Cairo, where he made several friends. This gave him the chance to become familiar with Egyptian culture, daily life and dialect, which had a great effect on his translation work and text selection.

The following year, Johnson-Davies went to Cambridge, where he began reading Arabic literary works in addition to extracts from the Qur’an. This experience increased his knowledge of Arabic language, literary style and tradition. After graduation, he was employed by the BBC radio. His duties included “checking translations made into Arabic of talks to be broadcasted” (Johnson-Davies 9). This was his first encounter with translation. At that time, he became eager to know more about Arabic language and literature, and he contacted famous Arabic short story writers, such as Mahmud Teymour, , Tayeb Salih and Twfiq al-

Hakim.

After meeting a number of Arab writers, Johnson-Davies took his first step towards translation in 1946 by translating two of Teymour’s short stories into

English. He published translations in literary magazines, with the help of a friend. At that time, Davies was the only translator who was interested in translating Modern

Arabic literature into English. He continued his career as an Arabic-English translator and he was the first to recognize and translate Naguib Mahfouz, the Nobel laureate. J-

D has translated more than thirty Arabic volumes including short stories, novels and biographies. He also won many translation prizes.

After he had been translating for almost forty years, J-D began to write his

7

own short stories. He published his own collection in English under the title Fate of a

Prisoner and Other Stories in 1999. The collection contains fifteen short stories discussing themes related to Arab culture and narrates stories taking place in different

Arab countries such as , Sudan, Lebanon and the Arabian Gulf. However, J-D the author was not successful as J-D the translator. His translations received more attention than his creative writing. He mentions in his autobiography that his work,

Fate of a Prisoner and Other Stories, “received no notices” except for two reviews one in Al-Ahram Weekly by John Rodenbeck, J-D’s friend and the other in The

Literary Review by Francis King.

Few works analyze the notions of creative writing and translating as two activities done by a single author/translator. To the best of my knowledge and based on the literature I consulted that dealt with the notion of translator style, the only study that investigated those two activities performed by a writer-translator is

Walder’s “A Timbre of Its Own: investigating style in translation and original writing”. Thus, J-D’s translating and creative writing activities make a valuable case study for addressing the stylistic and thematic influence of translating on creative writing and vice versa.

1.3. Research Hypotheses

The present study focuses on the translating and the creative writing activities of J-D and their impact on each other. It investigates whether the short stories J-D translated before the production of the creative writing have a thematic or

8

stylistic impact on his creative writing short stories and whether J-D’s creative writing short stories impacted his style and selection of short stories translated after the production of creative writing. To this end, this study tests the following hypotheses:

1- The short stories J-D translated before the production of creative writing are

close in theme to his creative writing short stories.

2- The short stories J-D translated after the production of creative writing are

close in theme to his creative writing short stories.

3- The short stories J-D translated before the production of creative writing will

display a number of stylistic characteristics similar to those visible in J-D’s

creative writing.

4- The stylistic characteristics of J-D’s creative writing are similar to those

which are visible in the short stories he translated after the production of

creative writing

1.4. Research Method

Three corpora were built to analyze the translational and the authorial style of

J-D. The first corpus contains short stories translated by J-D before the production of his creative writing. The second corpus contains his creative writing short stories and the third corpus contains the short stories J-D translated after the production of his creative writing. This study only analyzes short stories translated from Arabic into

English and written in English by J-D given the fact that J-D only wrote short stories.

Including only short stories helps control the corpus for genre to make sure that genre

9

does not interfere with the stylistic analysis in this study. For the data analysis, this study applies three computational methods to analyze the style of J-D his translations and writings. First, Latent Semantic Analysis (LSA) will be used as a fully automated methodology, taken from computer science, to conduct thematic similarity or relevancy analysis. The thematic analysis using LSA serves the current study in two ways: 1- it helps control the three corpora in this study for theme and 2- it helps reveal the thematic relation between J-D’s translations before and after his creative writing to his creative writing themes. The second method makes use of corpus stylistics, a sub-field of that uses the application of corpus methods and tools to analyze style. Baker defines a corpus as “any collection of running texts, held in electronic form and analyzable automatically or semi-automatically”

("Corpora" 225). Corpus stylitsics is used to analyze a set of style markers inclusing

Standarized Type-Token Ratio (STTR), average sentence length and punctuation marks.

The third method that will be used in this study is machine learning Profile-

Based (PB) approach. PB approach, adopted from stylometry, is used to build the authorial and the translational style of J-D in his translations and creative writing.

Such profiles contain different lexical and stylistic variables that show the personal stylistic attributes of the text producer. Machine learning PB is used to analyze the following style markers: word n-gram, character n-gram and Part-of-Speech (POS) n- gram. In this study, the style-markers profile analysis attempt to capture the lexical, semantic and syntactic variables in the writing and the translation of J-D in the three

10

corpora, short stories translated before creative writing, creative writing short stories and the short stories translated after creative writing.

1.5. Significance of the Study

This study addresses translator style as one of the largely ignored topics in translation studies. Saldanha points out that most work in translation studies focuses on the style of translations, as opposed to the style of translators (27). The study of translation style is source-oriented and focuses on the reproduction of the source-text style in the translation; while the style of translators is target-oriented focuses on stylistic patterns found in different translations produced by the translator. In addition, this study empirically investigates the relation between creative writing and translation as two activities carried out by one translator in an attempt to show the extent to which translating affects creative writing and vice versa. No systematic studies have been conducted in this area, and the only studies there remain anecdotal.

In addition, an important issue in the study of translator style is the lack of a solid methodological framework to conduct this type of analysis. Even works on translator style done using corpus methodologies by translation scholars present some methodological issues related to corpus compilation and control (e.g. Baker "Towards a Methodology" and Saldanha). Furthermore, existing definitions of a style markers or style variables are problematic. To address these issues, this dissertation adopts an interdisciplinary approach combining computational methods from corpus linguistics, computer science, and stylometry and authorship attributions to develop a rigorous

11

framework to study translator style. In addition, it proposes a style-marker profile that includes style variables, which are tested and approved by different scholars such as

Stamatatos, Fakotakis, and Kokkinakis, Kim and Walter; Zhou, Xu, and Tan; Coyotl-

Morales et al. and Zheng et al., to capture the personal style of text authors. Lastly, this study discusses best practices related to corpus compilation and control for translator style analysis purposes.

1.6. Summary of Chapters

The present dissertation includes seven chapters. Chapter one provides a general overview of the research on translator style in the field of translation studies.

It also addresses the significance of the present study and lays down the research hypotheses. Chapter two provides a review of the related literature. It starts with a general overview of stylistics in literary studies; it shows the connection between stylistics in literary studies and the study of style in translation studies, and also provides an overview of the development of the study of style in the translations studies field. Chapter three describes the research methodology, data collection and analysis. Chapter four reports the results of the Latent Semantic Analysis method that reveals the thematic connection between J-D’s translations and creative writing.

Chapter five provides the results of corpus stylistics analysis of Standardized Type-

Token Ratio (STTR), Sentence Length and punctuation marks. This chapter also reports the results of the second method of analysis, which relied on Machine-

Learning Profile-Based approach to analyze words n-grams, Part-of-Speech n-grams

12

and character n-grams in the three corpora in this study. Chapter six discusses the results of this study with reference to the research hypotheses that motivated it. The discussion is framed within translation theories related to the relation between translated and non-translated texts and the relation between creative writing and translation in order to reflect on the patterns discovered and to show the stylistic and thematic relations between creative writing and translation in the case of J-D. It also provides a proposed framework for translator style analysis and best practices related to corpus compilation and control for translator style analysis purposes. The last chapter in this study, chapter seven, summarizes the dissertation’s findings and discusses the limitations and proposes directions for future research.

13

CHAPTER 2: LITERATURE REVIEW

2.1. A Brief History of Literary Stylistics

Literary style has attracted the attention of many scholars and thinkers. Ancient

Roman and Greek philosophers such as Aristotle, Cicero, Demetrius and Quintilian treated “style as the proper adornment of thought” (Augustyn 1). They focused on the aesthetic function of literary style; the analysis of style was conducted to reveal the beauty of the literary piece. In this approach, the focus of stylistic analysis is mainly concentrated on metaphors, images and symbols. This view of style was held for many centuries and shifted only in early nineteenth century when style came to mean

“the message carried by the frequency-distributions and transitional probabilities of its linguistic features” (Bloch 42). In modern stylistics, style has become a component of literary analysis revealing what a text means (Culler 906). This modern view of style can be traced back to the work of the Russian formalist, Roman Jakobson who argues that the goal of stylistic analysis is to reveal the “literariness” of a verbal message (Jakobson 360). Jakobson was the first to propose a framework to conduct stylistic analysis of literary works. His framework includes analyzing six text-related elements in order to reveal the function of a particular communicative act. These elements include: context, addresser, addressee, contact, code, and message of the verbal act. Jakobson indicates that these six elements constitute six functions of language including a referential, an emotive, a conative, a phatic, a metalingual and a

14

poetic function. Jakobson’s thoughts on literary stylistics have impacted the different literary schools and scholars in Europe and in America.

Following Jakobson’s steps, Jan Mukařovský, a Czech literary and aesthetic theorist, addressed the notion of style in literature and argued that the style of a literary work is different from the style in “standard language”. His argument is one of the first arguments that discussed the style of non-literary works. Mukarovsky explains that what differentiates literary language from standard language is the use of some patterns that go against the norms of standard language (Allen). This in turn creates, what Victor Shklovsky calls, a “defamiliarizing effect” on the reader and attracts his/her attention (12). This deviation from ordinary language forces the reader to perceive the communicative act in a different way.

Built on the ideas of Jakobson and Mukarovsky, the New Criticism appeared in the United States during the early twentieth century. It focused on the close reading of texts as a way to analyze their style. New Criticism “disconnect[s] the literary text from its social and historical context” (Jancovich 200). That is, the text itself is the independent unit of meaning and meaning is inside the text. Another literary school,

Practical Criticism, appeared in Britain in the 1920s. This school adopted a psychological approach to stylistics and focused on the psychological effect of the interaction between the text and the reader (Richards 7). Michael Riffaterre, a member of this school, called for a new reader-oriented theory of style. In this theory he emphasized the importance of including an analysis of reader’s response in the stylistic analysis of the ‘act of communication’ (Clayton and Rothstein 24).

15

In turn, some stylistics scholars such as David Crystal and Derek Davy

(1969), Nils Enkvist (1973) and Michael Halliday (1978) criticized previous approaches to literary stylistics for not considering the social context in the stylistic analysis of texts. Crystal and Davy called for studying the social impact and the social context in style analysis. They were specifically concerned with the question of how a certain context or a social event could restrict the stylistic choices of a writer or a speaker (Sharndama and Mohammed). Halliday also emphasized the importance of the social function of discourse. He propounded a theory of stylistics that combines the language system and the social dimension of the language. Halliday argued that the social structure and language are two inseparable entities in any act of communication (Bawarshi and Reiff 30). Thus, the analysis of any linguistic act of communication should consider the social structure that is embedded in and transmitted through language.

The development and the combination of the social and the linguistic approaches to stylistics have paved the way for a new trend in literary stylistics known as discourse analysis. The discourse analysis approach focuses on context as an important component in the stylistic analysis of the discourse. Ronald Carter and Paul

Simpson argue that issues such as gender, class, sociopolitical determinations and ideology cannot be ignored in the analysis of the discourse in any communicative event, written or spoken (14). This approach argues that the stylistic analysis should be “concerned not simply with the micro-contexts of the effects of words across sentences or conversational turns but also with the macro-contexts of larger social

16

patterns” (Carter and Simpson 14). The analysis of micro and macro contexts of the discourse and the contextual elements of the discourse, as proposed by this approach, takes into consideration the linguistic, the social, the ideological, and the political aspects of the communicative act. The development of the stylistic approaches to literary style has impacted the stylistic approaches to style in translation studies. This impact and the development of the stylistic approaches to style in translation are discussed in the following sections.

2.2. Approaches to Style in Translation Studies

Translation scholars who focus on literary translation have adopted approaches from literary studies to study and discuss style in translation. For instance, the notions of literariness (Jakobson) and “defamiliarizing effect” (Shklovsky) have impacted the works of translation scholars on stylistics and generated a text-oriented view of style.

This particular approach focused on the stylistic peculiarities of the source text and how they should be manifested in the target text. Halliday’s theory of stylistics, which focuses on the importance of the social function of the discourse, has also shifted the focus of translational stylistics to take a functional turn considering the stylistic function the TT in its new context, the target culture. Carter and Simpson’s discourses-analysis approach to literary style, which draws on the critical and on the close reading of texts to reveal the hidden intended meaning of texts that is beyond the physical representation of the signs in the verbal discourse, has generated a descriptive approach to translational stylistics with a focus on the translator as a

17

cultural agent. The reader-oriented theory to literary stylistics as proposed by Michael

Riffaterre has paved the way for a reader-oriented and a cognitive approach to style in translation studies. This approach, as proposed by Jean Boase-Beier, argues that the style of the text is determined by the cognitive state of the reader. It also argues that the translator, as a reader, attempts to understand the mind in the ST (the attitude of the ST author) in order to reflect to this mind style and then to recreate it in the translation.

Translation scholars and theorists have followed different approaches to discuss the notion of style in translation. The study of style in translation studies has taken three main forms. The first one is text-oriented. This approach focuses on either the style of the ST and its manifestation in the TT (comparative approach) or on the style of the TT and its adaptation in the target culture (functionalist approach). The second approach to translation stylistics focuses on translator style or presence in his/her translations; while the third approach takes a cognitive turn. This later approach builds on cognitive linguistics to study the relation between text and mind.

It views style as a cognitive entity rather than a textual one (Boase-Beier, Stylistic

Approaches to Translation). The following sections review and discuss in some details the literature pertaining to the study of style in translation studies.

2.3. Text-Oriented Approaches

As mentioned earlier, the first approaches to style in literary studies have focused on the stylistic features of texts as manifested in the linguistic choice of the

18

author. These approaches ignore the cognitive and the ideological dimensions of texts. Similarly, text-oriented approaches to style in translation have focused on the textual attributes of the ST or/and the TT. Scholars who adopt a textual approach to translation stylistics are divided into two main groups. The first group, which adopts a comparative approach, is concerned with a comparative analysis of the ST style and the TT style. These scholars either focus on the divergence of the TT style from that of the ST (Boase-Beier, “Knowing and Not Knowing") or on how the ST style should be rendered in the TT ( Nida and Vinay and Darbelnet). The second group adopts a functionalist approach, which stresses the importance of adapting the TT style to the norms and conventions of the target culture. This approach places a minimal emphasis on the style of the ST ( Nord, Vermeer).

2.3.1. Comparative Approach

Text-oriented comparative approaches to style in translation focus on the style of the ST and on the manifestation of this style in the TT. Scholars within this tradition, such as Eugene Nida and Chales Taber, and John Catford argue that the translator should always take the style of the ST as a point of departure to create the style of the TT. They argue that the style of the ST should be rendered as closely as possible in the TT. This view has generated a comparative approach to style in translation, which focuses on the style of the ST and on the manifestation of this style in the TT.

19

Vinay and Darbelnet pioneered the study of comparative stylistics in translation. In their book, Comparative Stylistics of French and English2, they stress the importance of conducting a stylistic analysis of the ST to identify potential difficulties and problems and to propose solutions to ultimately reach a “good” translation. Vinay and Darbelnet identify some stylistic difficulties that face English

French translators and classify them into categories in order to find systematic solutions to those problems. They also discuss some basic linguistic notions, such as servitude and option, and relate them to stylistic analysis in translation. They indicate that servitude belongs to the grammar system of a certain language, whereas option refers to the domain of stylistics. They argue that “In the analysis of the SL, translators must pay particular attention to the options which constitutes the style of the ST author. In the TL, translators must pay attention to the servitudes, which limit their freedom of action” (16). Vinay and Darbelnet also discuss two types of stylistics: internal stylistics, which “seeks to isolate the means of expression of a given language by contrasting the affective with the intellectual elements” and comparative stylistics, which “seeks to identify the expressive means of two languages by contrasting them” (17). In their model, they indicate that translators are concerned with comparative stylistics and should not ignore internal stylistics. As for the relation between the ST and the TT style, they point out that translators must preserve the tone of the texts they translate, if possible. For Vinay and Darbelnet, the

2 Their book was first published in French in 1958 then the English version came out in 1995.

20

focal point in the stylistic analysis is the ST and the TT should be produced in light of the stylistic analysis of the ST.

Vinay and Darbelnet’s work on translation and stylistics relies on analyzing the stylistic peculiarities of the ST to produce a stylistically “equivalent” TT. Vinay and Darbelnet adopt a purely linguistic approach that ignores some important elements in the translation process such as the discourse itself, the target culture, the target reader and the translator. They consider the ST as an independent unit and the stylistic analysis of the linguistic features of texts is enough for the translator to reach a “good” translation.

This view of style is also discussed by Eugene Nida, the pioneer of the linguistic turn. Nida points out that translators always face the conflict between form

(style) and meaning (message). If they attempt to approximately maintain the stylistic qualities of the text, translators are likely to sacrifice much of the meaning (2). As for the translation of style, he argues that “the message in the receptor language should match as closely as possible the different elements [including message and style] in the source language. (159). Nida discusses his view of style by distinguishing two types of equivalence. First, dynamic equivalence in which the original text is translated into a target language and the response of the receptor in the target culture is essentially like that of the original receptors. In this type of translation, “the form of the original text [style] is changed” but is still resemble that of the original (200). The second type is formal equivalence, a type of translation in which “the features of the form of the source text have been mechanically reproduced in the receptor language”

21

(201). In both types, the style and the message of the ST are constantly compared with those in the TT to determine the standards of accuracy and correctness. In his approach, Nida stresses the importance of the ST as a point of departure for producing a “faithful” translation, which also entails the production of the same style in the TT, if possible. His discussion of style in translation does not account for the stylistic function of the TT in the target culture or the stylistic preferences of the target culture.

This view of style is rejected by Katharina Reiss. She was one of the first translation scholars to discuss a functionalist approach to style. Her approach is also comparative; however, it places more emphasis on the target text and culture compared to Nida’s and Vinay and Darbelnet’s approaches. In her approach, she builds on the stylistic features of texts and on the way the style of each text type should be exhibited in translation. Reiss argues that the style of written language is determined by the function of the text. She stresses the importance of the stylistic analysis of the ST to realize the communicative function of the ST in order to recreate the same function in the TT. Reiss argues that there are three main text types3 depending on the function of the language in the text. First, in content-focused texts, the function of the language in this kind of texts is to deliver content or a message.

Second, form-focused texts, which have a unique form or style, show the peculiarities of a specific genre or the stylistic characteristics of a specific author. Third, appeal-

3 The notion of text types, as discussed by Reiss, is borrowed from linguistics.

22

focused texts represent the persuasive function of language. Reiss believes that equivalence is being reached by paying more attention to text type and to how each text type should be translated to fulfill its function in the target language. For instance, an absurd play should function as an absurd play in the target culture. Thus, the translation process takes the function of the ST as a point of departure in creating its counterpart in the target culture.

Reiss proposes a model for translation criticism based on text style. She argues that “the analysis and evaluation of a translated text can serve as the first stage, but it must be followed by the second and indispensable stage of comparison with the source text (10). Reiss criticizes translation critics and reviewers for not comparing the translation to its original text. Her model also sets the rules for this comparison.

She argues that “in a content-focused text it is always appropriate to eliminate obvious errors and compensate for stylistic defects. In a form-focused text, on the other hand, a translator’s stylistic or other faults should not be ignored” (64). Reiss places emphasis on the style of form-focused texts, compared to other text types. She argues that in this types of texts, “the translator will not mimic slavishly (adopt) the forms of the source language, but rather appreciate the form of the source language and be inspired by it to discover an analogous form in the target language” (33).

Reiss' approach to style is different from that of Nida. She looks at the style of the ST as a source of inspiration for the translator in creating the style of the TT. From a functionalist point of view, Reiss argues that the translator should appreciate and be

23

inspired by the ST style to produce a TT that blends the ST style with the stylistic conventions of the target culture to reach stylistic textual “equivalence”.

Boase-Beier also adopts a comparative approach and discusses the notion of style in poetry. She argues that “style in poetry has the power to create cognitive effects in the reader where content alone can fail to” (“Knowing and not knowing”

34). From this point, Boase-Beier stresses the importance of understanding and reproducing the source poem’s poetic effect that is embedded in its style. In her discussion, Boase-Beier analyzes the English translation of a German poem about the

Holocaust to show how the manifestation of ambiguity in the style of the source poem is rendered into English. Boase-Beie’s discussion concentrates on the fact that style is a very important component of any poem and translators should preserve the essential characteristics of the original in producing the translation. Her approach to style is comparative; however, Boase-Beier places more emphasis on the translator as the reader and the creator of the TT (compared to Nida, Vinay and Darbelnet and Reies).

Boase-Beier points out that the style of the ST carries clues for the author’s intention and devices that have a particular effect on the reader. That is, the ST should be the main focus of the translator’s efforts, if the translator wants to create the same poetic effect of the ST. Like Vinay and Darbelnet, Boase-Beier argues that the stylistic analysis of the source poem/text is the first step towards understanding it. She also agrees with Reiss that the translator should seek to create the same effect of the ST in the translation. Boase-Beier points out that creating a poetic effect that is close or similar to that of the source poem can be achieved by revealing the author’s intention,

24

which has been a very problematic notion in literature.

The source-oriented approach to style in translation has placed more emphases on the ST as the main reference for the translation process as compared to the target text and culture, which may have a considerable effect on the style of the TT. The functionalist turn in translation studies, as proposed by Nord and Vermeer among others, has shifted this view by placing more emphasis on the function or the purpose of the TT in the target culture. This shift has generated a target-oriented approach to the study of style. This approach argues that the function or the purpose of the TT determines the style of the translation. It rejects any stylistic or textual constrains imposed by the ST.

2.3.2. Target-Oriented Approach

In the functionalist approach to translation studies, as represented by

Christiane Nord, Hans Vermeer among others, the translation process is viewed as always carried out with a function and a purpose (skopos) in mind. That is, the function of the TT, whether literary or non-literary, and the skopos4 of the translation are the components that govern the production of any translation. This view has generated a target-oriented functional view of translation that values the function or the purpose of the translation and refuses the ST to be the reference for the translation process. In this regard, Christina Schäffner argues that “the starting point for any

4 It means purpose, aim or goal. It is derived from the Greek word “skopós”.

25

translation is therefore not the (linguistic surface structure of the) source text (ST), but the purpose of the target text. The Skopos of the ST and the Skopos of the TT can be either identical or different” (133). This target-oriented view of translation has radically changed the long-held view of translation as a process of linguistic matching that seeks “equivalence” and “faithfulness”.

Vermeer discusses the skopos theory and its application in translation theory. He argues that translation is a type of human action. This action is an intentional, purposeful behavior that takes place in a given situation; it is part of the situation at the same time as it modifies the situation (qtd. in Nord 11). Vermeer also argues that the translator is the expert in the translation project and he/she is the one who is responsible of the translation. This also indicates that the translator as an expert in the ST and the TT contexts is the one who chooses the “best” way and the

“best” strategy to fulfill the translation skopos, which also determines the degree of the TT quality (Nord 28). Unlike the previous linguistic approaches that draw on the notion of equivalence, where the ST is the reference of the relationship between the

ST and the TT, the functionalist approach argues that “the quality and quantity of …

[the ST and the TT] relationship are specified by the translation skopos” (Nord 28).

According to Nord, the skopos of the translation can be embedded in the translation brief. She argues that ‘adequacy’ refers to the qualities of a target text with regard to its translation brief (35). As discussed earlier, Reiss functional view of style discusses the style of the text as a whole. Skopos theory, on the other hand, argues that style is applicable not only to entire texts but also to text segments (Nord 33).

26

Target-oriented functionalist scholars link style to the function or to the purpose of the TT. This has shifted the view of style as a ST component that should be taken into consideration in creating a TT, to treat style as a component of the TT that needs to adapt to the skopos or the function of the TT. Thus, translation brief, skopos or function determines the style of the TT. However, Nord and Vermeer’s arguments for a functional target-oriented view of translation have ignored the human agent in the translation process, the translator. Translation scholars, such as Harish

Trivedi and Susan Bassnett, Venuti and Hermans among others, have called for paying more attention to the presence of translators in their translation. This view has impacted the study of style by paving the way for translator style to emerge as a major current in translation stylistics.

2.4. Translator-Oriented Approaches

The text-oriented approaches to style have placed more emphasis on texts and have ignored the human agent of the translation process, the translator. Recently, more attention has been paid to translators and their presence and voice in their translations. One of the first works that discusses translators in their translations is

The Translator's Invisibility (1995) by Venuti. His work underlines the importance of translator’s presence in translated texts. Venuti challenges the previous functionalist approach to translation, which stresses the importance of making the target text fluent as an original text and not as a translation. He indicates that “the more fluent the translation, the more invisible the translator, and, presumably, the more visible the

27

writer or meaning of the foreign text” (1). He argues that this act of making the TT transparent, through domestication5, is an act of violence, which stems from the fact that a transparent translation reconstructs the foreign text in accordance with the value and the beliefs that preexists in the target language (18). Venuti argues against domestication and calls for adopting foreignization6 as a translation strategy to restrain the ethnocentric violence of translation. He stresses the importance of keeping the style of the ST in translation even if the style of the ST seems unfamiliar in the target culture. He refuses any stylistic adaptation that could lead to a fluent TT and results in a transparent translation where the TT does not read as a translation. Theo

Hermans also discusses the presence of translators in their translation. He indicates that translators’ voice is present in every translation they produce. Venuti and

Hermans’ notions of translator voice and presence can be embedded in the translator style in the TT.

Baker took Hermans and Venuti's arguments a step further and empirically investigated the translator’s voice and presence in the target text. Her work took the

TT as a point of departure to trace the translator style in his/her different translation, which she refers to as “translator’s fingerprint” (Baker “Towards a Methodology’). In her study, she used corpus stylistics, which is a method that makes use of the

5 Domestication is a translation strategy that involves the translation and the adaptation of the ST to the domestic culture values and literary system which results in a fluent “original-like” translated text. Adopting this strategy may result in a loss in the stylistic or linguistic peculiarities of the ST.

6 A translation strategy involves keeping the foreign stylistic and linguistic peculiarities of the ST in the TT as a way to break the conventions of the target culture to produce a text that sounds foreign 28

computer for extracting some stylistic patterns in a large corpus of digital texts. Baker built two corpora to analyze the style of two translators into English, Peter Bush and

Peter Clark. Peter Bush’s corpus contains translation of one fiction work from

Portuguese, Turbulence (1994), by Chico Buarque and translations of two fiction works from Spanish, Quarantine (1994) by Juan Goytisolo and Strawberry and

Chocolate (1995) by and Paz Senel. It also contains translations of two Spanish autobiographies, Forbidden Territory (1989) and Realms of Strife (1990), written by one author, Juan Goytisolo. Peter Clark’s corpus contains translations of one collection of Arabic tales, Dubai Tales (1991), by al Murr and a collection of tales, Grandfather’ s Tale (1999) and a novel, Sabriya (1997), written by the same author, Ulfat Idilbi. For the corpus analysis, Baker looked for patterns related to Standardized Type/Token Ratio (STTR)7, average sentence length and pattern frequency in using the word SAY in its all forms (say, says, said, saying).

Baker’s model is an innovation not only in its approach to style but also in its method of analysis. Baker’s study, in fact, has opened the way for further empirical studies that can address the limitations of her study. For instance, the two corpora in

Baker’s study were not homogenous; they included genres varying from novels, tales, autobiographies and short stories. One could argue that translator style may vary depending on the genre into another and this, in turn, affects style analysis. In

7 “Type/token ratio is a measure of the range and diversity of vocabulary used by a writer, or in a given corpus. It is the ratio of different words to the overall number of words in a text or collection of texts” (M. Baker, “Towards a Methodology for Investigating the Style of a Literary Translator”). STTR is calculated for the first 1000 running words (tokens), and then calculated a fresh TTR for the next 1000. 29

addition, a given stylistic pattern in the TT may reflect the conventions of a certain genre and is not necessarily related to the translator style. Another issue arising from

Baker’s corpora is the use of two works written by the same author. In this case, there is a higher chance of finding some patterns in the translated texts that are related to the STs author’s style and not to the translator’s.

Baker’s use of STTR as a style marker poses a methodological question. STTR captures the creativity degree of an author; in translation, it also shows if the translator has a rich or a limited lexical capacity. However, to conduct a STTR analysis with a higher level of accuracy, the corpus should be controlled for theme and genre. Some themes may have a limited number of lexical choices compared to others, which in turn limit the lexical creativity of the text author. In this regard,

George Mikros and Eleni Argiri empirically investigated theme influence on text author’s style analysis, using STTR and other style variables, reporting that text theme has a considerable influence on the analysis of the author’s style. Genre also affects

STTR analysis; for instance, biographies would differ from tales not only in lexical density, but also in style. This proves true for tales in Arabic, which are oral traditional stories relying on repetition to build the story line (Mohamed and Omer).

Any STTR analysis of tales would show this characteristic, and any comparison between STTR of tales and STTR of other genres such as novels and short stories would show such a distinction in lexical richness. Baker’s corpus also included

English translations of texts from different languages, including Arabic, Portuguese, and Spanish. This may also affect STTR analysis or what is called “lexical density.”

30

Some languages, such as Arabic, value repetition as a matter of stylistic elegance, and this might be reflected in the target text.

Another example of a study of translator’s style, which uses corpus analysis, is Diva De Camargo’s. In her study, she analyzed translator style in an attempt to find the extent to which the style of the ST author is reflected in the style of the translator and whether the target text shows a distinctive recurring and preferred marks of linguistic behavior of that translator. De Camargo analyzed one Portuguese literary work, Tocaia Grande: a face obscura (1984) by Jorge Amado (original text (OT)), and its translation into English, Showdown (1988), translated by Gregory Rabassa

(TT subcorpus). She used of corpus stylistics and analyzed number of tokens and types, type/token ratio TTR and STTR. De Camargo also uses two control corpora,

The (BNC8, BNC fiction corpus (BNC fn)) and the Banco do

Português (BP)9. She conducted her experiment in four steps. First, using WorSmith tool (Scott 1998), she retrieved statistics related linguistic pattern distribution in terms of TTR and STTR in both texts, TT and OT. Second, she conducted TT/TO comparisons by tokens (frequency of words) and types (word forms). Third, she compared TT TTR and STTR to that of British National Corpus − BNC. Finally, she compared OT TTR and STTR with the TTR and STTR scores of the Banco do

Português (BP).

De Camargo’s results show that the English translation of Tocaia Grande,

8 A 100-million word corpus of written and spoken English texts taken from different domains A 240-million word corpus of written and spoken Portuguese texts from different domains 31

Showdown registers a lower number of tokens and types in relation to its original text.

She also found that the translator has a lower TTR and STTR compared to the original author. As a second stage of analysis, she compared the translator’s language use (in his translation of Tocaia Grande, a corpus contains 141,608 words) and the language use in the BNC corpus, a 90,748,880-word corpus contains general English text originally written in English. She also compared the TT to the BNC fn corpus, which contains different fiction works for different genres written between 1985—1994 in

English containing 19,444,150 words. She found that the translator shows a richer and more varied language use, a higher TTR and a higher STTR, compared to those of the BNC and the BNC fn. De Camargo also compared the TTR score of both OT

(with 159,440 words) and BP (with 230,460,560 words). Her analysis shows that the

OT has a higher TTR compared to the TTR in BP. The same thing applies for STTR analysis. De Camargo concludes that since the TTR and STTR analysis show that

Rabassa has a lower score compared to Amado’s, this can be an indicator of the translator’s divergence from the OT. She concluded that Rabassa presents a much higher diversified use of linguistic patterns and much less vocabulary repetition than what is found in the variety of text-types represented in the BNC and in the BNC fn.

It is clear from the above discussion that the use of corpus methodology in De

Camargo’s study reveals interesting statistical information about the textual and the personal attributes of both the TT translator and OT author. The use of STTR and TTR to trace the authorial style of Amado in Rabassa’s translation shows that the two texts have different scores, which, as De Camargo argues, indicates the difference between

32

the two styles. However, translation is not a process of linguistic matching of words.

Thus, style divergence can never be measured simply by means of statistical linguistic analysis and comparison of the number of words (Tokens) ad number of forms (Types) in the ST and the TT. In other words, the comparison of number of types and tokens in both ST and TT does not reveal the extent to which the style of the TT diverges from the style of the ST. In addition, only a word-for-word10 translation of OT would produce a close or similar STTR and TTR. In this regard,

Baker questions the arguments that translators should reproduce the style of the ST in their translations. She indicates that “it is as impossible to produce a stretch of language in a totally impersonal way as it is to handle an object without leaving one’s finger- prints on it’ (244)”. Thus, comparing STTR and TTR scores of the OT and TT can neither reveal the extent to which the translator reproduces the style of the OT author nor show the distinctive manner of translating.

De Camargo’s use of the control corpora, BNC, BNC fn and PB, in her study can be questionable. First, she compared STTR and TTR of Rabassa’s translation

(translational English) to those of the BNC and the BNC fn, which include original

English texts. One could argue that, comparing STTR and TTR of a translated text to the STTR and TTR of non-translated text, BNC and BNC fn in this case, is not a useful method of analysis since the two languages have different styles and peculiarities. In addition, BNC contains different English text types from different

10 A translation strategy involves a literal translation of texts 33

domains and this could affect the STTR and the TTR analysis. Some genres contain a limited number of vocabularies compared to others. De Camargo also makes use of

PB corpus, which contains Portuguese spoken and written original texts from different domains. She compared the STTR and TTR of Amado’s OT to the STTR and

TTR in PB. She reported that Amado scores higher STTR and TTR compared to the

PB. This analysis also presents a methodological issue because comparing STTR and

TTR in a fiction work (Amado’s OT) and in a great number of written and spoken texts from different domains (PB corpus) may not reveal accurate results for the comparison. Genre or text type and the size of the corpus do affect STTR and TTR analysis.

Marion Winters also used of corpus-based methodology and studies translator style by comparing two German translations by Hans-Christian Oeser and Renate

Orth-Guttmann of the novel The Beautiful and Damned (1998) by Francis Scott

Fitzgerald. The researcher looked for patterns in the use of modal particles11 by the translators. Winters argued that modal particles reveal the micro-level of the translators’ linguistic choices. She relied on two methods of analysis to trace the use of modal particles in the two translations. First, she used keywords list functionality to retrieve the most frequent eight modal particles in the two translations. Then using concordance12 search function, Winters retrieved concordance lines of those eight

11In German, modal particles are words used in spoken language and in colloquial registers; these words show the attitude of the speaker or narrator.

12 concordance search functionality shows key words in their immediate context in the corpus

34

modal particles in the two translations in order to explore the individual style of the translators in using them. She then traced the effect of these micro-level linguistic choices on the macro-level of the novel. To do so, she referred to the ST by running a bilingual concordance search in the two German translations and the ST.

Winters found that the two translators have an individual fingerprint when using modal particles. The difference between the two translators lies in the frequency and in the usage of the modal particles. Using concordance functionality, she analyzed the instances of the eight modal particles in the two translations and in the ST. She reported that in some instances, the two translators use the same modal particle for the same source-text sentence, which she argues is an effect of the ST. In most of the cases however, the two translators do not use a modal particle for the same source- text sentence (82). Winters pointed out that the two translators use modal particles differently and that reveals possible differences in the styles of the two translators.

Winters analyzed the effect of the micro linguistic style of the translators in using modal particles on the macro-level of the novel and she pointed out that Oeser’s translation is source-text oriented which takes the reader to the ST and its culture.

Orth-Guttmann’s translation, on the other hand, creates a more casual or colloquial tone. It “moves the source text and the author’s world closer to the reader, while

Oeser expects the reader to move to the source culture/text” (93). Winters concluded that the micro-level of the linguistic difference in translators’ style affects the macro- level of the novel.

As shown, Winters analyzed the style of the translators in two stages. The first

35

stage encompasses tracing the individual style of the translator by looking for some linguistic patterns in the two translations and then compare the two translation to trace the differences in style. This stage involves the target text only. The second stage includes a bilingual comparison of the ST and the two translations. Winters carries out this ST-TT analysis for two reasons: first, to see if the ST has any effects on the TT and second, to see the effect of the translator style on the overall effect of the novel. Unlike Baker who sees translator style as a recurring pattern of translator’s linguistic choice that can be traced in his/her different translations of the same author of different authors, Winters sees translator style as a divergence from the ST.

Winters also discusses the effect of translator style on the effect of translator style on the overall translation. In other word, she looks at why certain ST components are translated the way they are and how this translation affects the overall meaning, taking into account the relation between the ST and the TT. This approach takes translator style analysis a step further by revealing the impact of the translator’s individual style on the overall translation of a certain text. This model can be useful for conducting a comparative analysis of a number of translations of one source text to reveal the individual translator style in each translation. However, this approach might not be useful in the case of analyzing the individual style of different translators in different translations of different texts.

Following Baker’s, an interesting study has been conducted by Gabriela

Saldanha in an attempt to develop a methodology and to propose a working definition for translator style. Saldanha makes use of corpus methodology and focuses her

36

approach on looking for some consistent patterns in various works translated by the same translators as a way to reveal translator style. To do so, she built three corpora,

Corpus of Translations by Peter Bush (CTPB) including four translations from

Spanish (Forbidden Territory: The Memoirs of Juan Goytisolo (1989), Tonight

(1991), The Wolf, the Woods, The New Man (1995) and The Old Man Who Read Love

Stories (1993)) and one from Portuguese (Turbulence 1992), Corpus of Translations by Margaret Jull Costa (CTMJC) including three translations from Spanish

(Adventures of the Ingenious Alfanhui (2000), Bedside Manner (1995) and Spring

Sonata (1997) and two from Portuguese (The Mandarin (1993) and Lúcio’s

Confession (1993)) and a third corpus of translated works (COMPARA), used as a reference corpus. In her analysis she searched for the use of emphatic italics13 and foreign words in the TT, which refers to lexical borrowing. She also examined the translator’s use of the connective “that” after the reporting verbs “say” and “tell” as a distinctive feature of translator style.

Saldanha’s study exhibits the limitations noted earlier. The first one is related to corpora compilation. Her corpora contain different genres and this, as explained earlier, could affect the translator style analysis. The second limitation is related to corpus control. CTPB corpus contains translations of works published in 1980 onwards, whereas the other corpus, CTMJC, contains translations of works published in 1880 – 1993. In other words, the first corpus covers works written in a ten-year

13 Refers to italicizing a word, a sentence or certain part of the translated texts to indicate that this part is emphasized

37

time period and the other one covers works written over 113 years. This could also affect translator style analysis. Some of the patterns that may distinguish CTMJC corpus from CTPB could be related to a historical period’s writing style or conventions.

Saldanha also examined the use of the connective “that” after the reporting verbs “say” and “tell”. She reported that Bush shows an overall preference for using

‘zero connective’ while, Jull Costa shows a preference for using optional “that” after

“say” and “tell”. One could argue that the use of connective “that” after the reporting verbs “say” and “tell” might not be a style-marker. This analysis could reveal a

“distinctive and consistent patterns of choice”(Saldanha 28), but it cannot be used to distinguish the translator style from that of others, simply because different translators may use “that” after say and “tell” in a similar manner.

In an interesting study on translator's "fingerprint" or style, Qing Wang and

Defeng Li added the notion of translator’s authorial style to the discussion. They analyzed not only the style of the translator in his/ her translation, but also his/her style in his/her creative writing. In their project, they adopted a corpus-based approach to trace translator’s fingerprint in two translations of Ulysses into Chinese by Qian Xiao (1994) and Di Jin (1997). The researchers built a Bilingual Corpus of

Ulysses (BCU) consisting of the two translations as well as the original text. They also built a comparable subcorpus, reference corpus (RC), which includes Xiao’s original writings in Chinese of short stories and novels. The researchers also used this corpus to see if there were any instances where the authorial and the translational

38

style of Xiao intersect. The researchers analyzed lexical idiosyncrasy that is the individualized habitual use of words. They argued that lexical idiosyncrasy is a style marker that reveals the translator style or his/her unique manner of translating. To do this lexical analysis, they compared keyword list of the two translations of Ulysses.

The researchers analyzed high and low frequency words in the two translations and in the reference corpus (RC).

The analysis revealed that Xiao, the literary writer and translator, leaves some traces of lexical idiosyncrasy in his translation. The researchers reported that Xiao tries to make his translation more emotional and as colloquial as possible, which is reflected in his lexical choices. Jin, on the other hand, chooses to use the standard

Mandarin to make the translation looks more neutral and impersonal (85). The researchers also found out that Xiao’s habitual wording style in creative writing has an influence on his translation. They argued that the reason might be that the translator consciously or subconsciously reverts back to his own language habits, and shows a tendency to use preferred expressions over other alternatives (89).

Wang and Li also used the syntactic sequence of sentences (positions of clauses) as a style marker to trace the individual style of the two translators in their translations. The authors reported that there is a similarity in post-positioned adverbial clauses in the two translations, which is, after referring to the ST, a reflection of the ST style. As for post-positioned adverbial clause in Xiao’s creative writing, the analysis showed that this stylistic feature is more common in Xiao’s translation. This feature, they argued, distinguishes the translated text from non-

39

translated original writing.

Wang and Li’s analysis thus indicated that tracing the style of the translators by comparing two translations of one source text would revel the distinctive manner of translating of the two translators. In addition, comparing two translations of one ST to their original would help reveal any instances where the style of the translators is impacted by the style of the ST. Wang and Li study is one of the first studies tackling the notion of the authorial and the translational style of one translator. However, their analysis of translational and authorial style of Xiao is very limited. They analyzed only one translation produced by Xiao (Ulysses) and compare it to his creative writing of a novel and twenty-three short stories. Wang and Li’s conclusion that the authorial wording style has an influence on the translational wording style needs further investigation. Using more than one method of analysis to study the translational and the authorial style of a particular translator would help reach a more solid conclusions, which is what the current study attempts to do by making use of methods used in stylometry and authorship attribution studies.

Like Winters and Wang and Li, Iraklis Pantopoulos analyzed the style of two translators in producing translations of one source text. In his study, Pantopoulos introduced new style markers to be traced in the analysis of translator style including the analysis of function words and contracted forms. In his study, Pantopoulos attempted to analyze translator style in two translations of C.P. Cavafy's “The Canon” by Rae Dalven (1961) and by Edmund Keeley and Philip Sherrard (1992). He built a corpus of the two translations aligned to their source text. He traced stylistic patterns

40

used by the translators by retrieving Type/Token Ratio (TTR), Standardized

Type/Token Ratio (STTR), number of types and tokens, number of lexical words and number of functional words in the two translations. In addition, he traced the translator’s use of contraction or contacted forms in the two translations.

Pantopoulos reported that Dalven uses more tokens and more types compared to Keeley and Sherrard. He also found out that both translators exhibit a very similar

TTR overall. Keeley and Sherrard show higher STTR. As for the lexical (open class) and function (closed-class) word analysis, the researcher reported that Dalven uses a greater variety of lexical and functional words compared to Keeley and Sherrard. The analysis of contractions in the two translations revealed that Dalven uses fewer apostrophes in the contracted forms of words in his translation of Cavafy’s “The

Canon” compared to Keeley and Sherrard in their translations of the same poems

(100).

Pantopoulos also carried out a qualitative analysis by comparing the way translators translated different terms in the poem and how different translations could lead to different meanings. At this stage, he ran the parallel corpus using ParaConc14 tool, which provides the ability to search terms in a parallel corpus and retrieves concordance lines of the source aligned to the target text(s). Pantopoulos found out that the two translators have translated some terms differently, which has an impact on the meaning of some parts of the TT. He also compared the style of the ST to the

14 ParaConc is a bilingual or multilingual that can be used in contrastive analyses, language learning, and translation studies/training. (http://www.athel.com/para.html) 41

style of the two TTs. He reported that Dalven's version is closer to the structure of the

ST in both form and effect (102). Keeley and Sherrard, on the other hand, appear to follow the structure of the ST more loosely. Pantopoulos discussed translator style as unique recurring patterns of a translator’s linguistic choice compared to other translator(s) of the same ST and this provides more pertinent analyses than Baker’s and Saldanha’s.

Other scholars who discuss translator style in his/her creative writing and translations include Maeve Olohan. She suggested a model to find the extent to which

“translated texts display the translator’s linguistic habits” by analyzing “texts written by the translators that are not translations” (150). Following Olohan, Claudia Walder discussed the style of the translator in his translations of different texts and his/her style in creative writing. She tried to see whether there are stylistic similarities and/or differences in the two types of texts produced by the same translator. Walder used corpus methodology and built three sub-corpora to study the style of Donal

McLaughlin, an English writer and a German- English translator, in his translation and in his creative writing. The first sub-corpus contains McLaughlin’s translations from German to English of 52 texts by 47 source text authors. This consists of 29,672 words. The second sub-corpora include McLaughlin’s original writing from his collection of short stories: An Allergic Reaction to National Anthems and Other

Stories. This corpus contains 20 short stories (61,028 words). The author divides this corpus into two sub-corpora of approximately similar length, 1- (ow1) contains

31,731 2- (ow2) contains 29,297 words. Walder also built a reference corpus

42

consisting of 15 different German text translated into English by 19 different translators. As for the data analysis, she used standardized type-token ratio15 (STTR) and mean sentence length. The sentence length is the number of words in a sentence.

She sets the sentence boundaries as a sequence of characters followed by a space and an initial capital letter (59). She also looked at the use of dashes and at language variation in the two translations.

The researcher reported that STTR of McLaughlin’s translation is close to that in his creative writing. As for mean sentence length, Walder found that sentences tends to be longer in the translations of McLaughlin compared to the sentences in his creative writing. She also reported that McLaughlin uses more dashes in his creative writing. Walder also traced the occurrence of dashes per 1,000 words. She found out that the scores of “McLaughlin’s original and translated texts are closer to each other than to that of the control corpus” (62). As for language variation, Walder reported that McLaughlin uses borrowings from the languages used in Switzerland (German,

Rumantsch, and Italian) in his translations; this feature is also found in his original writings (63). She identified some instances where untranslated German (and French) expressions and sentences were used in McLaughlin’s creative writing and she concluded that the translational and authorial style of the translator influence each other.

Walder's study is the most granular but some of her parameters may not be

15 STTR is usually calculated for every 1000 words. In this study, the researcher lowers STTR parameter from 1000 words to 200 and 500 because the average length of the short stories in the translation sub-corpus is 570. 43

useful when it comes to the analysis of translator style. For instance, sentence length analysis may fail because setting the sentence boundaries as a space and an initial capital letter would regard “Mr.” in “Mr. Sam is happy.” as sentence. This would affect the mean sentence length analysis. It is worth mentioning that using the tittle of characters such as Mr. and Mrs. is common in short stories and literature.

In addition to these text-oriented approaches and translator-oriented approaches to style, the emergence of cognitive approaches to translation studies has led to the inclusion of cognitive processes in the study of style. In this approach, style is considered a cognitive entity that is constructed in the mind of the ST author and in the mind of the TT creator. That is, style is a representation of the writer’s mind.

Consequently, the translator’s task is to capture evidence of the mind in the ST and try to recreate it in the TT. The following section discusses the cognitive-approaches to the study of style.

2.5. Cognitive-Oriented Approach

The cognitive approach to style, also called cognitive stylistics, has also led to recent changes in the study of style. This new approach builds on the concepts of cognitive linguistics and emphasizes the relation between text and mind. It looks at the style of the source text as a vehicle that conveys and creates a cognitive state or a mind style. It argues that the role of the translator is to capture and convey this cognitive state in the TT; otherwise the target text will have less effect on its readers'

44

minds (Ghazala 77). Boase-Beier’s book Stylistic Approach to Translation is one of the first works to discuss cognitive stylistics in translation studies. She views style as a cognitive entity rather than as a purely textual one. Boase-Beier indicates that style is determined by the cognitive state of the reader, which is shaped by his/her historical, ideological and the cultural background. She argues that a cognitive stylistic approach views translators as readers and sees style as a reflection of mind and tries to grasp and recreate that mind in the TT.

Boase-Beier points out that the mind in the text represents a cognitive state, which may have two aspects. First, “it is influenced by ideology, it takes a particular attitude, or it embodies a particular feeling”. Second, “it carries an attitude conveyed by the style” (79). She argues that style conveys this attitude in the text. Thus, it is very important that translators understand the mind attitude that appears in the ST. If the mind attitude is lost or misunderstood, the translation is affected. Boase-Beier argues that the translator should first recognize the cognitive state of the ST in order to recreate it in the TT. She points out that if the “translation fails to capture such a cognitive state, the target text will have less effect on its readers' minds” (77). Boase-

Beier also indicates that knowledge of style helps translators understand how style works and help them interpret stylistic the features of the source text. She points out that stylistically aware translation, which begins with a stylistically aware reading of the ST, “can make a more reasonable case for its interpretation of the source text than any other sort of translation can” (111). She also argues that the TT reader reacts to choices made by the translator, which reflect the translator’s cognitive state at the

45

moment of translating. This makes reading a translation different from reading a piece of creative writing because translation involves a reflection of two mind states, the original author’s and its reflection by the translator in the TT.

Cognitive stylistics is also discussed by Hasan Ghazala, who applied it to the translation of metaphors from English into Arabic. Ghazala neglects the traditional view of translating metaphors in terms of creating an equivalent to the ST metaphor in the TT. He argues that metaphor should be “understood as a cognitive process that conceptualizes people’s minds and thoughts linguistically in similar or different ways in languages”. That is, he treats metaphors as a conceptual feature in texts that has two domains: the “target domain (the concept to be described by the metaphor), and the source domain (the concept drawn upon, or used to create the metaphorical construction)” (60). Ghazala argues that all metaphors are reflections and constructions of concepts, attitudes, mentalities and ideologies on the part of the writer / speaker (57). He adds that speakers or writers do not use metaphors only for esthetic purposes; they use them as a vehicle for ideological and cultural concepts, meanings and perception of world. From this point, Ghazala calls for conceptualization of metaphors of ST in their cultural, political, ideological, social and mental environment. Doing so helps translators understand and response to the

ST metaphors in his/ her translation.

Ghazala’s cognitive approach to style does not touch upon translation as a mental process. It rather takes the conceptualization of some textual parts in texts as a key component of discourse comprehension, which is a very important step in

46

creating a TT. Both, Boase-Beier and Ghazala argue that mind of ST author is what the translator seeks to render and to recreate in the TT. The mind of the ST author is embedded in the style of the text. That is, the translator should try to reveal this mind style and convey it in the translation.

It is obvious that the cognitive approach to style in translation, as discussed by Boase-Beier and Ghazala, has not been well developed yet. This approach does not rely on any empirical methods, such as keystroke logging16 or eye tracking17, in studying and discussing style in translation. The future of stylistics in translation studies may heavily rely on the empirical study of translators' style using cognitive approaches.

2.6. Conclusion

The above discussion of the related literature shows that the development of stylistic approaches to translation has been affected by the approaches to the study of style in literary studies as well as the different turns in translation studies. The first approaches to style in literary studies have focused on texts and have placed a minimal emphasis on the ideological and the cognitive dimensions of style in a text.

This approach has generated a text-oriented comparative view of style in translation studies placing more emphasis on the stylistic peculiarities of the ST and on their manifestation in the TT. Scholars within the linguistic turn in translation studies

16 A software records keys struck on a keyboard by a user while typing 17 the activity of recording eye movement and fixation of a computer user while executing a specific task on a computer 47

adopted this approach and discussed style in translation as a textual component that does not go beyond the linguistic signs in the text. This view reflects the main approach of the New Criticism school for which texts are self-referential and self- contained, while meaning is always inside the text.

Halliday’s theory of stylistics, which focuses on the importance of the social function of the discourse, has also shifted the focus of translational stylistics to take a functional turn considering the stylistic function of the TT in its new context (i.e., the target culture). This view of style is also text-oriented and does not account for the ideological or cognitive components of style; however, it takes the TT culture and its norms as a point of departure in the stylistic analysis of translated texts.

Discourse-analysis approach to literary style, as discussed by Carter and

Simpson, draws on the critical and close reading of texts to reveal their hidden intended meaning that is beyond the physical representation of the signs in the verbal discourse. This approach has generated a descriptive method to translational stylistics with a focus on the translator’s voice and presence in the TT. The focus on translators as cultural agents, has paved the way for translator-oriented approaches to style in translation to emerge as a new turn in translational stylistics. In the same manner, the reader-oriented theory to literary stylistics, as proposed by Michael Riffaterre, as well as the cognitive turn in translation studies, have played a key role in the emergence of the cognitive approach to style in translation studies.

As this literature review makes clear, translation scholars have approached the topic differently. Some scholars analyzed translator style by referring to his/her

48

translations of different authors (Saldanha, Baker "Towards a Methodology"), while others trace translator style by comparing the changes he/she makes to the target text and how these changes could affect the macro-level of the TT (Pantopoulos, Winters).

Another group of scholars compared different translations of one source text as a way to reveal the translator style (Pantopoulos, De Camargo) while a new recent research group discussed translator style as a fingerprint that is traceable in his translation and creative writing (Walder and Wang and Li).

Most of the previous works on translator style have used corpus methodology to carry out the analysis, but their use of corpus methodology presents some limitations related to study design, corpus compilation, control and analysis. As we have seen, some researchers analyze translator style in his/her translations of different genre and ignored the fact that a number of the stylistic patterns might be a reflection of a certain genre18. Other researchers study a translator’s style in his/her translation from different languages. As discussed earlier, including texts from different languages affects style analysis because some stylistic patters could be related to stylistic/ linguistic constraints of a certain language, source language. In addition, the design of some of the previous studies is problematic. For instance, comparing translated to non-translated text is one of these issues. Comparing a corpus of written and spoken text to a translational corpus can also be considered as a flaw in study design. The gap between what corpus tools can do and what to consider as a style

18 see above discussion on Baker’s use of tales, autobiographies and novels in one corpus to analyze translator style

49

marker is one of the main problems in several of the previous studies. In addition, building on methods used in previous studies without questioning their viability in translator style analysis is another problem. The next chapter attempts to propose a methodology and a style-markers’ profile for analyzing translator style.

50

CHAPTER 3: METHODOLOGY

3.1. Introduction

This chapter discusses the methods used in this study. It also discusses the data collection process, corpus compilation and control. As mentioned earlier, the study’s principal purpose is to investigate the thematic and stylistic relation between

Denys Johnson-Davies’s (J-D) creative writing and in his translations in an attempt to address existing limitations in current style-related research in translation studies. To do so, the study relies on three corpora of short stories produced by him including: 1- a corpus of short stories translated by J-D before the production of his own writing; 2- a corpus of J-D’s own short stories; and 3- a corpus of short stories J-D translated after the production of his own writing.

This study uses three computational methods to reveal the thematic and stylistic relation between J-D’s creative writings and translations. First, Latent

Semantic Analysis (LSA) is used as a fully automated method to conduct thematic similarity analysis of J-D’s creative writing and translations. This study makes use of

LSA similarity query, which is an information retrieval technique that is applied to

Natural Language (NL) understanding problems. This technique relies on the latent semantic relation between the concepts and terms in a document to determine the similarity between different documents in a corpus and to classify them accordingly.

This particular method is used to reveal the extent to which the themes in the translated short stories are similar to those in J-D’s creative writing.

51

The second computational method makes use of Corpus Stylistics, a sub-field of corpus linguistics that uses the application of corpus methods and tools to analyze style. Baker defines a corpus as “any collection of running texts, held in electronic form and analyzable automatically or semi-automatically (“Corpora in Translation

Studies”225).” Using corpus stylistics, three style markers will be analyzed including

Standardized Type-Token Ratio (STTR), Sentence Length and punctuation marks

(comma, semicolon and hyphen). The third method of analysis applies a machine- learning Stylometry technique. The machine learning approach to style analysis is one of the most advanced approaches in this kind of research. This method is built on the notion of Artificial Intelligence (IA) and on the fact that machine, i.e., computer, can learn from data. Machine learning style analysis is used in this study to analyze three style markers including word n-grams, character n-grams and Part-of-Speech n- grams.

The present study applies multiple analysis methods to the same set of data. This is known as data analysis triangulation. Analysis triangulation is defined as “a situation whereby two or more analysis techniques are used for the same data set”

(Ziyani, King, and Ehlers 12). In the present study, J-D’s creative writing and translations are analyzed using three different methods of analysis to investigate the impact and the relation between his translating and creative writing activities. Ashatu

Hussein argues that the importance of triangulation stems from two facts: it is used for “increasing the wider and deep understanding of the study phenomenon, and it is also “used to increase the study accuracy, in this case triangulation is one of the

52

validity measures” (1). Therefore, mixing different analysis methods is used in this study to reach a deeper understanding of the interaction between J-D’s authorial and translational style and to validate the findings of the methods adopted. The following sections provide an overview of the data collection, corpus compilation and control and discuss the three analysis methods used in this study.

3.2. Data Collection

The data of this study include all short stories written and translated by J-D.

Only short stories are included in this study for two reasons: first, J-D wrote short stories and he did not produce any other genre. Second, short stories are used to control the corpus for genre. This would help achieve adequate results when comparing J-D style in his creative writing and in his translations of short stories.

Full lists of J-D’s own and translated short stories are provided in appendixes A and

B.

3.3. Corpus Database

The corpus of this study is stored in an Open Office database, an open source productivity suite. The database includes the following metadata for each short story: translation year, collection title and ID, short story title, publisher, source text author and notes. Each collection of short stories has a unique four-character alphanumeric identifier (ID). The IDs of the translated collections were predefined as follows: the collections translated by J-D before writing his own collection of short stories are given TB as a prefix followed by a number; whereas the collections that were

53

translated after writing his short stories are given TA as a prefix followed by a number.

Assigning a predefined ID to each translated short story helps retrieve a list of collections and short stories based on their translation date. While entering the titles of the short stories in the database, I noticed that some of the translated short stories were published in more than one collection. Thus, the database was set in a way that does not allow duplicates. When facing a short story that is published in more than one collection, a note is entered in the note field in the database.

3.4. Corpus Compilation and Pre-processing

The first step in corpus compilation is to have all texts in a machine-readable format (MRF), which is an electronic format that can be read and analyzed by the computer. Most of the short stories that are included in this study are scanned and converted into MRF, plaintext format with .txt19 extension, using Optical Character

Recognition (OCR) tool. This tool converts the scanned images into texts that can be read by computers. Other short stories are converted either from a Portable Document

Format (PDF) or Electronic Publication (ePUB) format into plaintexts. The txt files are pre-processed and cleaned to make sure that the body of the texts does not contain any running heads, page number, footnotes or any characters that could affect the analysis result.

19 Such files contain very little or no formatting. This format of text is used to make sure that text formatting has no effect on the corpus analysis. 54

The second step involves text selection. As Stamatatos has argued, “any good evaluation corpus for authorship attribution [author’s personal style] should be controlled for genre and topic” (21). In other words, building a corpus for the purpose of analyzing authorial style should consider controlling the corpus for genre and topic. Thus, this study controls for genre and only considers short stories translated and written by J-D. It also controls for theme and builds three corpora that are, to a certain extent, related to each other in theme. LSA Similarity query is used as a computerized text selection method to control the corpora for theme. Similarity query is discussed in more details in section 3.5.1 LSA Similarity Query. The initial three corpora in this study are:

TABLE 1: THE CORPORA OF THE PRESENT STUDY

Corpus Name Description Number of Short Overall Production Stories included size in Date words

TBCRW_raw Translational corpus 216 554739 1966—1998 contains all short stories translated from Arabic into English by J-D before writing his own short stories

CRW Creative writing corpus 15 50336 1999 contains the short stories written by J-D

TACRW_raw Translational corpus 107 223735 2000—2012 contains all short stories translated from Arabic into English by J-D after writing his own short stories

55

3.5. Latent Semantic Analysis

As mentioned earlier, Similarity query using Latent Semantic Analysis (LSA) is used in this study to reveal the thematic relation between the short stories in the three corpora and to build two translational sub-corpora that are close in theme to those in the creative writing corpus. Before discussing the notion of Similarity query, it might be beneficial to provide a definition of LSA and to lay down the main processes through which it works. LSA is a “theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text” (Landauer and Dumais). Landauer, Foltz, and

Laham argue that LSA is an automated text analysis method that “can approximate human judgments of meaning similarity between words and can objectively predict the consequence of overall word-based similarity between passages” (5). LSA is regarded as an advanced information retrieval method that solves the problems caused by synonymy and polysemy. It goes beyond traditional Information Retrieval

(IR) methods that rely on key word matching techniques and rather “deals with the concept, and carries out a search on this basis” (Antai, Fox, and Kruschwitz 161). In

56

retrieving information from textual data, LSA takes advantage of the conceptual and semantic content of texts20. LSA makes use of the relation between words and concepts to reveal the latent (hidden) relations between words. For instance LSA would differentiate Apple the fruit and Apple the Company based on relation of these two terms with other terms in the same document. When the word Apple accompanies terms such as iMac and OSX, the document is more likely describing Apple Company, whereas, when the term Apple accompanies other terms such as calories and fruit, the document is more likely discussing a topic related to the fruit.

The LSA relies on complex mathematical algorithms that convert texts into matrices to do the analysis, which goes through several steps. The first step in the

LSA analysis is matrix creation (term-document matrix) including terms and documents in a target corpus21. In this matrix the terms are placed on the rows and the documents on the columns. The entries in the matrix are the frequencies of each term in a corresponding document. The following figure shows three documents and their terms in a matrix:

20 The LSA method works by categorizing terms and concepts in a document in what is called a “concept space”. That is, related terms will be mapped onto their concept space and other concepts can be retrieved from a particular concept. LSA uses Singular Value Decomposition (SVD) to create this concept space. (Antai, Fox, and Kruschwitz 161). SVD is a mathematical technique for dimensionality reduction. That is, given a large vector space, SVD attempts to reduce the number of dimensions in the space by combining terms (this process is explained in the following paragraphs). This allows patterns to be revealed between terms and concepts in a corpus of (un)structured documents. 21 A corpus on which LSA is applied 57

FIGURE 1: MATRIX OF MADE UP DOCUMENTS

Terms Doc1 Doc2 Doc3

The 1 1 1 State 2 0 0 Kent 1 2 1 graduate 0 1 0 community 2 0 1 European 0 1 1 UK 0 3 1 education 3 0 0 Canterbury 0 1 1 campus 0 0 1

The first document in the above matrix, Doc 1, is more likely to discuss topics related to Kent State, the educational institution. While, the second document, Doc 2, is more likely to discuss topics related to Kent, a city in the United Kingdom.

The matrix creation process is followed by a pre-processing step, which includes stop word removal, and assignments of weights to terms (Antai, Fox, and

Kruschwitz 162). An important step in using LSA is creating a stop words list containing propositions, conjunctions, articles and names. The importance of including such a list stems from the fact that LSA should capture the relation between meaningful words in order to reach a good level of accuracy. Term weight depends on the score or the frequency of terms in a set of documents and on applying Singular

Value Decomposition (SVD)22 on the term-document matrix as a method for data reduction. SVD captures the strong relations between terms and removes other terms

22 SVD takes “a high dimensional, highly variable set of data points and reducing it to a lower dimensional space that exposes the substructure of the original data more clearly and orders it from most variation to the least” (K. Baker 16). To do so, SVD takes term-document matrix, X, and decomposes it into three matrices including “[o]ne component matrix describes the original row entities [of X] as vectors of derived orthogonal factor values, another describes the original column entities in the same way, and the third is a diagonal matrix containing scaling values such that when the three components are matrix- multiplied, the original matrix is reconstructed” (Landauer, Foltz, and Laham 8).

58

with unimportant value to the document. In other words, it reconstructs the initial matrix by only including terms that have a strong relation with other terms in the document. Then, each document is represented as a vector 23in a dimensional space and the similarity between words and/or documents is computed by measuring the similarity between their vectors (see Figure 2: Made up documents in 2-dimensional space). Usually, the cosine of the angle between the vectors in Dim-space is used to determine this similarity between terms and between documents. The following figure explains this process:

FIGURE 2: MADE UP DOCUMENTS IN 2-DIMENSIONAL SPACE

Dim 1

Doc 1 Doc 4

Doc 2

Dim 2

Doc 5

Doc 3

In the above figure, five documents are represented in vectors (Doc1, Doc2, Doc3,

23 A vector is a mathematical representation of the latent topics/themes in the document. The researcher should decide on the vector value or the V value, which is the number of latent topics that a vector should include. The V value varies from one study to another depending on the corpus size and the research questions.

59

Doc4, Doc5). The distance between vectors represents the distance between terms and concepts in the documents from other concepts and terms in other documents.

The similarity between documents is measured by the closeness of vectors from each other. LSA measures the angle between the vectors (which represent documents) in a dimensional space, Dim, to determine their similarity. The closer the vectors from each other the more similar they are. For instance, Doc 1 is more like Doc 4 compared to Doc 2.

3.5.1. LSA Similarity Query

LSA similarity query or similarity analysis is one of the LSA applications, which is used to determine the thematic similarity between a query document and a number of other documents in specific corpora/ corpus. A query document is like a query term that a user enters in Google.com search bar; however, the query document is a whole document that can be thousands of words long. The following figure explains this process:

FIGURE 3: LSA SIMILARITY QUERY

Query Document

1

Target' 2 LSA 4 Query' Corpus Processing Results

3

60

LSA similarity query is used in this dissertation to retrieve the most thematically relevant translated short stories to their counterparts in the creative writing corpus.

The most thematically relevant short stories (in the two translational corpora) to each query document (to each creative writing short story) will be retrieved to build

(TBCRW) and (TACRW).

The One-to-many24approach is followed to apply similarity query on the corpora of this study. This approach takes a single document as a query document

(each single short story from the creative writing corpus (CRW)), compares it to a number of other documents in a target corpus, (TACRW_raw) and (TBCRW_raw) in this case, and then retrieves the document(s) that are most thematically relevant to the initial query document (creative writing short story). See Figure 4 below:

FIGURE 4: ONE-TO-MANY SIMILARITY QUERY PROCESS

Target Corpora Doc. A TACRW (CRW) Doc. 1 Doc. B . . Doc. C . Doc.15 TBCRW Doc. n

This process results in the creation of the two translational sub-corpora (TACRW and

TBCRW), which are used in the second and third data analysis methods, namely: corpus analysis and machine learning.

24 One-to-many is one of the LSA applications. It is also used for vocabulary testing and essay grading by some universities such as Colorado University at Boulder. (see http://lsa.colorado.edu/cgi-bin/LSA-one2many.html) 61

3.5.2. LSA Similarity Cutoff

The output of the LSA analysis provides the similarity degree of the query document to the other documents in the corpus by providing the cosine value of the angle between their vectors. The researcher should determine the similarity degree or the cosine value, which is the value of the cosine angle above which the retrieved document is considered relevant to the query document. Scholars indicate that that there is no defined rule for determining this value, however, most studies reported that this value occurs between 0.65—0.70 (Graesser et al., Penumatsa et al.). The present study sets the similarity value to 0.70 and considers the translated short story to be similar to the creative writing short story if their vectors have a cosine value of

0.70 or greater (See Table 5 page 84 for an example of LSA output).

3.5.3. LSA Output Evaluation

The LSA output is evaluated manually. This manual evaluation includes reading each query short story (from the creative writing corpus) and its first five thematically relevant translated short stories as retrieved by the LSA analysis. This would give the researcher the chance to evaluate the LSA results and to know the kind of themes, characters and settings in the three corpora (TBCRW, CRW, and

TACRW).

3.6. Corpus Stylistics

62

The second method that is used in this study adopts a corpus stylistic approach, which is a corpus-based approach that makes use of computational methods to analyze style in a collection of digital texts. Corpus Stylistics relies on the statistical analysis of style based on frequencies of some stylistic features or style markers such as sentence length and word length. This study traces the stylistic influence of J-D’s translating activity on his creative writing and vice versa by analyzing three style marks (Standardized Type-Token Ratio, Mean Sentence Length and Punctuation Marks) in the three corpora produced by J-D (translation before creative writing, creative writing and translation after creative writing). To do so,

WordSmith (WS) tool (Scott), a corpus analysis tool that provides the ability to retrieve stylistic statistical information from digital texts, is used in this study. The corpus analysis output of WS is also tested using one-way independent samples

Analysis of Variance (ANOVA) to compare the effect of text production activity

(translation before creative writing, creative writing and translation after creative writing) on the three style markers (Standardized Type-Token Ratio, Mean Sentence

Length and Punctuation Marks). The following sections explain the three style markers analyzed in this study as well as the statistical analysis applied on the data.

3.6.1. Standardized Type-Token Ratio (STTR)

Type-Token Ratio (TTR) reveals what is called lexical density or vocabulary richness by calculating the ratio of types (words without repetition) and tokens (the overall number of words in a document) in a specific text. In other words,

63

it measures the “diversity of vocabulary used by a writer, or in a given corpus” (250).

STT analysis is affected by text length; it is used to measure vocabulary richness in equal-sized corpora. However, it becomes useless if the corpora in the study are not of an equal size. On the other hand, Standardized Type-Token Ratio (STTR) is used when the corpora under study are not of an equal size. STTR calculates the average

TTR based on consecutive word chunks of a text; it calculates, for example, the TTR in each consecutive 500 or 1000 tokens (Kubát and Milička 341).

3.6.2. Mean Sentence Length

Mean sentence length is the average number of words in a sentence. As

Udny Yule argues, “it has been approved in the literature that sentence length or number of words per sentence is a characteristic of an author’s style” ( 370). Jon

Patton and Fazil Can also use sentence length as a style marker, among others, to analyze authorial style. Their result reveals that using "sentence lengths" as a style marker is one of the best style markers that helps distinguish author’s style. A sentence is defined in Wordsmith tool’s User Guide as “the full-stop, question-mark or exclamation-mark (.?!) and immediately followed by one or more word separators and then a number or a currency symbol, or a letter in the current language which isn't lower-case” (Scott 317). The analysis of mean sentence length in this dissertation was preceded by “find short sentence” method. It is a function in the Wordsmith tool that lists all short sentences. The importance of applying “find short sentence” method before measuring the mean sentence length lies in making sure that sentences such as

64

“I like Mr. John.” are not considered as two sentences. The “Find short sentence” method makes it possible to trace all the forms that may affect sentence length analysis and convert them to forms that have no effect on sentence length analysis.

For example, this method retrieves a form such as “Mr.” as a sentence, the researcher manually removes the period in “Mr.” to make it “Mr”. This will make the tool consider “Mr” as a word and not a sentence.

3.6.3. Punctuation marks

Li, Zheng, and Chen argue that “incorporating punctuation frequency as a feature can improve the performance of authorship identification [style analysis]”

(80). In order to apply punctuation marks analysis, three punctuation marks are analyzed in this dissertation: semicolon, comma and hyphen. Punctuation marks analysis using corpus-based methods relies on frequency scores of punctuation marks in a given corpus. However, since text size could affect this kind of analysis and given the fact that the size of the three corpora in this study is not equal, the frequency of each punctuation mark is calculated per 1000 words (standardized punctuation marks analysis). Therefore, text size as a variable will not have any effect on the punctuation markers analysis.

3.7. Statistical Testing

65

Statistical analyses were conducted using Statistical Package for the Social

Sciences (SPSS version 22; SPSS Inc., Chicago, IL) to determine if there was a significant difference between the mean scores of the three style-markers under three conditions (Translation Before Creative Writing, Creative Writing, and Translation

After Creative Writing). To do so, one-way independent samples Analysis of

Variance (ANOVA) was used. ANOVA is a type of statistical tests that is used to analyze the difference between the means of groups in a specific study to determine if there is any statistical difference between them. When there is a significant difference between the means of the compared groups, ANOVA does not provide information on which group is different form the other. In this case, a post-hoc statistical analysis should be used to determine which means are significantly different from each other.

The present study applies Tukey HSD (Honest Significant Difference) test, which is a post-hoc statistical test applied after the ANOVA test. Tukey HSD is applied when the

ANOVA test reveals that there is a significant different between the means of the groups. The Tukey HSD helps reveal which groups are significantly different from each other.

3.8. Machine Learning Approach

The third method used in this study to analyze J-D’s translational and authorial style makes use of machine learning stylometry. Machine learning approach to style analysis is one of the most advanced methods in this field. This method goes beyond the statistical methods of style analysis, such as corpus stylistics. It relies on artificial intelligence and what is called automatic pattern recognition, which is the

66

ability of the machine to learn the style of text producer by training the machine on a corpus of text written by a specific author (Ramyaa, Rasheed, and He). Machine- learning stylometric analysis is conducted in five main stages. The first stage involves feeding the machine, i.e. the computer, with a training corpus of texts written by a specific author. The machine is trained on the style of the training corpus author by analyzing his/her style based on machine learning algorithms and a predefined set of style marker(s). This training stage results in building the authorial stylistic profile of the training corpus author25. The third stage of machine learning stylometric analysis involves querying the machine whether or not a specific text, a text written by an unknown author, is written by the same training corpus author. At this stage, the machine analyzes the text that was written by an unknown author and builds a stylistic profile of this text. Finally the machine compares the two stylistic profiles (of the training data author and of the text with unknown author) and decides whether the author of the training data is the same author or the query text.

Machine learning approach to style analysis has been used and tested in several studies (Luyckx and Daelemans and Elayidom et al.) investigating the personal style of authors and most studies reported that this method is viable and very useful for investigating authorial style. Machine learning stylometry is used in this study to determine which of J-D’s text production activities (translation before creative writing, creative writing and translation after creative writing) is stylistically

25 This stage is also called automated pattern recognition, which involves training the machine to learn the style of text producer based on some stylistic patterns in the training corpus (Ramyaa, Rasheed, and He).

67

close to the other. In the machine learning experiments, J-D’s translations before creative writing (TBCRW) and after creative writing (TACRW) are used as training corpora. The creative writing short stories (CRW) are used as query documents in order to reveal the extent to which the authorial stylistic profile of J-D in his creative writing is close to his translational stylistic profiles in the translations produced before and after creative writing (See Figure 13 on page 109). The stylometric analysis of J-D’s translations and creative writings is based on the analysis of three style markers including:

3.8.1. Character n-grams

Oakes & Ji (2012) define character n-gram as “a sub-sequence of n characters from a given word (153). For example, the 2-gram or the 2-subsequent characters of the word happy will be “ha”, “ap”, “pp”, “py”. Jack Grieve compared the feasibility of word frequency, punctuation marks and character n-gram for style analysis purposes. He reported that character n-gram is one of the most effective measures to reveal the style of the document’s author. John Houvardas and

Stamatatos indicated that “character n-grams are able to capture complicated stylistic information on the lexical, syntactic, or structural level” (78). They pointed out that character 3-grams analysis could reveal lexical information such as [/the/ /_to/], word-class [/ing/, /ed] or punctuation usage (/._X/, /_“X/) information (Houvardas and Stamatatos 78). The current study uses charcter n-gram with n=3 in order to allow a deeper stylistic analysis. Since the chracter n-gram method is able to reveal

68

information related punctuation mark usage, this method would help validating the coprus analysis of punctuation marks.

3.8.2. Part of Speech (POS) n-grams

Part of Speech (POS) analysis depends on the analysis of the tagged part of a speech element. For this particular type of analysis two tools are needed. The first one is POS tagger, which is a software that reads texts in a certain language and assigns parts of speech to each word. The second tool is a corpus analysis tool that analyzes the tagged text depending on the research questions. POS analysis tends to capture the syntactic style of the document writer. Patrick Juola indicates that there is a general agreement in authorship attributions literature that POS tags are good features to include in any style related studies. He also argues that “methods that do not use syntax in one form or another, either through the use of word n-grams or explicit syntactic coding tend to perform poorly” (320). The importance of using POS tags stems from the fact that authors tend to use similar syntactic patterns unintentionally in their different writings. Several researchers such as Argamon-

Engelson, Koppel, and Avneri; Zhao and Zobel have used POS tags n-gram as a way to study authorial style and have reported that this style-marker gives quite accurate results.

As for the analysis of POS tags in this study, 2, 3, 4gram POS tags were considered. Since the target language syntactic conventions could impact the syntactic style of the J-D in his translation and the creative writing, a fourth control

69

corpus, containing fifteen short stories written originally in English during the period

(1940-2012), is used. The control corpus is used to ensure that syntactic conventions of the target language have no influence on the analysis of J-D’s syntactic style.

In order to perform POS analysis, this study automatically tagged the three initial corpora using Stanford Log-linear Part-Of-Speech Tagger. This tagger gives 97.24% accuracy in tagging texts (Toutanova et al.). In producing the tags, this tagger uses the

Penn Treebank tag set, shown in Table 2 below:

TABLE 2: PENN TREEBANK TAG SET (Adopted from Marcus, Santorini, and

Marcinkiewicz)

POS Tag Description Example CC coordinating conjunction and

CD cardinal number 1, three DT determiner the EX existential there there is FW foreign word d'hoevre IN preposition/subordinating in, of, like, after, that JJ adjective green JJR adjective, comparative greener

JJS adjective, superlative greenest LS list marker 1) MD modal could, will NN noun, singular or mass table NNS noun plural tables

70

NP proper noun, singular John

NPS proper noun, plural Vikings PDT Predeterminer both the boys POS possessive ending friend's

PP personal pronoun I, he, it

PP$ possessive pronoun my, his

however, usually, RB adverb here, good

RBR adverb, comparative better

RBS adverb, superlative best RP particle give up SYM Symbol $, % TO to to go, to him UH interjection uhhuhhuhh VB verb, base form take

VBD verb, past tense took

VBG verb, gerund/present participle taking

VBN verb, past participle taken

VBP verb, sing. present, non-3d take

VBZ verb, 3rd person sing. present takes

WDT wh-determiner which WP wh-pronoun who, what WP$ possessive wh-pronoun whose WRB wh-abverb where, when

71

3.8.3. Word n-grams

Word n-gram shows the most frequent n-grams in a document depending on the size of the gram n. For instance a bigram list consists of the most frequent two words that come together. Stamatatos, Fakotakis, and Kokkinakis and Kim and

Walter use word n-grams, reporting that word n-gram analysis is a good method to study the style of a document author. In the same way, Raghavan, Kovashka, and

Mooney apply the n-gram model to study authorship attribution. In their study, they report that the best performing n-gram is the 3-gram model with an accuracy of

98.34%. Within translation studies, the idea of using word n-gram in analyzing translated texts is discussed by Dorothy Kenny. She refers to word n-gram as word clusters indicating that retrieving and analyzing two or three-word clusters help reveal different types of patterns in translated texts. These patterns can be related to the style of a translated text or to the style of the translator (42, 138). Similarly,

Michaela Mahlberg uses the term word cluster to refer to word n-gram. She uses word cluster to analyze Dickens’ style in his fiction. The present study uses the term word n-gram to refer to word cluster analysis. While applying word n-gram, the researcher should decide on the size of the n, which varies from one study to another.

As for the present study, n-gram is applied with n=, 3 and 4 to see which n-gram model works better in the case of the authorial and translational style. Word n-gram has been also used for “topic discovery” or “topic modeling”, which is a natural language processing technique that is used to reveal the topic or theme of a specific set of documents and classify them accordingly (Wang, McCallum, and Wei). Word

72

n-gram analysis would help reveal lexical patterns in J-D’s translations and creative

writing and would also help validate the Latent Semantic Analysis results (LSA)26.

3.9. Tools Used in the Dissertation

Three tools are used to analyze the data in this study. The first one is Gensim tool, an

open source topic-modeling tool that is implemented in the Python programing

language. The second tool is WordSmith, a windows-based tool used in corpus

linguistics. The third tool is JGAAP (Java Graphical Authorship Attribution

Program), a free machine-learning stylometry tool. Table 3 below provides a

description of these tools along with their main features:

TABLE 3: TOOLS USED IN THE DISSERTATION

Tool Features About the tool

WordSmith Concord Corpus tool created by Mike Scott at the University 6.0 of Liverpool. The tool is available for sale in the Key Words list tool’s webpage: http://www.lexically.net/wordsmith/index.html Word Lists

N-gram lists

Gensim tool LSA An open source topic-modeling tool that is implemented in the Python programing language. Topic modeling The Python code for the tool is available online for researcher on the tool’s website Document (http://radimrehurek.com/gensim/).

26 It is worth mentioning that LSA analyzes the topic in a text by following what is called “bag of words” technique, which is an NLP technique where word order is not important. On the contrary, word n-gram analysis relies on word order and the co-occurrences of words in a text. The two techniques use different analysis methods to reveal the thematic content of a text.

73

categorization

Similarity Query

JGAAP Tool Machine Learning Java Graphical Authorship Attribution Program Style analysis (JGAAP), a free machine-learning stylometry tool. JGAAP is a Java-based program for stylometric analysis developed by the Evaluating Variation in Language Laboratory (EVL Lab) at Duquesne University, Pennsylvania.

3.10. Conclusion

This chapter has provided detailed information about data collection, and

corpus preprocessing, compilation, control and analysis. It has also discussed the

three computational methods used in this study (Latent Semantic Analysis, Corpus

Stylistics and Machine Learning Stylometry) while showing the purpose behind using

each method. This chapter has also discussed a set of style markers that were

analyzed using each method to conduct the data analysis. The following chapter

discusses the processes and the results of applying Latent Semantic Analysis on the

three corpora in this study: the corpus of translated short stories produced before the

production of creative writing (TBCRW_raw), the corpus of creative writing short

stories (CRW) and the corpus of translated short stories produced after the production

of creative writing (TACRW_raw).

74

CHAPTER 4: LATENT SEMANTIC ANALYSIS RESULTS

4.1. Introduction

This chapter presents the results of the LSA experiments on the three short stories corpora produced by J-D: the corpus of translated short stories produced before the production of creative writing (TBCRW_raw), the corpus of creative writing short stories (CRW) and the corpus of translated short stories produced after the production of creative writing (TACRW_raw). The chapter is divided into three sections; the first section provides a general overview of the LSA experiment applied to the three corpora in this study (TBCRW_raw, CRW and TACRW_raw). The second section shows the results of the first LSA experiment concerning the thematic similarity of the translated short stories in (TBCRW_raw) to their counterparts in J-

D’s creative writing (CRW) while the third section discusses the thematic similarity of creative writing short stories (CRW) to the short stories translated after the production of the creative writing (TACRW_raw). This chapter relies on data visualization as a method for data representation. A number of charts are provided in each section to demonstrate visualizations of the LSA experiments, processes and results.

The first LSA experiment described and reported in this chapter is meant to test the third hypothesis, which claims that the short stories J-D translated before the production of creative writing are close in theme to his creative writing short stories.

The second LSA experiment tests the fourth hypothesis, which claims that J-D’s own short stories are close in theme to the short stories he translated after the production

78

of his creative writing. As discussed in the methodology chapter, LSA Similarity analysis is used to reveal if the themes J-D developed in his creative writing were influenced by the themes of the short stories he translated before the production of his creative writing. This analysis is also used to determine if the themes in J-D’s creative writing impacted his choice of the short stories translated after the production of creative writing.

4.2. LSA Similarity Analysis

As pointed out in the methodology chapter, LSA is a fully automated computational method that applies mathematical algorithms to retrieve and represent the contextual meaning and usage of words. The power of LSA analysis lies in the fact that it “approximate[s] human judgments of meaning similarity between words and can objectively predict the consequence of overall word-based similarity between passages” (Landauer, Foltz, and Laham). Several scholars, such as Thomas Landauer,

Foltz, and Laham; Thomas Hofmann, T. L. Griffiths and Steyvers; T. Griffiths and

Steyvers, “Prediction and Semantic Association” indicate that automated document similarity analysis is very useful in many cases such as structuring a huge corpus of texts based on topics. The applicability of LSA in similarity analysis queries is empirically proved with excellent rates of accuracy (Deerwester et al.; Ahat, Amor, and Bui). When it comes to the study of translator style using computational methods,

LSA similarity analysis can be effectively used to build a corpus of translated texts that is controlled for topic. As mentioned earlier, LSA similarity analysis is used in

79

this study to retrieve the most thematically relevant short stories J-D translated before the production of his creative writing (TBCRW_raw) and the short stories he translated after the production of creative writing (TACRW_raw) to their counterparts in the creative writing corpus (CRW). The following sections explain this process and report the result of applying this method on the three corpora in the present study.

4.2.1. LSA Similarity Query on J-D’s Translation before Creative Writing

In this experiment, the fifteen short stories written by J-D (CRW corpus) are used as query documents to retrieve the most relevant short stories in the first translational sub-corpora (TBCRW_raw), containing the short stories J-D translated before writing his own short stories. The most relevant five short stories to each query document (creative writing short story) were retrieved to build (TBCRW), which contains short stories translated by J-D and is relevant in topic to the creative writing corpus. The following figure shows this process:

FIGURE 5: LSA ANALYSIS OF TBCRW_RAW

Query Document (Doc1..15 in CRW) Query' 1 TBCRW 4 Results

Target' 2 Corpus LSA (TBCRW Processing _raw) 3

80

As mentioned in the methodology chapter, the present study makes use of the

Gensim27 tool to run the LSA experiment. It is worth noting that LSA analysis using

Gensim requires preprocessing the corpus in a specific way. That is, each corpus is saved in one txt file containing a number of lines that equals the number of different short stories in each corpus. In other words, each line contains one short story. The first corpus, CRW, which contains fifteen creative writing short stories, is converted into a 15-lines txt file. Similarly, the short stories TBCRW_raw is converted into one txt file containing two hundred and sixteen lines, which reflects the number of short stories J-D translated before writing his own collection of short stories, as the Table 4 below shows:

TABLE 4: CRW AND TBCRW_RAW SIZE

CRW TBCRW_raw Number of lines/ 15 216 documents

The second step in the LSA analysis requires pre-processing the corpus by applying a stoplist on the data. The stoplist contains words such as prepositions and articles that have less lexical meaning compared to content words such as verbs or nous. The importance of including such a list stems from the fact that LSA should capture the relation between meaningful words in order to reach a good level of accuracy. After pre-processing the corpus using the stoplist, LSA represents the corpus documents in

27 An open source topic-modeling tool that is implemented in the Python programing language (see the methodology Chapter).

81

a vector format within a dimensional space (See Figure 2 page 59). The size of the vector containing the number of latent themes based on which the similarity experiment is conducted, must be determined. The thematic similarity or relevancy between documents in a specific corpus is determined by calculating the cosine angle between their vectors. In the literature, different researchers used different dimensionality in their experiments (Nakov). Typically, to choose the vector value, one runs experiments with different values (e.g., 50, 100, 200 or 300), depending on the research questions and the corpus size in the study, and then selects the value that gives the most accurate results. In this experiment, LSA was run with V= 75, 100,

15028. A manual evaluation of the three LSA outputs showed that the overall results had not changed when changing the V value. Thus, the output of the LSA with V=100 was selected since it is the value between 75 and 150.

4.2.1.1. LSA Results with V=100

Table 5 below displays the LSA output of the experiment featuring the five most thematically relevant short stories translated before the production of creative writing to each creative writing short story in (CRW). The table also shows the cosine value of the angle between the vectors of the translated short stories from

(TBCRW_raw) and the vector of the creative writing short stories (CRW). The

28 There is no existing consensus on the rules determining the vector size in LSA experiments. The researcher determines the V value based on the corpus size along with a manual evaluation of LSA output of different V values tested.

82

creative writing short stories are represented as Q (1—15). The Doc column lists the short story number that was translated by J-D after the production of his creative writing in the first translational corpus (TBCRW_raw). The similarity column shows the cosine value of the angle between the vectors of the translated short story (Doc column) and the creative writing short story (Q1—15). For example, Q-1 is short story number one in the creative writing corpus. Doc number 166 represents the translated short story number one hundred and sixty six in the translational corpus

(TACRW_raw). The similarity column shows the cosine value of the angle between the vector of the translated short story number one hundred and sixty six in

(TACRW_raw) and the vector of the short story number one in the creative writing corpus (CRW), which is 0.63.

83

TABLE 5: LSA OUTPUT OF SIMILARITY QUERY ANALYSIS ON TBCRW_RAW

As explained in the methodology chapter, the similarity cutoff for this experiment is set to 0.70. The similarity cutoff value determines the cosine value between two vectors above which the translated short story is considered thematically similar to the short stories in the creative writing corpus. Thus, any short story with less than

0.70 cosine value is considered less thematically relevant to the creative writing short story. Based on the pre-defined cosine value cutoff, the tables above show that none of the translated short stories that J-D produced before engaging in his creative

84

writing has a significant thematic similarity to the themes of his own writing. The exposition of the LSA findings in a table format might not provide an easy read of the results. Therefore, the following graphs offer a visualization of the LSA results in form of spatial graphs:

FIGURE 6: LSA EXPERIMENT 1 RESULTS (Q1--Q5)

Short Stories translated by J-D Similarity Cutoff ≥ 0.70 from TBCRW_raw

D90 Q5 D3 D2 D70 D166 0.80 Q4

D89 D150 D167 D157 D154 0.60 Q3 D30

D17 D26 D156 D15 0.40 Q2 D70

D24 D2 D173 D166 0.20 Q1

D90 D70 D166 D2 D79

0.20 0.40 0.60 0.80 The first creative writing short story Short Stories translated by J-D from TACRW_raw Similarity Cutoff space (≥ 0.70)

85

FIGURE 7: LSA EXPERIMENT 1 RESULTS (Q6--Q10)

Short Stories translated by J-D Similarity Cutoff ≥ 0.70 from TBCRW_raw

D2 D166 Q10 D24 D173 D68 0.80 Q9 D90 D79 D70 D166 D2 0.60 D2 Q8 D166 D90 D24 D173 0.40 Q7 D70

D90 D92 D166 D2 0.20

D2 Q6 D4 D166 D70 D90

.20 .40 .60 .80 Short Stories translated by J-D from TACRW_raw Similarity Cutoff space (≥ 0.70)

FIGURE 8: LSA EXPERIMENT 1 RESULTS (Q11--Q15)

Short Stories translated by J-D Similarity Cutoff ≥ 0.70 from TBCRW_raw

D68 Q15 D169 D2 D90 D79 0.80 Q14 D53 D90 D176 D92 D169 0.60 Q13 D70 D23 D2 D166 D90 0.40 D2 D4 Q12

D70 D166 D152 0.20

D68 D173 D2 Q11 D166 D152

0.20 0.40 0.60 0.80 Short Stories translated by J-D from TACRW_raw Similarity Cutoff space (≥ 0.70)

86

The above graphs visualize the results of the LSA queries and demonstrate the relation between the short stories J-D translated before the production of his creative writing and (TBCRW_raw) and the creative writing sort stories in (CRW). Q1—Q15 are used to refer to the process of using the fifteen creative writing short stories as a query documents. The blue rectangles represent the translated short stories from

(TBCRW_raw). In each query, the first five results, which represent the five translated short stories most thematically relevant to the creative writing short stories, are presented in the graphs above. It can be noticed that none of the translated short stories that were translated before the production of J-D’s creative writing fall in the similarity cutoff space in the three graphs; however, some translated short stories are more thematically relevant than other translated short stories in the same corpus

(TBCRW_raw) to the creative writing stories in (CRW).

4.2.2. LSA Similarity Query on J-D’s Translation after Creative Writing

The second LSA experiment reported in this section attempts to reveal the thematic similarity between J-D’s creative writing and the translated short stories produced after the production of the creative writing short stories. The most thematically similar five short stories in the second translational corpus containing short stories translated after the production of creative writing (TACRW_raw) in relation to each creative writing short story (CRW) are retrieved. The output of this

LSA experiment will be used to build (TBCRW) corpus, which contains the fifteen most thematically relevant short stories translated by J-D after the production of his

87

creative writing. The following figure shows this process:

FIGURE 9: LSA ANALYSIS OF TACRW_RAW

Query Document (Doc1..15 in CRW) Query' 1 TACRW 4 Results

Target' 2 Corpus LSA (TACRW_ Processing raw) 3

As with the previous experiment, TACRW_raw was saved in a txt file containing one hundred and seven short stories. Each short story is presented in a separate line (See table 3 below). The stoplist was then applied on the two corpora (CRW and

TACRW_raw).

TABLE 6: CRW AND TACRW_RAW SIZE

CRW TACRW_raw Number of lines/ short 15 107 stories

As for Vector value, the second LSA experiment was run with two vector values: V =25 and V =50, considering the size of the corpus. It is worth mentioning that determining the V value depends on the size of the corpus. Since (TACRW_raw) is smaller than (TBCRW_raw), the V value will logically be lower than the V value of the first LSA experiment considering the size of the two corpora. After a manual evaluation of the LSA analysis results with V = 50 and V =100, I noticed that the

88

overall results had not changed when changing the V value. However, the cosine value decreased when setting the experiment with a higher V value although the number of the relevant short stories remained the same. For consistency purposes, I selected V =50 which represent almost half the number of the short stories in the

(TACRW_raw). The same principle was applied in the first experiment where V value was set to 100, which is half the number of short stories in (TBCRW_raw).

4.2.2.1. LSA Results with V=50

The following table display the LSA output of the experiment featuring the five most thematically relevant short stories translated after the production of creative writing to the creative writing short stories (CRW). The table also shows the cosine value of the angle between the vectors of the translated short stories from

(TACRW_raw) and the vector of the creative writing short stories (CRW). The creative writing short stories are represented as Q (1—15). The Doc column lists the short story number that was translated by J-D after the production of his creative writing in the second translational corpus (TACRW_raw). The similarity column shows the cosine value of the angle between the vectors of the translated short story

(Doc column) and the creative writing short stories (Q1—15). For example, Q-1 represents short story number one in the creative writing corpus. Doc number 28 represents the translated short story number twenty-eight in the translational corpus

(TACRW_raw). The similarity column shows the cosine value of angle between the vector of the translated short story number twenty-eight in (TACRW_raw) and the

89

vector of the short story number one in the creative writing corpus (CRW), which is

0.73.

TABLE 7: LSA OUTPUT OF SIMILARITY QUERY ANALYSIS ON TACRW_RAW

90

Based on the pre-defined similarity cutoff (cosine value ≥ 0.70), Table 7 above shows that many of the translated short stories that J-D produced after engaging in his creative writing have a significant thematic similarity to themes in his own writing. The following graphs in particular demonstrate the thematic similarity between the creative writing short stories and the translated short stories that were translated after the production of the creative writing short stories (in TACRW_raw):

FIGURE 10: LSA EXPERIMENT 2 RESULTS (Q1--Q5)

Similarity Cutoff ≥ 0.70

D69 D28 D31 Q5 D23 D6

0.80 D6 Q4 D23 D91 D106 D12

0.60 Q3 D37 D28 D6 D23 D3 0.40 D23 Q2 D6

D31 D28 D35 0.20 Q1 D28 D23 D6 D35 D31

0.20 0.40 0.60 0.80 Short Stories translated by J-D from TACRW_raw Similarity Cutoff space (≥ 0.70)

91

FIGURE 11: LSA EXPERIMENT 2 RESULTS (Q6--Q10)

D28 Q10 D9 D23 D6 D35

0.80

D31 D6 D28 Q9 D35 D23

0.60 D31 D6 D28 Q8 D6 D23

0.40

D6 D28 Q7 D35 D23 D31 0.20

D31 D28 Q6 D23 D6 D35

0.20 0.40 0.60 0.80 Short Stories translated by J-D from TACRW_raw Similarity Cutoff space (≥ 0.70)

FIGURE 12: LSA EXPERIMENT 2 RESULTS (Q11-Q15)

D28 Q15 D23

D72 D6 D69 0.80

D100 D23 Q14 D69 D28 D6 0.60

D30 D28 D6 Q13 D9 D23 0.40

D23 D28 Q12 D31 D100 D6

0.20

D9 D6 D23 D28 Q11 D35

0.20 0.40 0.60 0.80 Short Stories translated by J-D from TACRW_raw Similarity Cutoff space (≥ 0.70)

92

The above graphs visualize the results of the LSA queries and demonstrate the thematic relation between the short stories in both corpora (CRW and

TACRW_raw). Each graph represents five creative writing short stories (Q1—Q5,

Q6—10, Q11—Q15) and shows their most thematically relevant counterparts in the translational corpus (TACRW_raw). The triangles in the above graphs represent the translated short stories from (TACRW_raw). It can be noticed that a number of translated short stories fall within the similarity cutoff space in the three graphs. It can also be observed that only four translated short stories are relevant to the fifteen creative writing short stories.

4.3. Conclusion

According to the findings, the first LSA experiment on the relation between

(TBCRW_raw) and (CRW) revealed that there was no significant thematic similarity between the short stories translated by J-D before the production of his creative writing and his creative writing short stories. On the other hand, the second LSA experiment revealed that there is a significant thematic similarity between the short stories translated after the production of creative writing short stories (TACRW_raw) and the creative writing ones in (CRW). These findings will be further discussed in

Chapter 6 in the light of the research hypotheses that motivated this study. The following chapter offers a second set of experiments that made use of corpus stylistic and machine learning stylometry methods to investigate a selection of style markers specific to J-D’s translations and creative writings.

93

CHAPTER 5: CORPUS STYLISTICS AND MACHINE LEARNING

ANALYSIS RESULTS

5.1. Introduction

This chapter is divided into two parts. The first part reports the corpus analysis results as derived from the WordSmith tool. It also reports the results of the one-way independent samples Analysis of Variance (ANOVA). The second part reports the results of the machine learning experiments. The corpus and the machine learning experiments described in this chapter are meant to test the third hypothesis, which claims that the short stories J-D translated before the production of creative writing display some stylistic markers that are also found in his creative writing.

These two experiments also address the fourth hypothesis, i.e., J-D’s creative writing short stories display some stylistic markers that are also found in the short stories he translated after the production of creative writing The corpora analyzed in this chapter are built based on the results of the two LSA experiments that were conducted in the previous chapter. That is, the chapter analyzes a set of short stories translated by J-D before and after the production of his creative writing and which are thematically relevant to his creative writing short stories. This include the following corpora:

TABLE 8: STUDY CORPORA FROM THE LSA RESULTS

Corpus Description Size in Text size range words in words TBCRW A translational corpus containing the 61477 2,561 – 8820 fifteen most thematically relevant translated short stories to J-D’s creative writing and which are translated before

94

the production of J-D’s creative writing. CRW Creative writing corpus 51280 1245 – 7460 TACRW A translational corpus containing the 49366 840 – 6670 fifteen most thematically relevant translated short stories to J-D’s creative writing and which are translated after the production of J-D’s creative writing.

5.2. Corpus Analysis

This section applies a corpus-based approach to translator style and analyzes

J-D’s style in the corpus of short stories translated before the production of creative writing (TACRW), the corpus of Creative Writing short stories (CRW) and the corpus of short stories translated after the production of creative writing (TACRW).

The stylistic analysis in this section focuses on some style-markers including

Standardized Type-Token Ratio (STTR), mean sentence length and punctuation marks (commas, hyphens and semicolons). The goal of this analysis is to trace the stylistic impact of J-D’s translating activity on his creative writing activity and in turn see if the creative writing activity has any stylistic impact on J-D’s translating activity. The following sections report the corpus analysis results of the three style markers.

5.2.1. Textual Analysis

5.2.1.1. Standardized Type-Token Ratio

STTR reveals the degree of vocabularies diversity in a text or the vocabulary richness of the text producer. As mentioned earlier, STTR has been used

95

as a style marker to analyze translators and authors style. This section provides the

STTR results of three corpora produced by J-D. The following table displays the

STTR analysis results of the three corpora in this study (TBCRW, CRW, and

TACRW). The STTR results in Table 9 below are based on textual chunks of 50029 words.

TABLE 9: STTR SCORE IN THE THREE CORPORA

The above table shows the mean STTR in J-D’s translation before creative writing (TBCRW), in J-D’s creative writing (CRW) and in J-D’s translation after creative writing (TACRW). It can be noticed that there is not a significant difference

29 The STTR analysis in this study was set to 500 because one of the short stories J-D translated after the production of his creative writing contains less than 1000 words and the WordSmith tool did not provide the STTR for this specific short story unless the STTR is measured for each 500 words.

96

between the mean score of STTR in the translations that J-D produced before and after the production of his creative writing (TBCRW and TACRW). However, there is a noticeable difference between the mean score of STTR in the two translational corpora (TBCRW and TACRW) and the creative writing corpus (CRW). The difference between the mean score of STTR in the three corpora will be further investigated using statistical significance analysis.

5.2.1.2. Mean Sentence Length

As explained in the methodology chapter, mean sentence length analysis calculates the average number of words in a sentence. A sentence is defined as “the full-stop, question-mark or exclamation-mark (.?!) and immediately followed by one or more word separators and then a number or a currency symbol, or a letter in the current language which isn't lower-case” (Scott 317). Table 10 below shows the mean sentence length in each short story in the three corpora along with the overall mean sentence length in each corpus:

TABLE 10: MEAN SENTENCE LENGTH SCORE IN THE THREE CORPORA

Mean Mean Sentence Mean Sentence TBCRW CRW TACRW Sentence Length Length Length 15.txt 25.52 001.txt 17.71 1003.txt 13.26 150.txt 16.11 0010.txt 22.91 1006.txt 14.69 154.txt 13.56 0011.txt 14.51 1009.txt 17.89 166.txt 27.97 0012.txt 19.97 1012.txt 16.75 167.txt 25.61 0013.txt 23.16 1023.txt 14.99

97

169.txt 15.21 0014.txt 21.61 1028.txt 20.29 173.txt 12.23 0015.txt 18.08 1030.txt 19.39 176.txt 15.28 002.txt 22.60 1031.txt 21.83 2.txt 21.63 003.txt 22.81 1035.txt 10.96 24.txt 31.39 004.txt 23.25 1037.txt 12.17 26.txt 20.77 005.txt 20.39 1069.txt 42.58 4.txt 16.04 006.txt 20.79 1072.txt 22.55 68.txt 18.09 007.txt 19.02 1091.txt 20.00 70.txt 37.53 008.txt 22.73 1100.txt 15.34 90.txt 23.60 009.txt 20.63 1106.txt 20.13 Mean S Mean S ± 21.37 7.23 ± 20.68 2.50 Mean SD 18.85 7.45 D ± D ± ± ±

As displayed in the above table, the mean sentence length in the first translational corpus, translation before creative writing (TBCRW) ranges from 37.53 to 12.23, while the range in the creative writing corpus (CRW) starts from a lower score, 23.25 to 14.51 words. The overall mean score of sentence length in two corpora is not very distant. The score of the sentence length mean in the second translational corpus, translation after creative writing (TACRW) ranges from 42.58 to

12.17. It can be noted that the overall mean sentence length scores in the TBCRW and CRW corpora are close to each other compared to the TACRW corpus. However, this does not mean that the mean sentence length in TACRW is significantly different from the mean sentence length in the other two corpora, Before Creative Writing and

Creative Writing corpora. The statistical significance of mean sentence length is tested in the second section of this chapter.

5.2.2. Punctuation Marks Analysis Several studies in Stylometry consider the use of punctuation markers as a

98

viable style marker for authorial style analysis. In this regard, Li, Zheng, and Chen argue that “incorporating punctuation frequency as a feature can improve the performance of authorship identification” (80). As mentioned in the methodology chapter this study provides analysis of three punctuation marks including hyphens, semicolons and commas. This analysis relies on frequency scores of punctuation.

However, since the text size could affect this kind of analysis and given the fact that the size of the three corpora in this study is not equal, the frequency of each punctuation mark should be calculated per 1000 words. Therefore, texts size as a variable will not have any effect on the punctuation marks analysis.

5.2.2.1. Standardized hyphen Analysis

Several studies rely on hyphens as a style-marker to either determine the authorship of a particular text or to analyze the authorial style of the text producer.

For instance, Narayanan et al. and Chaski have used hyphen as a style marker and reported that the hyphen, among other punctuation marks, does help identify authors.

Hyphen analysis was applied on the three corpora. The following table shows hyphen frequency ratio per 100030 words in each text in the three corpora (TBCRW, CRW and TACRW):

30 The WordSmith tool is defaulted to calculate punctuation marks in textual chunks of 1000 words. WordSmith tool provided the punctuation mark ratio for the short story that contains less thank 1000 in this study; however, the tool did not calculate the ration of STTR of the same short story in textual chunks of 1000 words.

99

TABLE 11: STANDARDIZED HYPHEN SCORE IN THE THREE CORPORA

Hyphen TBCR Hyphen _per Hyphen _per CRW TACRW _per W 1,000 1,000 1,000 15.txt 11.37 001.txt 8.94 1003.txt 8.96 150.txt 5.36 0010.txt 9.37 1006.txt 5.61 154.txt 6.86 0011.txt 8.95 1009.txt 8.56 166.txt 2.6 0012.txt 5.56 1012.txt 6.56 167.txt 2.97 0013.txt 9.04 1023.txt 4.84 169.txt 3.19 0014.txt 12.69 1028.txt 2.93 173.txt 5.55 0015.txt 10.23 1030.txt 2.41 176.txt 4.12 002.txt 9.64 1031.txt 3.86 2.txt 3.59 003.txt 9.59 1035.txt 4.5 24.txt 8.67 004.txt 11.63 1037.txt 6.44 26.txt 6 005.txt 5.27 1069.txt 4.38 4.txt 14.71 006.txt 14.54 1072.txt 6.97 68.txt 2.75 007.txt 8.72 1091.txt 2.25 70.txt 4.55 008.txt 10.74 1100.txt 0.91 90.txt 4.13 009.txt 5.83 1106.txt 1.97 Mean 4.74 2. ± 5.76 3.44 Mean SD 9.38 2.54 Mean SD ± SD ± ± ± ± 42

As the above table shows, the mean standardized hyphens score in the first translational corpus, translation Before Creative Writing (TBCRW) ranges from 2.6 to 14.71. It ranges in the Creative Writing corpus (CRW) from 5.27 to 14.54 and from

0.091 to 8.96 in the translation After Creative Writing (TACRW) corpus. The overall mean standardized hyphens score in the first translational corpus (TBCRW) is 5.76, which is lower than the average of the total occurrences of hyphens in the creative writing corpus (CRW), 9.38. It is also observed that the mean standardized hyphens

100

score in the Creative writing corpus is the highest while the total scores of hyphens in the two translation corpora is lower and both corpora are situated close to each other.

5.2.2.2. Standardized Comma Analysis

Another punctuation mark that is widely used in Stylometry as a style- marker is the comma (Li, Zheng, and Chen ). The following table shows the mean

Standardized Comma score in each short story in the three corpora (TBCRW, CRW,

TACRW) along with the overall standardized comma score for each corpus.

TABLE 12: STANDARDIZED COMMA SCORE IN THE THREE CORPORA

Comma Comma Comma TBCRW _per CRW _per TACRW _per 1,000 1,000 1,000 15.txt 68.47 001.txt 38.06 1003.txt 42.69 150.txt 65.50 0010.txt 33.65 1006.txt 53.97 154.txt 57.81 0011.txt 50.25 1009.txt 49.01 166.txt 44.21 0012.txt 38.26 1012.txt 60.49 167.txt 44.85 0013.txt 49.70 1023.txt 41.97 169.txt 38.79 0014.txt 41.69 1028.txt 54.55 173.txt 44.66 0015.txt 43.59 1030.txt 50.07 176.txt 37.89 002.txt 38.99 1031.txt 54.1 2.txt 64.03 003.txt 33.00 1035.txt 40.04 24.txt 75.48 004.txt 25.84 1037.txt 48.8 26.txt 47.29 005.txt 53.42 1069.txt 45.41 4.txt 49.89 006.txt 44.59 1072.txt 53.57 68.txt 36.86 007.txt 44.59 1091.txt 48.48 70.txt 54.53 008.txt 40.09 1100.txt 46.73 90.txt 36.51 009.txt 44.68 1106.txt 43.69 Mean±SD 51.12±12.57 Mean±SD 41.36±7.21 Mean±SD 48.90±5.67

101

The above table displays the results of standardized comma analysis in the three corpora. The mean score of standardized comma in the first translational corpus translation Before Creative Writing (TBCRW) is the highest, 51.12, followed by the mean score of standardized comma in the second translational corpus, 48.90, translation After Creative Writing (TACRW). It is clear that the mean scores of standardized comma for the two translational corpora are relatively close to each other compared to the mean score of standardized comma in the Creative Writing corpus (51.12).

5.2.2.3. Standardized Semicolon Analysis

The semicolon as a style marker has been used in many studies investigating authorial style (Hänlein; Ramyaa, Rasheed, and He). Ramyaa, Rasheed, and He pointed out that “semicolons indicate the reluctance of an author to stop a sentence where (s)he could”. That being said, semicolon analysis might reveal the idiosyncratic style of authors in using this specific punctuation mark. Table 13 below provides the standardized semicolon score for each short story along with the overall score of each corpus:

TABLE 13: STANDARDIZED SEMICOLON SCORE IN THE THREE CORPORA

SemiCo SemiCo SemiCo_per TBCRW _per CRW _per TACRW 1,000 1,000 1,000 15.txt 4.87 001.txt 1.65 1003.txt 1.49 150.txt 0.60 0010.txt 1.49 1006.txt 1.40

102

154.txt 1.96 0011.txt 0.87 1009.txt 1.18 166.txt 1.49 0012.txt 0.98 1012.txt 1.43 167.txt 1.85 0013.txt 1.57 1023.txt 1.21 169.txt 1.59 0014.txt 4.23 1028.txt 1.56 173.txt 3.05 0015.txt 2.44 1030.txt 0.96 176.txt 0.82 002.txt 1.68 1031.txt 1.07 2.txt 4.57 003.txt 8.44 1035.txt 0.50 24.txt 3.35 004.txt 0.65 1037.txt 1.02 26.txt 3.00 005.txt 2.26 1069.txt 1.25 4.txt 2.68 006.txt 0.79 1072.txt 1.99 68.txt 1.93 007.txt 0.25 1091.txt 0.00 70.txt 5.19 008.txt 2.51 1100.txt 0.00 90.txt 0.00 009.txt 1.94 1106.txt 1.13 Mean±SD 2.46 ±1.55 Mean±SD 2.12±2.0 Mean±SD 1.08±.54

The above table displays the mean score of standardized semicolon in the three corpora. It can be noticed that the mean score of semicolons in the first translational corpus, translation Before Creative writing (TBCRW), is the highest with a mean score of 2.46. The mean score of standardized semicolon in the Creative writing corpus (CRW) is 2.12, which is not very different from that of the first transitional corpus. However, the mean of standardized semicolon score in the second translational corpus, translation After Creative Writing (TACRW) was 1.08, which is lower than that of the other two corpora.

5.2.3. SPSS Statistical Analysis

The results of the corpus analysis in the above section was then verified using one-way ANOVA test in order to determine if there is a significant difference

103

between the mean scores of the above discussed style-markers under the three conditions (Translation Before Creative Writing, Creative Writing, and Translation

After Creative Writing). Post hoc groups comparison using the Tukey HSD test was run also when there was a significant difference between the mean scores of style markers under the three condition in order to determine which groups (translation before, creative writing and translation after) are significantly different from each other and which groups are not considering the different style markers. This analysis would help solidify any conclusion drawn from the possible effects of J-D’s translating activity on his creative writing activity and vice versa. The following sections report the AVOVA and the Tukey’s HSD test results.

5.2.3.1. Textual Analysis

5.2.3.1.1. Standardized Type-Token Ratios (STTRs)

Mean Standard Type/Token Ratios (STTRs) for the three conditions (Before

Creative Writing, Creative Writing, and After Creative Writing) of the independent variable (Text Production Activity) were submitted to one-way independent samples

Analysis of Variance (ANOVA). There was a significant effect of Text Production

Activity, F(2,42) = 4.338, p = .019 on the mean STTRs of the three conditions. Thus, at least two of the mean STTRs, for Before Creative Writing [M = 49.27, SD = 2.62],

Creative Writing [M = 51.19, SD =1.48], and After Creative Writing [M = 49.06, SD

= 2.28], were significantly different.

104

Post hoc comparisons using the Tukey HSD test indicated a significant difference between mean STTR of Creative Writing and mean STTR of After

Creative Writing (p = .028). The difference between mean STTR of Before Creative

Writing and mean STTR of Creative Writing approached significance (p = .052).

However, there was no significant difference between mean STTR of Before Creative

Writing and mean STTR of After Creative Writing (p = .963).

5.2.3.2. Mean Sentence Length

Sentence Length Means for the three conditions (Before Creative Writing,

Creative Writing, and After Creative Writing) of the independent variable (Text

Production Activity) were submitted to one-way independent-samples Analysis of

Variance (ANOVA). There was not a significant effect of Text Production Activity,

[F (2,42) =0.665, p = .520], on the three conditions (Before Creative Writing, Creative

Writing, and After Creative Writing).

5.2.3.3. Punctuation Marks analysis

5.2.3.3.1. Standardized Comma analysis

Mean standardized comma scores for the three conditions (Before Creative

Writing, Creative Writing, and After Creative Writing) of the independent variable

(Text Production Activity) were submitted to a one-way independent-samples

Analysis of Variance (ANOVA). There was a significant effect of Text Production

Activity, [F (2, 42) = 0.665; p = 0.013] on the standardized comma means in the three

105

conditions. Thus, at least two of the mean standardized comma score, for Before

Creative Writing [M= 4.186, SD= .191], Creative Writing [M= 4.29, SD= .072], and

After Creative Writing [M= 4.186, SD= .191], were significantly different.

Post hoc comparisons using the Tukey HSD test indicated a significant difference between the mean standardized comma score of Creative Writing and the mean standardized comma score of Before Creative Writing (p = .013). However, there was no significant difference between mean standardized comma score of After

Creative Writing and mean standardized comma score of Creative Writing (p = .067).

There was no significant difference between the mean standardized comma score of

After Creative Writing and the mean standardized comma score of Before Creative (p

= .136).

5.2.3.3.2. Standardized Hyphen analysis

Mean standardized hyphen ratio for the three conditions (Before Creative

Writing, Creative Writing, and After Creative Writing) of the independent variable

(Text Production Activity) were submitted to a one-way independent-samples

Analysis of Variance (ANOVA). There was a significant effect of Text Production

Activity, [F (2, 42) = 11.03; p = 0.00] on the mean standardized hyphen ratio for the three conditions. Thus, at least two of the mean standardized hyphen ratios, for

Before Creative Writing [M= 5.76, SD= 3.44], Creative Writing [M= 9.38, SD=

2.54], and After Creative Writing [M= 4.74, SD= 2.42], were significantly different.

106

Post hoc comparisons using the Tukey HSD test indicated a significant difference between the mean standardized hyphen score of Creative Writing and the mean standardized hyphen score of Before Creative Writing (p = .003). There was a significant difference between the mean standardized hyphen score of After Creative

Writing and the mean standardized hyphen score of Creative Writing (p = .000).

However, there is no significant difference between mean standardized hyphen score of After Creative Writing and the mean standardized hyphen score of Before Creative

Writing (p = .593).

5.2.3.3.3. Standardized Semicolon analysis

Mean standardized semicolon scores for the three conditions (Before Creative

Writing, Creative Writing, and After Creative Writing) of the independent variable

(Text Production Activity) were submitted to a one-way independent-samples

Analysis of Variance (ANOVA). There was a significant effect of Text Production

Activity, [F (2, 42) = 3.457; p = .041] on the mean standardized semicolon scores of the three conditions. Thus, at least two of the mean standardized semicolon scores, for

Before Creative Writing [M= 2.46, SD= .40], Creative Writing [M= 2.11, SD= 2.00], and After Creative Writing [M= 1.08, SD= .54], were significantly different.

Post hoc comparisons using the Tukey HSD test indicated a significant difference between the mean standardized semicolon score of Before Creative

Writing and the mean standardized semicolon score of After Creative Writing (p =

.040). However, there was not a significant difference between the mean standardized

107

semicolon of Before Creative Writing and the mean standardized comma of Creative

Writing (p = .802). There was no significant difference between mean standardized semicolon ratio of After Creative Writing and mean standardized comma of After

Creative Writing (p = .153).

5.3. Machine Learning Stylometry

Machine learning stylometry is used in this section to determine which of J-

D text production activities (translation Before Creative Writing, Creative Writing and translation After Creative Writing) is stylistically closer to the other. The stylometric analysis of J-D’s translations and creative writings is based on the analysis of three style markers including character n-grams31, Part of Speech (POS) n-grams and word n-grams. In the machine learning stylometry experiments reported in this chapter, J-D’s translations before creative writing (TBCRW) and after creative writing (TACRW) are used as training corpora. The creative writing short stories

(CRW) are used as query documents in order to reveal the extent to which the authorial stylistic profile of J-D in his creative writing is close to his translational stylistic profile in the translations produced before and after creative writing. Figure

13 below shows how machine-learning stylometric analysis is applied in this study:

31 N-gram is a sequence of textual data in n size. For example word 3-gram is a cluster consisting of three words. In the same manner, character 3-gram is a cluster of three characters for example 3-grams of the word happy would be “hap”, “app”, and “ppy”.

108

FIGURE 13: MACHINE LEARNING TRANSLATOR STYLE DETECTION (ADOPTED

FORM EFSTATHIOS STAMATATOS)

Training translational corpus TBCRW

TT 1

+ J-D translational TT 2 = Profile in TBCRW + (S_TB) TT... n Creative writing Step 3 short stories Step 1 Step 4 J-D authorial CRW_1 Profile in . Style Comparison CRW . Training translational (S_CRW) CRW...15 corpus TACRW

TT C J-D + translational Result TT Profile in = D TACRW (S_TA) + Step 5 TT... n Step 2

First, the machine is trained on the style of J-D in his translations before the production of creative writing (TBCRW) and the translation after creative writing

(TACRW) based on a predefined set of style-marker such as character n-gram or word n-gram (steps 1 and 2). The machine, internally, analyzes and recognizes the stylistic patterns J-D in (TBCRW), (S32_TB), (step 1). In the same manner, the machine takes the texts in the second translational corpus (TACRW) as an input to learn the style of J-D in those texts (S_TA), (step 2). After that, the machine is given

32 S stands for style

109

J-D’s creative writing short stories. It analyzes his style, using the same methods and the same style-markers that were used to analyze the style in the two training corpora, and produces J-D’s authorial stylistic profile (S_CRW), (step 3). As a final step, the machine compares the three stylistic profiles (S_TA, S_TB and S_CRW) (Step 4) and determines the most relevant translational stylistic profile (S_TA or S_TB) to the authorial stylistic profile (S_CRW), (step 5).

5.3.1. JGAAP Tool

As pointed out in the methodology chapter, the stylometric analysis in this chapter makes use of the Java Graphical Authorship Attribution Program (JGAAP), a free machine-learning stylometry tool. JGAAP is a Java-based program for stylometric analysis developed by the Evaluating Variation in Language Laboratory

(EVL Lab) at Duquesne University, Pennsylvania. Figure 14 below shows a screenshot of the JGAAP tool interface:

110

FIGURE 14: JGAAP TOOL INTERFACE

The above screenshot displays the interface of the JGAAP tool and shows that J-D’s creative writing short stories are used to process the stylometric queries, which will reveal which translational stylistic profile (translation before or after creative writing) is closer to the authorial stylistic profile of J-D in his creative writing. The screenshot also shows that the two translational corpora (translation before and translation after creative writing) are used as training corpora. It is worth mentioning the term stylistic profile refers to the internal pattern resignation that is built by the machine based on the style marker analysis applied on the data. This machine learning pattern recognition is used by the machine to determine the closeness of the other compared patterns in the other sets of data (other stylistic profiles). That being said, the user cannot see the stylistic profile (the recognized patterns) of the authors during the processing stage of the analysis.

111

5.3.2. Corpus Pre-processing

The JGAAP tool provides the users with the ability to conduct automatic corpus pre-processing based on the user’s input. Before conducting the stylometric analysis, the three corpora in this study, (TBCRW, CRW and TACRW), were pre- processed using two canonicalization methods. Canonicalization is a normalization process that converts the data that has two different representations into one standard representation. An example of canonicalization is converting all capital letters in a corpus to small letters. The first canonicalization method applied on the three corpora in this study is normalizing white space. It is a process that converts all whitespace characters such as newline, space and tab to a single space. This will ensure that any space produced by the text conversion processes is normalized across texts. The second method is normalizing the textual data based on The American Standard Code for Information Interchange (ASCII)33. This process guarantees that the texts analyzed do not contain any non-printing characters. It will also removes any characters that are not included in the ASCII table, which include printable characters, a-z, A-Z, digits 0-9, punctuation marks, and some different symbols.

33 A character-encoding scheme

112

5.3.3. JGAAP Analysis Method

The JGAAP provides different methods of analysis/ analysis algorithms.

The analysis methods that were used in the machine learning experiments reported in this chapter are Nearest Neighbor Driver with metric Cosine Distance. Nearest

Neighbor is an algorithm that presents the similarity/distance between document vectors in a dimensional space. Cosine Distance is another algorithm that calculates the cosine distance between vectors to determine the similarity between documents based on the distance of their vectors from each other. Figure 15 below exemplifies this process:

FIGURE 15: VECTORS OF MADE UP DOCUMENTS

In the above dimensional space Dim1, Doc 1 and Doc 2 represent two vectors of two different documents. Q Doc represents another vector that represents a query document. If the purpose is to determine which Doc (1 or 2) is similar to the

113

query document Q Doc using Cosine Distance approach we can say that Doc1 is more like Q Doc by noting the angles between the vectors. The smaller the size of the angle, the closer the vectors and the more similar the documents are.

5.3.4. Style Markers Analysis

5.3.4.1. Character n-gram analysis

As pointed out in the methodology chapter, the current study uses character n-gram with n=3 in order to allow a deeper stylistic analysis. This would help reveal imporatnt stylistic information such as the use of punctuation marks or lexical information such as word class. Figure 16 below shows a screenshot of the first two results of the character 3-gram analysis. It also shows creative writing short stories number one and ten and their most stylistically relevant short stories in the translational corpora (translation before or after creative writing) along with the canonization and the analysis methods applied.

FIGURE 16: MACHINE LEARNING CHARACTER 3-GRAM ANALYSIS

114

The character n-gram analysis with n=3 revealed that J-D’s translational stylistic profile34 in the short stories he translated after the production of creative writing (TACRW) is closer to his authorial stylistic profile35 in his creative writing

(CRW). The analysis showed that the authorial stylistic profile of J-D in thirteen short stories was closer to his translational stylistic profile in the short stories he translated after the production of creative writing (TACRW). The analysis also revealed that J-

D authorial stylistic profile in only two of his creative writing short stories is similar to his translational stylistic profile in the short stories he produced before the production of creative writing.

5.3.4.2. Part-of-Speech (POS) Analysis

POS n-gram analysis reveals syntactic patters related to the text author. In this experiment, the goal is to investigate whether or not the syntactic style of J-D in the translations produced before creative writing impacted his syntactic style in his creative writing. In order to perform this type of analysis, this study uses an automatic

POS Tagger that is embedded in the JGAAP tool. Using JGAAP, POS n-gram analysis is conducted with n= 2, 3, 4. Figure 17 below shows a screenshot of the first two results of the POS 3-gram analysis:

34 Translational stylistic profile refers to the machine internal pattern recognition of the character n-grams in J- D’s translations.

35Authorial stylistic profile refers to the machine internal pattern recognition of the character n-grams in J-D creative writing.

115

FIGURE 17: MACHINE LEARNING POS N-GRAM ANALYSIS

The character POS n-gram analysis with n=3 revealed that J-D’s translational stylistic profile36 in the short stories he translated before the production of creative writing (TACRW) is closer to his authorial stylistic profile37 in his creative writing (CRW). The analysis showed that the authorial stylistic profile of J-D in the fifteen short stories was closer to his translational stylistic profile in the short stories he translated after the production of creative writing (TACRW). It also revealed that none of the creative writing short stories is close to J-D’s translational stylistic profile in the short stories he produced after the production of creative writing. The same experiment was conducted with different n size (n= 2, 3 and 4) and

36 Translational stylistic profile refers to the machine internal pattern recognition of the POS n-grams in J-D’s translations.

37 Authorial stylistic profile refers to the machine internal pattern recognition of the POS n-grams in J-D creative writing.

116

the results had not changed. The results also showed that none of the creative writing short stories’ syntactic style is close to the control corpus syntactic style. This confirms that the syntactic style of the creative writing short stories was impacted by

J-D’s own style in the short stories translated before the production of creative writing. That being said, the syntactic style or conventions of the target language did not have any impact on J-D’s personal syntactic style in his creative writing.

5.3.4.3. Word n-gram Analysis

The last style marker analyzed in this study using machine learning is word n-gram. As pointed out in the methodology chapter, word n-gram analysis reveals lexical patterns related to the author’s own lexical choice and to the document theme or topic. This study sets the size of n= 3 and 4 in the word n-gram analysis. Figure 18 below shows a screenshot of the first two outputs of the word 3-gram analysis:

FIGURE 18: MACHINE LEARNING WORD N-GRAM ANALYSIS

117

The word n-gram analysis with n=3 revealed that J-D’s translational stylistic profile38 in the short stories he translated after the production of creative writing

(TACRW) is closer to his authorial stylistic profile39 in his creative writing. The analysis showed that the authorial stylistic profile of J-D in twelve short stories was closer to his translational stylistic profile in the short stories he translated after the production of creative writing (TACRW). The analysis also revealed that J-D authorial stylistic profile in only three of his creative writing is similar to his translational stylistic profile in the short stories he produced before the production of creative writing. The same experiment was run with a different n size, n=4. The result of a bigger n size showed an increase in the number of the creative writing short stories in which the authorial stylistic profile of J-D is more like his translational stylistic profile in the short stories he translated after the production if his short stories. The number of short stories with n=3 was 12 while with n=4 it increased to

13.

5.3.5. Conclusion

The first part of this chapter presented the corpus quantitative analysis of three style-markers (STTR, mean sentence length and punctuation marks, commas,

38 Translational stylistic profile refers to the machine internal pattern recognition of the word n-grams in J-D’s translations.

39 Authorial stylistic profile refers to the machine internal pattern recognition of the word n-grams in J-D creative writing.

118

hyphens and semicolons) in the three corpora in this study: the corpus of short stories translated before the production of creative writing (TACRW), Creative Writing short stories (CRW) and short stories translated after the production of creative writing

(TACRW_raw). This chapter also reported the results of the one-way (ANOVA) with post hoc comparisons using the Tukey HSD test that was applied on the quantitative data derived from the corpus analysis. The statistical significance analysis revealed that there was a significant difference between STTR of Creative Writing and STTR of the translated short stories produced After Creative Writing. It also shown that there was a difference between STTR of short stories translated before Creative

Writing and STTR of the Creative Writing short stories. The data analysis also revealed that there was no significant difference between STTR of the short stories translated before the production of Creative Writing and STTR of short stories translated After Creative Writing. As for the second style marker, mean sentence length, it was shown that there was no significance difference in the mean sentence length in the three corpora (TBCRW, TACRW and CRW), which means that the sentences in the three corpora are, to a great extent, of a similar length.

The punctuation marks analysis, which included three punctuation marks

(commas, semicolon and hyphen), revealed that there was no significant difference between J-D’s use of comma and semicolons in his creative writing and in the short stories he translated after the production of his creative writing. It was also noticed that there was no significant difference between J-D’s use of comma in the short stories J-D translated before or after the production of his creative writing. However,

119

J-D’s use of commas and in his creative writing was significantly different from his use of the same punctuation mark in the short stories translated before the production of creative writing. The semicolon analysis also revealed that J-D’s use of semicolon in the short stories translated before the creative writing was not different from his use of the same punctuation mark in the creative writing short stories. However, the analysis showed that there was a significant difference between J-D’s use of semicolon in the short stories translated before and after the production of his creative writing. As for the third punctuation mark, the hyphen, the analysis revealed that J-

D’s use of hyphens in his creative writing is different from his use of the same punctuation mark in his translation after and before the production of the creative writing.

The second section of this chapter applied a machine-learning approach to style analysis in which three style markers were analyzed in the three corpora in this study The analysis of the first style marker, character n-gram with n=3, revealed that the authorial stylistic profile of J-D in his creative writing is more like his translational stylistic profile in the short stories he produced after the production of creative writing. However, the analysis of POS n-grams with n=3 has revealed that the authorial stylistic profile of J-D in his creative writing is more like his translational stylistic profile in the short stories he produced before the production of creative writing. The analysis of the last style marker, word n-gram with n=3, has shown that the authorial stylistic profile of J-D in his creative writing is more like his translational stylistic profile in the short stories he produced after the production of

120

creative writing. The findings of this chapter will be further discussed in the

Discussion Chapter in the light of translation theories and the main arguments on the relation between translation and creative writing.

121

CHAPTER 6: DISCUSSION

6.1. Introduction

The present chapter discusses the findings reported in the results chapters in an attempt to interpret the impact of J-D’s translating activity before the production of creative writing on the creative writing activity. This chapter also attempts to interpret the interaction between and the impact of J-D’s creative writing activity on the translations produced after creative writing. It will discuss the study’s findings in the light of present research hypotheses:

1- The short stories J-D translated before the production of creative writing are

close in theme to his creative writing short stories.

2- The short stories J-D translated after the production of creative writing are

close in theme to his creative writing short stories.

3- The short stories J-D translated before the production of creative writing will

display some stylistic markers that are also found in his creative writing

4- J-D’s creative writing short stories will display some stylistic markers that are

also found in the short stories he translated after the production of creative

writing

The findings of this study suggest that the first hypothesis, i.e., that the short stories J-

D translated before the production of creative writing are close in theme to his creative writing short stories, was refuted. However, the second hypothesis, which claims that the themes in J-D’s short stories are close in theme to the short stories translated after the production of the creative writing, was confirmed. The findings

122

also confirmed the third hypothesis, which claims that the short stories J-D translated before the production of creative writing display some stylistic markers that are also found in his creative writing, was confirmed. Last, the fourth hypothesis, which claims that J-D’s creative writing short stories would display some stylistic markers that are also found in the short stories he translated after the production of creative writing, was also confirmed. The last section in this chapter proposes a framework for translator style analysis. The framework also lays down best practices and recommendations for compiling and controlling a corpus and conducting a computational analysis of translator style

6.2. Zooming into the Results

The following sections discuss the study’s findings in the light of the

“translation universals” (Baker “Corpus Linguistics”) and the main arguments on the relation between creative writing and translation. Before commencing the discussion, it might be beneficial to provide a general overview of the main arguments and studies regarding the notion of translation universals and the relation between translation and creative writing.

Translation universals are broadly defined as textual and stylistic features found in translated texts rather than non-translated texts. Shoshana Blum-Kulka was one of the first translation scholars to propose what is called “translation universals”.

She indicated that translated text tends to be more explicit and language seems to be more redundant than non-translated language; this is called “explicitation

123

hypothesis”. Until recently, research on translation universals was based on manual analysis of textual features of translations and their original texts or on the comparison between translated and non-translated texts. In the 1990’s translation scholars started investigating the relation between translated and non-translated texts using the resources of corpus analysis. Baker initiated this kind of research and proposed “translational universals” which she defines as “features which typically occur in translated text rather than original utterances and which are not the result of interference from specific linguistic systems” (Baker, “Corpus Linguistics” 243).

According to Baker the language in translated text is different from the language in non-translated texts. This difference is manifested in some tendencies that appear to be patterns common in translated texts. Those patterns constitute what Baker calls

“universal features of translation” including: explicitation (translators tend to explicate the message of the target text rather than make it implicate), simplification

(is translators’ tendency to use a simplified language in the target text), and normalization (translators tend to conform to the linguistic and stylistic system of the target language). Baker argues that “translated texts record genuine communicative events and as such are neither inferior nor superior to other communicative events in any language. They are however different” (Baker, “Corpus Linguistics” 234). Since the present study investigates some textual features in translated and non-translated texts, it might be beneficial to discuss the data in the light of the “universal feature of translation”. Considering the type of data analysis conducted in this study, only simplification as a universal feature of translation will be tested.

124

The findings of this study will also be discussed by shedding some light on some of the major arguments on the relation between creative writing and translation, mainly those viewing translation as a means to learn creative writing, translation as an inspiration for creative writers and translation as a source for intertextuality in creative writing (Bassnett; Bush and Boase-Beier and Holman). Janis Forman discusses the relation between creative writing and translation and points out that translation was used a means to teach rhetoric in ancient Greek and to train young writers on style and invention. She also indicates that, nowadays, translation is used by several writers to invent style and theme (676). In other words, translation has been used to learn the craft of creative writing and as a source for thematic and stylistic inspiration for creative writers. Some translators are also creative writers, which is case of J-D. It might be very interesting to link the above theoretical discussion to the results of the empirical investigation on the relation and the thematic and stylistic impact of J-D’s translating and creative writing activities on each other to determine if his translating activity helped him develop his writing skills and his own themes and style in his creative writing.

6.3. Thematic analysis

The computational thematic analysis using LSA has revealed that the themes in the short stories J-D translated before the production of his own creative writing were not significantly relevant to those he wrote about in his creative writing.

This finding was also confirmed by the second method of analysis, word n-gram,

125

which is also used in topic discovery (Wang, McCallum, and Wei). The word n-gram analysis showed that there is no significant thematic similarity between the short stories J-D translated before creative writing and his creative writing short stories.

These findings refute the first hypothesis, which claims that the short stories J-D translated before the production of creative writing are close in theme to his creative writing short stories. However, the second LSA experiment, which investigated the thematic similarity between J-D’s creative writing and the short stories he translated after the production of creative writing, was confirmed. It was revealed that J-D’s creative writing activity has impacted his choice of the short stories he translated after the production of his own creative writing. In other words, the themes developed in his creative writing short stories are relatively similar to the themes developed in the short stories he translated after he started his own work. This finding was also confirmed by the word n-gram analysis results, which shows that 87% (13 out of 15 short stories) of J-D’s creative writing short stories are thematically relevant to the short stories he translated after the production of his creative writing. At the thematic level, J-D’s creative writing activity is shown to have impacted his choice of the short stories he translated after the production of creative writing.

6.4. Textual Analysis

6.4.1. STTR

The SPSS analysis of Standardized Type-Token Ratio (STTR) revealed that

J-D’s creative writing has a different degree of lexical variation compared to his

126

translation before and after creative writing. J-D’s creative writing short stories have a higher STTR score compared to his translations. In other words, J-D’s creative writing has a wider range of vocabulary while his translations have less degree of vocabulary richness. The STTR analysis also showed the J-D’s translations produced before and after the production of his creative writing were not significantly different from each other. The fact that the degree of J-D’s vocabulary richness in the short stories he translated before and after his creative writing is similar indicate that, at the lexical level, his style as a translator is less complex than when he writes. Including translations of shorts stories written by different Arab writers, who logically have different lexical complexity levels, would confirm that the STTR results in the translated works are more related to J-D’s own degree of vocabulary richness or lexical complexity in the translated texts. These findings support the simplification hypothesis proposed by Baker, which argues that the translator’s intentional tendency to use a simple language by using less lexical variation.

6.4.2. Mean Sentence length

The SPSS analysis of the mean sentence length of the three short stories corpora revealed that the sentence length mean score in J-D’s creative writing and the mean score of sentence length in the short stories J-D translated before and after the production of creative writing were not significantly different. This means that the sentences in the short stories produced by J-D whether they were translations or non- translations are relatively of a similar length. That is, J-D has a stylistic thumbprint

127

manifested in sentence length and traceable through different texts produced by him whether translated or non-translated. These findings go against the simplification hypothesis proposed by Baker where she indicates that sentences tend to be shorter in translated text compared to non-translated ones. This finding also contradicts Sara

Laviosa findings, which imply that sentences in translated texts tend to be longer than sentences in non-translated texts.

6.5. Punctuation Marks

In chapter five, three punctuation marks (comma, semicolon and hyphen) were analyzed throughout J-D’s creative writings and translations. The significance analysis using SPSS showed that J-D’s use of commas and semicolons in his creative writing and in the short stories he translated after the production of creative writing was relatively similar. This indicates that J-D’s use of the punctuation marks in his creative writing impacted his use of punctuation marks in the short stories he translated after the production of his creative writing. The analysis revealed that J-D’s use of commas in the short stories he translated before the production of creative writing was different from the use of the same punctuation mark in his creative writing. However, J-D’s use of hyphens in both his translation before and after creative writing was significantly different.

It seems that J-D’s use of semicolons in the short stories translated before the production of creative writing impacted his use of this punctuation mark in his own short stories. However, J-D’s pattern in using commas and hyphens in the short

128

stories translated before the production of creative writing did not impact his use of using those punctuation marks in his own short stories. It can also be noticed that J-

D’s style in using semicolons and commas in his creative writing impacted his style in using them in the translation he produced after the production of creative writing.

J-D’s use of hyphens in his creative writing short story did not impact his use of hyphens in the short stories translated after creative writing. The empirical investigation of three punctuation marks in the English translations of the Arabic short stories that were translated by J-D showed inconsistent stylistic patterns, which might be related to the fact that the use of punctuation marks in Arabic is a recent phenomenon and is not governed by a set of defined rules40.

This argument is based on the fact that the inconsistency of punctuation use in J-D’s translations is revealed as a characteristic in a number of translated short stories written by different Arab authors. That is, the inconsistency in the use of punctuation marks in J-D’s translation is more likely the reason of the inconsistent idiosyncratic use of punctuation marks of the ST authors and since English has a defined set of punctuation rules, the inconsistency is more likely related to the source texts.

40 Ronak Husni and Daniel Newman point out that “in contemporary Arabic, punctuation is patterned on Western, especially English, principles, albeit with some permutations. The most striking observation with regard to punctuation in contemporary Standard Arabic is, however, its inconsistency, which at times borders on the chaotic as a result of idiosyncratic variations” (Husni and Newman 244).

129

The analysis has also showed that J-D’s use of punctuation marks in his own short stories and in his translations after the production of the creative writing was not significantly different. It seems that J-D’s creative writing activity has helped him define a way to use punctuation marks, which significantly impacted the way he used punctuation marks in the translation produced after creative writing.

6.6. Syntactic Analysis

The machine-learning analysis of J-D’s translation and creative writing using Part-of-Speech (POS) n-grams revealed that J-D syntactic style in the short stories translated before the production of creative writing is close to the his syntactic style in 100% of his creative writing short stories. This implies that J-D’s translating activity before the production of his creative writing impacted his syntactic style in his creative writing. To ensure that the analysis is not impacted by the Target

Language (TL) syntactic style, a fourth corpus was used as a control containing short stories written originally in English during the period 1960—2012. Even with a control corpus, the analysis result did not change. In addition, syntactic style is more likely related to J-D’s personal style and not to the Source Language (SL) syntactic style. If the syntactic style in the translated and the creative writing short stories were different, then it may mean that the syntactic style in the translational corpus is a reflection of the SL, Arabic, which has a completely different syntactic system from

English, the language of the creative writing. Thus, the stylistic patterns found in J-

D’s translation and creative writing are more likely his own personal style and are not

130

impacted either by the SL or the TL.

6.7. Word n-gram Analysis

Word n-gram analysis reveals semantic and lexical relation between texts relying on comparing clusters of words, either content41 or function words42. Word n- gram analysis was used in this study to retrieve the thematic similarity between the short stories in the three corpora in this study (translation before creative writing, creative writing and translation after creative writing). The machine-learning word n- gram analysis result showed that J-D’s short stories are more relevant to the translated short stories after he produced his own short stories than to those he translated before he began his creative writing. This finding also confirms the LSA experiment findings, which revealed that the short stories that J-D translated after the production of his creative writing are more thematically relevant to the creative writing short stories compared to those he translated before the production of his creative writing. This means that J-D’s creative writing activity impacted J-D’s choice of texts to translate after he began to write his own short stories.

41 Words that have meaning such as nouns and verbs Word with less lexical meaning such as propositions and articles

42 Word with less lexical meaning such as propositions and articles

131

6.8. Character n-gram Analysis

The machine-learning analysis of the character n-gram in J-D’s short stories and the short stories translated before and after and the production of creative writing revealed that his lexical items and punctuation preferences in his creative writing is reflected in the style found in the short stories he translated after he wrote his own short stories. The analysis of character n-grams showed that 87% of the creative writing short stories are similar to the ones he translated after the production of creative writing. Character n-gram analysis captures stylistic information such as punctuation marks. Based on the character n-gram analysis, J-D’s use of punctuation marks in the short stories translated after the production of creative writing is impacted by his use of punctuation marks in his creative writing. This finding is also confirmed by the corpus analysis of two punctuation marks (semicolons and commas), which were tested in the creative writing short stories and the short stories translated after the production of creative writing and found not to be significantly different from each other (p value >.05). The other type of information that can be revealed by using character n-gram analysis is lexical information whether related to function words or content words. The character n-gram analysis revealed that J-D’s use of function words and content words in his creative writing is similar to his use of the same type of words in the short stories he translated after the production of his creative writing. This indicates that J-D’s post-creative writing translations are impacted by his own development as a writer.

132

6.9. On the Translating and Writing Activities of J-D

The findings of this study can be interpreted with regards to three notions: translation as inspiration, a means to learn creative writing and creative writing as intertextuality. Several translation scholars and translators regard translation as an inspiration for the translator to write or to write creatively especially when writer’s block moments control the translator’s productivity or creativity. Susan Bassnett, a well-known translation scholar and a translator, describes her experience as a writer who gets inspired by practicing translation. She argues that “translating serves as a way of continuing to write and to shape language creatively, it can act as a regenerative force” (179). For her, translation is a source of inspiration that helps creative writers to continue to write. Bassnett gives a famous example of translation serving as an inspiration for creative writers that of Keats’s sonnet, ‘On first looking into Chapman’s Homer’. After reading Homer in translation, Keats was inspired by

Homer’s work to write his own sonnets (Bassnett 174). Forman, a writer and a translator, argues that translation practice serves the activity of creative writing as a source for thematic and stylistic inspiration (676). By the same token, J-D’s translating activity served as an inspiration for him in his own creative writing. This argument can be supported in two ways: first, J-D’s translation style in the short stories translated before the production of creative writing is also found in his creative writing short stories. This is evident and revealed by the POS and sentence length analyses. Second, the short stories he translated before the production of creative writing which discuss themes related to the Arabic culture, take place in the Arab

133

world and talk about Arab characters inspired him to use similar themes in his creative writing.

The second notion that could help interpret the results of this study is the idea that translation is a means to learn creative writing. The translating activity is a writing activity that ends with the production of a (target) text that has never existed as a text in the target language. The activity of writing a target text requires cognitive processes including choosing the right words to express a specific idea and adhering or choosing to depart from to the syntactic and stylistic conventions of the target language depending on the source text. Furthermore, the result of the translating activity goes through stages that are also the same as those involved in creative writing. In this regard, Manuela Perteghella and Eugenia Loftredo argue that:

the shaping of text, in both creative writing and translation, presumes a

critical awareness, a critical thinking which pervades this ‘moving

inside’ a text and this ‘immediate’ involvement with it. This view, of

course, demystifies the notion of original writing as a purely

spontaneous activity and translating as willed activity. (Perteghella and

Loftredo 5)

Translators can be seen as the writers of the translation and the translated texts can be seen as a form of writing. This act of writing, does give translators the chance to practice writing in the target language and to develop as writers. Thus, translation can serve as a way to develop a style in one’s won creative writing. Bassnett also supports this argument and states that “[t]ranslation… can be a means of learning the craft of

134

writing” (172); she also argues that

it has often been noted that periods of intense translation activity in a

culture are followed by a great flowering of local writing talent – this is

exactly what happened during the English Renaissance of the sixteenth

century after the vast amount of translation undertaken during the

difficult years of civil war in the fifteenth. (Bassnett 179)

This phenomenon of translation activity followed by an intense writing activity can also describe the case of J-D. He had started his translating activities of Arabic into

English in the 1960’s; forty years later he started writing his own short stories.

Moreover, J-D’s style, as manifested in the syntax and the themes of the translations he produced before the production of creative writing is also found in his creative writing, which also shows the impact of practicing writing TTs on the writing of his own short stories.

The third notion used to discuss the relation between J-D’s translating and writing activities is translation as a source for intertextuality in creative writing.

Intertextuality, for Boase-Beier and Holman, is “a process of integrating other writing, taking particular elements of another work and making explicit or implicit references to them, building these references into the context of the new work”

(Boase-Beier and Holman 4). If we apply the notion of intertextuality on the case of

J-D, we find that he translated a great number of works from Arabic into English discussing specific themes related to the Arabic and Islamic culture and tradition. As the manual evaluation of the LSA results has revealed, the settings and the characters

135

in J-D’s creative writing have an obvious connection to the settings and characters in the translations he produced before the production of his creative writing. Both groups of short stories discuss themes related to Arab characters, countries and culture.

As for thematic relevance, the LSA as well as the word n-gram analysis revealed that J-D’s short stories are thematically more relevant to the works he translated after the production of his own creative writing, compared to the pre- creative writing translations. The reasons why J-D’s creative writing themes have less significant relevancy to the themes he translated before the production of his creative writing can be explained as follows: J-D did not get the chance to translate short stories revolving around themes he liked during the pre-creative writing period. Thus, he started writing his own short stories revolving around themes he liked and felt that they would find a welcoming reception among English readers. Since the translating activity of J-D before his creative writing did not satisfy him as a cultural agent or an orientalist, he decided to assume the role of a cultural advocate writing short stories about themes related to Arab culture.

The second LSA experiment showed a significant thematic relevance between

J-D’s creative writing and the short stories translated after the producing of his creative writing. J-D wrote his own short stories about themes he liked which has impacted his choices as a translator in selecting the short stories he translated after the production of his creative writing.

J-D’s creative writing can also be seen as a translation of an imaginary source

136

text. This claim can be supported by two sets of findings. First, J-D’s style as a translator is found in his creative writing specifically in relation to the textual features such as sentence length and syntax. Moreover, the sentence length analysis revealed that there was no significant difference between average sentence length of the creative writing short stories (non-translations) and the translated short stories. This finding goes against the findings of two major studies on translation universals. The first study conducted by Baker reported that sentences in translated texts tend to be shorter. The second study, on the other hand, found out that sentences in the translated texts tend to be longer (Laviosa). Those findings do not apply to the case of

J-D and since his translating activity started before the creative writing activity, then it can be argued that former activity is more likely to impact the latter activities. For this reason J-D’s creative writing can be seen as the translation of an imaginary source text.

Secondly, the findings of the second LSA experiment, which revealed a close thematic relation between J-D’s creative writing and the stories he translated after the production of creative writing, could also support the argument that J-D’s creative writing can be seen the translation of an imaginary source text. J-D’s creative writing revolved around some themes that appealed to him as a writer. Then, he translated short stories (in the post-creative writing period) involving themes that he liked and which were similar to the themes of his creative writing short stories. In this regard,

Bassnett addresses the phenomenon of authorial and translational connection and argues that “frequently writers translate other people’s works because those are the

137

works they would have written themselves had they not already have been created by someone else” (175). Bassnett’s statement could be applied to the case of J-D who translated works that were similar to those that he had already discussed in his creative writing. In other word, J-D would have written the short stories that he translated after creative writing because they revolved around themes he found compelling. The clear thematic patterns that characterize the relation between J-D creative writing and the short stories he translated after the production of creative writing can also be attributed to the fact that J-D consciously chose the topics he wrote about and the short stories he translated which discuss his preferred topics.

The stylistic similarities as well as the thematic relevancy between J-D’s creative writing and translations indicate that J-D’s creative writing can be considered as more of a translation rather than an independent piece of creative writing. Lastly, the findings presented in this section, i.e., the stylistic impact of J-D’s translation before creative writing on his creative writing, the intertextuality found in J-D’s creative writing, and the thematic relation between his creative writing and translation after creative writing, might be interpreted as: J-D’s creative writing was a translation of an imaginary Arabic source text into English.

6.10. A Framework for Studying Translator Style

In this section, an interdisciplinary framework for conducting translator style analysis is proposed. The framework combines methods from three disciplines: computer sciences, corpus linguistics and stylometry. It lays down recommendation

138

for compiling and controlling a corpus and conducting a computational analysis of translator style. The following are the main elements in the proposed framework:

6.10.1. Corpus Compilation and Control

The first step in any translator style corpus study involves compiling a collection of texts produced by a specific translator and converted to a digital format.

The corpus should also be controlled for some important factors that could affect the stylistic analysis such as:

• Theme43: Structuring and collecting text that belong to one theme or revolve

around similar themes can either be done manually or using some automated

methods such as Latent Semantic Analysis (LSA). LSA would be an easier

and faster way to control a huge corpus for theme. However, the lack of user-

friendly applications that provide this kind of analysis may hinder future

studies.

• Genre44: When compiling a corpus for translator style analysis, the researcher

should also consider controlling the corpus for genre. In other words, all

translated texts in the corpus should belong to one genre.

• Time: The researcher should also control the corpus under study for the

production date of the ST. That is, the corpus should include texts produced in

the same period to make sure that the stylistic patterns that appear in the

43 Example of themes are love or death

44 Examples of genre are novels and short stories

139

stylistic analysis are not related to stylistic conventions of a specific period.

For instance, Classical Arabic Literature was oral and relied on repetition.

Thus, compiling works from Modern Arabic Literature and Classical Arabic

Literature translated by one translator may not reveal accurate results about

his/her style.

• Source Text Language: When compiling a corpus for translator style analysis,

the researcher should control the corpus for source text language. This insures

that the stylistic patterns appeared in the analysis are not related to the

convention of a specific ST language.

• Source Texts Authors: The researcher should also make sure that his/her

corpus contains translated works written by different authors to ensure that the

stylistic patterns appear in the stylistic analysis are not related to one ST

author.

6.10.2. Digital Corpus Preparation:

One of the best practices for preparing digital texts for corpus analysis is saving them in a .txt plain text format with Unicode encoding. A plain text is a type of texts that contains very little to no formatting. Unicode is a type of encoding that handles most of the world’s writing systems. Saving the corpus in Plain text format with Unicode encoding insures that texts are not impacted by any formatting and will be processed by most corpus tools across different operating systems and platforms.

140

6.10.3. Corpus Preprocessing

A very important step in corpus studies is pre-processing the corpus and clearing it from any elements that affect style analysis. This could include, but not limited to:

• Page Numbers: Page number adds noise to the corpus analysis specially when

conducing character n-gram analysis. That is, page numbers should be

removed from the body of the corpus.

• Running Heads: Running heads, if not cleaned, affect style marker analysis

such as word-n-gram, POS and STTR analysis

• Footnotes: Footnotes will be considered by corpus analysis tools, such as

WordSmith, as a part of the translated text. This does affect most style-marker

analysis such as STTR, word n-gram.

6.10.4. Style Markers Selection:

Selecting a set of style marks for the corpus analysis is a very important step. The selection of style markers depends on the scope of the study and the research questions. The following style markers, which were used in this study, were proved to be reliable for the analysis of text producers’ style, either translators or creative writers:

• Standardized Type-Token Ratio (STTR): STTR is used to reveal the lexical

richness for text producer.

141

• Sentence length: Sentence length measures the length of sentences in words.

This style marker has been used to reveal the text producer’s preference

related to sentence length.

• Word n-gram: word n-gram can reveal lexical patterns related to the text

producer personal style.

• Character n-gram: character n-grams reveal lexical, word-class or

punctuation usage information (Houvardas and Stamatatos 78), those are

closely related to the style of the text producer.

• Part-of-Speech (POS) n-gram: POS n-grams reveal patterns related to the

syntactic style of the text producer. POS n-gram analysis requires tagging the

corpus manually or using an automatic tagger such as Stanford POS tagger.

142

6.10.5. Corpus Analysis Method

This framework proposes two different methods to conduct the stylistic analysis.

The first method is adopted from corpus stylistics, which relies on qualitative analysis of style markers frequencies. The second method applies a machine learning method adopted from Stylometry.

• Corpus Stylistics: corpus stylistics can be a reliable method for style analysis

when it comes to some style markers that do not require a complex textual

analysis such as sentence length, STTR. However, corpus stylistics turns to be

a limited analysis method when analyzing complex style markers such as

character n-grams. One of the best practices in conducting a corpus analysis

of style is applying statistical methods to test the significance on corpus

output. Applying statistical methods would insure that the conclusion reached

by the study is relevant and accurate. Selecting the appropriate statistical

method depends on the variables of interest and groups to be compared.

• Machine Learning: Machine Learning stylometry is a well-developed field of

study that is concerned with the stylistic analysis of authors and texts. There

are different analysis methods/ algorithms within machine learning stylometry

that can be used to do the stylistic analysis. Based on these algorithms, the

style in corpora is analyzed and results are provided. One of the powerful

methods that proved its viability, Nearest Neighbor Driver with Cosine

Distance, is used in this study.

143

The proposed framework is summarized Figure 19 below:

FIGURE 19: FRAMEWORK FOR TRANSLATOR STYLE ANALYSIS

Using Theme/ topic LSA

ST language Control for

ST production date Corpus Control for Control

ST Genre

Several ST authors

Page Numbers

Removing Corpus Running Pre-Processing Heads

Footnotes

Character n-gram Machine Analysis Nearest Neighbor Driver with metric Learning Cosine Distance

STTR

Include Style Markers Sentence Selection Length

POS Statistical Corpus Testing Significance Stylistics testing Word n- gram

144

Based on the scope of the study or the research questions behind it, researchers can decide on the style markers that they want to include in their study.

Combining the two methods of analysis, machine learning and corpus stylistics, remains optional. However, the analysis of some style markers such as, character n- gram and word n-gram, cannot be accurately conducted by merely tracing frequencies of their occurrences in a specific corpus using a corpus stylistics approach. Those two style markers require a more sophisticated method of analysis that uses the power of the machine fed by complex mathematical algorithms to process the stylistic analysis. This is where the significance of this translator style analysis framework comes into play. This framework does not only propose machine- learning stylometric approaches to analyze style in translation studies, but it also calls for the triangulation of methods in conducting stylistic analysis by applying two different analysis methods on corpus data in order to reach more solid conclusions.

This would help fill a methodological gap in and enhance the quality of translator style research.

6.11. Conclusion

This chapter has provided a discussion of the main findings of this study on the interaction and the relation between J-D’s creative writing and translating activities. The discussion has showed that J-D’s translation served him as a means to practice writing in the target language and as a source for intertextuality and inspiration for his creative writing. The discussion has also revealed that J-D’s

145

creative writing impacted his choices of the short stories translated after his creative writing. The last section of this chapter has proposed an interdisciplinary framework for translator style analysis. The framework outlines some recommendations for corpus compilation and control as well as for the computational analysis of translator style. This latter relies on two different approaches: corpus stylistics and machine learning Stylometry. The framework also suggests a set of style markers that proved to be viable for translator style analysis.

The following chapter provides a summary of the dissertation’s findings, discusses the limitations of the present study and proposes directions for future research.

146

CHAPTER 7: CONCLUSION

7.1. Summary of Results

The motivations that initiated this study were related firstly, to the lack of a well-defined methodological framework to study translators’ style and, secondly, to the very limited number of studies that empirically address the relation between creative writings and translations written by the same translator. As mentioned earlier in the literature review chapter, previous studies investigating translator style showed methodological flaws related to corpus compilation, control and analysis. Even major studies, (e.g. Baker "Towards a Methodology"), that established this area did not apply statistical significance tests on the corpus quantitative analysis results and relied on raw statistics related to style marker frequencies, such as Standardized Type

Token Ratio (STTR), sentence length and punctuation marks. Another motivation behind this study was the lack of empirical studies investigating the relation between the creative writing activity and the translating activity of the same person and the potential interaction between both activities. Most of the studies dealing with the notions of creative writing and translation either have methodological issues (e.g.

Walder "Investigating Style in Translation and Original Writing") or are not based on empirical evidence to support their arguments and conclusions (e.g. Bassnett’s "The

Writer of Translations"). The present study tried to fill this gap by using analysis triangulation as an approach to empirically investigate the impact of creative writing on translation and vice versa in the case of Denys Johnson-Davies (J-D). To do this, I proposed four hypotheses and conducted empirical investigation of J-D’s style in his

147

translation and creative writing to support or refute the proposed hypotheses. The investigation of J-D’s style in his creative writing and translations relied on analyzing style markers through three computational methods, namely: Latent Semantic

Analysis (LSA), corpus linguistics and machine learning stylometry.

The first hypothesis, according to which J-D’s translations before creative writing have a thematic relevancy to his creative writing short stories, was refuted. As discussed in chapter four, the LSA experiment revealed that there was no significant thematic relevancy between the short stories J-D translated before the production of his creative writing and his own creative writing short stories. However, the second hypothesis, which claims that J-D’s creative writing short stories is thematically relevant to the works he translated after the production of his creative writing, was confirmed by the second LSA experiment reported in chapter four. The importance of the LSA analysis in this study is twofold. First, it revealed the thematic relation between J-D’s creative writing and translations. Second, it allowed for the creation of two translational sub-corpora controlled for theme, which was important to make sure that theme does not have any impact on style.

Triangulating the thematic analysis of the three corpora using word n-gram has also revealed a thematic relevance between J-D’s creative writing and the short stories translated after the production of his short stories. The thematic analysis findings showed that J-D’s creative writing activity impacted his choice of the short stories he translated after the production of his creative writing. The manual evaluation of the LSA results showed that he wrote his short stories on themes related

148

to Arabic culture and traditions and discussing the life of Arab characters. Those elements were also the main elements that built the story line of the short stories that

J-D translated before the production of his creative writing. That is, this study argues that J-D’s translating activity served as a source of inspiration and intertextuality for his creative writing short stories at the thematic level.

The thematic analysis of J-D’s translations and creative writings was followed by analyses of style markers including STTR, mean sentence length and punctuation marks in order to test the third and the fourth research hypotheses, which were confirmed. As it was reported in chapter five, the STTR analysis revealed a significant difference between J-D’s creative writing short stories and the stories he translated after and before creative writing. However, the sentence length analysis has shown that there was no significant difference between the average sentence length in J-D’s creative writing short stories and his translation before and after the production of creative writing. The analysis of the three punctuation marks (comma, semicolon and hyphen) revealed inconstancy.

The analysis of commas and semicolons showed that J-D’s use of those two punctuation marks in his creative writing was not impacted by his style the translations he produced before the production of creative writing. However, his idiosyncratic use of commas and semicolons in his creative writing was found in the translations he produced after the production of his creative writing. The analysis of the third punctuation mark, the hyphen, showed that J-D’s use of hyphens in three corpora differed from each other.

149

Chapter five applied one-way independent samples Analysis of Variance

(ANOVA) with Post hoc comparisons using the Tukey HSD on the quantitative output of the corpus analysis of the style markers. Applying significance analysis in this chapter revealed the importance of this kind of tests for corpus studies that compare statistical information between groups of texts. This chapter also indicated that the raw statistical information provided by corpus tools such as WordSmith, might not be enough to draw conclusions. This was evident in the analysis of mean sentence length in the present study. The corpus analysis showed that the average sentence length in

J-D’s translations before the production of his creative writing was 21.37 words per sentence, the average sentence length in the creative writing sentences stories was

20.68 while it was lower in the translation J-D produced after the production of his creative writing, which scored 18.85 words per sentence. By looking at the score of average sentence length in the three corpora, one could say that the average sentence length in J-D’s creative writing is close to sentence length in the translations he produced before the production of his creative writing while the average sentence length in the short stories produced after the production of his creative writing is distant from both. However, the statistical significance analysis using ANOVA revealed that there was no significant difference between the score of the mean sentence length in the three corpora. Most corpus studies tackling the notion of translator style looked at the raw statistics of quantitative statistics without applying any statistical significance tests on the data. As noticed from the mean sentence length example in this study, studying translator style relying on frequencies or scores

150

of style markers only without testing the significance of those frequencies might point out insignificant differences and thus unable to bring us closer to an understanding of translator style.

A further investigation of J-D style in his creative writing and translation was conducted by adopting machine learning stylometric methods. The goals of using machine-learning stylometry are twofold: 1- it allows for the testing of the research hypotheses and 2- triangulating the analysis method. Three style markers were analyzed using machine-learning stylometry including Part-of-Speech (POS), word and character n-grams. The POS analysis has confirmed the third hypothesis that J-

D’s translating activity has a stylistic impact on his creative writing. Character n- gram analysis confirmed the second hypothesis that J-D’s creative writing has a stylistic impact on his translations after the production of the creative writing. As mentioned earlier, character n-grams are able to reveal stylistic information related to punctuation marks, which revealed that J-D’s use of punctuation marks in his creative writing is closer to his use in the translations produced after the production of creative writing compared to the pre-creative writing translations. These findings also support the findings of corpus analysis of the three punctuation marks in J-D’s translations and creative writings.

The findings of this study revealed that there is a close thematic, lexical and stylistic connection between J-D’s creative writing and translation. As for the thematic connection between J-D’s creative writing and translations, it was discovered that his translations before the production of his creative writing has less

151

thematic connections to his creative writing compared to the translations he produced after the production of the creative writing, which showed a significant thematic relevancy to his creative writing. This thematic connection relates to the fact that J-D was interested in certain themes that he incorporated in his creative writing and then chose to translate those same themes in the translations he conducted after his creative writing experience. The stylistic analysis revealed that some of J-D’s stylistic markers in the translations he produced before the production of his creative writing impacted his creative writing, while other style markers in his creative writing impacted his translation after the production of his creative writing. In other instances, it was noticed that there was no difference between J-D’s translations and creative writing when analyzing some other style markers. The interaction between J-

D styles in the translations produced before creative writing and his creative writing revealed that J-D’s translating activity served as a source of inspiration and as a means to practice creative writing. This interaction between the short story elements, such as settings and characters, found in J-D’s creative writing and translation before the production of creative writing showed that J-D’s translating activity has also served as a source for intertextuality in his creative writing.

Similarly, the interaction between the stylistic characteristics of J-D’s creative writing and his translations before creative writing has indicated that his creative writing helped him define and build a style of his own in using punctuation marks that are reflected in his translations after creative writing and in his creative writing.

The thematic connection between J-D’s creative writing activity and the translations

152

he produced after the production of his creative writing revealed that the creative writing activity impacted his choices of the short stories he translated after the production of his creative writing, which are found to be revolving around themes he discussed in his creative writing.

Lastly, this study proposed an interdisciplinary framework combining methods from corpus stylistics, computer science, stylometry and authorship attribution to study style in texts either translated or non-translated. The framework also provides best practices for compiling, controlling and analyzing a corpus for translator style analysis. It also proposes a set of tested style markers that can be used to analyze thematic, syntactic or stylistic features in texts, translations or non- translations.

7.2. Limitations of the Study

One of the limitations of this study is related to its reliance on the quantitative analysis of style and on a limited number of stylistic features and that it does not include qualitative analysis for J-D’s authorial and translational style. Conducting a computational quantitative analysis with a critical qualitative analysis would help reach more solid conclusions on the relationship between creative writing and translation as two activities carried out by one person. Nevertheless, such a qualitative analysis on such a large corpus would have been difficult if not impossible to achieve.

The present study has also some limitations related to the language pair, text type and corpus size. This study was limited to a selection of translated short stories from

Arabic into English and on English creative writing short stories both produced by J-

153

D. That is, the findings of this study apply to the case of J-D in the short stories he translated from Arabic into English and in his creative writing in English. Therefore, further investigations of the interaction between creative writing and translation produced by other translator-authors are needed. Corpus size is another limitation for this study. Due to corpus control purposes, this study only analyzes a limited number of short stories written and translation by J-D.

7.3. Implication of LSA method for Translation Studies

The use of Latent Semantic Analysis (LSA) in this dissertation can also be applied to different areas in translation studies. For instance, some corpus-based studies can make use of LSA as an automated method for sampling and controlling corpora for theme. This would save the researcher’s time and effort and would reduce the researcher’s bias in text selection. LSA can also be used for corpus management and compilation. In other words, LSA can be used to automatically structure a huge corpus of millions of texts based on theme or topic. LSA allows for the possibility to collect and extract texts belonging to one domain and build huge domain specific corpora that can be used for several purposes such as teaching specialized language, teaching specialized language translation and terminology extraction. It can help improve machine translation output as well by compiling a huge corpus of text in a specific subject field to be used as training data for machine translation engines.

7.4. Future Directions

154

The present study has provided a framework for translator style analysis, which needs to be verified and tested with cases of translators working with different language pairs. The proposed framework to study translator style can also be applied to study translated and non-translated text. That is, it might be used to validate previous corpus studies analyzing the notion of translated and non-translated texts especially the studies that discussed translation universals. The proposed framework would add more value to this kind of studies since it applies statistical significance tests on the corpus analysis output. In addition, this study has revealed that there is a stylistic influence of translation on creative writings, in the case of J-D. Therefore, studying the relation between translated text, non-translated texts written by non- translators and non-translated texts written by translators may reveal some of the stylistic specific features that embedded in the three types of texts. In addition, the present study made use of Latent Semantic Analysis as an advanced Information

Retrieval (IR) method to control the corpora for topic. Another future direction might be comparing the performance and the accuracy of LSA versus another IR method that does corpus topic/ theme control such as Latent Dirichlet Allocation (LDA).

LDA is an advanced natural language processing technique that is used for topic modeling. This technique is similar to the LSA method; however LDA uses a different algorithm to model the data.

155

GLOSSARY OF ACRONYMS

AI Artificial Intelligence

ANOVA Analysis of Variance

ASCII American Standard Code for Information Interchange

CRW Creative Writing

F F Value

ID Identification Number

IR Information Retrieval

J-D Johnson-Davies

JGAAP Java Graphical Authorship Attribution Program

LDA Latent Dirichlet Allocation

LSA Latent Sematic Analysis

M MEAN

NLP Natural Language processing

P P Value

POS Part-of-Speech

Q Query

SD Standard Deviation

STTR Standardized Type-Token Ratio

SVD Singular Value Decomposition

156

TACRW Translation After Creative Writing

TBCRW Translation Before Creative Writing

TT Source Text

TTR Type-Token Ratio

V Vector

WS WordSmith

157

REFERENCES

Abdullah, Adnan. “The Translation of Style.” Language, Discourse, and Translation in the West and Middle East. Ed. De Beaugrande, Abdulla Shunnaq, and Mohamed Heliel. Amsterdam: Benjamins, 1992. 65–73. Print.

Ahat, Murat, SB Amor, and Marc Bui. “Document Classification with LSA and Pretopology.” InterLsa 8.1 (2010): 125–144. Web. 8 Feb. 2014.

Allen, Valerie. “Without Style.” On Style: An Atelier. Ed. Eileen Joy and Anna Klosowska. New York: Punctum Books, 2013. 1–15. Print.

Antai, Roseline, Chris Fox, and Udo Kruschwitz. “The Use of Latent Semantic Indexing to Cluster Documents into Their Subject Areas.” Proceedings of the Fifth Language Technology Conference. Springer, 2011. 161–166. Print.

Argamon-Engelson, Shlomo, Moshe Koppel, and Galit Avneri. “Style-Based Text Categorization: What Newspaper Am I Reading?” AAAI/ML Workshop on Text Categorization. Wisconsin: AAAI-98, 1998. 1–4. Print.

Augustyn, Adam. “Stylistics.” The New Encyclopedia Britannica 2013. Web.

Baker, Kirk. “Singular Value Decomposition Tutorial Contents.”, 2013. Web.

Baker, Mona. “Corpora in Translation Studies: An Overview and Some Suggestions for Future Research.” Target 2.7 (1995): 223–243. Print.

---. “Corpus Linguistics and Translation Studies — Implications and Applications.” Text and Technology: In Honour of John Sinclair. Ed. Mona Baker, Gill Francis, and Elena Tognini-Bonelli. Amsterdam: John Benjamins, 1993. 233–250. Print.

---. “Towards a Methodology for Investigating the Style of a Literary Translator.” Target 266 (2000): 241–266. Web. 15 Feb. 2014.

Bassnett, Susan. “Writing and Translating.” The Translator as Writer. Ed. Susan Bassnett and Peter Bush. London: Continuum, 2006. 173–183. Print.

Bawarshi, Anis, and Mary Reiff. An Introduction to History, Theory, Research, and Pedagogy. Indiana: Parlor Press, 2010. Print.

158

Bloch, Bernard. “Linguistic Structure and Linguistic Analysis.” Report of 4th Annual Round Table Meeting on Linguistics and Langauge Teaching. Ed. A Hill. Washington, D.C.: Georgetown University Press, 1953. 40–44. Print.

Blum-Kulka, Shoshana. “Shifts of Cohesion and Coherence in Translation.” Interlingual and Intercultural Communication. Ed. Juliane House and Shoshana Blum-Kulka. Tübingen: Gunter Narr, 1986. 17–35. Print.

Boase-Beier, Jean. “Knowing and Not Knowing: Style, Intention and the Translation of a Holocaust Poem.” Language and Literature 13.1 (2004): 25–35. Web. 17 Sept. 2014.

---. “Knowing and Not Knowing: Style, Intention and the Translation of a Holocaust Poem.” Language and Literature 2004: 25–35. Web.

---. Stylistic Approaches to Translation. Manchester: St. Jerome, 2006. Print.

Boase-Beier, Jean, and Michael Holman. The Practices of Literary Translation: Constraints and Creativity. Manchester: St. Jerome, 1999. Print.

Bush, Peter. “The Writer of Translations.” The Translator as Writer. Ed. Susan Bassnett and Peter Bush. 1st ed. London and New York: Continuum, 2006. 32– 32. Print.

Carter, Ronald, and Paul Simpson. Language, Discourse and Literature: An Introductory Reader in Discourse Analysis. London: Unwin Hyman, 1989. Print.

Catford, John. A Linguistic Theory of Translation. London: Oxford Univ Press, 1969. Print.

Chaski, Carole E. “Empirical Evaluations of Language-Based Author Identification Techniques.” Forensic Linguistics 2001: 1–65. Web.

Clayton, Jay, and Eric Rothstein. “Figures in the Corpus: Theories of Influence and Intertextuality.” Influence and Intertextuality in Literary History. Ed. Jay Clayton and Eric Rothstein. Wisconsin: University of Wisconsin Press, 1991. 3– 37. Print.

159

Coyotl-Morales, R et al. “Authorship Attribution Using Word Sequences.” Progress in Pattern Recognition Image Analysis and Applications 4225 (2006): 844–853. Web.

Cristani, Marco et al. “Conversationally-Inspired Stylometric Features for Authorship Attribution in Instant Messaging.” Proceedings of the 20th ACM International Conference on Multimedia. ACM, 2012. 1121–1124. Web. MM ’12.

Crystal, David, and Derek Davy. Investigating English Style. London: Longman, 1969. Print.

Culler, Jonathan. “Introduction: Critical Paradigms.” PMLA 125.4 (2010): 905–915. Print.

De Camargo, Diva Cardoso. “An Investigation of a Literary Translator’s Style in a Novel Written by Jorge Amado.” Intercâmbio. Revista do Programa de Estudos … 13 (2004): 1–7. Web. 11 Sept. 2014.

Deerwester, S et al. “Indexing by Latent Semantic Analysis.” Journal of the American Society for Information Science 41 (1990): 391–407. Web.

Enkvist, Nils. Linguistic Stylistics. Berlin: Mouton, 1973. Print.

Forman, Janis. “Review: Rethinking Reading and Writing from the Perspective of Translation.” College English 52.2 (1990): 676–682. Print.

Ghazala, Hasan Said. “Translating the Metaphor : A Cognitive Stylistic Conceptualization ( English – Arabic ).” World Journal of English Language 2.4 (2012): 57–68. Web.

Graesser, Arthur C. et al. “Using Latent Semantic Analysis to Evaluate the Contributions of Students in AutoTutor.” Interactive Learning Environments 8 (2000): 129–147. Web.

Grieve, Jack. “Quantitative Authorship Attribution: An Evaluation of Techniques.” Literary and Linguistic Computing 22.3 (2007): 251–270. Web.

Griffiths, Thomas L, and Mark Steyvers. “Finding Scientific Topics.” Proceedings of the National Academy of Sciences of the United States of America 101 Suppl (2004): 5228–5235. Print.

160

Griffiths, Thomas, and Mark Steyvers. “Prediction and Semantic Association.” Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2003. Print.

Halliday, Michael. Language as Social Semiotic: The Social Interpretation of Language and Meaning. Baltimore: University Park Press, 1978. Print.

Hänlein, Heike. Studies in Authorship Recognition: A Corpus-Based Approach. Frankfurt: Peter Lang, 1999. Print.

Hermans, Theo. “The Translator’s Voice in Translated Narrative.” Target 1.8 (1996): 23–48. Print.

Hofmann, Thomas. “Probabilistic Latent Semantic Analysis.” Processing 97 (1999): 21. Web.

Houvardas, John, and Efstathios Stamatatos. “N-Gram Feature Selection for Authorship Identification.” Ed. Jérôme Euzenat and John Domingue. Artificial Intelligence Methodology Systems and Applications 4183 (2006): 77–86. Web. Lecture Notes in Computer Science.

Husni, Ronak, and Daniel Newman. A to Z of Arabic-English-Arabic Translation. London: Saqi Books, 2013. Print.

Hussein, Ashatu. “The Use of Triangulation in Social Sciences Research : Can Qualitative and Quantitative Methods Be Combined?” Journal of Comparative Social Work 1 (2009): 1–12. Print.

Jakobson, Roman. “Closing Statement: Linguistics and Poetics.” Style in Language. Ed. Sebeok Thomas. Vol. 350. New York & London: John Wiley & Sons, Inc., 1960. 350–377. Print.

Jancovich, Mark. “The Southern New Critics.” The Cambridge History of Literary Criticism 7 : Modernism and the New Criticism. Ed. A. Walton Litz, Louis Menand, and Lawrence Rainey. Cambridge: Cambridge University Press, 2000. 200–219. Print.

Ji, Meng. Phraseology in Corpus-Based Translation Studies. Bern: Peter Lang International Academic Publishers, 2010. Print.

161

Jockers, Matthew. Macroanalysis: Digital Methods and Literary History. Illinois: University of Illinois Press, 2013. Print.

Johnson-Davies, Denys. Memories In Translation. Cairo and New York: The American University in Cairo Press, 2006. Print.

Juola, Patrick. “Authorship Attribution.” Foundations and Trends in Information Retrieval 1.3 (2006): 233–334. Web.

Kenny, Dorothy. Lexis and Creativity in Translation: Corpus-Based Study. Manchester: St. Jerome, 2001. Print.

Kim, Luyckx, and Daelemans Walter. “Authorship Attribution and Verification with Many Authors and Limited Data.” 22nd Inter- National Conference on Computational Linguistics. Manchester: Association for Computational Linguistics, 2008. 513–520. Print.

Kubát, Miroslav, and Jiří Milička. “Vocabulary Richness Measure in Genres.” Journal of Quantitative Linguistics 20 (2013): 339–349. Web.

Landauer, Thomas K, Peter W. Foltz, and Darrell Laham. “An Introduction to Latent Semantic Analysis.” Discourse Processes 1998: 259–284. Web.

Landauer, Thomas K., and Susan T. Dumais. “A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.” Psychological Review 1997: 211–240. Web.

Laviosa, Sara. “Core Patterns of Lexical Use in a Comparable Corpus of English Narrative Prose.” Meta 43.4 (1998): 557–570. Print.

Leech, Geoffrey. Language in Literature: Style and Foregrounding. London & New York: longman, 2008. Print.

Li, Jiexun, Rong Zheng, and Hsinchun Chen. “From Fingerprint to Writeprint.” Communications of the ACM 49.4 (2006): 76–82. Web. 5 Feb. 2014.

Malmkjær, Kirsten. “Translational Stylistics: Dulcken’s Translations of Hans Christian Andersen.” Language and Literature 13.1 (2004): 13–24. Web.

162

Marcus, Mitchell P, Beatrice Santorini, and Mary Ann Marcinkiewicz. “Building a Large Annotated Corpus of English: The Penn Treebank.” Computational Linguistics 19.2 (1993): 313–330. Web.

Mason, Ian. “Discourse, Idology and Translation.” Language, Discourse, and Translation in the West and Middle East. Ed. De Beaugrande, Abdulla Shunnaq, and Mohamed Heliel. Amsterdam: Benjamins, 1992. 23–35. Print.

McEnery, Tony, and Michael Oakes. “Authorship Identification and Computational Stylometry.” Handbook of Natural Language Processing. Ed. Robert Dale, Harold Somers, and Hermann Moisl. NewYork: Dekker, 2000. Print.

Mikros, George K, and Eleni K Argiri. “Investigating Topic Influence in Authorship Attribution.” Ed. Benno Stein, Moshe Koppel, and Efstathios Stamatatos. Politics 276 (2007): 29–35. Web.

Mohamed, A. H., and M. R. Omer. “Texture and Culture: Cohesion as a Marker of Rhetorical Organisation in Arabic and English Narrative Texts.” RELC Journal 31 (2000): 45–75. Web.

Mukarovsky, Jan. “Standard Language and Poetic Language.” A Prague School Reader on Esthetics, Literary Structure, and Style. Ed. Paul Garvin. Washington, D.C.: Georgetown University Press, 1964. 17–30. Print.

Munday, Jeremy. Style and Ideology in Translation: Latin American Writing in English. London & New York: Routledge Studies in Linguistics, 2007. Print.

Nakov, Preslav. “Latent Semantic Analysis for German Literature Investigation.” International Conference: 7th Fuzzy Days on Computational Intelligence Theory and Applications. London: Springer-Verlag, 2001. 834–841. Print.

Narayanan, Arvind et al. “On the Feasibility of Internet-Scale Author Identification.” Proceedings - IEEE Symposium on Security and Privacy. N.p., 2012. 300–314. Web.

Nida, Eugene. Toward a Science of Translating: With Special Reference to Principles and Procedures Involved in Translating. Leiden: E. J. Brill, 1964. Print.

Nida, Eugene, and Chales Taber. The Theory and Practice of Translation. Leiden: E. J. Brill, 1969. Print.

163

Nord, Christiane. “A Functional Typology of Translation.” Text Typology and Translation Language. Ed. Anna Trosborg. Philadelphia: John Benjamins Publishing, 1997. Print.

Oakes, Michael, and Meng Ji. Quantitative Methods in Corpus-Based Translation Studies. Amsterdam: John Benjamins Publishing, 2012. Print.

Olohan, Maeve. Introducing Corpora in Translation Studies. London and New York: Routledge, 2004. Print.

Pantopoulos, Iraklis. “Two Different Faces of Cavafy in English : A Corpus-Assisted Approach to Translational Stylistics.” Inernational Journal of English Studies 12.2 (2012): 93–110. Print.

Patton, Jon M., and Fazli Can. “A Stylometric Analysis of Yasar Kemal’s Ince Memed Tetralogy.” Computers and the Humanities 38 (2004): 457–467. Web.

Penumatsa, Phanni et al. “The Right Threshold Value: What Is the Right Threshold of Cosine Measure When Using Latent Semantic Analysis for Evaluating Student Answers?” International Journal on Artificial Intelligence Tools 2006: 767–777. Web.

Perteghella, Manuela, and Eugenia Loftredo. “Introduction.” Translation and Creativity: Perspectives on Creative Writing and Translation Studies. Ed. Manuela Perteghella and Eugenia Loftredo. 1st ed. London and New York: Continuum, 2006. 1–18. Print.

Raghavan, Sindhu, Adriana Kovashka, and Raymond Mooney. “Authorship Attribution Using Probabilistic Context-Free Grammars.” Computational Linguistics 6.July (2010): 38–42. Web.

Ramyaa, CH, Khaled Rasheed, and Congzhou He. “Using Machine Learning Techniques for Stylometry.” Conference on Machine Learning (2012): n. pag. Web. 26 Mar. 2014.

Reiss, Katharina. Translation Criticism- Potentials and Limitations: Categories and Criteria for Translation Quality Assessment. London: Routledge, 1976. Print.

---. Translation Criticism: The Potentials and Limitations. Manchester: St. Jerome Publishing, 2000. Print.

164

Richards, Ivor Armstrong. Practical Criticism. LONDON: Kegan Paul Trench Trubner And Company Limited., 1930. Print.

Riffaterre, Michael. “Criteria for Style Analysis.” Essays on the Language of Literature. Ed. Seymour Chatman and Samuel Levin. Houghton Mifflin, 1959. Print.

Saldanha, Gabriela. “Translator Style: Methodological Considerations.” The Translator 1.17 (2011): 25–50. Print.

Schäffner, Christina. “Political Discourse Analysis from the Point of View of Translation Studies.” Journal of Language and Politics 2004: 117–150. Web.

Schulstad, Ida et al. “Evaluation of a Stylometry System on Various Length Portions of Books.” Proceedings of Student-Faculty Research Day, CSIS, Pace University. New York: Pace University, 2012. 51–58. Print.

Scott, Mike. “WordSmith Tools.” 2010: 379. Web.

Sharndama, Emmanuel, and Ibrahim Mohammed. “Stylistic Analysis of Selected Political Campaign Posters and Slogans in Yola Metropolis of Adamawa State of Nigeria.” Asian Journal of Humanities and Social Sciences 1.3 (2013): 60–68. Print.

Shklovsky, Victor. “Art as Technique.” Modern Criticism and Theory: A Reader (1988) (1917): 16–30. Print.

Stamatatos, E, N Fakotakis, and G Kokkinakis. “Automatic Authorship Attribution.” Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 1999. 158–164. Web.

Stamatatos, Efstathios. “A Survey of Modern Authorship Attribution Methods.” Journal of the American Society for Information Science and Technology 60.3 (2009): 538–556. Web.

Toutanova, Kristina et al. “Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network.” Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human

165

Language Technology - NAACL ’03. Vol. 1. Morristown, NJ, USA: Association for Computational Linguistics, 2003. 173–180. Web. 18 Feb. 2014.

Trivedi, H, and Susan Bassnett. Post-Colonial Translation. Theory and Practice. Ed. Susan; Trivedi Harish Bassnett. London: Routledge, 1999. Web.

Venuti, Lawrence. The Translator’s Invisibility: A History of Translation. Vol. 28. London and New York: Routledge, 1995. Web.

Vermeer, Hans. “Skopos and Commission in Translational Action.” The Translation Studies Reader. Ed. Lawrence Venuti. London and New York: Routledge, 2000. 221–233. Print.

Vinay, Jean-Paul, and Jean Darbelnet. Comparative Stylistics of French and English : A Methodology for Translation. Trans. Juan Sager and M Hamel. 1st ed. Amsterdam: John Benjamins Publishing, 1995. Print.

Walder, Claudia. “A Timbre of Its Own : Investigating Style in Translation and Original Writing.” New Voices in Translation Studies 9 (2013): 53–68. Print.

Wang, Qing, and Defeng Li. “Looking for Translator’s Fingerprints: A Corpus-Based Study on Chinese Translations of Ulysses.” Literary and Linguistic Computing 27 .1 (2012): 81–93. Web.

Wang, Xuerui, Andrew McCallum, and Xing Wei. “Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval.” Proceedings - IEEE International Conference on Data Mining, ICDM. N.p., 2007. 697–702. Web.

Winters, Marion. “Modal Particles Explained: How Modal Particles Creep into Translations and Reveal Rranslators ’ Styles.” Target 21.1 (2009): 74–97. Print.

Yule, G. Udny. “On Sentence Length as a Statistical Characteristic of Style in Pros with Application to Two Cases of Disputed Authorship.” Biometrika 30.3-4 (1938): 363 – 390. Web.

Zhao, Ying, and Justin Zobel. “Searching with Style : Authorship Attribution in Classic Literature.” Technology 62 (2007): 59–68. Web.

166

Zheng, Rong et al. “A Framework for Authorship Identification of Online Messages: Writing-Style Features and Classification Techniques.” Journal of the American Society for Information Science and Technology 57.3 (2006): 378–393. Web.

Zhou, Zhi-min, Yu Xu, and Chew Lim Tan. “Predicting Discourse Connectives for Implicit Discourse Relation Recognition.” Coling 2010: Companion Volume: Posters and Demonstrations. N.p., 2010. 1507–1514. Print.

Ziyani, IS, U King, and VJ Ehlers. “Using Triangulation of Research Methods to Investigate Family Planning Practices in Swaziland.” Africa Journal of Nursing and Midwifery 6.1 (2004): 12–17. Print.

167

APPENDIX A: List of Denys Johnson-Davies’ Translated Short Stories

Year Collection Tittle Short Story Publisher ST Author 1967 Modern Arabic Short Stories ZAABALAWI Three Continants NAGIB MAHFOUZ Press 1967 Modern Arabic Short Stories THE SOUTH WIND Three Continants ABDEL MALIK Press NOURI 1967 Modern Arabic Short Stories THE PICTURE Three Continants LATIFA AL- Press ZAYYAT 1967 Modern Arabic Short Stories THE MAN AND THE FARM Three Continants YUSUF SHAROUNI Press 1967 Modern Arabic Short Stories THE LOST SUITCASE Three Continants ABDEL-MONEIM Press SELIM 1967 Modern Arabic Short Stories THE GRAMOPHONE Three Continants JABRA IBRAHIM Press JABRA 1967 Modern Arabic Short Stories THE ELECTION BUS Three Continants TOUMA AL- Press KHOURI 1967 Modern Arabic Short Stories THE DYING LAMP Three Continants FOUAD TEKERLI Press 1967 Modern Arabic Short Stories THE DREAM_1 Three Continants SALAM AL-UJAILI Press 1967 Modern Arabic Short Stories THE DOUM TREE OF WAD HAMID Three Continants TAYEB SALIH Press 1967 Modern Arabic Short Stories THE DEATH OF BED NUMBER 12 Three Continants GHASSAN Press KANAFANI 1967 Modern Arabic Short Stories THE DEAD AFTERNOON Three Continants Walid Ikhlassi Press 1967 Modern Arabic Short Stories SUNDOWN Three Continants SHUKRI AYYAD Press 1967 Modern Arabic Short Stories SUMMER JOURNEY Three Continants MAHMOUD Press TEYMOUR 1967 Modern Arabic Short Stories SUMMER Three Continants Zakaria Tamer Press

168

1967 Modern Arabic Short Stories MOTHER OF THE DESTITUTE Three Continants YAHYA HAKKI Press 1967 Modern Arabic Short Stories MIRACLES FOR SALE Three Continants TEIUFIK AL- Press HAKIM 1967 Modern Arabic Short Stories FARAHAT’S REPUBLIC Three Continants YUSUF IDRIS Press 1967 Modern Arabic Short Stories A SPACE SHIP OF TENDERNESS TO Three Continants LAILA THE MOON Press BAALABAKI 1967 Modern Arabic Short Stories A HOUSE FOR MY CHILDREN Three Continants MAHMOUD DIAB Press 1969 Heinemann African Writers THE WEDDING OF ZEIN Heinemann Series 1978 Egyptian Short Stories YUSUF MURAD MORCOS Three Continents NABIL GORGY Press 1978 Egyptian Short Stories WITHIN THE WALLS Three Continents EDWARD EL- Press KHARRAT 1978 Egyptian Short Stories THE WHISTLE Three Continents Abd al-Hakim Qasim Press 1978 Egyptian Short Stories THE SNAKE Three Continents SONALLAH Press IBRAHIM 1978 Egyptian Short Stories THE PERFORMER Three Continents IBRAHIM ASLAN Press 1978 Egyptian Short Stories THE MAN WHO SAW THE SOLE OF Three Continents LUTFI AL-KHOULI HIS LEFT FOOT IN A CRACKED Press MIRROR 1978 Egyptian Short Stories THE HILL OF GYPSIES Three Continents Said al-Kafrawi Press 1978 Egyptian Short Stories THE CRUSH OF LIFE Three Continents YUSUF SHAROUNI Press 1978 Egyptian Short Stories THE COUNTRY BOY Three Continents YUSUF AL-SIBAI Press 1978 Egyptian Short Stories THE CONJURER MADE OFF WITH Three Continents Naguib Mahfouz THE DISH Press 1978 Egyptian Short Stories THE CLOCK Three Continents Khairy Shalaby Press

169

1978 Egyptian Short Stories THE CHILD AND THE KING Three Continents YAHYA HAKKI Press 1978 Egyptian Short Stories THE ACCUSATION Three Continents SULEIMAN Press FAYYAD 1978 Egyptian Short Stories SUDDENLY IT RAINED Three Continents BAHA TAHER Press 1978 Egyptian Short Stories A STORY FROM PRISON Three Continents Yahya Hakki Press 1978 Egyptian Short Stories A PLACE UNDER THE DOME Three Continents ABDUL RAHMAN Press FAHMY 1978 Egyptian Short Stories A CONVERSATION FROM THE THIRD Three Continents MOHAMED EL- FLOOR Press BISATIE 1983 The Mountain Of Green Tea WORDS TO THE WINDS Heinemann YAHYA TAHER ABDULLAH 1983 The Mountain Of Green Tea WHO’LL HANG THE BELL? Heinemann YAHYA TAHER ABDULLAH 1983 Arabic Short Stories VOICES FROM NEAR AND FAR Quartet Books Abdul Ilah Abdul Razzak 1983 Distant view of a minaret and THURSDAY LUNCH Waveland Press Alifa Rifaat other stories 1983 Arabic Short Stories THE TRIAL OF THE SMALL BLACK Quartet Books Abdel-Hakim WOMAN Kassem 1983 The Mountain Of Green Tea THE TATTOO Heinemann YAHYA TAHER ABDULLAH 1983 The Mountain Of Green Tea THE STORY OF THE UPPER Heinemann YAHYA TAHER EGYPTIAN ABDULLAH 1983 Arabic Short Stories THE SLAVE FORT Quartet Books Ghassan Kanafani 1983 Arabic Short Stories THE PERSIAN CARPET Quartet Books Hanan Shaykh 1983 Arabic Short Stories THE OLD MAN Quartet Books Gamil Atia Ibrahim 1983 The Mountain Of Green Tea THE MOUNTAIN OF GREEN TEA Heinemann YAHYA TAHER ABDULLAH 1983 Distant view of a minaret and THE LONG NIGHT OF WINTER Quartet Books Alifa Rifaat other stories

170

1983 The Mountain Of Green Tea THE LOFTY ONE Heinemann YAHYA TAHER ABDULLAH 1983 Arabic Short Stories THE LITTLE GIRL IN GREEN Quartet Books Ibrahim Aslan 1983 Distant view of a minaret and THE KITE Quartet Books Alifa Rifaat other stories 1983 Arabic Short Stories THE KEROSENE STOVE Quartet Books Mahmoud AlrWardani 1983 The Mountain Of Green Tea THE INHERITOR Heinemann YAHYA TAHER ABDULLAH 1983 The Mountain Of Green Tea THE GIPSY Heinemann YAHYA TAHER ABDULLAH 1983 Arabic Short Stories THE GAP IN KALTOUMA’S FENCE Quartet Books Ibrahim Ishaq Ibrahim 1983 The Mountain Of Green Tea THE FREE-FOR-ALL DANCE Heinemann YAHYA TAHER ABDULLAH 1983 Distant view of a minaret and THE FLAT IN NAKSHABANDI STREET Quartet Books Alifa Rifaat other stories 1983 Arabic Short Stories THE DRUMMING SANDS Quartet Books Ibrahim Al-Kouni 1983 Arabic Short Stories THE CYPRIOT MAN Quartet Books Tayeb Salih 1983 Arabic Short Stories THE CHAIR CARRIER Quartet Books Yusuf Idris 1983 The Mountain Of Green Tea THE BODY Heinemann YAHYA TAHER ABDULLAH 1983 Distant view of a minaret and TELEPHONE CALL Quartet Books Alifa Rifaat other stories 1983 Arabic Short Stories SMALL SUN Quartet Books Zakaria Tamer 1983 The Mountain Of Green Tea RHYTHMS IN SLOW TIME Heinemann YAHYA TAHER ABDULLAH 1983 The Mountain Of Green Tea PILGRIM'S RETURN Heinemann YAHYA TAHER ABDULLAH 1983 Distant view of a minaret and MY WORLD OF THE UNKNOWN Quartet Books Alifa Rifaat other stories 1983 Arabic Short Stories MY BROTHER Quartet Books Mohamed El-Bisatie

171

1983 Distant view of a minaret and ME AND MY SISTER Quartet Books Alifa Rifaat other stories 1983 Distant view of a minaret and MANSOURA Quartet Books Alifa Rifaat other stories 1983 Arabic Short Stories LIFE BY INSTALMENTS Quartet Books Mohammed Barrada 1983 Distant view of a minaret and JUST ANOTHER DAY Quartet Books Alifa Rifaat other stories 1983 The Mountain Of Green Tea GRANDAD HASAN Heinemann YAHYA TAHER ABDULLAH 1983 Arabic Short Stories GLIMPSES FROM THE LIFE OF Quartet Books Yusuf Sharouni MAUGOUD ABDUL MAUGOUD AND TWO POSTSCRIPTS 1983 Arabic Short Stories FLOWER CRAZY Quartet Books Mohammed Chukri 1983 Arabic Short Stories DREAMS SEEN BY A BLIND BOY Quartet Books Yusuf Abu Rayya 1983 Distant view of a minaret and DISTANT VIEW OF A MINARET Quartet Books Alifa Rifaat other stories 1983 Arabic Short Stories DISTANT SEAS Quartet Books Habib Selmi 1983 Distant view of a minaret and DEGREES OF DEATH Quartet Books Alifa Rifaat other stories 1983 Arabic Short Stories CLOCKS LIKE HORSES Quartet Books Mohammed Khudayyir 1983 Arabic Short Stories CAIRO IS A SMALL CITY Quartet Books Nabil Gorgy 1983 Arabic Short Stories BIRDS' FOOTSTEPS IN THE SAND Quartet Books Edward El-Kharrat 1983 Distant view of a minaret and BAHIYYA'S EYES Quartet Books Alifa Rifaat other stories 1983 Distant view of a minaret and BADRIYYA AND HER HUSBAND Quartet Books Alifa Rifaat other stories 1983 Distant view of a minaret and AT THE TIME OF THE JASMINE Quartet Books Alifa Rifaat other stories 1983 Arabic Short Stories AT A WOMAN'S HOUSE Quartet Books Mohammed Ahmed Abdul Wali 1983 The Mountain Of Green Tea A TALE WITH A MORAL Heinemann YAHYA TAHER

172

ABDULLAH 1983 The Mountain Of Green Tea A TALE TOLD BY A DOG Heinemann YAHYA TAHER ABDULLAH 1983 Arabic Short Stories ANOTHER EVENING AT THE CLUB Quartet Books Alifa Rifaat 1983 Distant view of a minaret and AN INCIDENT IN THE GHOBASHI Quartet Books Alifa Rifaat other stories HOUSEHOLD 1983 Arabic Short Stories ADVICE FROM A SENSIBLE YOUNG Quartet Books Bahaa Taher MAN 1985 Tigers on the Tenth Day TIGERS ON THE TENTH DAY Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day THE WATER’S CRIME Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day THE STALE LOAF Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day THE SMILE Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day THE FAMILY Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day THE FACE OF THE MOON Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day THE ENEMY Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day THE DAY GENGHIS KHAN BECAME Quartet Books Zakaria Tamer ANGRY 1985 Tigers on the Tenth Day THE ANCIENT GATE Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day SUN FOR THE YOUNG Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day SNOW AT THE END OF THE NIGHT Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day SHEEP Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day ROOM WITH TWO BEDS Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day NOTHING Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day NO RAINCLOUD FOR THE TREES, NO Quartet Books Zakaria Tamer WONGS ABOVE THE MOUNTAINS 1985 Tigers on the Tenth Day MY FINAL ADVENTURE Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day HASAN AS A KING Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day GENGHIS KHAN Quartet Books Zakaria Tamer

173

1985 Tigers on the Tenth Day DEATH OF THE JASMINE Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day DEATH OF THE BLACK HAIR Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day CITY IN ASHES Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day A SUMMARY OF WHAT HAPPENED Quartet Books Zakaria Tamer TO MOHAMMED AL- MAHMOUDI 1985 Tigers on the Tenth Day AN ANGRY MAN Quartet Books Zakaria Tamer 1985 Tigers on the Tenth Day A LONE WOMAN Zakaria Tamer 1991 The Slave's Dream and Other ZENODOTUS OF EPHESUS Quartet Books Nabil Naoum Gorgy Stories 1991 The Slave's Dream and Other THE WELL Quartet Books Nabil Naoum Gorgy Stories 1991 The Time and the Place: And THE WASTELAND American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Slave's Dream and Other THE VISIT Quartet Books Nabil Naoum Gorgy Stories 1991 The Slave's Dream and Other THE TOMB Quartet Books Nabil Naoum Gorgy Stories 1991 The Time and the Place: And THE TIME AND THE PLACE American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Slave's Dream and Other THE THOUSANDTH JOURNEY Quartet Books Nabil Naoum Gorgy Stories 1991 The Time and the Place: And THE TAVERN OF THE BLACK CAT American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Slave's Dream and Other THE SWEETNESS OF LOVE Quartet Books Nabil Naoum Gorgy Stories 1991 The Slave's Dream and Other THE SUBSTITUTE Quartet Books Nabil Naoum Gorgy Stories 1991 The Slave's Dream and Other THE SPOUSE Quartet Books Nabil Naoum Gorgy Stories 1991 The Slave's Dream and Other THE SLAVE'S DREAM Quartet Books Nabil Naoum Gorgy Stories 1991 The Slave's Dream and Other THE RIVALRY Quartet Books Nabil Naoum Gorgy Stories

174

1991 The Time and the Place: And THE NORWEGIAN RAT American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Slave's Dream and Other THE MIRACLE Quartet Books Nabil Naoum Gorgy Stories 1991 The Slave's Dream and Other THE MARIA Quartet Books Nabil Naoum Gorgy Stories 1991 The Time and the Place: And THE MAN AND THE OTHER MAN American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Time and the Place: And THE LAWSUIT American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Slave's Dream and Other THE JOURNEY OF RA Quartet Books Nabil Naoum Gorgy Stories 1991 The Slave's Dream and Other THE INHERITANCE Quartet Books Nabil Naoum Gorgy Stories 1991 The Time and the Place: And THE EMPTY CAFE American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Time and the Place: And THE DITCH American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Time and the Place: And THE CONJURER MADE WITH THE American University NAGIB MAHFOUZ Other Stories DISH in Cairo Press 1991 The Time and the Place: And THE ANSWER IS NO American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Slave's Dream and Other MODES OF PLEASURE Quartet Books Nabil Naoum Gorgy Stories 1991 The Slave's Dream and Other LOVER-NARRATOR Quartet Books Nabil Naoum Gorgy Stories 1991 The Slave's Dream and Other LOVE LETTERS Quartet Books Nabil Naoum Gorgy Stories 1991 The Time and the Place: And HIS MAJESTY American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Time and the Place: And HALF A DAY American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Time and the Place: And FEAR American University NAGIB MAHFOUZ Other Stories in Cairo Press

175

1991 The Slave's Dream and Other DAWANIA Quartet Books Nabil Naoum Gorgy Stories 1991 The Time and the Place: And BY A PERSON UNKNOWN American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Time and the Place: And BLESSED NIGHT American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Slave's Dream and Other BIRD MOUNTAIN Quartet Books Nabil Naoum Gorgy Stories 1991 The Time and the Place: And AT THE BUS STOP American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Time and the Place: And A LONG-TERM PLAN American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Time and the Place: And A FUGITIVE FROM JUSTICE American University NAGIB MAHFOUZ Other Stories in Cairo Press 1991 The Time and the Place: And A DAY FOR SAYING GOODBYE American University NAGIB MAHFOUZ Other Stories in Cairo Press 1992 Wiles of Men and Other WHAT HAPPENED TO PUSSY Quartet Books Salwa Bakr Stories 1992 Wiles of Men and Other THIRTY-ONE BEAUTIFUL GREEN Quartet Books Salwa Bakr Stories TREES 1992 Wiles of Men and Other THE WILES OF MEN Quartet Books Salwa Bakr Stories 1992 Wiles of Men and Other THE SORROWS OF DESDEMONA Quartet Books Salwa Bakr Stories 1992 Wiles of Men and Other THE SMILE OF DEATH Quartet Books Salwa Bakr Stories 1992 Wiles of Men and Other THE MONKEY TRAINER Quartet Books Salwa Bakr Stories 1992 Wiles of Men and Other THE BIRD IN HIS CAGE Quartet Books Salwa Bakr Stories 1992 Wiles of Men and Other THAT BEAUTIFUL UNDISCOVERED Quartet Books Salwa Bakr Stories VOICE 1992 Wiles of Men and Other FILCHING OF A SOUL Quartet Books Salwa Bakr Stories

176

1992 Wiles of Men and Other DOTTY NOONA Quartet Books Salwa Bakr Stories 1992 Wiles of Men and Other A SMALL WHITE MOUSE Quartet Books Salwa Bakr Stories 1992 Wiles of Men and Other A SHORT BUS RIDE Quartet Books Salwa Bakr Stories 1992 Wiles of Men and Other AN OCCASION FOR HAPPINESS Quartet Books Salwa Bakr Stories 1994 A Last Glass of Tea and other WILD MULBERRIES Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other WAR WIDOWS Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other UNCLE ZEYDAN Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other TIME THE FLOATING SACK Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other THE WASTELANDS Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other THE TRAP Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other THE PRISONER Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other THE HILL Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other THE GIRL WASHES Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other THE CONDEMNED MAN Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other THE BEND OF THE RIVER Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other THAT’S HOW IT WAS Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other ON THE BRINK Lynne Rienner Mohamed El-Bisatie Stories Publishers

177

1994 A Last Glass of Tea and other MY GRANDFATHER Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other MEETING Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other HAGG ABD RABBUH Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other DROUGHT Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other DEATH HAS ITS Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other AT THE ROADSIDE Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other A LAST GLASS OF TEA Lynne Rienner Mohamed El-Bisatie Stories Publishers 1994 A Last Glass of Tea and other A CONVERSATION AT NIGHT Lynne Rienner Mohamed El-Bisatie Stories Publishers 1995 Arabian Nights and Days THE SULTAN Doubleday NAGIB MAHFOUZ 1995 Arabian Nights and Days THE SHEIKH Doubleday NAGIB MAHFOUZ 1995 Arabian Nights and Days THE PORTER Doubleday NAGIB MAHFOUZ 1995 Arabian Nights and Days THE GRIEVERS Doubleday NAGIB MAHFOUZ 1995 Arabian Nights and Days THE CAP OF INVISIBILITY Doubleday NAGIB MAHFOUZ 1995 Arabian Nights and Days THE CAFE OF THE EMIRS Doubleday NAGIB MAHFOUZ 1995 Arabian Nights and Days THE ADVENTURES OF UGR THE Doubleday NAGIB MAHFOUZ BARBER 1995 Arabian Nights and Days SINDBAD Doubleday NAGIB MAHFOUZ 1995 Arabian Nights and Days SHAHRZAD Doubleday NAGIB MAHFOUZ 1995 Arabian Nights and Days SHAHRIYAR Doubleday NAGIB MAHFOUZ 1995 Arabian Nights and Days SANAAN AL-GAMALI Doubleday NAGIB MAHFOUZ 1995 Arabian Nights and Days QUT AL-QULOUB Doubleday NAGIB MAHFOUZ 1995 Arabian Nights and Days NUR AL-DIN AND DUNYAZAD Doubleday NAGIB MAHFOUZ

178

1995 Arabian Nights and Days MA’ROUF THE COBBLER Doubleday NAGIB MAHFOUZ 1995 UUKNOWN HOMECOMING_2 American University Buthaina Al Nasiri in Cairo Press 1995 Arabian Nights and Days GAMASA AL-BULTI Doubleday NAGIB MAHFOUZ 1995 Arabian Nights and Days ANEES AL-GALEES Doubleday NAGIB MAHFOUZ 1995 Arabian Nights and Days ALADDIN WITH THE MOLES ON HIS Doubleday NAGIB MAHFOUZ CHEEKS 1998 The Hill of Gypsies and other THE WOLF CUB American University Said al-Kafrawi stories in Cairo Press 1998 The Hill of Gypsies and other THE TRACKER American University Said al-Kafrawi stories in Cairo Press 1998 The Hill of Gypsies and other THE NIGHT BOOK American University Said al-Kafrawi stories in Cairo Press 1998 The Hill of Gypsies and other THE MAN WITH THE TRAPS American University Said al-Kafrawi stories in Cairo Press 1998 The Hill of Gypsies and other THE GAZELLE HUNTER American University Said al-Kafrawi stories in Cairo Press 1998 The Hill of Gypsies and other THE CAMEL, ABDUL MAWLA, THE American University Said al-Kafrawi stories CAMEL! in Cairo Press 1998 The Hill of Gypsies and other THE BOY ON THE BRIDGE American University Said al-Kafrawi stories in Cairo Press 1998 The Hill of Gypsies and other THE BOSS American University Said al-Kafrawi stories in Cairo Press 1998 The Hill of Gypsies and other THE BLIND SHEIKH American University Said al-Kafrawi stories in Cairo Press 1998 The Hill of Gypsies and other A MATTER OF HONOR American University Said al-Kafrawi stories in Cairo Press 1998 The Hill of Gypsies and other ABSENCE American University Said al-Kafrawi stories in Cairo Press 2000 Under the naked sky : short WOMEN IN FEAR American University INAAM stories from the Arab world in Cairo Press KACHACHI 2000 Under the naked sky : short WHITENESS OF SILVER American University ABDULLAH stories from the Arab world in Cairo Press BAKHSHAWEEN

179

2000 Under the naked sky : short TRAVELER WITH HAND LUGGAGE American University NAGUIB stories from the Arab world in Cairo Press MAHFOUZ 2000 Under the naked sky : short TO THE NIGHT'S SHELTER American University MOHAMED stories from the Arab world in Cairo Press KHUDAYIR 2000 Under the naked sky : short THE WOMEN'S ROOM American University HANA ATIA stories from the Arab world in Cairo Press 2000 Under the naked sky : short THE SOUND OF SINGING American University SALMA MATAR stories from the Arab world in Cairo Press SEIF 2000 Under the naked sky : short THE RETURN OF THE PRISONER American University Buthaina Al Nasiri stories from the Arab world in Cairo Press 2000 Under the naked sky : short THE PILOT American University Mohamed stories from the Arab world in Cairo Press Makhzangi 2000 Under the naked sky : short THE MONEY ORDER American University MOHAMED EL- stories from the Arab world in Cairo Press BISATIE 2000 Under the naked sky : short THE MAN SHOULDN'T KNOW OF American University HANAN AL- stories from the Arab world THIS in Cairo Press SHAYKH 2000 Under the naked sky : short THE ILL-OMENED GOLDEN BIRD American University Ibrahim al-Koni stories from the Arab world in Cairo Press 2000 Under the naked sky : short THE DOG American University ABD AL-AZIZ stories from the Arab world in Cairo Press MISHRI 2000 Under the naked sky : short THE DAY GRANDPA CAME American University Mahmoud Al- stories from the Arab world in Cairo Press Wardani 2000 Under the naked sky : short THE CLOUD American University FUAD AL- stories from the Arab world in Cairo Press TAKARLI 2000 Under the naked sky : short THE CHARGE American University EDWAR AL- stories from the Arab world in Cairo Press KHARRAT 2000 Under the naked sky : short THE APPLES OF PARADISE American University BRAHIM stories from the Arab world in Cairo Press DARGOUTHI 2000 Under the naked sky : short SNAKE HUNTING American University Mohamed Zefzaf stories from the Arab world in Cairo Press 2000 Under the naked sky : short PRESENCE OF THE ABSENT MAN American University Alia Mamdouh stories from the Arab world in Cairo Press 2000 Under the naked sky : short OUT IN THE OPEN American University YUSUF ABU stories from the Arab world in Cairo Press RAYYA

180

2000 Under the naked sky : short MY FELLOW PASSENGER American University Ibrahim Samouiel stories from the Arab world in Cairo Press 2000 Under the naked sky : short IT HAPPENED SECRETLY American University AMINA ZAYDAN stories from the Arab world in Cairo Press 2000 Under the naked sky : short IT'S NOT FAIR American University YUSUF IDRIS stories from the Arab world in Cairo Press 2000 Under the naked sky : short HAVE YOU SEEN ALEXANDRIA American University ABDOU GUBEIR stories from the Arab world STATION? in Cairo Press 2000 Under the naked sky : short FEAR_1 American University GHALIB HALASA stories from the Arab world in Cairo Press 2000 Under the naked sky : short DEATH OF A DAGGER American University Zakaria Tamer stories from the Arab world in Cairo Press 2000 Under the naked sky : short CORNCOBS American University SALWA BAKR stories from the Arab world in Cairo Press 2000 Under the naked sky : short AN INVITATION American University GAMAL AL- stories from the Arab world in Cairo Press GHITANI 2000 Under the naked sky : short A DASH OF LIGHT American University IBRAHIM ASLAN stories from the Arab world in Cairo Press 2000 Under the naked sky : short A BOAT ON THE WATER American University SAID AL- stories from the Arab world in Cairo Press KAFRAWI 2002 Final Night: Short Stories WHY DON’T WE GO MORE TO THE American University Buthaina Al Nasiri SEA? in Cairo Press 2002 Final Night: Short Stories THE STORY OF SAMAH American University Buthaina Al Nasiri in Cairo Press 2002 Final Night: Short Stories THE MAN WHO MADE CHANGES American University Buthaina Al Nasiri in Cairo Press 2002 Final Night: Short Stories THE MANSION American University Buthaina Al Nasiri in Cairo Press 2002 Final Night: Short Stories THE BOAT American University Buthaina Al Nasiri in Cairo Press 2002 Final Night: Short Stories OMAR’S HEN American University Buthaina Al Nasiri in Cairo Press 2002 Final Night: Short Stories MAN AND WOMAN American University Buthaina Al Nasiri in Cairo Press

181

2002 Final Night: Short Stories I’VE BEEN HERE BEFORE American University Buthaina Al Nasiri in Cairo Press 2002 Final Night: Short Stories FINAL NIGHT American University Buthaina Al Nasiri in Cairo Press 2002 Final Night: Short Stories DEATH OF THE SEA GOD American University Buthaina Al Nasiri in Cairo Press 2002 Final Night: Short Stories DAILY REPORT American University Buthaina Al Nasiri in Cairo Press 2002 Final Night: Short Stories CIRCUS DOG American University Buthaina Al Nasiri in Cairo Press 2002 Final Night: Short Stories A TIME FOR WAITING American University Buthaina Al Nasiri in Cairo Press 2002 Final Night: Short Stories ALL THIS LAND American University Buthaina Al Nasiri in Cairo Press 2004 The Lamp of Umm Hisham THE LAMP OF UMM HASHIM American University YAHYA HAKKI and other Stories in Cairo Press 2004 The Lamp of Umm Hisham STORY IN THE FORM OF A PETITION American University YAHYA HAKKI and other Stories in Cairo Press 2005 Modern Arabic Fiction THE WOMEN’S SWIMMING POOL Columbia University Hanan al-Shaykh Press 2006 The Anchor Book THE DREAM Anchor Abdel Salam al-Ujaili 2006 The Anchor Book THE COMEDY OF DEATH Anchor Mahmoud Teymour 2006 The Anchor Book SUNSET Anchor Fathy Ghanem 2006 The Anchor Book HOUSE OF FLESH Anchor Yusuf Idris 2006 The Anchor Book APPLES OF PARADISE Anchor Brahim Dargouthi 2009 In a Fertile Desert ZAAIN AND FATIMA American University Mohamed al- in Cairo Press Mazrouei 2009 In a Fertile Desert TWO NEIGHBORS American University Muhammad al-Murr in Cairo Press 2009 In a Fertile Desert TOO LATE American University SALEH KARAMA in Cairo Press 2009 In a Fertile Desert THREADS OF DELUSION American University Sheikha al-Nakhi in Cairo Press

182

2009 In a Fertile Desert THE STORY OF IBRAHIM American University Roda al-Baluchi in Cairo Press 2009 In a Fertile Desert THE PEDDLER American University MUHSIN in Cairo Press SOLEIMAN 2009 In a Fertile Desert THE OLD WOMAN American University MARYAM AL in Cairo Press SAEDI 2009 In a Fertile Desert THE LITTLE TREE American University NASSER AL- in Cairo Press DHAHERI 2009 In a Fertile Desert RIPE DATES AND DATE PALMS American University HAREB AL- in Cairo Press DHAHERI 2009 In a Fertile Desert GRIEF OF THE NIGHT BIRD American University IBRAHIM in Cairo Press MUBARAK 2009 In a Fertile Desert FISHHOOKS American University NASSER JUBRAN in Cairo Press 2009 In a Fertile Desert FEAR WITHOUT WALLS American University A'ISHAA AL- in Cairo Press ZA'ABY 2009 In a Fertile Desert ENEMIES IN A SINGLE HOUSE American University MARYAM JUMAA in Cairo Press FARAJ 2009 In a Fertile Desert DEATH American University OMNIYAT SALEM in Cairo Press 2009 In a Fertile Desert BIRDS OF A FEATHER American University JUMAA AL- in Cairo Press FAIRUZ 2009 In a Fertile Desert A SLAP IN THE FACE American University ABDUL HAMID in Cairo Press AHMED 2009 In a Fertile Desert A DIFFERENT SPECIES American University LAMEES FARIS in Cairo Press AL-MARZUQI 2009 In a Fertile Desert A DECISION American University EBTISAM AL- in Cairo Press MUALLA 2009 In a Fertile Desert ABU ABBOUD American University ALI ABDUL AZIZ in Cairo Press AL-SHARHAN 2012 Homecoming: Sixty Years of USELESS CATS American University BAHAA TAHER Egyptian Short Stories in Cairo Press 2012 Homecoming: Sixty Years of UNDOING THE SPELL American University SAID AL- Egyptian Short Stories in Cairo Press KAFRAWI

183

2012 Homecoming: Sixty Years of THIRST American University MAHMOUD AL- Egyptian Short Stories in Cairo Press WARDANI 2012 Homecoming: Sixty Years of THE STORY OF BLACK KNIGHT American University HOSAM FAKHR Egyptian Short Stories in Cairo Press 2012 Homecoming: Sixty Years of THE SON, THE FATHER, AND THE American University SABRI MOUSSA Egyptian Short Stories DONKEY in Cairo Press 2012 Homecoming: Sixty Years of THE SNOOPER American University MEKKAWI SAID Egyptian Short Stories in Cairo Press 2012 Homecoming: Sixty Years of THE ROOM NEXT DOOR American University MOHAMED EL- Egyptian Short Stories in Cairo Press BISATIE 2012 Homecoming: Sixty Years of THE REPORT OF MRS. R. American University RACHVA ASHOUR Egyptian Short Stories CONCERNING THE LAST DAY OF in Cairo Press THE WEEK 2012 Homecoming: Sixty Years of THE PALM TREE MY AUNT LOVED American University HANA ATIA Egyptian Short Stories in Cairo Press 2012 Homecoming: Sixty Years of THE OLD CLOTHES MAN American University FATHY GHANEM Egyptian Short Stories in Cairo Press 2012 Homecoming: Sixty Years of THE MOTHER American University IBRAHIM Egyptian Short Stories in Cairo Press SHUKRALLAH 2012 Homecoming: Sixty Years of THE MAN WITH THE MUSTACHE American University MOHAMED Egyptian Short Stories AND THE BOW TIE in Cairo Press MAKHZANGI 2012 Homecoming: Sixty Years of THE HOUSE OF THE SPINSTER American University SAHAR TAWFIQ Egyptian Short Stories in Cairo Press 2012 Homecoming: Sixty Years of THE GUARD’S CHAIR American University MOHAMED Egyptian Short Stories in Cairo Press MAKHZANGI 2012 Homecoming: Sixty Years of THE GIRLS AND THE ROOSTER American University ABDOU GUBEIR Egyptian Short Stories in Cairo Press 2012 Homecoming: Sixty Years of THE DENTAL CROWN American University Egyptian Short Stories in Cairo Press 2012 Homecoming: Sixty Years of THE DAYS OF THE BLACK CAT American University REHAB BASSAM Egyptian Short Stories in Cairo Press 2012 Homecoming: Sixty Years of THE DANCER American University SHAWQI FAHEEM Egyptian Short Stories in Cairo Press 2012 Homecoming: Sixty Years of SYLVIA American University AHMED ALAIDY Egyptian Short Stories in Cairo Press

184

2012 Homecoming: Sixty Years of SECRETS American University IBRAHIM ABDEL Egyptian Short Stories in Cairo Press MEGUID 2012 Homecoming: Sixty Years of RED AND WHITE American University IBRAHIM Egyptian Short Stories in Cairo Press FARGHALI 2012 Homecoming: Sixty Years of PUT OUT THOSE LIGHTS American University YOUSEF GOHAR Egyptian Short Stories in Cairo Press 2012 Homecoming: Sixty Years of PARADISE ITSELF American University MAHMOUD AL- Egyptian Short Stories in Cairo Press SAADANI 2012 Homecoming: Sixty Years of NAKED HE WENT OFF American University MOHAMED Egyptian Short Stories in Cairo Press MUSTAGAB 2012 Homecoming: Sixty Years of IN THE PLACE FOR PRAYERS American University MOHAMMED Egyptian Short Stories in Cairo Press AFIFI 2012 Homecoming: Sixty Years of HOMECOMING American University YUSUF ABU Egyptian Short Stories in Cairo Press RAYYA 2012 Homecoming: Sixty Years of HASHISH STEALS THE NIGHT American University SHEHATA AL- Egyptian Short Stories in Cairo Press ERIAN 2012 Homecoming: Sixty Years of FOR THE LOVE OF GOD American University ABDUL HAMID Egyptian Short Stories in Cairo Press GOUDA AL- SAHHAR 2012 Homecoming: Sixty Years of EYES STARING INTO SPACE American University MANSOURA EZ Egyptian Short Stories in Cairo Press ELDIN 2012 Homecoming: Sixty Years of DROPS OF LEMON JUICE American University IBRAHIM ASLAN Egyptian Short Stories in Cairo Press 2012 Homecoming: Sixty Years of A VISIT American University GAMAL AL- Egyptian Short Stories in Cairo Press GHITANI 2012 Homecoming: Sixty Years of AT THE LEVEL CROSSING American University ABBAS AHMED Egyptian Short Stories in Cairo Press 2012 Homecoming: Sixty Years of A MURDER LONG AGO American University NAGUIB Egyptian Short Stories in Cairo Press MAHFOUZ 2012 Homecoming: Sixty Years of A LOVER American University MOHAMED Egyptian Short Stories in Cairo Press MAKHZANGI 2012 Homecoming: Sixty Years of A DOG American University HAMDY EL- Egyptian Short Stories in Cairo Press GAZZAR 2012 Homecoming: Sixty Years of ACROSS THREE BEDS IN THE American University SONALLAH Egyptian Short Stories AFTERNOON in Cairo Press IBRAHIM

185

2012 Homecoming: Sixty Years of ABU ARAB American University MAHMOUD Egyptian Short Stories in Cairo Press TEYMOUR

186

APPENDIX B: List of Denys Johnson-Davies’ Creative Writing Short Stories

Collection Tittle Copyright Year Publisher Shor Story Title

The Fate of a Prisoner: And Other Stories 1999 Quartet Books A Short Weekend

The Fate of a Prisoner: And Other Stories 1999 Quartet Books A Smile from the Past

The Fate of a Prisoner: And Other Stories 1999 Quartet Books A Taxi to Himself

The Fate of a Prisoner: And Other Stories 1999 Quartet Books Cat

The Fate of a Prisoner: And Other Stories 1999 Quartet Books Coffee at the Marriott

The Fate of a Prisoner: And Other Stories 1999 Quartet Books Deal Concluded

The Fate of a Prisoner: And Other Stories 1999 Quartet Books Fate of a Prisoner The Fate of a Prisoner: And Other Stories 1999 Quartet Books Garbage Girl The Fate of a Prisoner: And Other Stories 1999 Quartet Books Mr Pritchard

The Fate of a Prisoner: And Other Stories 1999 Quartet Books Oleanders Pink and White

The Fate of a Prisoner: And Other Stories 1999 Quartet Books Open Season in Beirut

The Fate of a Prisoner: And Other Stories 1999 Quartet Books Slice of the Cake The Fate of a Prisoner: And Other Stories 1999 Quartet Books The Dream

The Fate of a Prisoner: And Other Stories 1999 Quartet Books The Garden of Sheikh Osman

The Fate of a Prisoner: And Other Stories 1999 Quartet Books Two Worlds

187