<<

COHESION IN : A CORPUS STUDY OF HUMAN- TRANSLATED, MACHINE-TRANSLATED, AND NON-TRANSLATED TEXTS (RUSSIAN INTO ENGLISH)

A dissertation submitted

to Kent State University in partial

fulfillment of the requirements for the

degree of Doctor of Philosophy

by

Tatyana Bystrova-McIntyre

December, 2012

i

© Copyright by Tatyana Bystrova-McIntyre 2012

All Rights Reserved

ii

Dissertation written by Tatyana Bystrova-McIntyre B.A., Tver State University, Russia, 1999 M.A., Kent State University, OH, 2004 M.A., Kent State University, OH, 2006 Ph.D., Kent State University, OH, 2012

APPROVED BY

______, Chair, Doctoral Dissertation Committee Brian J. Baer (advisor)

______, Members, Doctoral Dissertation Committee Gregory M. Shreve

______, Erik B. Angelone

______, Andrew S. Barnes

______, Mikhail Nesterenko

ACCEPTED BY

______, Interim Chair, Modern and Classical Language Studies Keiran J. Dunne

______, Dean, College of Arts and Sciences Timothy Chandler

iii

TABLE OF CONTENTS

LIST OF FIGURES ...... XIII

LIST OF TABLES ...... XX

DEDICATION ...... XXXVI

ACKNOWLEDGEMENTS ...... XXXVII

CHAPTER 1: INTRODUCTION ...... 1

1.1 Background ...... 2

1.1.1 Global textual features and translation: Text as a -

specific phenomenon ...... 2

1.1.2 New realities of the translation industry today ...... 5

1.1.2.1 Machine translation as a reality of modern

translation practice ...... 5

1.1.2.2 Editing and post-editing in the translation industry

today ...... 9

1.1.2.3 Translation memory tools and their effect on

translators' work ...... 10

1.2 Statement of the problem ...... 11

1.3 Goals and components of this study ...... 12

1.4 Research questions ...... 14

1.5 Organization of this study ...... 15

iv

CHAPTER 2: REVIEW ...... 16

2.1 Non-translated vs. translated texts ...... 16

2.1.1 The notion of translationese ...... 16

2.1.2 General laws of translation ...... 18

2.1.3 Interference of the source language vs. translation universals

...... 19

2.1.4 Methodology of studies involving translated and non-

translated texts ...... 22

2.1.5 Applicability of studies involving translated and non-

translated texts ...... 22

2.2 Cohesion, global textual features, and related terms ...... 23

2.2.1 Text organization and analysis ...... 23

2.2.2 Cohesion as a linguistic phenomenon ...... 26

2.2.3 Cohesion and texture...... 28

2.2.4 Cohesion vs. coherence ...... 28

2.2.5 Standards of textuality ...... 30

2.2.6 Definition of cohesion in the present study ...... 31

2.2.7 Categories of cohesion ...... 32

2.2.7.1 Grammatical cohesion ...... 33

2.2.7.2 Lexical cohesion ...... 46

2.2.8 Other global textual features ...... 50

v

2.2.8.1 Nominalization ...... 51

2.2.8.2 Lexical density ...... 52

2.2.8.3 Average word length ...... 52

2.2.8.4 Average sentence length ...... 53

2.2.8.5 Passives ...... 54

2.2.8.6 Prepositional phrases ...... 55

2.2.9 Cohesion and other global textual features as a language- and

culture-specific phenomenon ...... 55

2.3 Studies of cohesion and other global textual features ...... 59

2.3.1 Studies of cohesion in the context of teaching and evaluating

writing ...... 59

2.3.2 Studies of cohesion in the context of text comprehension ... 62

2.3.3 Studies of cohesion as a -specific and text-type specific

phenomenon ...... 63

2.3.4 Studies of cohesion in the context of translation studies ..... 65

2.3.5 Studies of other global textual features in the context of

translation studies...... 69

2.3.6 Cohesion and other global textual features and translation

expertise ...... 71

2.4 Conclusions ...... 73

CHAPTER 3: METHODS AND MATERIALS ...... 76

vi

3.1 Corpus as a methodology to analyze cohesion and other

global textual features ...... 76

3.2 The framework for studying cohesion and other global textual features

...... 78

3.3 Materials ...... 83

3.3.1 Corpus compilation ...... 83

3.3.1.1 Literary corpus ...... 89

3.3.1.2 Newspaper corpus ...... 96

3.3.1.3 Scientific corpus...... 98

3.4 Methods and procedures ...... 101

3.4.1 Corpus database ...... 101

3.4.2 Corpus annotation ...... 102

3.4.3 Data collection ...... 103

3.4.4 Data analysis: General procedures for descriptive statistical

analysis and significance testing ...... 106

CHAPTER 4: TEXTUAL CHARACTERISTICS OF LITERARY, NEWSPAPER, AND

SCIENTIFIC TEXTS ACROSS NON-TRANSLATED, HUMAN-TRANSLATED,

AND MACHINE-TRANSLATED CORPORA ...... 111

4.1 Cohesive and other global textual characteristics selected for the genre

description of the corpora ...... 111

4.2 Cohesive characteristics of literary, newspaper, and scientific texts 113

vii

4.2.1 Third-person pronominal cohesive devices in literary, newspaper, and scientific texts ...... 113

4.2.2 Possessive pronouns in literary, newspaper, and scientific texts ...... 116

4.2.3 Demonstrative pronouns in literary, newspaper, and scientific texts ...... 120

4.2.4 Definite article in literary, newspaper, and scientific texts 124

4.2.5 Comparative cohesive devices in literary, newspaper, and scientific texts ...... 128

4.2.6 Reference cohesive devices in literary, newspaper, and scientific texts ...... 131

4.2.7 Conjunction cohesive devices in literary, newspaper, and scientific texts ...... 135

4.3 Textual characteristics of literary, newspaper, and scientific texts .. 140

4.3.1 Nominalization in literary, newspaper, and scientific texts 140

4.3.2 Lexical density in literary, newspaper, and scientific texts 144

4.3.3 Average word length in literary, newspaper, and scientific texts ...... 148

4.3.4 Average sentence length in literary, newspaper, and scientific texts ...... 152

4.3.5 Passives in literary, newspaper, and scientific texts ...... 156

viii

4.3.6 Prepositional phrases in literary, newspaper, and scientific

texts ...... 160

4.4 Conclusions ...... 164

CHAPTER 5: RESULTS AND ANALYSIS: ASSOCIATION OF COHESIVE

DEVICES AND OTHER GLOBAL TEXTUAL FEATURES WITH THE METHOD OF

TEXT PRODUCTION...... 171

5.1 Association of cohesive devices and other global textual features with

the method of text production: Overview of variables under study...... 171

5.1.1 Cohesive devices ...... 172

5.1.1.1 Reference cohesive devices ...... 172

5.1.1.2 Conjunction cohesive devices ...... 173

5.1.1.3 Reference and conjunction cohesive devices ...... 174

5.1.2 Global textual features ...... 174

5.2 Association of cohesive devices and other global textual features with

the method of text production: Literary corpus ...... 175

5.2.1 Cohesive devices in non-translated, human-translated, and

machine-translated texts in the literary corpus ...... 175

5.2.1.1 Reference cohesive devices in the literary corpus

...... 175

5.2.1.2 Conjunction cohesive devices in the literary corpus

...... 202

ix

5.2.1.3 Total reference and conjunction cohesive devices in

the literary corpus ...... 208

5.2.2 Global textual features in non-translated, human-translated,

and machine-translated texts in the literary corpus ...... 211

5.2.2.1 Nominalization ...... 211

5.2.2.2 Lexical density ...... 212

5.2.2.3 Average word length ...... 216

5.2.2.4 Average sentence length ...... 217

5.2.2.5 Passives ...... 218

5.2.2.6 Prepositional phrases ...... 218

5.3 Association of cohesive devices and other global textual features with the method of text production: Newspaper corpus ...... 223

5.3.1 Cohesive devices in non-translated, human-translated, and

machine-translated texts in the newspaper corpus ...... 223

5.3.1.1 Reference cohesive devices in the newspaper corpus

...... 223

5.3.1.2 Conjunction cohesive devices ...... 250

5.4.1.3 Reference and conjunction cohesive devices in the

newspaper corpus ...... 259

5.3.2 Global textual features in non-translated, human-translated,

and machine-translated texts in the newspaper corpus ...... 263

x

5.3.2.1 Nominalization ...... 263

5.3.2.2 Lexical density ...... 266

5.3.2.3 Average word length ...... 269

5.3.2.4 Average sentence length ...... 269

5.3.2.5 Passives ...... 271

5.3.2.6 Prepositional phrases ...... 272

5.4 Association of cohesive devices and other global textual features with the method of text production: Scientific corpus ...... 279

5.4.1 Cohesive devices in non-translated, human-translated, and

machine-translated texts in the scientific corpus ...... 279

5.4.1.1 Reference cohesive devices in the scientific corpus

...... 279

5.4.1.2 Conjunction cohesive devices in the scientific

corpus ...... 304

5.4.1.3 Reference and conjunction cohesive devices in the

scientific corpus ...... 311

5.4.2 Global textual features in non-translated, human-translated,

and machine-translated texts in the scientific corpus ...... 313

5.4.2.1 Nominalization ...... 313

5.4.2.2 Lexical density ...... 315

5.4.2.3 Average word length ...... 317

xi

5.4.2.4 Average sentence length ...... 320

5.4.2.5 Passives ...... 322

5.4.2.6 Prepositional phrases ...... 325

5.5 Conclusions ...... 336

CHAPTER 6: CONCLUSIONS, RECOMMENDATIONS, AND FUTURE

DIRECTIONS ...... 337

6.1 Summary of results ...... 338

6.1.1 Summary of results for the literary corpus ...... 338

6.1.2 Summary of results for the newspaper corpus ...... 349

6.1.3 Summary of results for the scientific corpus ...... 362

6.2 Practical recommendations for linguists working in the translation

industry and translator trainers...... 373

6.3 Limitations of this study ...... 376

6.4 Future research directions ...... 378

REFERENCES ...... 382

xii

LIST OF FIGURES

Fig. 3.1 Example of CLAWS POS-tagged text ...... 103

Fig. 4.1 Means and standard error for 3rd person pronominal cohesive devices for NT,

HT, and MT in the literary, newspaper, and scientific corpora ...... 116

Fig. 4.2 Means and standard error for possessive pronouns for NT, HT, and MT in the literary, newspaper, and scientific corpora ...... 120

Fig. 4.3 Means and standard error for demonstratives for NT, HT, and MT in the literary, newspaper, and scientific corpora ...... 123

Fig. 4.4 Means and standard error for the definite article for NT, HT, and MT in the literary, newspaper, and scientific corpora ...... 127

Fig. 4.5 Means and standard error for comparative devices for NT, HT, and MT in the literary, newspaper, and scientific corpora ...... 131

Fig. 4.6 Means and standard error for reference cohesive devices NT, HT, and MT in the literary, newspaper, and scientific corpora ...... 134

Fig. 4.7 Means and standard error for conjunction cohesive devices for NT, HT, and MT in the literary, newspaper, and scientific corpora ...... 139

Fig. 4.8 Means and standard error for nominalization for NT, HT, and MT in the literary, newspaper, and scientific corpora ...... 143

xiii

Fig. 4.9 Means and standard error for STTR for NT, HT, and MT in the literary, newspaper, and scientific corpora ...... 147

Fig. 4.10 Means and standard error for average word length for NT, HT, and MT in the literary, newspaper, and scientific corpora ...... 151

Fig. 4.11 Means and standard error for average sentence length for NT, HT, and MT in the literary, newspaper, and scientific corpora ...... 155

Fig. 4.12 Means and standard error for passives for NT, HT, and MT in the literary, newspaper, and scientific corpora ...... 159

Fig. 4.13 Means and standard error for prepositions for NT, HT, and MT in the literary, newspaper, and scientific corpora ...... 163

Fig. 5.1 Means and standard error (± 1 SE) for 3rd person singular objective pronominal cohesive devices in literary texts ...... 179

Fig. 5.2 Means and standard error (± 1 SE) for 3rd person plural objective pronominal cohesive devices in literary texts ...... 181

Fig. 5.3 Means and standard error (± 1 SE) for 3rd person singular subjective pronominal cohesive devices in literary texts ...... 184

Fig. 5.4 Means and standard error (± 1 SE) for 3rd person plural subjective pronominal cohesive devices in literary texts ...... 186

Fig. 5.5 Means and standard error (± 1 SE) for the total of 3rd person pronominal cohesive devices in literary texts ...... 187

Fig. 5.6 Means and standard error (± 1 SE) for possessive pronouns in literary texts ... 191

xiv

Fig. 5.7 Means and standard error (± 1 SE) for singular demonstratives in literary texts

...... 194

Fig. 5.8 Means and standard error (± 1 SE) for definite article "the" in literary texts .... 197

Fig. 5.9 Means and standard error (± 1 SE) for the total number of reference cohesive devices in literary texts ...... 201

Fig. 5.10 Means and standard error (± 1 SE) for additive conjunction devices in literary texts ...... 205

Fig. 5.11 Means and standard error (± 1 SE) for adversative conjunction devices in literary texts ...... 206

Fig. 5.12 Means and standard error (± 1 SE) for the total number of conjunction devices in literary texts ...... 207

Fig. 5.13 Means and standard error (± 1 SE) for total reference and conjunction devices in literary texts ...... 210

Fig. 5.14 Means and standard error (± 1 SE) for STTR in literary texts ...... 214

Fig. 5.15 Means and standard error (± 1 SE) for prepositional phrases with "for" in literary texts ...... 221

Fig. 5.16 Means and standard error (± 1 SE) for 3rd person singular neuter pronominal cohesive device ("it") in newspaper texts ...... 227

Fig. 5.17 Means and standard error (± 1 SE) for 3rd person plural subjective pronominal cohesive device ("they") in newspaper texts...... 231

xv

Fig. 5.18 Means and standard error (± 1 SE) for the total of 3rd person pronominal cohesive devices in newspaper texts ...... 233

Fig. 5.19 Means and standard error (± 1 SE) for possessive pronouns in newspaper texts

...... 236

Fig. 5.20 Means and standard error (± 1 SE) for definite article "the" in newspaper texts

...... 241

Fig. 5.21 Means and standard error (± 1 SE) for general comparative adjectives in newspaper texts ...... 245

Fig. 5.22 Means and standard error (± 1 SE) for superlative degree adverbs in newspaper texts ...... 246

Fig. 5.23 Means and standard error (± 1 SE) for the total number of reference cohesive devices in newspaper texts ...... 249

Fig. 5.24 Means and standard error (± 1 SE) for additive conjunction devices in newspaper texts ...... 253

Fig. 5.25 Means and standard error (± 1 SE) for adversative conjunction devices in newspaper texts ...... 255

Fig. 5.26 Means and standard error (± 1 SE) for causal and continuative conjunction devices in newspaper texts ...... 256

Fig. 5.27 Means and standard error (± 1 SE) for the total number of conjunction devices in newspaper texts ...... 257

xvi

Fig. 5.28 Means and standard error (± 1 SE) for total reference and conjunction cohesive devices in newspaper texts ...... 261

Fig. 5.29 Means and standard error (± 1 SE) for nominalization in newspaper texts .... 265

Fig. 5.30 Means and standard error (± 1 SE) for STTR in newspaper texts ...... 268

Fig. 5.31 Means and standard error (± 1 SE) for prepositional phrases with "of" in newspaper texts ...... 275

Fig. 5.32 Means and standard error (± 1 SE) for the total number of prepositional phrases in newspaper texts ...... 276

Fig. 5.33 Means and standard error (± 1 SE) for 3rd person singular neuter pronominal cohesive device ("it") in scientific texts ...... 283

Fig. 5.34 Means and standard error (± 1 SE) for the total of 3rd person pronominal cohesive devices in scientific texts ...... 286

Fig. 5.35 Means and standard error (± 1 SE) for possessive pronouns in scientific texts

...... 289

Fig. 5.36 Means and standard error (± 1 SE) for singular demonstratives in scientific texts

...... 292

Fig. 5.37 Means and standard error (± 1 SE) for definite article "the" in scientific texts296

Fig. 5.38 Means and standard error (± 1 SE) for comparative devices in scientific texts

...... 301

Fig. 5.39 Means and standard error (± 1 SE) for the total number of reference cohesive devices in scientific texts ...... 303

xvii

Fig. 5.40 Means and standard error (± 1 SE) for additive conjunction devices in scientific texts ...... 307

Fig. 5.41 Means and standard error (± 1 SE) for causal and continuative conjunction devices in scientific texts ...... 308

Fig. 5.42 Means and standard error (± 1 SE) for the sum of conjunction devices in scientific texts ...... 309

Fig. 5.43 Means and standard error (± 1 SE) for total reference and conjunction cohesive devices in scientific texts ...... 312

Fig. 5.44 Means and standard error (± 1 SE) for STTR in scientific texts ...... 316

Fig. 5.45 Means and standard error (± 1 SE) for average word length in scientific texts

...... 319

Fig. 5.46 Means and standard error (± 1 SE) for average sentence length in scientific texts

...... 321

Fig. 5.47 Means and standard error (± 1 SE) for passives in scientific texts ...... 323

Fig. 5.48 Means and standard error (± 1 SE) for prepositional phrases with "for" in scientific texts ...... 328

Fig. 5.49 Means and standard error (± 1 SE) for the use of general prepositions in scientific texts ...... 330

Fig. 5.50 Means and standard error (± 1 SE) for the use of the preposition "of" in scientific texts ...... 332

xviii

Fig. 5.51 Means and standard error (± 1 SE) for the total number of prepositional phrases in scientific texts ...... 335

xix

LIST OF TABLES

Table 2.1 Personal pronouns in English ...... 35

Table 2.2 Personal pronouns in Russian ...... 36

Table 2.3 Possessive pronouns in English ...... 36

Table 2.4 Possessive pronouns in Russian ...... 37

Table 2.5 Demonstrative cohesive devices in English ...... 38

Table 2.6 Demonstrative cohesive devices in Russian ...... 38

Table 2.7 Comparison in English...... 40

Table 2.8 Comparison in Russian ...... 41

Table 3.1 The combined framework for the study of the use of cohesive devices and other

global textual features (adapted from Dong and Lan 2010) ...... 82

Table 3.2 Descriptive statistics for the English-language corpus ...... 88

Table 3.3 Russian source texts and their corresponding human ...... 90

Table 3.4 English non-translated literary texts and their authors ...... 94

Table 3.5 Descriptive statistics for the LC...... 96

Table 3.6 Descriptive statistics for the NC ...... 98

Table 3.7 Descriptive statistics for the SC ...... 101

Table 3.8 Calculations used to track cohesive devices and textual features in the corpus

(adapted from Dong and Lan 2010) ...... 104

xx

Table 3.9 CLAWS conventions for the tags used in the process of data collection ...... 105

Table 4.1 Association of 3rd person pronominal cohesive devices with the genre (Literary,

Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA ...... 113

Table 4.2 Pairwise comparisons of 3rd person pronominal cohesive devices across

(Literary, Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing

...... 114

Table 4.3 Descriptive statistics for the total numbers of 3rd person pronominal cohesive devices across genres (N = 50) ...... 114

Table 4.4 Association of possessive pronouns with the genre (Literary, Newspaper,

Scientific) for NT, HT, and MT by one-way ANOVA ...... 117

Table 4.5 Pairwise comparisons of possessive pronouns across genres (Literary,

Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing ...... 117

Table 4.6 Descriptive statistics for the possessive pronouns across genres (N = 50) ..... 118

Table 4.7 Association of demonstratives with the genre (Literary, Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA ...... 121

Table 4.8 Pairwise comparisons of demonstratives across genres (Literary, Newspaper,

Scientific) for NT, HT, and MT by Tukey HSD post hoc testing ...... 121

Table 4.9 Descriptive statistics for demonstratives across genres (N = 50) ...... 122

Table 4.10 Association of the definite article with the genre (Literary, Newspaper,

Scientific) for NT, HT, and MT by one-way ANOVA ...... 124

xxi

Table 4.11 Pairwise comparisons of the definite article across genres (Literary,

Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing ...... 125

Table 4.12 Descriptive statistics for the definite article across genres (N = 50) ...... 125

Table 4.13 Association of comparative devices with the genre (Literary, Newspaper,

Scientific) for NT, HT, and MT by one-way ANOVA ...... 129

Table 4.14 Pairwise comparisons of significant comparative devices across genres

(Literary, Newspaper, Scientific) for NT by Tukey HSD post hoc testing ...... 129

Table 4.15 Descriptive statistics for comparative devices across genres (N = 50)...... 129

Table 4.16 Association of reference cohesive devices with the genre (Literary,

Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA ...... 132

Table 4.17 Pairwise comparisons of reference cohesive devices across genres (Literary,

Newspaper, Scientific) for NT and HT by Tukey HSD post hoc testing ...... 133

Table 4.18 Descriptive statistics for reference cohesive devices across genres (N = 50)

...... 133

Table 4.19 Association of conjunction cohesive devices with the genre (Literary,

Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA ...... 136

Table 4.20 Pairwise comparisons of conjunction cohesive devices across genres (Literary,

Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing ...... 137

Table 4.21 Descriptive statistics for conjunction cohesive devices across genres (N = 50)

...... 137

xxii

Table 4.22 Association of nominalization with the genre (Literary, Newspaper,

Scientific) for NT, HT, MT by one-way ANOVA ...... 140

Table 4.23 Pairwise comparisons of nominalization across genres (Literary, Newspaper,

Scientific) for NT, HT, and MT by Tukey HSD post hoc testing ...... 141

Table 4.24 Descriptive statistics for nominalization across genres (N = 50) ...... 142

Table 4.25 Association of STTR with the genre (Literary, Newspaper, Scientific) for NT,

HT, and MT by one-way ANOVA ...... 145

Table 4.26 Pairwise comparisons of STTR across genres (Literary, Newspaper,

Scientific) for NT, HT, and MT by Tukey HSD post hoc testing ...... 145

Table 4.27 Descriptive statistics for STTR across genres (N = 50)...... 146

Table 4.28 Association of average word length with the genre (Literary, Newspaper,

Scientific) for NT, HT, and MT by one-way ANOVA ...... 149

Table 4.29 Pairwise comparisons of average word length across genres (Literary,

Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing ...... 149

Table 4.30 Descriptive statistics for average word length across genres (N = 50) ...... 150

Table 4.31 Association of average sentence length with the genre (Literary, Newspaper,

Scientific) for NT, HT, and MT by one-way ANOVA ...... 153

Table 4.32 Pairwise comparisons of average sentence length across genres (Literary,

Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing ...... 153

Table 4.33 Descriptive statistics for average sentence length across genres (N = 50) ... 154

xxiii

Table 4.34 Association of passives with the genre (Literary, Newspaper, Scientific) for

NT, HT, and MT by one-way ANOVA ...... 156

Table 4.35 Pairwise comparisons of passives across genres (Literary, Newspaper,

Scientific) for NT, HT, and MT by Tukey HSD post hoc testing ...... 157

Table 4.36 Descriptive statistics for passives across genres (N = 50) ...... 157

Table 4.37 Association of prepositions with the genre (Literary, Newspaper, Scientific)

for NT, HT, and MT by one-way ANOVA ...... 160

Table 4.38 Pairwise comparisons of prepositions across genres (Literary, Newspaper,

Scientific) for NT, HT, and MT by Tukey HSD post hoc testing ...... 161

Table 4.40 Differences in mean scores for the cohesive characteristics and global textual

features found to be significantly different at the 0.05 level for NT, HT, and MT ...... 165

Table 5.1 Association of 3rd person pronominal cohesive devices with the method of text

production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA ...... 177

Table 5.2 Pairwise comparisons of significant 3rd person pronominal cohesive devices

(NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing ...... 177

Table 5.3 Descriptive statistics for 3rd person pronominal cohesive devices in literary

texts (N = 50) ...... 178

Table 5.4 Association of possessive pronouns with the method of text production (NT,

HT, and MT) in the corpus of literary texts by one-way ANOVA ...... 190

Table 5.5 Pairwise comparisons of possessive pronouns (NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing ...... 190

xxiv

Table 5.6 Descriptive statistics for possessive pronouns in literary texts (N = 50) ...... 190

Demonstratives are represented by two groups: ...... 192

Table 5.7 Association of demonstrative cohesive devices with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA ...... 192

Table 5.8 Pairwise comparisons of singular demonstrative cohesive devices (NT, HT, and

MT) for the corpus of literary texts by Tukey HSD post hoc testing ...... 193

Table 5.9 Descriptive statistics for demonstrative cohesive devices in literary texts (N =

50) ...... 193

Table 5.10 Association of definite article "the" with the method of text production (NT,

HT, and MT) in the corpus of literary texts by one-way ANOVA ...... 196

Table 5.11 Pairwise comparisons of definite article "the" (NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing ...... 196

Table 5.12 Descriptive statistics for the use of the definite article "the" in literary texts (N

= 50) ...... 196

Table 5.13 Association of comparative cohesive devices with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA ...... 198

Table 5.14 Descriptive statistics for comparative devices in literary texts (N = 50) ...... 198

Table 5.15 Association of the total number of reference cohesive devices with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA

...... 199

xxv

Table 5.16 Pairwise comparisons of the total number of reference cohesive devices (NT,

HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing ...... 200

Table 5.17 Descriptive statistics for the total number of reference cohesive devices in literary texts (N = 50) ...... 200

Table 5.18 Association of conjunction cohesive devices with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA ...... 203

Table 5.19 Pairwise comparisons of conjunction cohesive devices (NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing ...... 203

Table 5.20 Descriptive statistics for conjunction cohesive devices in literary texts (N =

50) ...... 204

Table 5.21 Association of total reference and conjunction cohesive devices with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way

ANOVA ...... 208

Table 5.22 Pairwise comparisons of total reference and conjunction cohesive devices

(NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing ...... 209

Table 5.23 Descriptive statistics for total reference and conjunction cohesive devices in literary texts (N = 50) ...... 209

Table 5.24 Association of nominalization with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA ...... 211

Table 5.25 Descriptive statistics for nominalization in literary texts (N = 50) ...... 212

xxvi

Table 5.26 Association of STTR with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA ...... 212

Table 5.27 Pairwise comparisons of STTR (NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing ...... 213

Table 5.28 Descriptive statistics for STTR in literary texts (N = 50) ...... 213

Table 5.29 Association of average word length with the method of text production (NT,

HT, and MT) in the corpus of literary texts by one-way ANOVA ...... 216

Table 5.30 Descriptive statistics for average word length in literary texts (N = 50) ...... 216

Table 5.31 Association of average sentence length with the method of text production

(NT, HT, and MT) in the corpus of literary texts by one-way ANOVA ...... 217

Table 5.32 Descriptive statistics for average sentence length in literary texts (N = 50) 217

Table 5.33 Association of passives with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA ...... 218

Table 5.34 Descriptive statistics for passives in literary texts (N = 50) ...... 218

Table 5.35 Association of prepositional phrases with the method of text production (NT,

HT, and MT) in the corpus of literary texts by one-way ANOVA ...... 219

Table 5.36 Pairwise comparisons of prepositional phrases with "for" (NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing ...... 220

Table 5.37 Descriptive statistics for prepositional phrases in literary texts (N = 50) ..... 220

Table 5.38 Association of 3rd person pronominal cohesive devices with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA 225

xxvii

Table 5.39 Pairwise comparisons of significant 3rd person pronominal cohesive devices

(NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing . 225

Table 5.40 Descriptive statistics for 3rd person pronominal cohesive devices in newspaper texts (N = 50) ...... 226

Table 5.41 Association of possessive pronouns with the method of text production (NT,

HT, and MT) in the corpus of newspaper texts by one-way ANOVA ...... 234

Table 5.42 Pairwise comparisons of possessive pronouns (NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing...... 235

Table 5.43 Descriptive statistics for possessive pronouns in newspaper texts (N = 50) 236

Table 5.44 Association of demonstrative cohesive devices with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA 238

Table 5.45 Descriptive statistics for demonstrative cohesive devices in newspaper texts

(N = 50) ...... 238

Table 5.46 Association of definite article "the" with the method of text production (NT,

HT, and MT) in the corpus of newspaper texts by one-way ANOVA ...... 240

Table 5.47 Pairwise comparisons of definite article "the" (NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing...... 240

Table 5.48 Descriptive statistics for the use of the definite article "the" in newspaper texts

(N = 50) ...... 240

Table 5.49 Association of comparative cohesive devices with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ...... 243

xxviii

Table 5.50 Pairwise comparisons of general comparative adjective cohesive devices (NT,

HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing...... 244

Table 5.51 Descriptive statistics for comparative devices in newspaper texts (N = 50) 244

Table 5.52 Association of the total number of reference cohesive devices with the method

of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way

ANOVA ...... 248

Table 5.53 Pairwise comparisons of the total number of reference cohesive devices (NT,

HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing...... 248

Table 5.54 Descriptive statistics for the total number of reference cohesive devices in

newspaper texts (N = 50) ...... 249

Table 5.55 Association of conjunction cohesive devices with the method of text

production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA 251

Table 5.56 Pairwise comparisons of conjunction cohesive devices (NT, HT, and MT) for

the corpus of newspaper texts by Tukey HSD post hoc testing ...... 251

Table 5.57 Descriptive statistics for conjunction cohesive devices in newspaper texts (N

= 50) ...... 252

Table 5.58 Association of the total reference and conjunction cohesive devices with the

method of text production (NT, HT, and MT) in the corpus of newspaper texts by one- way ANOVA ...... 259

Table 5.59 Pairwise comparisons of the total reference and conjunction cohesive devices

(NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing . 260

xxix

Table 5.60 Descriptive statistics for total reference and conjunction cohesive devices in newspaper texts (N = 50) ...... 260

Table 5.61 Association of nominalization with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA...... 263

Table 5.62 Pairwise comparisons of nominalization (NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing...... 264

Table 5.63 Descriptive statistics for nominalization in newspaper texts (N = 50) ...... 264

Table 4.64 Association of STTR with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA ...... 267

Table 4.65 Pairwise comparisons of STTR (NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing...... 267

Table 4.66 Descriptive statistics for STTR in newspaper texts (N = 50) ...... 267

Table 5.67 Association of average word length with the method of text production (NT,

HT, and MT) in the corpus of newspaper texts by one-way ANOVA ...... 269

Table 5.68 Descriptive statistics for average word length in newspaper texts (N = 50) 269

Table 5.69 Association of average sentence length with the method of text production

(NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA...... 270

Table 5.70 Descriptive statistics for average sentence length in newspaper texts (N = 50)

...... 270

Table 5.71 Association of passives with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA ...... 271

xxx

Table 5.72 Descriptive statistics for passives in newspaper texts (N = 50) ...... 271

Table 5.73 Association of prepositional phrases with the method of text production (NT,

HT, and MT) in the corpus of newspaper texts by one-way ANOVA ...... 272

Table 5.74 Pairwise comparisons of prepositional phrases with "of" (NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing ...... 273

Table 5.75 Descriptive statistics for prepositions in newspaper texts (N = 50) ...... 274

Table 5.76 Association of 3rd person pronominal cohesive devices with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA ... 280

Table 5.77 Pairwise comparisons of significant 3rd person pronominal cohesive devices

(NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing ... 281

Table 5.78 Descriptive statistics for 3rd person pronominal cohesive devices in scientific texts (N = 50) ...... 282

Table 5.79 Association of possessive pronouns with the method of text production (NT,

HT, and MT) in the corpus of scientific texts by one-way ANOVA ...... 287

Table 5.80 Pairwise comparisons of possessive pronouns (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing ...... 288

Table 5.81 Descriptive statistics for possessive pronouns in scientific texts (N = 50) ... 288

Table 5.82 Association of demonstrative cohesive devices with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA ... 290

Table 5.83 Pairwise comparisons of singular determiners (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing ...... 290

xxxi

Table 5.84 Descriptive statistics for demonstrative cohesive devices in scientific texts (N

= 50) ...... 291

Table 5.85 Association of definite article "the" with the method of text production (NT,

HT, and MT) in the corpus of scientific texts by one-way ANOVA ...... 295

Table 5.86 Pairwise comparisons of definite article "the" (NT, HT, and MT) for the

corpus of scientific texts by Tukey HSD post hoc testing ...... 295

Table 5.87 Descriptive statistics for the use of the definite article "the" in scientific texts

(N = 50) ...... 295

Table 5.88 Association of comparative cohesive devices with the method of text

production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA ... 299

Table 5.89 Pairwise comparisons of comparative cohesive devices (NT, HT, and MT) for

the corpus of scientific texts by Tukey HSD post hoc testing ...... 299

Table 5.90 Descriptive statistics for comparative devices in scientific texts (N = 50) ... 300

Table 5.91 Association of the total number of reference cohesive devices with the method

of text production (NT, HT, and MT) in the corpus of scientific texts by one-way

ANOVA ...... 302

Table 5.92 Pairwise comparisons of the total number of reference cohesive devices (NT,

HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing ...... 302

Table 5.93 Descriptive statistics for the total number of reference cohesive devices in

scientific texts (N = 50) ...... 302

xxxii

Table 5.94 Association of conjunction cohesive devices with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA ... 305

Table 5.95 Pairwise comparisons of conjunction cohesive devices (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing ...... 305

Table 5.96 Descriptive statistics for conjunction cohesive devices in scientific texts (N =

50) ...... 306

Table 5.97 Association of reference and conjunction cohesive devices with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA

...... 311

Table 5.98 Pairwise comparisons of reference and conjunction cohesive devices (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing ...... 312

Table 5.99 Descriptive statistics for total reference and conjunction cohesive devices in scientific texts (N = 50) ...... 312

Table 5.100 Association of nominalization with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA ...... 314

Table 5.102 Association of STTR with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA ...... 315

Table 5.103 Pairwise comparisons of STTR (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing ...... 316

Table 5.104 Descriptive statistics for STTR in scientific texts (N = 50) ...... 316

xxxiii

Table 5.105 Association of average word length with the method of text production (NT,

HT, and MT) in the corpus of scientific texts by one-way ANOVA ...... 317

Table 5.106 Pairwise comparisons of average word length (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing ...... 318

Table 5.107 Descriptive statistics for average word length in scientific texts (N = 50) . 318

Table 5.108 Association of average sentence length with the method of text production

(NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA ...... 320

Table 5.109 Pairwise comparisons of average sentence length (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing ...... 320

Table 5.110 Descriptive statistics for average sentence length in scientific texts (N = 50)

...... 321

Table 5.111 Association of passives with the method of text production (NT, HT, and

MT) in the corpus of scientific texts by one-way ANOVA ...... 322

Table 5.112 Pairwise comparisons of passives (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing ...... 322

Table 5.113 Descriptive statistics for passives in scientific texts (N = 50) ...... 323

Table 5.114 Association of prepositional phrases with the method of text production (NT,

HT, and MT) in the corpus of scientific texts by one-way ANOVA ...... 326

Table 5.115 Pairwise comparisons of significant prepositional phrases (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing ...... 326

Table 5.116 Descriptive statistics for prepositional phrases in scientific texts (N = 50) 327

xxxiv

Table 6.1 Differences in mean scores for cohesive characteristics and global textual features in the literary corpus ...... 339

Table 6.2 Pairwise comparisons of significant cohesive and global textual characteristics

(NT, HT, and MT) by Tukey HSD post hoc testing in the literary corpus ...... 341

Table 6.3 Differences in mean scores for the cohesive characteristics and global textual features in the newspaper corpus ...... 349

Table 6.4 Pairwise comparisons of significant cohesive and global textual characteristics

(NT, HT, and MT) by Tukey HSD post hoc testing in the newspaper corpus ...... 352

Table 6.5 Differences in mean scores for the cohesive characteristics and global textual features in the scientific corpus...... 362

Table 6.6 Pairwise comparisons of significant cohesive and global textual characteristics

(NT, HT, and MT) by Tukey HSD post hoc testing in the scientific corpus ...... 364

xxxv

DEDICATION

To my son Matthew William McIntyre,

Who was born when this dissertation was being completed.

xxxvi

ACKNOWLEDGEMENTS

I would like to express my deepest gratitude to my long-time advisor, mentor, and colleague Dr. Brian James Baer, whose guidance and encouragement have helped me through the years of the journey called "graduate school"—from the beginning of my

Master's program in Translation to the end of my doctoral studies. His wise teachings and

continuous support, combined with the diverse and stimulating environment of the

Institute of Applied Linguistics at Kent State University, have helped me grow as a

scholar, translator, teacher, and person.

I would like to thank the members of my dissertation committee Dr. Gregory

Shreve, Dr. Erik Angelone, Dr. Andrew Barnes, and Dr. Mikhail Nesterenko for their

thoughtful review of my dissertation and insightful comments during the defense. Their

suggestions helped me to fine-tune this dissertation.

My sincere gratitude goes to the entire faculty and staff of the Department of

Modern and Classical Language Studies for their dedication to the success of their students in our relatively young but bold doctoral program.

I would also like to thank Dr. Alexander Alekseenko for providing invaluable advice on statistics, and my supportive friend Nataliya Alekseenko for steering me towards her, back then, fiancé Alexander when I was in need of such advice.

xxxvii

I would like to thank my parents Yury and Maria Pastushenkov, as well as my

grandparents on both sides of my family, for instilling values of life-long learning in me.

Their great scholarly and pedagogical achievements have inspired my research and teaching career.

A special thank-you goes to my wonderful husband Bill McIntyre, whose love, encouragement, and wisdom have made the successful completion of this dissertation possible.

Finally, I would like to thank everyone else who cared about my research, teaching, and life—my friends all around the world, my academic and translation colleagues, my supportive students, and my up-and-coming linguist brother Dmitry.

Tatyana Bystrova-McIntyre

November 15, 2012

Kent, OH

xxxviii

CHAPTER 1: INTRODUCTION

This dissertation compares the use of cohesive devices and other global textual features

across corpora of three groups of texts produced by different methods: texts written in

English (non-translated texts), texts translated into English from Russian by human translators (human-translated texts), and texts translated into English from Russian by a machine translation tool (machine-translated texts). This study looks at global features of these texts in corpora of three genres: literary, newspaper, and scientific.

I use the term global to refer to textual features, that is, those features that appear throughout an entire text, as opposed to features that appear only at a sub-sentential level.

Cohesive devices are an important subset of such features; they tie textual segments together into a whole by creating "connections among the elements within the discourse"

(Campbell 1995: 5-6). The present study looks at reference and conjunction cohesive devices as identified by the pioneers of cohesion studies Michael Alexander Kirkwood

Halliday and Ruqaiya Hasan (1976). Detailed accounts of these devices and of the notion of cohesion are provided in Chapters 2 and 3. Other global textual features included in this study are nominalization, lexical density, average word length, average sentence length, passives, and prepositional phrases. These features are defined and discussed in

Chapter 2.

1

2

This empirical study applies a more comprehensive research approach than has been used earlier for analyzing global features of texts. It sheds light on global characteristics of translated texts—whether and how human and machine translations differ from non-translated texts—contributing to existing studies of translation universals and laws of translation. Furthermore, by including machine-translated texts as a third dimension of research, in addition to typically studied non-translated and human- translated texts, this study addresses questions of whether translation universals and laws of interference are at work in machine-translated texts to the same extent as in human- translated texts. The results of this study are then used to design practical recommendations for translators, editors of human and machine translations, translation evaluators, and translator trainers.

This chapter presents background for this research, discusses its significance, goals, and research questions, and outlines its structure.

1.1 Background

1.1.1 Global textual features and translation: Text as a culture-specific phenomenon

Text organization and global textual features as its core are important components of successful writing, and have been of interest to linguists for a long time. How do writers or speakers represent their envisioned scenes and stories in a linear mode of text? How do readers or listeners decode such representations in the process of reading? Additionally—

3

and more importantly for the purposes of this study—how is text organization different

across ?

In 1960s, Robert Kaplan introduced the term "contrastive ." While working in an English as a Second Language program, he observed that the writing of

learners of English as a second language often displayed features of global text

organization that seemed flawed to native speakers of English. Related to this

observation, Kaplan also noticed similarities in global textual features in groups of

international students whose native languages were the same or similar (Mohamed-

Sayidina 1993: 7).

These observations became the basis of Kaplan's argument that discourse

organization is a phenomenon that differs from culture to culture (Mohamed-Sayidina

1993: 7). Kaplan suggested that international students writing in English transfer culture-

specific patterns of textual organization from their native languages into their English

writing, echoing the Sapir-Whorf hypothesis of linguistic relativity:

[T]he organization of a paragraph, written in any language by any individual who

is not a native speaker of that language, will carry the dominant imprint of that

individual's culturally-coded orientation to the phenomenological world in which he lives

and which he is bound to interpret largely through the avenues available to him in his

native language (Kaplan 1972: 1).

4

Consonant with Kaplan's contrastive rhetoric and other related theories, a number

of translation scholars have pointed out that cohesive devices and other global features of text organization are language and text-type specific (Hatim and Mason 1990, Baker

1992, Blum-Kulka 2000, Teich and Fankhauser 2004, among others). In their work, translators have to deal with these different specificities, adjusting cohesive and other

global textual features in the target text so as to re-situate the source text in a different

culture.

The language- and culture-specificity of cohesive and other global textual features raises a number of questions related to translation. To what extent do translators adjust global textual features in their work? To what extent do translations reflect the unique textual features of their source texts? To what extent does any of this influence readers' comprehension or interfere with how translations are received in the target culture? How does machine translation cope with culture-specific global features of texts? Do beginning translators differ from expert translators in their approaches to translating global features of text organization?

The present study aims to answer some of these questions by illuminating similarities and differences in the use of cohesive and other global features across non- translated, human-translated, and machine-translated texts. It looks at thirty-two cohesive and other textual parameters across these three methods of text production and in corpora of three genres (literary, newspaper, and scientific). The premise of this study is that although cohesive devices and other global textual features have been understudied in

5

empirical translation research, they are vitally important in the practice and training of translators and editors of human-translated and machine-translated output.

1.1.2 New realities of the translation industry today

The translation industry today is very different from that of the recent past. It is rapidly changing with the increased demands of globalization and swift growth of Internet and information technologies. Research in translation studies must keep up with these changes, focusing on new realities of the translation industry, such as machine translation, computer-assisted translation tools, and more. For this reason, the present study includes machine-translated output as a part of its design and looks at similarities and differences in the use of cohesive devices and other global features across three groups of texts—non-translated, human-translated, and machine-translated. In addition, the study includes three genres, including two non-literary ones (newspaper and scientific).

The sub-sections below deal with the issues of machine translation, (post-)editing, and translation memory tools as realities of the translation industry that contribute to the need to study cohesive and other global textual features of non-translated, human- translated, and machine-translated texts.

1.1.2.1 Machine translation as a reality of modern translation practice

Machine translation (MT) has been around since the 1960s, when it was expected to soon replace human translation. This proved an overly optimistic hope, and research in MT

6

slowed down. Since the late 1980s, with the increased power and decreased price of

computers, MT development is once again on the rise (Hutchins 1995: 433-437).

The growth of online communication, as well as the demands of businesses targeting customers abroad, have contributed to the need for more and faster translation in general, giving a new impetus to MT in particular. For example, Internet users often want new electronic content to be translated instantaneously and at no cost—exactly what such MT tools as GoogleTranslate and Yahoo! BabelFish provide (Gaspari and Hatchins

2007: 200-201). With developments in statistical machine translation and an increased ability to build very large electronic bilingual corpora needed for it, such entities as

Google are constantly improving their MT tools.

Translators should not necessarily fear for their fate. In fact, they might celebrate—according to the U.S. Bureau of Labor Statistics, "employment of interpreters and translators is projected to increase 22 percent over the 2008–18 decade, which is much faster than the average for all occupations." However, ignoring MT is no longer an option. MT has become an integral part of many translators' work. It is now included as a part of major translation tools, such as SDL Trados Studio. Many Language Service

Providers (LSPs) and government organizations hire translators to post-edit MT output, which is often considered to be a faster and cheaper way to provide translations. For instance, SDL, a major LSP, markets its "intelligent machine translation," which integrates its customized MT technology, SDL LanguageWeaver, with human post-

7

editing skills and is claimed to significantly reduce companies' translation budgets and

production times.

Given the explosive growth of MT in recent years, it would seem relevant to

supplement the studies of global features in human-translated texts by similar studies of

MT output. The present study does this. It supplements more traditional studies of human-translated and non-translated texts, including both machine- and human-translated outputs for the Russian source texts studied.

In fact, the inclusion of machine-translated texts into the study constitutes a relative novelty. While a number of studies have shown that translated texts exhibit certain peculiar characteristics when compared with non-translated texts (Blum-Kulka

1986, Eskola 2004, Mauranen 2000, Olohan and Baker 2000, Puurtinen 1995, Schaeffner

& Adab 2001, Tirkkonen-Condit 2004, among others), translations performed by machines have not been included in the majority of these studies. Only recently have researchers started including machine-translated texts into their studies (Khanna et al.

2011, Volansky et al. 2011). With MT on the rise, analyzing machine-translated texts may prove beneficial for training future translators to work with MT and for assessing the quality of MT output. It might also help identify MT output in the first place.

Not surprisingly, recent research does find differences between machine- translated and human-translated texts. In their study of English non-translated texts and

Hebrew into English human-translated and machine-translated texts, Volansky et al.

(2011) find that these three groups of texts display differences in such features as average

8

sentence and word length, passive-verb ratio, type-token ratio, punctuation, etc. They use

this information to design a system that would automatically differentiate between

English non-translated texts, texts translated into English by human translators, and texts

translated into English by MT tools.

Volansky et al. (2011) point out that "better awareness of the translation process

can improve the quality of statistical machine translation (STM) systems," citing works

by Gamon et al. (2005) and Kurokawa et al. (2009) (2). While the present study does not

aim at improving STM directly, its methodology and results might prove useful for this

purpose as well.

Some studies that look at machine-translated texts find that while they differ from human-translated texts, MT may be a reasonable (and free of charge) alternative to human translation, at least for some text-types. For instance, Khanna et al. (2011), who look at medical translations produced by professional translators vs. machines, find that while machine-translated sentences had significantly lower fluency (grammatical

correctness) scores compared with professional translation, there are similarities in terms

of adequacy (information preservation) and meaning (connotation maintenance) scores.

The present study might help illuminate more similarities and differences between non-

translated, human-translated, and machine-translated texts in view of cohesion and other

textual features, and provide insight into the work of translators and editors.

9

1.1.2.2 Editing and post-editing in the translation industry today

Editing is another translation field that may benefit from the results of this study insofar

as cohesion and text organization are vital to editing texts, especially at the macro-textual level. Brian Mossop, in his book Revising and Editing for Translators, talks about

smoothness (his own term for cohesion) as one of the twelve main revision parameters to

be considered by editors, noting that a lack of smoothness may occur due to "careless

of the word order or the connector words … of the source text" (2001: 106).

With the use of MT skyrocketing, the demand for professional translators who are

experienced in post-editing MT output has been growing. Some studies show that, at least

for some text-types, post-editing MT output may be faster than translating from scratch.

For example, in the study by Carl et al. (2011), translation speeds for MT plus post-

editing are found to be, on average, faster than for human translation. The researchers

also report a modest increase in translation quality for post-edited output.

Demand for editors and post-editors clearly exists in today's translation market, as has been indicated by the results of my study of two hundred ProZ.com job postings, collected on ten consecutive days in September-October, 2010. According to the analysis of the services required, while translations were called for in the majority of the postings

(61.5%), checking of translations was the second service in this list (with 22.5% of all postings), and editing (including post-editing and copy-editing) came in third (with 19%)

(Bystrova-McIntyre 2010).

10

The results of the ProZ.com survey support Mossop's claim that editing skills may

help translation graduates find translation-related work in technical writing or editing,

edit their own translations, and deal with poorly written source texts (2001: iv), thus

making translation students more marketable and versatile in the job market. To the

extent that cohesion is among the major revision parameters for editors of translations

(Mossop 2001: 107), translation trainees should be familiar with cohesion as it relates to

translation and the editing of one’s own or other translators’ work. Nonetheless, the role

that cohesion and other textual characteristics in the construction of well-written texts is often underestimated. For instance, out of five leading textbooks on monolingual editing (Baskette, Sissors, and Brooks 1992, Berner 1991, Dragga and Gong 1989,

Harrigan 1993, Samson 1993), only one—Dragga and Gong's Editing: The Design of

Rhetoric (1989)—listed cohesion in its index.

1.1.2.3 Translation memory tools and their effect on translators' work

The ability to work with translation memories (TMs)—databases that store segments of aligned source and target content—is a must in the translation industry today. Among main benefits of TMs are improved consistency of translated content, reduced cost

(especially for large-scale projects), and increased speed.

One can conjecture that the use of TMs, which forces translators to function more at the level of shorter textual segments, rather than at the level of the text as a whole, has an effect on the global textual features of translations. While there appear to be no studies

11

that directly measure differences in cohesion and other textual features in translations involving the use of translation tools vs. those that do not, multiple studies indicate that translation tools influence the process and product of translation (Dragsted 2005, O'Brien

2006a, 2006b, Guerberof 2008, Bowker 2005, Ribaz López 2007, Wallis 2006, among others). For instance, Dragsted (2004, 2006) finds that the segmentation rules used by

TM software "do not correlate well with the way expert human translators 'chunk' a source text" (Garcia 2009: 27)—a finding that might influence global-level features of translations. Pym notes, referring to Dragsted's research of 2004, that "the segmentation imposed by the software makes rather more impositions than authorial segmentation ever did" (2007: 7).

The proposed research adds to the array of studies in this area, but emphasizes differences in cohesive and other global textual features across human-translated, machine-translated, and non-translated texts. Its results may help design better policies and training for today's translation industry, where global teams of linguists extensively work with TM and MT tools.

1.2 Statement of the problem

Gideon Toury (1995) argues that the translation process is influenced by interference on the part of the source text. Additionally, scholars claim that translations tend to exhibit unique characteristics regardless of the source language, a phenomenon known as

"translation universals" (Baker 1996). Following this, it may be hypothesized that the

12

method by which a text is produced (non-translated, human-translated, or machine- translated) will be reflected in the use of cohesive devices and other global textual features. It may also be hypothesized that the text-type will influence the use of such devices.

These influences, however, have not been widely studied across large collections of texts. Moreover, few comprehensive methodologies have been applied to the empirical

study of such influences. In view of recent developments in the language industry

outlined above, more research and improved methodologies are needed to inform the

work of modern translators, editors of human and machine translations, translation

evaluators, and translator trainers. The present study suggests and tests a more comprehensive methodology for studying cohesive and other global features of texts, aiming to illuminate such features across genres in English non-translated, human- translated, and machine-translated texts.

1.3 Goals and components of this study

This study aims to illuminate the use of cohesive devices and other global features in texts produced by three different methods:

1. Texts translated by human translators from Russian into English (human-

translated texts)

2. Texts translated by MT tools from Russian into English (machine-translated texts)

3. Texts written in English (non-translated texts)

13

It looks at cohesive and other global textual features across three text-types:

1. Literary texts

2. Newspaper texts

3. Scientific texts

The goals of this study are as follows: (1) to distinguish global features of non- translated, human-translated, and machine-translated texts; (2) to contribute to the arsenal of research methodologies available to translation scholars, and (3) to design recommendations for translators, editors of machine-translated output, and trainers.

Consistent with its goals, this study has both a discovery component and a hypothesis-testing component. In addition, its methodology is designed to be more comprehensive than others that have come before.

1. Discovery component. The study produces new data on the use of cohesive devices and other textual features in human-translated, machine-translated, and non- translated texts, as well as across text-types. It uses computer tools to study thirty-two variables related to cohesion and global textual features (outlined in Chapter 3) in a relatively large corpus of texts, revealing phenomena not visible to the naked eye. This new knowledge may serve as a more solid basis for formulating future hypotheses in this area of translation research. It will inform the work translators, editors of human and machine translations, translation evaluators, and trainers. In addition, these data may also be useful for researchers looking to programmatically determine if a text is a translation.

14

2. Hypothesis-testing component. The study aims to test one main hypothesis concerning the groups and types of texts involved:

The method of text production (non-translated, human-translated, or machine-

translated) influences the use of cohesive devices and other global textual features

in a specific text-type.

3. Improved methodology. The study tests a more comprehensive methodology than has been used in earlier studies for analyzing cohesion and global textual features in translated and non-translated texts. Empirical translation studies need solid methodologies designed and/or tuned specifically to answer their unique research questions, and this study offers one such methodology for future use by other translation scholars.

1.4 Research questions

This study addresses three major research questions:

1. Does the way texts are produced (non-translated, human-translated, or machine-

translated) influence the use of cohesive devices and other global textual features

in literary texts for the Russian-into-English language pair, and if so, how?

2. Does the way texts are produced (non-translated, human-translated, or machine-

translated) influence the use of cohesive devices and other global textual features

in newspaper texts for the Russian-into-English language pair, and if so, how?

15

3. Does the way texts are produced (non-translated, human-translated, or machine-

translated) influence the use of cohesive devices and other global textual features

in scientific texts for the Russian-into-English language pair, and if so, how?

1.5 Organization of this study

This dissertation consists of six chapters. Chapter One provides a background and introduction to the study, states its research problem, and outlines the purposes and components of this study. The major research questions are also formulated in this chapter. Chapter Two offers a literature review, discussing and defining the key concepts employed in the study and outlining existing research in global textual features in both non-translation and translation disciplines. Chapter Three describes the methods and materials used in this research. Chapter Four provides an overview of text-type specificities for literary, newspaper, and scientific texts based on the data analyzed in this study. Chapter Five reports on the analysis of the cohesive and other textual features included in the study. The data and analysis for each corpus are reported in individual sections of the chapter. Chapter Six provides a summary of the results, discusses implications of the results for those working in the language industry and training future linguists, provides practical recommendations for translators and (post-)editors, and lists limitations of the study. It also outlines future research directions suggested by this study.

CHAPTER 2: LITERATURE REVIEW

This chapter provides an overview of the literature pertaining to studies of cohesion and

other global textual features in both non-translation and translation contexts. It outlines

the main concepts related to differences across translated and non-translated texts, and

discusses definitions of cohesion and other global textual features under study. Since

translation studies draws extensively on research in other fields, such as general

linguistics, psycholinguistics, and anthropology, to name but a few, the chapter reviews

existing research on cohesion and other global features of texts in both monolingual and multilingual settings.

2.1 Non-translated vs. translated texts

This section provides an overview of the research on differences between translated and non-translated texts. It discusses the concept of translation laws and gives an overview of the research into translation universals. Additionally, the section touches upon the common methodology of research into characteristics of translated and non-translated

texts. In conclusion, it discusses some potential applications of this research.

2.1.1 The notion of translationese

As noted above, research in translation studies indicates that there are differences

between translated and non-translated texts (Gellerstan 1986, Blum-Kulka 1986, Eskola 16

17

2004, Mauranen 2000, Olohan and Baker 2000, Puurtinen 1995, Schaeffner & Adab

2001, Hansen 2003, Tirkkonen-Condit 2004, among others). Researchers find differences

in vocabulary use, grammatical features, and other textual characteristics, as discussed in more detail below. Gellerstam (1986), whose study finds differences between non- translated Swedish texts and English-into-Swedish translations, uses the term

"translationese" to refer to these peculiar characteristics of translated texts. Olohan (2004) defines translationese as "a common description for translated language that appears to be influenced by the source language, usually in an inappropriate way or to an undue extent"

(90).

While the term is used by and large pejoratively (Olohan 2004: 90), such influences from the source text do not necessarily have to be viewed as negative. Toury

(1995), whose general laws of translation are discussed below, mentions that such discourse transfer can be both negative and positive (Routledge Encyclopedia of

Translation Studies 1998: 290). Indeed, as the following overview of pertinent research

shows, the discussion of such peculiarities of translated texts typically does not involve their condemnation.

This study looks at differences in the use of cohesive and other global textual features between non-translated, human-translated, and machine-translated texts, and

does not employ the term "translationese" to refer to such differences due to its overall

pejorative connotation. Rather than refer to a set of criteria to determine quality, the

assumption I make in this study is that the closer a translated text approaches the style of

18

a non-translated text, the more it conforms with the norms of the target culture. This

proximity to target culture norms is how I understand quality in the context of this study.

2.1.2 General laws of translation

Many translation researchers view differences between translated and non-translated texts as proof of the existence of general laws of translation. In fact, Gideon Toury (1995) maintains that one of the main goals of descriptive translation studies is to discover such general laws of translation and understand the norms involved in the translation process.

As mentioned above, Toury sets forth two general laws of translation: the aforementioned law of interference and the law of growing standardization. The law of interference leads to translationese, or "fingerprints" of the source language in the target text: "in

translation, phenomena pertaining to the make-up of the source text tend to be transferred

to the target text" (1995: 275). According to the law of growing standardization, "the

special textual relations created in the source text are often replaced by conventional

relations in the target text, and sometimes they are ignored altogether" (Routledge

Encyclopedia of Translation Studies 1998: 290).

According to Volanski et al. (2011), the "combined effect of these laws creates a

… text that partly corresponds to the source text and partly to texts written originally in

the target language but in fact is neither of them" (2). Frawley (1984) refers to such texts

as "hybrids," a term that is later revisited by Adab and Schaeffner (2001) and other

19

scholars. The hybrid nature of translated texts may affect cohesive and other global textual features.

2.1.3 Interference of the source language vs. translation universals

Some researchers trace unique features of target texts (TT) to their source texts' (ST) language (cf. Toury's law of interference). For instance, Johansson and Hofland (2000), in their study of English and Norwegian modal particles, find that particles used in translation are influenced by the source text. Gellerstam (1996) explains the overuse of the verb tillbringa in texts translated into Swedish by the fact that translators have to render the English verb "spend."

Other researchers claim that translations exhibit certain characteristics regardless of the source language (Baker 1996). These universal characteristics are known as translation universals, "one of the most intriguing and controversial topics in recent translation studies," as it was put in John Benjamin's marketing summary of the book

Translation Universals: Do They Exist? Mona Baker (1996) lists four categories of such features of translated texts: explicitation ("the tendency to spell things out in translation"), simplification ("the idea that translators subconsciously simplify the language or message"), normalization ("the tendency to conform to patterns and practices that are typical of the target language, even to the point of exaggerating them"), and leveling out ("the tendency of translated text to gravitate around the centre of any continuum rather than move towards the fringes") (Baker 1996: 176-177).

20

Explicitation, in particular, is prominent in research on translation universals. For instance, Vanderauwera (1985), in a study of Dutch-into-English translations of , finds that translators add cultural information to aid TT reader comprehension, repeat lexical items to avoid ambiguity, and fill in elliptical structures to make translation clearer. Blum-Kulka's explicitation hypothesis (1986) holds that translations may be more redundant than their source texts. Øverås (1998) finds explicitation shifts in Norwegian-

English and English-Norwegian translations, with lexical explicitation occurring more frequently than grammatical explicitation. Olohan and Baker (2000) analyze the use of the optional 'that' in translated and non-translated texts, reporting that it is more frequent in translated texts. They use this finding to support the notion of explicitation in translation.

Simplification is a focus of another group of studies in translation universals.

Laviosa, based on her extensive research into differences between translated and non- translated corpora (1998a, 1998b, 2002), finds that translated texts are characterized by a narrower range of vocabulary, a lower ratio of lexical to running words, and a lower average sentence length. In addition, translated texts tend to under-represent linguistic items of the target language that do not have direct equivalents in the source language

(Tirkkonen-Condit 2004).

Normalization is, in a way, related to simplification, and its existence is supported by several studies. Vanderauwera (1985) suggests that translators may break up sentences or omit certain information in an attempt to meet target language standards. May (1997),

21

in her study of French and Russian translations of Virginia Wolf and William Faulkner, argues that translators normalize punctuation, making it more simple. Øverås (1998) discovers shifts in collocation and metaphorical patterns from unusual to more usual in the target language. Kenny's study of hapax legomena (words that occur only once in a text) finds that translators tend to normalize them (2002).

Finally, leveling out has not been given much attention (Baker 1996: 184). As

Olohan points out, it also "appears to pose [the] most difficulty for measurement" (100).

Olohan suggests looking for reduced variance in a corpus of translated and non-translated texts in order to identify the presence of leveling out. Some studies, however, have not found this to be the case. For instance, Laviosa (1998b) reports low variance for both translated and non-translated fictional texts in terms of lexical density and proportion of high-frequency words, and higher variance in translated texts in terms of sentence length.

The present study hypothesizes that translation universals and Toury's laws of interference may account for differences in the use of cohesive and other global textual features across non-translated, human-translated, and machine-translated texts. The study also raises a question of whether the notion of translation universals is pertinent to machine-translated output, hypothesizing that such universals may be a product of the human brain.

22

2.1.4 Methodology of studies involving translated and non-translated texts

Most studies dealing with characteristic features of translation employ methods from corpus linguistics and use monolingual translational corpora, compiled from texts translated into a certain language (typically English), and comparable (non-translated) corpora of texts originally written in that language (e.g., Laviosa 1998a and 1998b,

Olohan and Baker 2000, among others). With such corpora, researchers investigate the use of certain lexical or grammatical items (e.g., particular words, type-token ratio, nominalization, passive forms, prepositional phrases) or other features of texts (e.g., length of words, sentences, or paragraphs) in translated and non-translated texts. Some researchers (e.g., Tirkkonen-Condit 2002) use such corpora with human participants, with the goal to determine whether readers can differentiate between translated and non- translated texts.

2.1.5 Applicability of studies involving translated and non-translated texts

In addition to shedding light on the nature of translated and non-translated texts, studies

such as this one might inform the work of translators, trainers and trainees, and

translation evaluators. It may also be useful for researchers seeking to differentiate

between translated and non-translated texts (Baroni et al. 2006). Automated methods of

indentifying translated texts could be a useful (self-) assessment tool for those involved in

translation (cf. Baroni et al. 2006), as well as for researchers who extract parallel corpora

from the web, when potential parallel texts for inclusion need to be identified and

23

assessed (Resnik et al. 2003). Baroni et al. also note that automated techniques of

distinguishing between translated and non-translated texts might be useful in identifying

multicultural .

2.2 Cohesion, global textual features, and related terms

This section discusses and defines concepts and terms related to cohesion and other global textual features. It starts off with an overview of text and discourse organization, as well as such related concepts as "genre" and "text-type." Further, it provides a thorough discussion of different definitions of cohesion and its relation to other linguistic terms, such as texture, coherence, and standards of textuality. Also included is a detailed outline of various categories of cohesion as set forth by Halliday and Hasan, as well as other researchers' viewpoints on the issue. In addition to cohesion, other textual features incorporated within the framework of the present study are discussed and defined. These include nominalization, lexical density, average word length, average sentence length, passives, and prepositional phrases.

2.2.1 Text organization and discourse analysis

Cohesion and other global features that occur at the level of text rather than isolated words and phrases at the subsentential level are key attributes of text organization. Text organization refers to ways that producers of written and oral communication structure information within a construct that "form[s] a totality … [with] its own characteristics"

(Caron 1992: 153). Such a "totality" is often referred to as "discourse"—the term that

24

covers communication in its situational and social contexts. The term "organization" can be defined as "the sum of relations which hold between the units of text… and between each unit and the whole" (Goutsos 1997: 138). Studies of text organization fall within the scope of discourse analysis.

Modern discourse analysis includes studies of various levels and dimensions of discourse, as well as of cognitive processes and memory representations related to discourse, and is thus "not a simple enterprise" (van Dijk 1985: 5-10). In its nature, discourse analysis is interdisciplinary, encompassing methods and levels of "analysis of language, cognition, interaction, society, and culture" (van Dijk 1985: 10), which gives it much in common with translation studies, also interdisciplinary in its nature. Van Dijk states that discourse is "a manifestation of all these dimensions of society" (1985: 10)—a statement that might also be made of translation.

Specific definitions of discourse and discourse analysis vary—for instance,

Jaworski and Coupland (1999: 1-3) list ten definitions of discourse drawn from various works. The current study does not aim at providing an overview of such definitions, and understands discourse in a broad sense of communication in its situational and social contexts. In this sense, translation studies may appropriately be added to the list of the disciplines that, according to Tannen (1989: 7f), discourse analysis embraces. (In 1989, her list included linguistics, anthropology, sociology, psychology, literature, rhetoric, philology, speech communication, and philosophy.)

25

As Hoey (1991a) points out, discourse organization is often approached from the

viewpoint of genre or text-type 1 (Halliday and Hasan 1985, Martin 1985, Ventola 1987)

(13). In fact, as Dooley and Levinsohn emphasize, most analysis that is done on discourse

"can only be genre-specific," because "many linguistic observations about a given text

cannot be generalized"—they only hold for a specific text-type (2001: 7). As Longacre

(1996) puts it,"[t]he linguist who ignores discourse typology can only come to grief" (7).

The concept of genre or text-type is central to translation studies—translators train and

work within text-types, as do writers, typically specializing in only a few. The text-types

selected for the present study include modern , newspaper commentary devoted to

international news, and scientific writing.

The present study focuses on cohesive and other global textual features and

analyzes them in the context of translation, looking at non-translated, human-translated,

and machine-translated texts of different text-types. The framework for the present study

includes cohesive and other textual features relevant to translator's competence

(Campbell 1998), such as nominalization, average word length, lexical diversity (e.g.,

1 The two terms are often used interchangeably. In the context of translation studies in the U.S., the term "text-type" seems to be preferred.

26

type/token ratio), passives, and prepositional phrases. The study limits its scope to written

discourse.

2.2.2 Cohesion as a linguistic phenomenon

Cohesion is a linguistic category that has interested linguists, especially those working

with the English language, for decades—breaking ground with pioneering works of

Gleason (1968), Hasan (1968), Quirk et al. (1972), Enkvist (1973), Halliday and Hasan

(1976), Gutwinski (1976), de Beaugrande (1980), Hoey (1983), and other, more recent,

scholars. The notion of cohesion "is easily perceived but not easy to define" (Stoddard

1991: 13). This study adapts a multidimensional model of cohesion based on the definitions discussed below.

Cohesion is a unifying mechanism that ties textual segments, such as phrases, sentences, or paragraphs, into a whole by "connections among the elements within the discourse" (Campbell 1995: 5-6). According to Gutwinski, who was among the first scholars to devote considerable attention to cohesion, these connections, or relations, occur on the grammatical stratum, while being "signaled by certain grammatical and lexical features reflecting discourse structure on a higher … stratum" (1976: 28). These features represent "textual connectivity of sentences and clauses" (Gutwinski 1976: 28).

Another pioneer of cohesion and textuality studies, Robert de Beaugrande, argues that cohesion includes "the procedures whereby surface elements appear as progressive occurrences such that their sequential connectivity is maintained and made recoverable"

27

(1980: 19). In LeTourneau's words, "cohesion … denotes the 'syntax of texts' with respect to the conformity of syntactic structure to the demands of information structure" (2007:

145).

The monumental work on cohesion in English is that of Halliday and Hasan

(1976), and many works draw on their extensive description. Halliday and Hasan stress that the concept of cohesion is a semantic one—"it refers to relations of meaning that exist within the text, and that define it as a text" (1976: 4). Cohesion occurs “where the interpretation of some element in the text is dependent on that of another" (Halliday and

Hasan 1976: 4). Thus, we cannot successfully decode a cohesive element in a text without awareness of another element that may occur earlier or later.

Following the pioneers in the study of cohesion, The Handbook of Discourse

Analysis combines their views and defines cohesion as "the set of resources for constructing relations in discourse which transcend grammatical structure" (2001: 35). It can be seen as "residing in the semantic and grammatical properties of the language"

(Bex 1996: 91). Therefore, the term "cohesion" is two-fold: it includes both the observable surface structure markers and their cohesive functions that contribute to creating what is referred to in linguistics as "texture."

Halliday and Hasan are sometimes criticized for restricting the use of the term

"cohesion" only to relations that occur across sentence boundaries (Herbst 2010: 284), thus neglecting intra-sentential cohesive ties. However, as Herbst (2010) clarifies, "it is important to realize that they only do this in order to focus on the textual aspect of

28

cohesion" (284). In fact, as Herbst further notes, Halliday and Hasan themselves

emphasize that the "parts of a sentence or a clause 'cohere' with each other" and so

display texture (1976: 6). Later, Halliday (1985) himself suggests the 'clause complex' as the basic unit for studying cohesion. Following this viewpoint, which is shared by many researchers (notably, de Beaugrande and Dressler 1981: 50), I will use the term

"cohesion" to apply to both inter- and intra-sentential relations in a text.

2.2.3 Cohesion and texture

A text is "any passage, spoken or written, of whatever length, that forms a unified whole," and any such passage is characterized by texture (Halliday and Hasan 1976: 1).

Texture, achieved through cohesion, among other things, thus may be defined as "the property of a text of being an interpretable whole (rather than unconnected sentences)"

(Teich and Fankhauser 2004: 131). Texture is the process during which the flow of meaning is channeled into a digestible current of discourse "instead of spilling out formlessly in every possible direction" (Halliday 1994: 311). Cohesion is an important element of texture since it helps the reader to relate phrases, sentences, and paragraphs within a meaningful whole of a text. As Koch puts it, "cohesion is the text-forming component of the linguistic system" (2001: 2).

2.2.4 Cohesion vs. coherence

Cohesion uses the linguistic level of texts to relate "successive elements which constitute discourse" (Caron 1992: 161). This differentiates cohesion from coherence: the latter uses

29

the conceptual level to relate textual elements and "involves connections between the

discourse and the context in which it occurs" (Campbell 1995: 5). Koch's definition is

more succinct: "cohesion is described as a textual phenomenon whereas coherence is a mental one" (2001: 2). In Blum-Kulka's words, coherence is "a covert potential meaning relationship among parts of a text, made overt by the reader or listener through processes of interpretation" (1986: 298-299). Neubert & Shreve (1992: 94) describe a coherent text as having "an underlying logical structure that acts to guide the reader through the text."

The two terms are closely related. They are cognates, going back to the Latin cohaerēre, "to cling" or "to stick." As we have already seen, and as Dooley and

Levinsohn point out, "cohesion … can be defined briefly as the use of linguistic means to signal coherence," and these linguistic means serve as "clues to assist the hearers [or readers, in our case] in coming up with an adequate mental representation [or coherence]"

(2001: 27). These terms have been used in this way by many scholars—Grimes 1975,

Halliday and Hasan 1976, de Beaugrande and Dressler 1981, Brown and Yule 1983, among others.

One important difference between these two phenomena is that "coherence also works without cohesion but not the other way around" (Koch 2001: 2). Koch's example

of a non-coherent text that has several cohesive ties illustrates this difference very well:

"Father was home. Home is here. Here is there. There was mother" (2).

Since the two terms are interrelated, some scholars consider them confusing—for instance, Mossop (2001) suggests replacing them with more self-explanatory terms,

30

"smoothness" (to refer to cohesion) and "logic" (to refer to coherence). The present

research uses the traditional terms, originally borrowed by translation studies from

linguistics, to be consistent with the majority of literature on the subject.

2.2.5 Standards of textuality

Discussions of cohesion, coherence, and texture frequently involve the seven standards of

textuality set forth by de Beaugrande and Dressler in their seminal work Introduction to

Text Linguistics (1981). The seven standards of textuality include cohesion (a network of

surface relations), coherence (a network of conceptual relations), intentionality

(concerned with text producers and their intentions), acceptability (concerned with the reader's ability to interpret the text), informativity (based on the balance of new and given information), situationality (concerned with the relevance of a text to its context), and ("relationship between a given text and other relevant texts encountered in

prior experience" (Neubert and Shreve 1992: 117), which, for instance, allows readers to

recognize a poem as a poem, etc.).

The standards of textuality are of direct relevance to translation studies, especially

when these studies focus on texts as units of translation (rather than smaller units, such as

sentences or paragraphs). The present research closely examines one of these textual

standards—cohesion.

31

2.2.6 Definition of cohesion in the present study

When cohesion is discussed in the framework of translation studies, its language- and culture- specificity is of vital importance. This fact is reflected in some definitions of cohesion in translation studies. For instance, Blum-Kulka defines cohesion as "an overt relationship holding between parts of the text, expressed by language specific markers"

(299).

For the purposes of this study, cohesion will be defined as a set of overt, language-specific resources that tie a text together at a global level. As per Halliday and

Hasan (1976), such linguistic resources in English consist of five distinct categories— reference, substitution, ellipsis, conjunction, and lexical cohesion (13). These categories are described in more detail in 2.2.7. The present study looks at reference and conjunction cohesive devices only, since they are most suitable for an automated corpus analysis. Other global textual features included in this study are nominalization, lexical density, average word length, average sentence length, passives, and prepositional phrases. These features are defined and discussed in Chapter 2.

Following the path set forth by Kaplan's contrastive rhetoric, this dissertation assumes that cohesion differs across languages and cultures. As Dooley and Levinsohn

point out, "[e]ach language will, of course, have its own range of devices which can be

used for cohesion" (2001: 27). Halliday and Hasan based their description on English-

language writing, and their framework for the study of cohesion may not be as well-

suited for other languages. However, since the present study deals only with the texts

32

translated into English or originated in English, the use of Halliday and Hasan's

categorization is appropriate.

It may be important to note that the present study does not deal with possible

expansion of the definition of cohesion—e.g., the inclusion of such features as syntactic

parallelism, functional sentence perspective (which deals with information arrangement),

or graphic devices (such as typography, enumeration, or chart types), as suggested by

Campbell (1995: 7) and other scholars. Such an expansion may ultimately be useful, since Halliday and Hasan's arguments were based mostly on their analysis of written literary works; however, it is outside the realm of this research.

2.2.7 Categories of cohesion

Halliday and Hasan's categorization has been used and expanded by other scholars. For example, Dooley and Levinsohn suggest six types of cohesion categories that they consider to be cross-linguistic (2001: 27). They include identity, lexical relations, morphosyntactic patterns, signals of relations between propositions (conjunctions), and intonation patterns. The categories of identity, lexical relations, and conjunctions overlap with Halliday and Hasan's categories of lexical repetition and replacement, reference

(pronouns), substitution, ellipsis, and conjunction. Intonation is a category that is difficult to study in written texts. For these reasons, Halliday and Hasan's categorization, tested by time and practice, was chosen for the current research.

33

Halliday and Hasan distinguish between two main types of cohesion—

grammatical cohesion and lexical cohesion. These two main types are further broken

down into concrete groups of cohesive resources.

2.2.7.1 Grammatical cohesion

Grammatical cohesion is realized through overt grammatical means of a language.

Halliday and Hasan name three of their five categories as being purely grammatical

cohesive ties: reference, ellipsis, and substitution. Conjunction combines both

grammatical and lexical features.

2.2.7.1.1 Reference

Reference is a property in which an item in a text (a presupposing element), "instead of

being interpreted semantically in [its] own right," requires recourse to another item (a

presupposed element) (1976: 31). It is "the most applied cohesive device in texts" (Koch

2001: 6). In English, reference is achieved through pronouns, demonstratives, the definite

article, comparatives, and such adverbs as "here," "there," "now," and "then."

Depending on whether the presupposed element occurs within the text or outside

of it, reference is endophoric (textual) or exophoric (situational). As Thompson puts it,

exophoric references point "outwards to the world," while endophoric references point

"inwards to the text" (1996: 149). Exophoric references can refer to the context of the

situation or discourse (e.g., the use of the first and second person pronouns), to assumed

cultural knowledge (e.g., "the current president of the United States"), or another artistic

34

work. Exophoric (situational) references are excluded from the scope of the present work since this study uses automated corpus analysis tools and thus can only focus on overt grammatical relationships within a given text that can be detected by a computer program.

Endophoric, or textual, references are among the main cohesive phenomena included in the present research. They may be anaphoric or cataphoric (Halliday and

Hasan 1976: 33); in the case of an anaphoric reference, the presupposed item precedes the presupposing item, while in the case of a cataphoric reference, it follows the presupposing item. Koch observes that in written discourse, "anaphoric reference is more often used than cataphoric reference" (2001: 4).

Halliday and Hasan further distinguish reference as being personal, demonstrative, or comparative (1976: 37). Personal reference is "reference by means of function in the speech situation" (37) and is created with the help of personal pronouns, possessive determiners, and possessive pronouns. As mentioned above, the first and the second person personal pronouns are often exophoric (situational) and refer to discourse participants outside of the text (speaker, addressee(s), writer, or reader(s)) (51). Third person pronouns tend to be endophoric and refer to something immediately within the text, and are thus cohesive (51). Thus, only third-person pronouns are included in the scope of this research.

Personal pronouns seem to be common cohesive devices in both English and

Russian. They help writers avoid unnecessary repetitions. However, Russian is

35

characterized by grammatical gender (i.e., inanimate singular nouns carry gender

distinction (masculine, feminine, or neuter), so the noun цветок ('flower') is a masculine

noun and thus requires a masculine personal pronoun (он) when the writer resorts to reference. English inanimate objects do not carry gender distinction (with very few exceptions, such as "ship," which is often referred to as "she"), and thus can be referenced by only one pronominal form when singular—"it."

Tables 2.1 and 2.2 illustrate nominative forms of personal pronouns for English and Russian. As mentioned above, only third person personal pronouns are included in this study.

The use of possessive pronouns in notably different in English and Russian.

Possessive pronouns or other modifiers are often obligatory in English (e.g., "He took his hat off" – the sentence without "his" would be grammatically incorrect in English). In

Russian, however, it is not only permissible but also more typical to have a noun without any explicit modifier (e.g., compare a translation of this same sentence in Russian "Он

снял шляпу"—with no modifier before the noun шляпа ('hat').

Table 2.1 Personal pronouns in English Singular Plural First person I we Second person you you Third person he, she, it they

36

Table 2.2 Personal pronouns in Russian Singular Plural First person я мы Second person ты вы Third person он, она, оно они

Moreover, Russian has a particular possessive pronoun свой, which has no direct equivalent in English, and can approximately be rendered with the English as "(one's) own." It may be used with all persons but is especially common (and often obligatory) with the third person. The use of other possessive pronouns in place of свой might cause differences in meaning. For instance, "Он взял со стола свою шляпу" may be rendered in English as "He took his (his own) hat from the table" (note the use of свой in Russian in place of the English "his"). If we replace свой with the Russian possessive pronoun

его, the meaning will change, and the Russian sentence will indicate that "he took his

(someone else's) hat from the table."

Tables 2.3 and 2.4 illustrate nominative forms of English and Russian possessive pronouns. For Russian, only masculine forms are provided as examples.

Table 2.3 Possessive pronouns in English Singular Plural First person my our Second person your your Third person his / her / its their

37

Table 2.4 Possessive pronouns in Russian Singular Plural First person мой наш Second person твой ваш Third person его / её / его их свой свои

Demonstrative reference is realized "by means of location, on a scale of

proximity" in relation to the speaker (space) or the moment of speech (time) (Halliday

and Hasan 1976: 37-38) and is "a form of verbal pointing" (57). In most cases, according

to Koch, "demonstratives signal that something was just mentioned" (2001: 5). In terms of semantics of demonstrative reference, Halliday and Hasan distinguish between the dimension of "near" and the dimension of "not near" (1976: 38). The adverbial demonstratives of place and time ("here," "there," now, and "then") typically function as

adjuncts in a clause and occur outside of a noun phrase (or a "nominal group," per

Halliday and Hasan), while nominal demonstratives ("this," "these," "that," "those," and

"the") tend to occur as elements of a nominal group (57-58). The English definite article

may be described as "the most neutral item amongst the demonstratives" (Thompson

1996: 150); it "merely indicates that the item in question is specific and identifiable; that

somewhere the information necessary for identifying it is recoverable" (Halliday and

Hasan 1976: 71).

38

English demonstratives (excluding the definite article "the," which does not exist

in Russian) are summarized in Table 2.5, and Russian demonstratives are summarized in

Table 2.6.

Table 2.5 Demonstrative cohesive devices in English Demonstrative pronouns Adverbs Category Singular Plural Place Time Near this these here now Not-near that those there then

Table 2.6 Demonstrative cohesive devices in Russian Demonstrative pronouns Adverbs Category Singular Plural Place Time Masculine Feminine Neuter Near этот эта это эти тут сейчас Not-near тот та то те там потом

The major differences between the English and Russian demonstratives include:

1) Russian demonstratives can be masculine, feminine, or neuter (этот

мужчина ('this man'), эта женщина ('this woman'), этот цветок ('this flower').

2) Russian demonstrative pronouns are declined, and have different forms in different grammatical cases.

3) Russian adverbs of place have a different form for location and direction (тут

('here' as location)—сюда ('here' as direction); там ('there' as location)—туда ('there' as direction).

39

4) Russian has no definite or indefinite articles; thus, the concept of definiteness has to be realized by other means, such as word order or the explicit demonstrative pronouns listed above. This difference may influence translators working from Russian into English, since they have to introduce articles or replace Russian demonstratives with

English definite articles.

Comparative reference is realized with the help of identity, similarity, or difference (in the case of general comparison), or numerative and epithetic expressions

(in the case of particular comparison) (Halliday and Hasan 1976: 76-77). Any comparison "includes two things that are being compared; and any comparative attached to one entity or concept thus implies the existence of the other entity or concept"

(Thompson 1996: 149). Some examples of general comparison include the use of "same,"

"equal"," such," "similar," "other," "different," etc. Particular comparison is realized with the help of numerative comparative devices (e.g., "more," "fewer," "less") or epithetical comparatives (e.g., comparative adjectives, such as "better," "worse," "larger," "smaller,"

"more difficult," "less difficult," etc.).

Both Russian and English have morphological and analytical comparatives. The possibility of a morphological comparative in English often depends on prosody (the number of syllables—e.g., the comparative adjective "nicer" is possible, while

"beautifuller" is not), while in Russian, this is not the case. English morphological comparatives can be used as attributes ("a nicer dress"), while Russian morphological

40

comparatives cannot be used as attributes. (Russian does not allow for such expressions

as "Это красивее платье.")

Although involved analysis of differences and similarities between English and

Russian is outside the scope of the present research, such examples illustrate that there

are differences substantial enough to pose a potential challenge.

General and particular comparison in English and Russian are illustrated in Tables

2.7 and 2.8. These tables represent re-arranged versions of the corresponding tables in

Halliday and Hasan (1976: 76) and Simmons (1981: 66), respectively. For Russian, only the singular masculine form is provided for adjectives and determiners that change their form according to the number and gender of the corresponding nouns. Both adjectival and adverbial forms are provided in the same cell, in keeping with Halliday and Hasan's approach.

Table 2.7 Comparison in English General comparison Particular comparison Numerative Identity Similarity Difference Epithet (quality) (quantity) comparative more, fewer, adjectives & less, further, adverbs (e.g., other, same, equal, such, similar; additional; better); different, else; identical; so, similarly, so-, as-, so-, as-, more-, differently, identically likewise equally- + less-, equally- + otherwise quantifier (e.g., comparative adj. so many) & adverbs (e.g., equally good)

41

Table 2.8 Comparison in Russian General comparison Particular comparison Numerative Epithet Identity Similarity Difference (quantity) (quality) лучший, худший, разный, compound похожий, различный, больше, comparative тот же самый, подобный, другой, иной; меньше, еще; adjectives, одинаковый; сходный; разнообразно, такой + равно, так; так + quantifier подобно, различно, по- adjective; похоже, (e.g., так то же самое другому, иначе так + adverb сходно много) лучше, хуже, заранее, позже, etc.

2.2.7.1.2 Substitution

Substitution is the second source of grammatical cohesion distinguished by Halliday and

Hasan. It is defined as "the replacement of one item by another" (1976: 88). Halliday and

Hasan devote a considerable amount of attention to various instances of substitution in

English (88-141). They also point out differences between substitution and other sources

of cohesion, such as reference and ellipsis (see next section). They emphasize that

substitution is "a relation between linguistic items, such as words and phrases; whereas

reference is a relation between meanings" (89).

Substitution in English can be nominal (achieved by the use of "one/ones"

or "the same" in place of a noun phrase, as in "We have no coal fires; only wood ones"),

verbal (realized with the help of "do"/"did" in place of a verb, as in "Never a woman in

42

Windsor knows more of Anne's mind than I do2"), and clausal (realized through the use of "so" and "not," when they replace an entire clause, as in "'Is there going to be an earthquake?' ' It says so'") (Halliday and Hasan 1976: 91-130).

According to Simmons, substitution is "not a viable category of cohesion in

Russian" (1981: 64). Since this study centers on the usefulness of automated tools, and substitution is little tractable to those tools, it is outside the scope of this study.

2.2.7.1.3 Ellipsis

Ellipsis, the third source of grammatical cohesion in English per Halliday and Hasan, is similar to substitution – "it can be defined as substitution by zero" (1976: 89). It occurs when an item is omitted and no tangible substitution happens. In the case of ellipsis,

"there is a presupposition, in the structure, that something is to be supplied, or

'understood,'" and a sense of incompleteness is present (144). In most cases, according to

Halliday and Hasan, "the presupposed item is present in the preceding text" (144).

Halliday and Hasan distinguish between three types of ellipsis: nominal, verbal, and clausal (cf. types of substitution). Nominal ellipsis occurs within nominal groups, when a modifier in such a group replaces the noun and starts functioning as a head noun

2 Halliday and Hasan quote Shakespeare's The Merry Wives of Windsor.

43

(Halliday and Hasan's example from Lewis Carroll is "Four other Oysters followed them,

and yet another four" (148)). Verbal ellipses occur within a verbal group (e.g., "'Have you

been swimming?' 'Yes, I have,'" in which case, the verbal form "swimming," from the

present perfect continuous tense "have been swimming," is elided (167)). Finally, clausal

ellipses are observed when a modal or propositional element of a clause can be elided.

According to Halliday and Hasan, a modal element of a clause includes the subject and

the finite element of the verb phrase, while a propositional element includes the

remaining parts of the verb phrase, as well as objects, complements, or adjuncts (197).

For instance, in "'What was the Duke going to do?' 'Plant a row of poplars in the park,'"

the modal element is omitted; while in "'Who was going to plant a row of poplars in the

park?' 'The Duke was,'" the propositional element is omitted (197-198).

While Halliday and Hasan emphasize the difference between ellipsis and substitution, some researchers account for them as one category. For instance, Thompson describes two types of ellipsis, ellipsis proper ("a gap") and substitution (where a gap is filled with "a substitute form") (1996: 153). This difference in categorization little impacts the present research. It illustrates an eternal tendency of scientists to either

"lump" or "split" categories and sub-categories.

Russian is rich in ellipsis. Comparing the use of ellipses in Russian into English translations (or the reverse), however, lies outside the goals of this study. While this study focuses on the usefulness of the applicability of a computer-based approach to the

44

study of many cohesive devices, the study of ellipses would likely require considerable

qualitative analysis.

2.2.7.1.4 Conjunction

Conjunction is yet another cohesive device discussed by Halliday and Hasan, and is quite different in nature from reference, substitution, or ellipsis. According to Halliday and

Hasan, the cohesive function of conjunctions is indirect and is realized "by virtue of their

specific meanings," which "presupposes the presence of other components in the

discourse" (226). Conjunctions relate "linguistic elements that occur together in

succession," thus creating ties between segments of text (Halliday and Hasan 1976: 227).

They combine "any two textual elements into a potentially coherent complex semantic unit" (Thompson 1996: 156).

While reference, substitution, and ellipsis are "clearly grammatical" because they involve closed systems (e.g., such systems as those of person, number, proximity, degree of comparison, or presence/absence), conjunction is "on the border-line of the grammatical and lexical" (303). As Halliday and Hasan point out, "the set of conjunctive elements can probably be interpreted grammatically in terms of systems, but … some conjunctive expressions involve lexical selection as well, e.g., "moment" in "from that

moment on" (303-304).

Hoey (1991a), among others, suggests discounting conjunction as a cohesive tie

on the grounds of "its quite different function in text formation" (9), pointing out that it is

45

"better treated as part of a larger system of semantic relations between clauses" (5).

Regardless, in the current research, conjunction will be included, since conjunction is still

part of textual organization, and may yield interesting results.

In Halliday and Hasan's Cohesion in English, four types of conjunction are

identified: additive, adversative, causal, and temporal (238). Additive conjunction is "a

generalized semantic relationship in the text-forming component of the semantic system"

(234), more commonly known as coordinating conjunction, such as that realized with

"and" or "besides." Adversative conjunction is a relation used as "contrary to expectation" (250). Some examples of adversative conjunctions include "but," "yet,"

"however," "(al)though," "nevertheless," etc. Causal conjunction is a cause-effect relation

(of reason, result, or purpose) and is expressed by such examples as "so," "thus," "hence,"

"therefore," etc. (256). Lastly, temporal conjunction is a relation of sequence in time, when one element in a text "is subsequent to the other" (261). Examples of temporal conjunctions include "then," "thereupon," "later," etc.

Conjunctions are abundant in both English and Russian. Simmons suggests accepting Halliday and Hasan's categorization of conjunction for the Russian language

(1981: 69). As examples, Simmons lists the Russian и ('and') as additive conjunction,

однако ('however') as adversative conjunction, так ('so') as causal conjunction, and

потом ('then') as temporal conjunction. Since the present study concentrates on English

texts only, further analysis of conjunction in Russian is not relevant.

46

2.2.7.2 Lexical cohesion

Unlike grammatical cohesion, lexical cohesion is realized through lexis, or vocabulary

(Halliday and Hasan 1976: 318). Since the present study focuses on the analysis of grammatical cohesion (including conjunction), the following overview of lexical cohesion is tangential, and therefore will be kept brief. That said, lexical cohesion is a rich topic worthy of its own study, and may be tractable with corpus tools. It is a tempting target for future studies.

Halliday and Hasan differentiate between two aspects of lexical cohesion— reiteration ("the repetition of a lexical item, or the occurrence of a synonym of some kind, in the context of reference; that is, where the two occurrences have the same referent") and collocation (the use of "a word that is in some way associated with another word in the preceding text, because it is a direct repetition of it, or is in some sense synonymous with it, or tends to occur in the same lexical environment") (1976: 318-319). In Halliday and Hasan's framework, collocations point to some semantic relationships and include, for example, superordinates, hyponyms, and antonyms.

Cohesive repetitions do not necessarily have to be lexical. Repetitions occur at the clause/sentence level as well. Some scholars (e.g., Gutwinski 1976 and Gleason 1965) use the term "enation" for the repetitions of syntactically similar sentences. Gleason states that two sentences are "enate" if "they have identical structures, that is, if the elements

(say, words) at equivalent places in the sentences are of the same classes, and if constructions in which they occur are the same" (1965: 199). Conversely, Enkvist terms

47

the repetition of syntactically and phonologically similar clauses and sentences "iconic

linkage" (1973). Other scholars (e.g., Quirk et al. 1972 and James 1983) use such terms

as "formal parallelism" or "structural parallelism."

Gutwinski points out that the cohesive function of enation (or iconic linkage, or

structural parallelism) may be reinforced by lexical cohesion, as well as other features of

grammatical cohesion (1976). He also points out that syntactic similarity, or enation, may

be complete or partial.

It should be noted that the concept of collocation defined by Halliday and Hasan

in their 1976 work differs from the one widely used in corpus linguistics, which dates

back to the works of Firth (1957) and Sinclair (1966), with Firth's famous quote "You

shall know a word by the company it keeps" (11). In corpus linguistics, collocations are

defined as "characteristic co-occurrence patterns of words," or, simply put, "words that

'go together' or words that are often 'found in each other's company'" (Bowker and

Pearson 2002: 32). In this interpretation, they are of major interest to corpus linguistics, since they represent "one type of word behavior that can be identified with the help of a corpus" (32). As Hoey (1999) puts it, collocation "as the relationship a lexical item has with items that appear with greater than random probability in its (textual) context … is in principle statistically demonstrable (as long as one processes enough text)" (7-8).

In her chapter in Understanding Reading Comprehension (1984), Hasan re-works the earlier model of lexical cohesion from 1976, developing such sources as repetition, synonymy, antonymy, hyponymy, and meronymy. Among other researchers who devoted

48

attention to lexical cohesion were Winter (1977), Francis (1985), Hoey (1991a),

McCarthy (1991), Martin (1992, 2001), and Matthiessen (together with Halliday, 1999).

Francis (1985) focused her attention on anaphoric nouns; Hoey (1991a) related "lexical

patterning to how lexical cohesion operates over larger stretches of text"; McCarthy

(1991) discussed lexical cohesion and discourse-organizing words; Winter (1977) focused on the anaphoric function of lexis, while Halliday and Matthiessen (1999) continued developing an ideational semantics (Lexical Cohesion and Corpus Linguistics

2006: 2). Martin combined Hasan's categories with Halliday and Hasan's earlier model and integrated the ideas of some of the other scholars mentioned above, proposing a

modular perspective for analyzing cohesion within a broader framework for analyzing discourse (Martin's contribution to The Handbook of Discourse Analysis, 2001). In addition, collocation was "factored out into various kinds of 'nuclear' relation, involving elaboration, extension, and enhancement (as developed by Halliday 1994 for the clause complex)" (2001: 38). Martin's term for lexical relations deployed "to construe institutional activity" was "ideation" (38).

Halliday and Hasan's treatment of lexical cohesion is considered insufficient by some linguists. For instance, in their introduction to Lexical Cohesion and Corpus

Linguistics, Flowerdew and Mahlberg (2009) point out that in Halliday and Hasan's

detailed description of different sources of cohesion in English, they give the shortest

treatment to lexical cohesion ("less than twenty pages," as Hoey (1991a) notes). In her

chapter in the said volume, Mahlberg states that lexical cohesion should be assigned a

49

more central role in linguistic research (2009: 105). She is not the first to take this stance.

For instance, Stotsky (1983) asserts that Halliday and Hasan's analysis does not

adequately describe the types of lexical ties found in written texts and proposes dividing

lexical cohesion into semantically-related words (e.g., words related by repetition,

synonymy, inclusion into an enumeration or a set) and collocationally-related words

(related through co-occurrence in contexts).

Hoey, who devoted a considerable amount of attention to lexical cohesion, observes that lexical cohesion "is the single most important form of cohesive tie" (1991a:

9). He illustrates it with Halliday and Hasan's own sample analysis, where lexical cohesion accounts for over forty percent of ties they identify (1991a: 9). According to

Hoey, "[l]exical cohesion is the only type of cohesion that regularly forms multiple relationships" between textual elements (1991a: 10). Hoey argues that the study of cohesion in texts is much more about lexis than previously believed: "the study of cohesion in text is to a considerable degree the study of patterns of lexis in text" (1991a:

10). In his Patterns of Lexis in Texts (1991a), Hoey provides a more detailed treatment of lexical repetition than do Halliday and Hasan.

Some scholars, such as Mahlberg (2009), argue for a different approach to studying cohesion categories, due to the fact that we should not assume that "lexical and grammatical phenomena can be clearly distinguished" (103). Mahlberg calls for "a corpus theoretical approach" to the description of cohesion, where cohesion is seen in a new light: "cohesion is created by interlocking lexico-grammatical patterns and overlapping

50

lexical items" (103). However, as the classification established in 1976 by Halliday and

Hasan has been far more widely applied, the present study uses it for its research

framework.

2.2.8 Other global textual features

In addition to studying cohesive devices in translated and non-translated texts, the present

study looks at other features that characterize texts globally. The additional features are

included following Dong and Lan's model for translation evaluation (2010). In their

model, Dong and Lan combine the categories of grammatical cohesion set forth by

Halliday and Hasan (1976) with Campbell's (1998) categories for evaluating textual

competence of translators. In his research on translation into a second language,

Campbell suggests that textual competence of translators may be empirically studied by

measuring such textual features of target texts as nominalization, average word length,

lexical diversity, passives, and prepositional phrases. The additional parameter of average

sentence length is also included into the framework, since some scholars find that

translated and non-translated texts exhibit significant differences in average sentence

length (Laviosa 1998a, 1998b).

The rationale for including other textual features into the present analysis, combining them with Halliday and Hasan's grammatical cohesion categories, is two-fold.

First, it makes the research framework more comprehensive, eliciting additional empirical data from the corpus. Second, this inclusion may allow the findings to be more

51

closely related to studies in translation competence and translation pedagogy. The results of this study may benefit researchers in these fields by providing benchmark data for translated and non-translated texts, especially for the Russian into English language pair, and by informing design of translation pedagogy and evaluation tools. Last but not least,

Campbell himself termed his exclusion of cohesion from his translator competence model

"a notable omission" (1998: 159).

2.2.8.1 Nominalization

Halliday (1994) broadly defines nominalization as a process "whereby any element or group of elements is made to function as a nominal group in the clause" (41). Following

Dong and Lan's research, the present study looks at a narrower part of nominalization tractable with Writer's Workbench. It measures only the number of noun forms of verbs as its nominalization count (Writer's Workbench Manual).

When translating between languages, especially when they are as structurally different as English and Russian (with English being an analytical language, and Russian, a synthetic one), translators might use nominalization or de-nominalization as a strategy.

Differences in the use of nominalization across languages might influence the process and product of translation. In addition, the use of nominalization is predicted to vary across genres.

Some linguists study nominalization as a feature that can be exploited ideologically by writers and editors. For instance, Fowler (1991) suggests that

52

nominalization "offers substantial ideological opportunities" (80), since it allows for

more information to go unexpressed and thus permits the "habits of concealment,

particularly in the areas of power-relations and writers' attitudes."

2.2.8.2 Lexical density

Texts can be described in terms of their lexical density. Type-token ratio is a measure of lexical density that is based on mathematical division of the number of unique words and

their forms (types) by the total number of words in a text (tokens). A high type-token

ratio is an indication that a text contains many different lexical items, so a higher

proportion of the words in that text have a specific meaning (Westin 2002: 77). A low

ratio "shows that few specific words are used while the more general ones are frequent"

(ibid.).

Westin notes that the category of type-token ratio was frequently used by

psycholinguists starting in 1960-1970s (e.g., Biber 1988) in their studies of differences

between oral and written genres (77). Written genres tend to have a higher type-token

ratio than spoken genres, possibly due to differences in production circumstances (Chafe

and Danielewicz 1987).

2.2.8.3 Average word length

Average word length is another textual feature that has been used as a measure of lexical

diversity, and "is included into several norm-based writing tests and in readability

formulas" (Troia 2009: 365). According to Troia, "[t]he measure of word length,

53

operationalized as the number of letters in a word, capitalizes on the fact that word length

and frequency are inversely related; as length increases, frequency decreases" (365). For

this reason, he notes the use of "lower frequency words as a developmental measure of

more mature writing" (365).

Average word length is a component of common readability formulas, including

the Flesch Readability Test, the Bormuth Mean Cloze formula adopted by the College

Entrance Examination Board and others. A comprehensive overview of readability

formulas can be found in DuBay's 2004 white paper.

2.2.8.4 Average sentence length

Sentence length is a measure that is often included into style guides and readability

formulas (Flesch Readability Test, Gog Index, the Clear River Test, etc.). Flesch

incorporates sentence length into his original "Reading Ease" readability index as early as

the late 1940s. The present study employs average sentence length as an additional

textual parameter since it has been shown that this may exhibit significant differences in

translated and non-translated texts (Laviosa 1998a and 1998b).

Studies of sentence length date back to the early 20th century. For instance, Kitson

(1921) finds that the average sentence length is different for different English-language newspapers (e.g., it was shorter in the Chicago American than in the Post of his times).

Vogel and Washburne (1928) include sentence length and prepositional phrases into the ten structural characteristics they used for their empirical study of readability. In Gary

54

and Leary's comprehensive study of readability (1935), longer sentences were found to be

more difficult for readers. In fact, the study found that average sentence length is among

the best predictors of textual difficulty. The study also showed negative correlation

between the number of prepositional phrases and readability.

2.2.8.5 Passives

Passives are an important characteristic of genres. It has received considerable attention

in linguistics, and translation studies might benefit from looking at this category as well.

The basic discourse function of passives is "to switch the focus of action in the active

clause from subject/actor to object/patient" (Westin 2002: 118). Research suggests that

the frequency of passives varies between text-types. For instance, Svartvik (1966: 152-

154) finds that texts pertaining to science have the highest number of passives, followed by texts pertaining to news and arts, while advertising has the fewest. Westin finds that the use of agentless passives in English newspaper editorials has decreased over time

(2002: 120), which might indicate "a drift away from a language that Biber characterizes

as 'abstract, technical, and formal' (Biber 1988: 112-113)."

The use of passives has also been linked to ideological purposes. For example,

Kress (1983: 127-128) emphasizes that passives allow for focusing on the affected entity and representing the action as an attribute of the affected entity while omitting the agent

of the sentence.

55

2.2.8.6 Prepositional phrases

Westin (2002) suggests that the use of prepositional phrases is "an effective way of

packing high amounts of information into idea units and of expanding the size of them,"

and so this feature is often found in informational discourse (71). In Russian, many

relations that would require a preposition in English may be expressed synthetically (e.g.,

"United States of America" is rendered in Russian as "Соединенные Штаты Америки,"

with the of-relation expressed synthetically with the help of the genitive case). For this

reason, translators working from Russian into English often have to introduce

prepositional phrases where there were none in the source text, which might be reflected

in the number of prepositional phrases in translated texts.

The frequency of prepositional phrases varies across text-types. In their study of

conversations, lectures, personal letters, and academic writing, Chafe and Danielewicz

(1987) find that prepositional phrases are frequent in all of these genres, while being

unusually frequent in academic writing. Biber's corpus study (1988) supports these

results.

2.2.9 Cohesion and other global textual features as a language- and culture-specific phenomenon

One of the main reasons why cohesion and other textual features pose problems for translators is their language- and culture-specificity (e.g., Hatim and Mason 1990, Baker

1992, Blum-Kulka 2000, Teich and Fankhauser 2004). As Blum-Kulka points out,

56

cohesive relationships in texts are linked to a language's grammatical system, as well as

to stylistic preferences for types of cohesive markers in each language involved in

translation (299-300).

In different cultures and languages, readers may have different background

knowledge of the world and particular subjects, and " … force us to draw upon

all we know about our culture, language, and world" (Everett 1992: 19). For this reason,

as Mona Baker points out, even a "simple cohesive relation of co-reference cannot be

recognized … if it does not fit in with a reader's prior knowledge of the world" (1992:

220). Baker analyzes an extract from A from Zero, pointing out that in it, there are no explicit cohesive ties between "Harrods" and "the splendid Knightsbridge store."

While a British reader, in this case, would not need an explicit cohesive tie between the two phrases due to his/her background knowledge (most British readers are familiar with the famous store and know that it is in Knightsbridge), a reader in a different culture might not be able to make such a connection without the translator explicitating cohesive ties between "Harrods" and "the splendid Knightsbridge store" (Baker 1992: 220-221).

Cross-cultural and cross-linguistic differences also lie in the structures of the languages involved in translation. In a small study of cohesion and coherence in one

Chinese text and its English translation, Yeh (2004) finds that "cohesion of a text in

English is constituted by reference items, such as 'he' or 'they,' while … in Chinese, [it] might be realized by the existence of a topic chain" (249). Lonsdale observes that due to the fact that English makes very few overt distinctions in terms of gender, number, or

57

subject-verb agreement, translators should pay special attention to making references

clear in an English-language target text (Lonsdale 1996: 219). In her study of Bulgarian

and English legal texts, Yankova (2006) finds a higher incidence of lexical cohesion in

English texts than Bulgarian texts, which might be predicted to play a role in translators'

decision-making processes. She also describes a higher number of synonyms and

collocations in Bulgarian texts (although this finding was not statistically significant). In

her study of English and Norwegian originals and translations, Hasselgård finds evidence

suggesting that English needs explicit markers of cohesion more often than Norwegian,

which may be explained by the fact that "the word order in Norwegian is more flexible

than that of English, particularly in sentence-initial position" (1997: 16).

The language and culture specificities of cohesion and other features of text

organization inevitably influence the process and product of translation (Johansson and

Hofland 2000). Toury, when discussing his translation law of interference, mentions that

“phenomena pertaining to the make-up of the source text tend to be transferred to the target text” (1995: 275). Pym, in his article on Toury's translation laws, explains this

"make-up" as "a set of segmentational and macrostructural features" (2007: 7), which would include cohesive devices and other features of text organization that are often transferred from the source to the target text. In European languages, Pym notes, there is

"a default norm by which one translates sentence for sentence, paragraph for paragraph, text section for text section." (Pym 2007: 7). This default behavior, Pym posits, may explain what Toury describes as "interference."

58

Non-interference, in its turn, "necessitates special conditions and/or special efforts

on the translator's part" (Toury 1995: 275). These special efforts would be targeted at

changing global textual features, such as cohesive devices, sentencing, or paragraphing.

And according to Pym, translators usually "need a very good reason to change a

paragraph break" (2007: 7). Elisabeth Le notes that "[t]ranslators are torn between the

apparent need to respect sentence and paragraph boundaries and the risk of sounding

unnatural in the target language" (2004: 267). Some empirical studies, for example, Baer

and Bystrova-McIntyre (2009), suggest that such global textual features as sentencing

and paragraphing are different across languages, as well as across the text-types included in their studies.

All this, again, stresses that translation inevitably involves changes, or shifts (to use Blum-Kulka's term) at a global level that should take into account the cultural and linguistic specificity of cohesion and other features of text organization. Lonsdale, in her textbook on teaching translation from Spanish to English, stresses that "different languages use cohesive devices (reference, substitution, conjunction, lexical and syntactic cohesion, chunking of information in sentences and paragraphs) differently," and translators must consider adjusting cohesive devices from the source language to the target language (215). Without such adjustments, translations may sound unnatural to the target reader.

The present study seeks to illuminate textual shifts that English translators consider when working with Russian texts first-hand, editing translations done by other

59

humans, or post-editing MT output. It establishes target language benchmarks by analyzing English non-translated texts, and then compares them to the results of analyses of human-translated and machine-translated texts from Russian into English. It also addresses the text-type specificity of cohesion and other features of text organization.

2.3 Studies of cohesion and other global textual features

This section starts with an overview of linguistic studies of cohesion in monolingual settings, including the contexts of writing evaluation and pedagogy, text comprehension, and genre and text-type specificity of cohesion categories. The reason for including an overview of linguistic studies that are not directly relevant to translation is that translation studies can benefit greatly from employing the findings and methodologies of linguistics, sociolinguistics, discourse analysis, and other disciplines. The section then proceeds to cover studies of cohesion in the context of translation, discussing its difficulties and reporting on achievements.

2.3.1 Studies of cohesion in the context of teaching and evaluating writing

Cohesion is an important and fascinating research topic in the context of teaching and evaluating writing. As discussed above, cohesion is an integral part of creating texts.

Still, some scholars point out the need to incorporate cohesion and coherence into writing textbooks at a more detailed and organized level. For instance, Kolln informally examined fifteen random books on writing (five for high school juniors and seniors, ten for college freshmen) and found that three (all college level) had no index entry for

60

"coherence," and only one (also college) included the word "cohesion" as an entry (1999:

94). Kolln also points out that textbooks on writing often include only parts of the concept of cohesion—e.g., while transition devices are mentioned in all fifteen books she looked at, only a few bring up parallelism and repetition in a positive light, and only one included a detailed framework of cohesion as outlined by Halliday and Hasan (95). While

Kolln's analysis may not be representative in a scholarly sense, it suggests that more attention should be given to cohesion and coherence in writing (and, consequently, in translation, since the latter involves writing as its major component).

Meanwhile, Chiang (2003) finds that the features of cohesion are considered an important predictor of writing quality by both native and non-native writers of English who rated foreign language writing samples using 20 discourse and grammatical features.

In fact, "all except three of the 30 raters based their perceptions of overall quality [of writing samples] primarily on either of the two discourse features: coherence and cohesion" (471). Of the four areas of evaluation used in this study (coherence, cohesion, syntax, and morphology), "regression analysis showed that … cohesion was the best predictor of writing quality of all the four areas" (471).

In the context of writing, cohesion is often examined in relation to the perceived quality of writing. For instance, Witte and Faigley (1981) examine cohesion in low- and high-rated essays and report a significant difference in the frequency of cohesive ties, finding that in low-graded essays, 20.4 percent of all the words contributed to cohesion, while in high-graded essays, the number was significantly higher—31 percent.

61

The number of ties, however, may not be the most representative feature of successful cohesion—"Quantity is not necessarily quality" (a common wisdom sited by

Kolln 1999: 96). Some researchers find no significant differences in the number of cohesive ties across differently rated writing samples. For instance, Tierney and

Mosenthal (1983) report that there is no causal relationship between propositional measures of cohesive ties within topic and coherence rankings within topic (215-229).

Hasan (1984), when counting the number of cohesive ties in children's writing and measuring coherence using readers' responses, also finds no easy correlation between the two. Irwin (1986), in her study of cohesion factors in children's textbooks, finds no significant differences in the total numbers of cohesive ties and connective concepts across the selected grade levels (64).

The number of cohesive ties proves more useful when examined across multiple dimensions. Many researchers report significant differences in the number of ties across different text modes, as well as when more granular types of cohesive ties are differentiated. Crowhurst (1987), in her study of cohesive ties across grade levels and modes (narration and argument), finds that the mode makes more difference than the grade, with 25.5 percent as the highest number of ties for narrations produced by sixth graders. In addition, Crowhurst's results indicate significant differences across the use of specific cohesive devices (e.g., the use of synonymy and collocation increased with age, which may be related to children's vocabularies expanding as they grow up) (1976: 192).

Irwin (1986), whose study was already mentioned above, finds significant differences in

62

terms of the frequency of shared arguments across main clauses, with lower-level textbooks displaying more cohesion than upper-level textbooks (1986: 64). This points to a usefulness of a more granular study of cohesive ties. For this reason, the present study looks at the use of cohesive ties in specific, granular, categories, in texts of different production modes (translations vs. non-translated texts), and across several text-types.

Last but not least, a significant body of research on cohesion exists for experiments involving impaired participants (Cherney et al. 1998, Fine 1994, etc.).

2.3.2 Studies of cohesion in the context of text comprehension

Cohesion is closely linked to readability and comprehension. Horning (1991) emphasizes that "[s]tudies of cohesion in reading show that cohesion makes a substantial contribution to readability" (136). In one such study, Irwin (1980) measures reading time and recall for two versions of a passage that differ in the number of cohesive ties, with one version containing about twice as many ties as the other, and finds that an increased number of cohesive ties improves readers' comprehension.

Of course, comprehension depends not only on the properties of the text itself.

McCutchen and Perfetti (1982) point out that the ability of readers to create the sense of coherence in a text depends on their knowledge of the topic and textual conventions.

Charolles goes further and states that "[i]n the end, it all depends on the receiver, and his

[sic] ability to interpret indications present in the discourse" (1983: 95). Comprehension

63

also develops with practice and experience on the part of the reader. Chapman (1987), for

instance, finds that readers' awareness of cohesive ties improves over time.

The translation scholar Mona Baker brings all this together, stating that textual

coherence is "a result of the interaction between knowledge presented in the text and the

reader's own knowledge and experience of the world" (1992: 219). Many linguists, such

as Barthes (1975), Eco (1979), Iser (1974), or Steiner (1975), have argued that, in the process of reading, the reader of a text becomes its producer rather than its receiver

(Bassnett-McGuire 1980: 79). The reader's experience and knowledge may be influenced by a number of factors, including age, sex, race, nationality, education, occupation, and political or religious affiliation (Baker 1992: 219).

Translators have to deal with text comprehension both as readers of the source text and as producers of the target text for the reader with a potentially different background knowledge and experience. Cohesion is an important factor in text comprehension, and is thus of strong relevance to translation studies.

2.3.3 Studies of cohesion as a genre-specific and text-type specific phenomenon

As suggested earlier, research on cohesive ties suggests that their usage differs across genres and text-types. As Mahlberg emphasizes, "[c]ohesive links are genre-specific"

(2006: 107). Mahlberg goes on to explain that, for instance, narrative texts that "deal with a central character … can provide many examples of reference and chains of reference items," while newspaper articles, in contrast, may be "more likely candidates to illustrate

64

lexical relationships where sentences share three or more lexical links" of the type

discussed in Hoey (1991a), who describes different categories of lexical repetition (e.g.,

complex repetition involving words that share a lexical morpheme but are not identical,

such as "argue" and "argument") (2006: 107).

The genre-specificity of cohesive ties is supported by research. For example,

Smith and Frawley (1983) study the use of conjunctions in , religious texts, journalism, and science. Their findings suggest that conjunctions used in fiction are more similar to the ones used in religious texts than to those used in journalism and science.

The conjunctions in science tend to be additives and hypotheticals that indicate causes

and logical succession. Verikaitė (2005) analyzes conjunctive discourse markers

(additive, adversative, causal, and temporal) in the genres of textbooks and scientific

research articles. According to her results, "the frequency of occurrence of a particular conjunction varied across the genres," for example, the causal and temporal conjunction were used more frequently in the research articles—suggesting that the variance may be due to genre constraints (68).

It should be noted that cohesion and other textual features also differ across

media, or modes, of production—oral vs. written. For example, in oral discourse, we find

more repetition than in written discourse. Aaron (1998) explains this as follows: "in

written language there is a limit to how much repetition can be tolerated by readers" (3).

Dooley and Levinsohn (2001) point out differences in organization between written and oral discourse, for example, written texts tend to be more tightly organized; they tend to

65

introduce information at a faster pace and have longer groupings of sentences (16-17).

Tanskanen (2006) finds differences in lexical cohesion (reiteration and collocation) across different genres on a spoken-written continuum (face-to-face conversations— prepared speeches—mailing list language—academic writing). Since the present study involves written texts only, additional discussion of difference across oral and written media is outside its scope.

The present study does involve methods of text production in the context of translation studies—it looks at English non-translated texts, texts translated from Russian into English by human translators (human-translated texts), and texts translated from

Russian into English by MT tools (machine-translated texts).

2.3.4 Studies of cohesion in the context of translation studies

In translation studies, cohesion has gained increased attention relatively recently, with its rise marked by the "textual turn" in translation studies, when global features, such as cohesion, were recognized to be of central importance (Neubert and Shreve 1992). Still, few studies have yet been done to isolate cohesive features of translated texts; as Chau

Hu points out, the issue of cohesion "in the theory and pedagogy of translation … has not been adequately studied thus far" (1999: 33). Not surprisingly then, until only recently, the American Translators Association (ATA) error marking framework had no explicit category of cohesion and few other categories that recognized phenomena above the level of the sentence (American Translators Association website). So while we know that

66

expert translator behavior is marked by a global, top-down approach to text creation

(Jääskeläinen 1990; Tirkkonen-Condit and Jääskeläinen 1991; Kussmaul 1995), the

world of professional translation has not fully benefited from this knowledge. A part of

the translation profession continues to concentrate, like novice translators themselves, on

the sub-sentential level, overlooking cohesion as a vital feature of translation.

This situation can "in part be explained by the fact that the qualities that constitute

cohesion are generally difficult to pinpoint and isolate" (Baer and Bystrova 2009: 163).

As Wilson (quoted in Callow 1974: 10-11) noted regarding the first translation of the

bible into Dagbani:

For a native speaker it was difficult to express what was wrong with the earlier

version, except that it was ‘foreign.’ Now, however, a comparison … has made clear that

what the older version mainly suffers from are considerable deficiencies in ‘discourse

structure,’ i.e., in the way the sentences are combined into well-integrated paragraphs, and these in turn into a well-constructed whole.

In addition, the construction of cohesion in translated texts may be complicated, as Baker (1992: 125) points out, by a tension between syntax and thematic patterning, requiring recasting not for the sake of semantics, understood in a limited sense, but for the sake of cohesion. Blum-Kulka refers to such language-specific recastings as "shifts"

(299).

67

Another possible reason that cohesion has been understudied in translation may be that it, unlike "unique items" (Tirkkonen-Condit 2004) or hapax legomena (Kenny

2001), is rather difficult to investigate with automated tools. The number of features studied with automated tools is typically limited. For instance, while Baker (1992) discusses automated studies of lexical density and type-token ratio as indicators of cohesion in texts, this omits other cohesive devices.

When studying cohesive devices in translation, many researchers perform their text analyses and annotation manually, which is labor-intensive (e.g., studies by Hoey

1991a , Hu 2004, Yeh 2004, and Zhao et al. 2009) and limits the number of texts included in such studies. For instance, the study by Zhao et al. (2009) includes only 15

English originals and their Chinese translations, manually annotated using Halliday and

Hasan's categories of reference, substitution, ellipsis, conjunction, and lexical cohesion.

Manual annotation has its advantages and disadvantages. As Teich and

Fankhauser point out, the inter-rater reliability in manual annotation is often less than satisfactory (132). In many studies, inter-rater reliability is not even checked. Further, smaller-sized corpora may be less representative of a broader population. However, manual annotation may allow for a more detailed analysis of the few selected texts, as shown in the studies by Øverås (1998) and Zhao et al. (2009).

Corpus studies, the advantages of which are discussed in more detail in Chapter 2, provide opportunities for automated studies that do not involve human annotation. A famous example of a corpus-based study of the optional "that" in translated English texts

68

was done by Olohan and Baker (2000). The authors analyze the inclusion or omission of

the optional 'that' with the reporting verbs 'say' and 'tell' using well-known corpora—the

Translational English Corpus (TEC) and the British National Corpus (BNC). Olohan and

Baker's data show that the 'that'-connective "is far more frequent in TEC than in BNC,

and conversely that the 'zero'-connective is more frequent for all forms of both verbs in

the BNC corpus than in TEC" (141). These results are interpreted as providing evidence

for explicitation as a feature of translated texts. The present study aims at expanding the

number of cohesion features elucidated by a corpus study.

Some researchers study cohesion in translation in the realm of translation

universals, searching for evidence supporting their existence (see Section 1.2.3). For

instance, Øverås (1998), who studies cohesion in an English-Norwegian parallel corpus,

finds that translated texts exhibit an overall rise in the level of cohesion, which suggests explicitation in translation. Her findings point to "shifts in conjunctive and reference cohesion through addition and expansion in the specification of nouns" by means of determiners, substitution, and shifts in lexical cohesion (Kruger 2002: 90). Blum-Kulka's explicitation hypothesis is also based on the use of cohesive markers in translation

(1986). She observes that a target text may be more redundant than a source text, and this

"redundancy can be expressed by a rise in the level of cohesive explicitness" in the target text (2004: 300).

Studies of cohesion in translation often include a limited number of text-types.

For instance, Zhao et al. (2009), mentioned above, look at cohesive devices in medical

69

texts only. The researchers find more similarities than differences in the use of cohesive

devices in medical originals and their translations, with the only difference found "in the

employment of reference in terms of occurrence frequencies" (2009: 313). The authors

conclude that this is due to the need to maintain "precision, clarity, and logicality" in

translated medical texts. To develop a broader model of cohesion in translation, more

text-types need to be included.

It seems useful to compare the use of cohesive devices in translated texts to their

use in non-translated texts of the same text-type, as was done in Olohan and Baker's analysis (2000). Such an empirical comparison might help illuminate differences between translated and non-translated texts. As Kruger points out, "[c]orpus-based research in translation is concerned with revealing both the universal and the specific features of translation" (2004: 70).

2.3.5 Studies of other global textual features in the context of translation studies

Studies of translation product and process have also employed other features of text organization to characterize texts and aid translators' work and training. For instance,

Byrne (2006) devotes his entire book to improving the quality of technical translations with the help of usability testing, which includes readability tests and other measures of text characteristics. He emphasizes how important it is for technical translators to be familiar with best practices of document design and testing procedures for the variety of technical test-types—the knowledge that today's technical writers spend years acquiring

70

but translators often lack (2006: ix). Among other things, Byrne mentions such categories

as average sentence length, average word length, nominalization, and passive

constructions. Related to studies of cohesion in translation, Byrne, based on the results of

his empirical study, suggests that iconic linkage has a positive effect on the usability of

software user guides. 'Iconic linkage' is Juliane House's term for "the repetition or reuse of target language translations for source language sentences which have the same meaning but different surface properties "(164).

Translation researchers have used the study of textual features to illuminate differences between translated and non-translated texts. For example, studies done by

Laviosa (1998a, 1998b) look at lexical density and sentence length in translational and non-translational corpora in the genres of newspaper articles and narrative prose. For newspaper articles, Laviosa's findings suggest that translated texts have a lower lexical density (a lower percentage of content words versus grammatical words, in Laviosa's terms), and display a lower average sentence length (1998a). For narrative prose, Laviosa

finds that the lexical density is significantly lower in translated texts, while the mean

sentence length, contrary to the findings for the genre of newspaper articles, is

significantly higher (1998b: 561).

Laviosa's translational corpora include translated texts from a variety of source

languages, since her research aims at pinpointing features that translated texts exhibit

regardless of their source language. However, it is of interest to look at particular

language pairs; as Laviosa herself cautiously hypothesizes, "the average sentence length

71

may be particularly sensitive, in the narrative subject domain, to the influence of different

source languages, as well as the author's particular style" (1998b: 565).

To give another example of how textual features are used in translation research,

Khanna et al. (2011) consider sentence length and textual complexity when studying translations of sentences from a medical pamphlet performed by professional (human) translators and a machine translation tool. They find that while sentence length is not associated with scores given by graders to human and machine translations, complexity is associated with the graders' preference scores. In their study, the graders preferred more complex English sentences produced by professional translators.

Interestingly, some textual features have been studied in the framework of

. For instance, some scholars view nominalization as a feature that can be

exploited ideologically by translators and editors. Kuo and Nakamura (2005), who study

ideological changes in English-Vietnamese translations, find that nominalization is one of the devices that may be used "to mask the 'reality' presented in the original news text"

(411). On the other hand, attempts to "de-vilify" nominalizations have also been made

(Martin 2008).

2.3.6 Cohesion and other global textual features and translation expertise

The notions of cohesion and other textual features that characterize texts globally are relevant to studies of expertise in translation. The ability to function at a macro-textual level, which involves dealing with the question of cohesion and other global features of

72

texts, is often cited as a trait of translation experts. Being able to function at a macro-level

and take global aspects of a task into account are characteristic of experts in other fields

as well. In their overview to The Nature of Expertise, Glaser and Chi (1988: xviii) report

that "experts see and represent a problem in their domain at a deeper (more principled)

level than novices; novices tend to represent a problem at a superficial level." The famous

1981 study on the categorization of physics problems among experts and novices by Chi,

Feltovich, and Glaser demonstrated that experts used underlying physics principles to

categorize problems, while novices used surface features as the basis for problem

categorization.

Translation researchers have indicated that novices in translation tend to

concentrate on the most straightforward and superficial level of words and phrases. For

instance, Shreve (2002) mentions the tendency of novice translators to “view the

translation as a sequence of exclusively lexical problems” (164). Shreve notes that

novices often ignore more complex structures because they do not recognize them or do not think of them as part of the scope of the translation activity (165). Expert translators, on the other hand, have an ability to “recognize that the sentence might not be the appropriate level to work with as a unit” (Séguinot 1999: 92). Differences in dictionary use and overall approach to translation (microcontext vs. macrocontext), reported in

Krings (1986), support this claim as well. The results reported in Pouget (1998),

Tirkkonen-Condit & Jääskeläinen (1991), Kussmaul (1995), and Angelone (2010) also suggest a more global approach of professional translators to their task.

73

For this reason, the present study of cohesion and other global textual features

might facilitate the development of curricula aimed at helping translation students shorten

their track to becoming experts in their profession by addressing the global feature of

textual cohesion. Calls for introducing cohesion and coherence into translation instruction

have already been made (e.g., Chau Hu 1999). Based on the results of this study, data-

informed pedagogical interventions can be developed to explicitly teach issues of

cohesion and textual organization in translation. Such interventions would encourage

student translators "to consider the target text globally, as a product involving a variety of

features above and beyond lexis, for which they are professionally responsible" (Baer and

Bystrova 2009: 163).

2.4 Conclusions

This chapter provided an overview of literature pertaining to cohesion and other global

features of texts in both monolingual and multi-lingual settings. It also outlined the main concepts and theories related to differences across translated and non-translated texts, as well as across different genres of texts. In particular, it discussed Toury's law of interference and the concept of translation universals, which maintain that translated texts possess characteristics different from non-translated texts in the same language. Toury's law of interference relates this to influences of source texts, while the concept of translation universals presupposes that unique characteristics of translations exist regardless of the source language.

74

This study focuses on cohesion as one of the central global features of texts.

Consequently, this chapter extensively described developments in the concept of cohesion in monolingual and multilingual studies, from the pioneering works by Halliday and Hassan (1976), Gutwinsky (1976) and others to more modern views of such scholars as Hoey (1983, 1991a, 1991b), Campbell (1995), Dooley and Levinsohn (2001), and

Mahlberg (2009), to name but a few. The chapter included definitions and descriptions of grammatical and lexical categories of cohesion in both English and Russian. Further, it looked at the concept of cohesion in translation (Blum-Kulka 2004, Baker 1996, etc.).

The chapter also gave considerable attention to other global features of texts and their importance in monolingual and translation research. Such features included nominalization, passives, prepositional phrases, average sentence and word length, and lexical density.

The chapter also provided an overview of studies of other global textual features in monolingual settings, including the contexts of writing evaluation and pedagogy, text comprehension, and genre and text-type specificity of global textual categories. Then, it covered studies of cohesion and other global features in the context of translation, discussing their difficulties and reporting on their achievements.

The chapter then introduced methodologies used to study global features of texts, which often involve methods of corpus linguistics. Studies of electronic corpora permit analyses of global text features in a more objective way and at a larger scope than manual analyses, helping illuminate linguistic features not visible to the naked eye. The chapter

75

also outlined practical applicability of studies such as this one in designing ways to

distinguish between human-translated, machine-translated, and non-translated texts and

for developing pedagogical interventions in training translators and (post)-editors.

In translation studies, global features of texts have gained increased attention relatively recently—with a "textual turn" in translation studies that recognized such global features to be of vital importance (Neubert and Shreve 1992). Still, few studies have yet been done to isolate global features of translation; as Chau Hu points out, the issue of cohesion "in the theory and pedagogy of translation … has not been adequately studied thus far" (1999: 33). In the current practice of translation, where content is translated by global translation teams using a plethora of translation technologies, translators' attention is often focused on segments below the textual level, threatening the maintenance of a global textual orientation. These realities of the industry make it vitally important to conduct empirical studies of such global features. As discussed in this chapter, studies of global features should include larger corpora, more text-types, and more comprehensive methodologies—all of which this dissertation seeks to address.

CHAPTER 3: METHODS AND MATERIALS

This chapter begins with a general overview of corpus linguistics methods as tools to analyze texts. It then outlines the framework used in this research to study cohesion and other features of texts, and presents the methods, tools, procedures, and materials employed in this dissertation. In addition, it describes the process of compiling the three-

dimensional, multi-genre corpus used for this study.

3.1 Corpus linguistics as a methodology to analyze cohesion and other global textual

features

The present study employs automated methods of corpus linguistics to investigate the use

of cohesive devices and other global textual features in a number of different text-types and across translated and non-translated texts. The corpus-based automated approach allows for comparison among substantial numbers of texts, including non-translated texts and texts translated into English by humans and machines.

Defined as a "collection of electronic texts assembled according to explicit design criteria," a corpus is usually compiled with the goal of "representing a larger textual population" (Zanettin 2002: 11), providing linguists with a "much more solid empirical basis than had ever been previously available" (Granger 2003: 18) (Baer, Bystrova-

McIntyre 2009: 161). Prior to corpus studies, linguists were limited to "what a single

76

77

individual could experience and remember" (Sinclair 1991: 1). As Sinclair (1991) notes,

due to enormous developments in computer technology over the last several decades,

corpus linguistics became possible and allowed for "the emergence of a new view of

language," when much more data and much more powerful tools to analyze it became

available to linguists (1).

Translation studies adopted methods and tools of corpus linguistics somewhat

later than linguistics itself did. In her preface to the book Introducing Corpora in

Translation Studies, Maeve Olohan (2004: 1) remarks, referencing Graeme Kennedy

(1998: 1), that, while linguistics has employed electronic corpora for more than three decades, "the use of corpora […] in translation studies has a short history, spanning no more than ten years." This makes corpus-based translation studies a fruitful area of development (Baer, Bystrova-McIntyre 2009: 161). The use of corpora allows researchers to compile databases of unprecedented size and to analyze data from them within seconds (Sinclair 1991: 1).

Corpus linguistics as a methodology involves "large text corpora … [and] derives its results from the analysis of such corpora" (Herbst 2010: 34). According to Bowker

(2001), the value of using a corpus-based approach in translation studies lies "in its broad scope, the authenticity of texts, and in the fact that data are in machine-readable form"

(Baer, Bystrova-McIntyre 2009: 161). Moreover, modern computational tools and methods for quantitative and qualitative analyses (word frequencies, concordances, etc.) allow for more empirical and objective research (Bowker 2001: 346).

78

Applying the techniques of corpus analysis to the study of cohesion may address

certain criticisms of Halliday and Hasan's cohesion theory and the "perceived lack of

utility of quantitative analysis" of cohesive devices (Campbell 1995: 8). A presumed lack

of utility, for instance, is noted by Hendricks (1998: 104), who deems the mass of data

that arises from applying Halliday and Hasan's framework manually to longer texts

"practically useless." As noted earlier, corpus analysis techniques allow for easier handling of masses of data, helping to detect patterns and assess the statistical significance of analyses.

The corpus-based approach also allows researchers to study a more comprehensive list of cohesive devices and other textual features, especially with the relatively new framework that I employ in this study. This framework, developed by

Dong and Lan (2010), includes a variety of cohesive markers identified by Halliday and

Hasan (1976), as well as several features of text organization suggested by Campbell

(1998), thus increasing the scope of coverage for cohesive and textual features that were

previously lacking in most automated analyses, which focused on such features as type-

token ratio, paragraph length, lexical density, readability indices, or individual words.

3.2 The framework for studying cohesion and other global textual features

The present study employs the framework for studying cohesive markers and other global textual features designed by Da-Hui Dong & Yu-Su Lan (2010). Dong and Lan applied their framework to translations produced by different groups of Chinese-into-English

79

translators (novices, native experts, and non-native experts). The results of their study were published by a major translation publisher, St. Jerome Publishing, in 2010. The present study uses a different application of Dong and Lan's framework, focusing on differences and similarities that might exist between human-translated, machine-

translated, and non-translated texts, with English as the target language, and Russian as

the source language.

Dong & Lan combine parts of Halliday & Hasan's (1976) framework with one

part of Stuart Campbell's model of translation competence—textual competence (1998).

Below, I outline the parts of both models that were used by Dong and Lan.

As described in Chapter 2, Halliday and Hasan (1976) differentiate two main

components of cohesion: grammatical and lexical. The cohesive devices that constitute

Halliday and Hasan's framework include: reference, substitution, ellipsis, conjunction,

and lexical cohesion. For their framework, Dong & Lan choose two of these five

categories: reference and conjunction (50). For the term "reference," I use the definition

offered by Dong & Lan: "[r]eference is a relation between an element of the text and

another element that it is linked to which gives it meaning" (50). Several pronoun

referents and demonstratives are covered by the term (50):

− Pronominal

− Demonstrative

− Definite article

80

− Comparative

Conjunction is defined as "the semantic linkages between related sentences,"

which includes coordinating as well as subordinating conjunctions "used to express assistive, adversative, causal, and temporal relationships between and within sentences"

(Dong & Lan 2010: 50-51):

− Additive ("and," "also," etc.)

− Adversative ("however," "yet," "but," etc.)

− Causative ("so," "therefore," "consequently," etc.)

− Temporal ("then," "next", "finally," etc.)

− Continuative ("anyway," "nonetheless," etc.)

Halliday & Hassan's model is supplemented with a component of Campbell's

model of translation competence (1998). The idea of combining parts of the two models

belongs to Dong & Lan, and, in my opinion, makes the resulting model more

comprehensive.

Campbell's model was originally designed for assessing translation into a second

language and is now well accepted in studies of translation expertise. It includes three

types of competence—textual competence, translators' disposition, and monitoring

competence. Campbell's textual competence is based on Bachman's definition of textual

competence as "the knowledge of the conventions for joining utterances together to form

a text" (1990: 88). In Campbell's terms, it is "the capacity to deploy grammar and lexis

above the level of the sentence" (1998: 60). It also involves mastery of genre conventions

81

and "sensitivity to naturalness" (61). Disposition is a personal competence of a translator.

Campbell claims that translators' vocabulary choices "reveal tendencies of risk-taking versus prudence and persistence versus capitulation" (3). Finally, the term "monitoring competence" describes translators' ability to self-edit and to assess their own work (126-

127).

Of these three competences, textual competence yields most easily to corpus studies. In fact, to assess textual competence, Campbell uses a model designed by Biber to describe structural genre norms in corpora. Biber's model is quite extensive (1988:

223-245), and, among a wide variety of textual features, includes nominalization, average word length, lexical diversity (e.g., type/token ratio), passives, and prepositional phrases.

These features, as well as sentence length, are included in this study.

As a result of combining Halliday & Hasan's cohesion model with Campbell's components of textual competence, Dong & Lan suggested the framework outlined in

Table 3.1. Their framework was supplemented by an additional textual feature—sentence length. It was also slightly modified to address possible oversights of the researchers.

While Dong & Lan used their framework to study translation competence, I use this framework to study characteristics of texts translated from Russian into English by humans and machines when compared with non-translated texts in an attempt to pinpoint aspects of cohesion in translation in the Russian-English language pair. Table 9 provides an overview of the modified framework adapted from Dong and Lan (2010: 59).

82

Table 3.1 The combined framework for the study of the use of cohesive devices and other global textual features (adapted from Dong and Lan 2010) Components of previous models Dong and Lan's model (2010) Cohesive devices (Halliday and Hasan 1976) Reference devices Pronominal devices—personal Included (3rd person pronouns only, since other pronouns are less likely to be cohesive devices) Pronominal devices—possessive Added by the researcher Demonstratives Included Definite article Included Comparative devices Included Conjunction devices Additive devices Included Adversative devices Included Causal devices Replaced by subordinate conjunction Temporal devices Included Continuative devices Replaced by subordinate conjunction Global textual features (Campbell 1998) Nominalization Included Lexical density (type-token ratio) Included Word length (average) Included Passives Included Prepositional phrases Included Sentence length (average) Added by the researcher

83

3.3 Materials

This section outlines criteria and procedures used in compiling the corpus, and describes in detail the sub-corpora comprising it.

3.3.1 Corpus compilation

At the broadest level, monolingual corpora can be divided into two types (Zanettin 2011:

15). A corpus can be general—attempting to represent a "national variety" of a language.

Or it can be specialized—focusing on a particular variety of language: text type/genre

(e.g., news, scientific articles), domain/topic (e.g., international events, physics),

production method (e.g., oral language, written language), or a combination of these.

On the other hand, corpora used in translation studies are almost always

specialized, and the opportunities for specialization are numerous. They may be parallel

(consisting of source texts and their translations) or comparable (consisting of translated

and non-translated texts of similar criteria in a single language). Comparable corpora

should "cover a similar domain, variety of language and time span, and be of comparable

length" (Baker 1995: 233). Translational corpora may be unidirectional (consisting of

translations into one language only) or multidirectional (consisting of translations to and

from a certain language) (Zanettin 2008). In addition, translational corpora might include

texts produced by different groups (such as professionals or learners) or methods (such as

human-translated or machine-translated). For this study, I built a set of specialized

corpora based on the combined criteria of text-type/genre and production method.

84

For the purposes of this study and future use, a monolingual comparable corpus of

English texts and a unidirectional parallel corpus of Russian source texts and their

English translations were compiled. The monolingual comparable corpus includes three

major categories of texts—texts translated from Russian into English by human

translators, texts translated from Russian into English by the Google machine translation

tool, and English non-translated texts. In the tables, figures, and graphs, these three

methods of text production will be referred to as follows:

Human-translated texts—HT;

Machine-translated texts—MT;

Non-translated texts—NT.

Quantitative analysis was performed on the monolingual English corpus only. The

corresponding parallel corpus of Russian source texts and their English translations was

compiled to produce machine translated texts for the monolingual English corpus and for

further qualitative analysis of specific textual features identified by the quantitative

analysis.

The corpus consists of three text-types—literary texts (modern prose), newspaper articles (international news/commentary), and scientific articles (physics). These sub- corpora will be referred to as follows:

Literary corpus—LC;

Newspaper corpus—NC;

85

Scientific corpus—SC.

All texts in the corpus were produced by professionals, and were published in books, literary magazines, scientific journals, and online resources of major Russian and

English newspapers that report professional editing of their articles. The corpus is synchronous: as of 2012, all the texts are considered to be modern (published in the past

10 years for the newspaper and scientific corpora, and the past 20 years for the literary

prose corpora).

Any corpus design entails considerations of representativeness, and involves

important decisions regarding the corpus size, text selection criteria, a balance of texts,

length of individual samples, mark-up, etc. (Baker 1995). These decisions depend on the

researcher's goals, his/her approach, and practical considerations (available time, funding,

etc.).

Typically, it is assumed that "the linguistic patterns which are observed can be

generalized beyond that corpus to the textual universe that corpus stands for" (Zanettin

2011: 15). However, as Leech points out, "difficulties of determining the size of the

textual universe and its sub-universes from which a corpus is to be sampled are

formidable" (2007: 139), and "representativeness remains the 'holy grail' of corpus

linguistics, something to strive for rather than something that can reasonably be attained"

(Zanettin 2011: 15). In the end, it is the researcher's call to make compilation decisions

based on best practices and practical considerations. As Ahmad (2008) notes, corpus

design "will always carry the unintended influence of the designer(s)" (61).

86

Even the best corpora compiled by large institutions and groups of researchers can

be criticized. Leech (2007) points out some flaws in the world-famous 100-million-word

British National Corpus (BNC). For instance, Leech notes that spoken language in it is

underrepresented, and that the genre proportionality may be flawed (136). This fact does

not undermine the benefits of corpus linguistics outlined in 2.1. As with many

methodologies, corpus studies work best when viewed as part of a larger effort to

describe linguistic phenomena, with both quantitative and qualitative methods used

together.

For specialized corpora, representativeness is improved by the fact that such

corpora cover much smaller text categories than general corpora. For this study, the sub-

corpora were compiled using strict criteria, such as text-types, publication types, publication years, authors, translators, lengths of samples, etc. (outlined in more detail in the following sections). The compilation criteria and practices used in this study were piloted and improved over several years of corpus compilation and research undertaken by this researcher and her advisor, documented in several articles and professional conference presentations (Bystrova-McIntyre 2006, 2007, Baer, Bystrova-McIntyre

2009). The present corpus represents substantial improvements over these prior efforts. In my opinion, these three sub-corpora are sufficiently representative of their specific language varieties for the goals of this study.

Still, the inclusion of a larger number of texts is an ever-present goal for corpus

researchers ("the bigger, the better" concept). However, the goal of largeness is more

87

realistically attainable by funded research groups rather than individuals. The compilation

process for the present corpus took several months of full-time work for one researcher, with gratis assistance from a database designer. Immediate tasks included locating source texts, their translations, and comparable non-translated texts (constrained by availability within the budget of the researcher, i.e., free or at a minimal cost), scanning and running character recognition for sources not available electronically, designing and maintaining a database to store and analyze the texts and metadata, running texts through machine translation and part-of-speech tagging software, collecting the data from the texts, and solving many unexpected problems arising in the process.

The comparable and parallel corpora compiled for the purposes of this study include 600 texts. Each of the three text-type sub-corpora includes 200 texts—50 Russian source texts, their corresponding English human and machine translations, and 50

English non-translated texts of comparable nature. From these, 450 English texts were used for the quantitative analysis.

For literary and scientific corpora, texts of at least 1,000 words were included.

When the texts ran longer than 2,000 words, the sample for inclusion was cut off at the end of the paragraph following the 2,000-word mark. This was done to make the corpus more manageable and to have texts of comparable length. For the newspaper corpus, the minimum 1,000-word requirement was not applied, since the genre of a newspaper article presupposes shorter texts.

88

The Russian segments in the parallel corpus were trimmed to match the English

samples obtained as described above. For the machine-translated part of the corpus, these

Russian samples were run through GoogleTranslate. The MT was performed on March

20, 2012 (a potentially important point, since GoogleTranslate is known to be under rapid development). As is well known, GoogleTranslate is a free translation tool, currently working for 58 different languages. GoogleTranslate is a statistical MT system, which analyzes truly monumental amounts of linguistic data—"it looks for patterns in hundreds of millions of documents" (GoogleTranslate's webpage). It supplements this process with the use of translation memories (TMs), which help fine-tune some translation solutions.

In fact, users can upload their own TMs into the Google Translator Toolkit, to improve quality of the output and tweak it for the domain of interest. For the purposes of the current project, no additional TMs were used.

The entire English-language corpus is comprised of 691,414 words. The descriptive statistics for the corpus are shown in Table 3.2.

Table 3.2 Descriptive statistics for the English-language corpus Descriptive parameters # of texts # words # sentences Human-translated corpora (English)* 150 229,100 12,177 Machine-translated corpora (English)* 150 219,429 11,577 Non-translated comparable corpora 150 242,885 13,174 (English) Totals for the English corpus 450 691,414 36,928 *These texts are translations of the texts in the Russian source-text corpora

89

Further compilation criteria were fine-tuned for each of the three corpora

individually, and are described in more detail in the following subsections.

3.3.1.1 Literary corpus

The literary corpus focuses on contemporary prose represented by writing. For

the LC, translations of Russian contemporary prose were scanned from individual

publications (e.g., Ludmila Ulitskaya's Sonechka), The New Yorker magazine (e.g.,

Ludmila Petrushevskaya's The Fountain House), and especially from collections of translated Russian stories, all published in English during the last 20 years. After scanning, optical character recognition was performed. Below is the list of the collections used for the literary corpus:

− The Penguin Book of New Russian Writing. 1996. Eds. Victor Erofeyev &

Andrew Reynolds)

− Rasskazy. New Fiction from a New Russia. 2009. Eds. Mikhail Iossel & Jeff

Parker

− Life Stories: Original Works by Russian Writers. 2009. Ed. Paul E. Richardson

− Russian Love Stories. An Anthology of Contemporary Prose. 2009. Ed. Nadya L.

Peterson

Since the corpus "should be representative in terms of the range of original

authors and of translators" (Baker 1995: 233), each author in the LC occurs only once,

which minimizes influences related to authors' individual styles. However, by necessity,

90

individual translators do occur multiple times in the corpus; there exist fewer Russian into English translators than authors, and a single translator often works with a number of

Russian authors. For this corpus, the occurrence of any one translator is limited to three times, to avoid major influences of individual translators' styles.

The source texts for the corpus of Russian contemporary prose were mainly obtained from the Internet. The Russian texts included in the corpus were available on websites of Russian online libraries or of individual authors. The process of corpus compilation was not easy, due to the limited availability of source texts and translations, as well as to the constraints of corpus compilation parameters (such as the inclusion of each author only once, the inclusion of each translator only three times, and the minimum length of the pieces).

The list of the Russian source texts with their corresponding human translations is presented in Table 3.3.

Table 3.3 Russian source texts and their corresponding human translations Author Translator(s) Title (Russian) 1 Alyokhin, Evgeni Mesopir, Victoria Ядерная весна 2 Astafyev, Victor Reynolds, Andrew Людочка 3 Babchenko, Arkady Allen, Nick Дизелятник 4 Boteva, Maria Mesopir, Victoria Дело в том кому веришь (sic.) 5 Danilov, Dmitry Robinson, Douglas Более пожилой человек 6 Dovlatov, Sergey Frydman, Anne Компромисс пятый 7 Epikhin, Nikolai Gusev, Mariya Рычева

91

8 Erofeev, Venedikt Mulrine, Stephen Василий Розанов глазами эксцентрика 9 Gallego, Ruben Schwartz, Marian Белое на черном 10 Geide, Marianna Gunin, Anna Чудовище Плещеева озера 11 Gelasimov, Andrei Bayer, Alexei Жанна 12 Goralik, Linor Iossel, Mikhail Говорит: (sic.) 13 Gorenshtein, Bromfield, Andrew С кошелочкой Friedrikh 14 Grishkovets, Yevgeny Richardson, Paul E. Спокойствие 15 Kabakov, Alexander Seluyanova, Anna Убежище 16 Kalinin, Vadim Gusev, Mariya Невероятная и печальная история Миши Штрыкова и его жестокосердной жены 17 Kamenetskaya, Maria Mikhailova, Julia Между летом и осенью 18 Kharitonov, Evgeny Tait, Arch Духовка 19 Khazanov, Boris Maizell, Sylvia Праматерь 20 Khurgin, Alexander Fisher, Anne O. Беруши 21 Kochergin, Ilya Gunin, Anna Потенциальный покупатель 22 Kozlov, Vladimir Gregovich, Andrea Праздник строя и песни & Iossel, Mikhail 23 Lipskerov, Dmitry Bayer, Alexei Эдипов комплекс 24 Lukyanenko, Sergei Bliss, Liv Сердце Снарка 25 Makanin, Vladimir Shayevich, Bela Однодневная война 26 Mamleev, Yury Mulrine, Stephen Тетрадь индивидуалиста 27 Moskvina, Marina Fisher, Anne O. Мусорная корзина для алмазной сутры 28 Muliarova, Elena Peterson, Nadya L. Хроника двух дней жизни Жени Д. 29 Pelevin, Viktor Favorov, Nora Онтология детства

92

Seligman 30 Petrushevskaya, Gessen, Keith & Дом с фонтаном Ludmilla Summers, Anna 31 Popov, Evgeny Porter, Robert Как съели петуха 32 Popov, Valery Doyle, James Любовь тигра 33 Prigov, Dmitry Shuttleworth, Mark Описание предметов 34 Prilepin, Zahar Ilinskaya, Svetlana & Убийца и его маленький друг Robinson, Douglas 35 Rubina, Dina Katz, Michael R. & Туман Komarov, Denis 36 Sadulaev, German Gunin, German Почему не падает небо 37 Senchin, Roman Mesopir, Victoria Тоже история 38 Shcherbakova, Galina Osorio, Rachel Три любви Маши Передреевой 39 Shklovsky, Evgeny Lindsey, Byron Бабель в Париже 40 Sokolov, Sasha Pobedinskaya, Olga Тревожная куколка & Reynolds, Andrew 41 Sorokin, Vladimir Hoffman, Deborah Черная лошадь с белым глазом 42 Starobinets, Anna Litman, Ellen Правила 43 Terts, Abram Tait, Arch Золотой шнурок (Andrew Sinyavsky) 44 Tolstaya, Tatyana Gambrell, Jamey Кысь 45 Ulitskaya, Ludmila Tait, Arch Сонечка 46 Voinovich, Vladimir Morley, Peter Роман (Трагедия) 47 Yerofeyev, Viktor Berdy,Michele A. Реабилитация Дантеса 48 Yuzefovich, Leonid Schwartz, Marian Гроза 49 Zobern, Oleg Gessen, Keith Шестая дорожка Бреговича 50 Zonberg, Olga Ilinskaya, Svetlana & Смилуйся, государыня рыбка Robinson, Douglas

93

The comparable non-translated literary sub-corpus consists of modern English- language prose harvested from Kindle and paper editions of collections of contemporary

American prose, as well as from the online archives of The New Yorker magazine's

Fiction section for the years of 2010-2012. Each author occurs only once. The list of the contemporary American prose collections is below:

− Henry Prize Stories 2011. Random House, Inc. Kindle Edition

− The Best American Short Stories 2011: The Best American Series. Eds. Geraldine

Brooks & Heidi Pitlor. Houghton Mifflin Company. Kindle Edition.

− The Best American Sampler: The Best American Series 2011. Houghton Mifflin

Company. Kindle Edition.

− The Best American Short Stories 2010: The Best American Series. Eds. Richard

Russo & Heidi Pitlor. Houghton Mifflin Company.

− The Best American Short Stories 2009: The Best American Series. Eds. Alice

Sebold & Heidi Pitlor. Houghton Mifflin Company.

− The Best American Short Stories 2008: The Best American Series. Eds. Salman

Rushdie & Heidi Pitlor. Houghton Mifflin Company.

− The Best American Short Stories 2005: The Best American Series. Eds. Michael

Chabon & Heidi Pitlor. Houghton Mifflin Company.

The list of the English non-translated texts and their authors is presented in Table

3.4.

94

Table 3.4 English non-translated literary texts and their authors 1 McGuane, Thomas A Prairie Girl 2 McDermott, Alice Someone 3 Chabon, Michael Citizen Conn 4 Sayrafiezadeh, Said A Brief Encounter with the Enemy 5 Lanchester, John Expectations 6 Bergman, Megan Mayhew Housewifely Arts 7 Millhauser, Steven Phantoms 8 Bissle, Tim A Bridge Under Water 9 Chimamanda Ngozi Adichie Ceiling 10 Egan, Jennifer Out of Body 11 Englander, Nathan Free Fruit for Young Widows 12 Keegan, Claire Foster 13 Goodman, Allegra La Vita Nuova 14 Havazelet, Ehud Gurov in Manhattan 15 Hurrocks, Caitlin The Sleep 16 Johnston, Bret Anthony Soldier of Fortune 17 Lipsyte, Sam The Dungeon Master 18 Makkai, Rebecca Peter Torrelli Falling Apart 19 McCracken, Elizabeth Property 20 Nuila, Ricardo Dog Bites 21 Oates, Joyce Carol ID 22 Powers, Richard To the Measures Fall 23 Row, Jess The Call of Blood 24 Saunders, George Escape from Spiderhead 25 Slouka, Mark The Hare's Mask 26 Calhoun, Kenneth Night Blooming

95

27 Doenges, Judy Melinda 28 Means, David The Junction 29 Shepard, Jim Your Face Hurtles Down at You 30 Tuck, Lily Ice 31 Watson, Brad Alamo Plaza 32 Barnes, Julian The Limner 33 Lethem, Jonathan Procedure in the Air 34 Simpson, Helen Diary of an Interesting Year 35 Akpan, Uwem Baptizing the Gun 36 Barry, Kevin Fjord of Killary 37 Bynum, Sarah The Erlking 38 Frame, Janet Gavin Highly 39 Lennon, Robert Eight Pieces for the Left Hand 40 Link, Kelly Stone Animals 41 Munro, Alice Silence 42 Chase, Katie Man and Wife 43 Krauss, Nicole From the Desk of Daniel Varsky 44 Tice, Bradford Missionaries 45 Mccorkle, Jill Magic Words 46 Moffett, Kevin One Dog Year 47 Rash, Ron Into the Gorge 48 Groff, Lauren Delicate Edible Birds 49 Shipstead, Maggie The Cowboy Tango 50 Tower, Wells Raw Water

96

The entire English literary corpus includes 297,866 words. Descriptive statistics for the LC are presented in Table 3.5.

Table 3.5 Descriptive statistics for the LC Descriptive parameters Non-translated Human-translated Machine-translated # of texts 50 50 50 # of words 102,983 102,395 92,488 # of sentences 6,972 7,017 6,456 Average sentence length 17.2 16.2 16.3

3.3.1.2 Newspaper corpus

The newspaper corpus is drawn from a genre of commentary on topics of international news. Russian source texts, their translations, and non-translated English articles were harvested from online sources. The articles were published between 2008 and 2012. The original Russian publications include five major news sources, each represented by 10 articles in the 50-text sub-corpus in an effort to make the corpus as balanced and representative as possible. The list of the publications is provided below:

− Izvestia

− Nezavisimaia gazeta

− Pravda

− Rossiiskaia gazeta

− Vzgliad

97

The corresponding translations were obtained from the publications' own English-

language versions, where available, and, in other cases, from the website

watchingamerica.com. All translations on watchingamerica.com undergo professional editing. Notably, each translation lists not only its translator, but also its editor—a rare phenomenon that may be viewed as a sign of respect and understanding for the profession of translation.

The non-translated English comparable texts were taken from five major English- language news sources. Each source is represented by 10 articles in an effort to make the corpus as balanced and representative as possible. The list of the publications is provided below:

− Chicago Tribune

− Los Angeles Times

− The New York Times

− Washington Post

− The Wall Street Journal

In all parts of the NC, the occurrence of individual authors is limited to two times.

The occurrence of individual translators was harder to control, since in some cases, their names were not listed (for instance, in the English version of Pravda).

98

In the process of compiling the NC, headlines, by-lines, captions, and acknowledgements were removed from the corpus samples, since they are likely to display text characteristics of their own (Perfetti 1987).

The entire English newspaper corpus includes 121,112 words. Descriptive statistics for the NC are presented in Table 3.6.

Table 3.6 Descriptive statistics for the NC Descriptive parameters Non-translated Human-translated Machine-translated # of texts 50 50 50 # of words 39,893 40,440 40,779 # of sentences 1,985 2,037 2,079 Average sentence length 20.7 20.2 20.1

3.3.1.3 Scientific corpus

The scientific corpus is comprised of texts on the subject of physics. This subject was selected because my father is a Russian physicist who has, over the years, resorted to my help as a translator for his publications. My father's familiarity, and my own, with the field made it easier to select and access publications to include in the SC.

The original Russian publications include five major physics journals that are routinely translated into English; each journal is represented by 10 articles published between 2008 and 2012 within the 50-text sub-corpus. To minimize the influence of individual writing styles, main authors were included only once. For the purposes of this study, a main author is defined as the author of a single-author work, two authors of a

99

two-author work, and the first author of a work with more than two authors. The list of

the journals, with their transliterated Russian and translated English titles, is provided

below:

− Fizika metallov i metallovedenie (The Physics of Metals and Metallography)

− Fizika tverdogo tela (Physics of the Solid State)

− Izvestiia akademii nauk (seria fizicheskaia) (Bulletin of the Russian Academy of

Sciences (Physics))

− Zhurnal tekhnicheskoi fiziki (Technical Physics)

− Pis'ma v zhurnal tekhnicheskoi fiziki (Technical Physics Letters)

The comparable English-language sub-corpus includes five prominent physics journals published in English, each represented by 10 articles in this 50-text sub-corpus.

Below is the list of these journals:

− Journal of Magnetism and Magnetic Materials

− Journal of Nanomaterials

− Journal of Physics (Condensed Matter)

− Physical Review B

− Physical Review Letters

For this sub-corpus, only the articles written by authors affiliated with U.S. universities and colleges were included. This was done to ensure that the English writing was as close to that of a native speaker of English as possible. As for the Russian sub-

100

corpus, to minimize the influence of individual writing styles, main authors were

included only once.

It should be noted that scientific international English is in itself a linguistic

phenomenon where the conventions of writing differ significantly from those in literary

prose and journalism, even when written by a native speaker of a language. What reads

well to a physicist in a scientific journal may not be intelligible to a reader of The New

Yorker magazine. As David Bellos points out in his non-scholarly but entertaining book

Is That a Fish in Your Ear?, "much of the English now written by natural and social scientists whose native language is other is almost impenetrable to non-specialist readers

who believe that because they are native English speakers they should be able to

understand whatever is written in English" (2011: 17), which might influence how native

speakers write in this domain.

In the process of compiling the SC, titles, abstracts, captions, and

acknowledgements were excluded from the corpus samples, since they are likely to

display peculiar text characteristics of their own, being specific genres by themselves

(Moon 1998). The lists of works cited were also excluded from the texts. When complex

formulas and signs caused trouble for the OCR or part-of-speech tagging software, the

formulas and signs were removed from the texts or replaced with a placeholder.

Moreover, formulas and signs are not part of this study, so their exclusion does not

influence it.

101

It is interesting to note that some of the Russian journals translated into English

do mention the names of the translators for some of their articles—among them are

Fizika tverdogo tela (Physics of Solid State), Pis'ma v zhurnal tekhnicheskoi fiziki

(Technical Physics Letters), and Zhurnal tekhnicheskoi fiziki (Technical Physics).

However, the information on translators was not included regularly, and was insufficient to control for the same translators' work in the corpus.

The entire English scientific corpus includes 272,436 words. The descriptive statistics for the SC are presented in Table 3.7.

Table 3.7 Descriptive statistics for the SC Descriptive parameters Non-translated Human-translated Machine-translated # of texts 50 50 50 # of words 100,009 86,265 86,162 # of sentences 4,217 3,123 3,042 Average sentence length 24.3 28.2 28.8

3.4 Methods and procedures

3.4.1 Corpus database

The compiled corpus is stored in a Microsoft Access database (named Disserata),

modified from its previous version (CorpusCopia) based on the results of the pilot study

conducted in January–April of 2011. In addition to the text samples, Disserata stores the

following metadata for each text: author(s), translator(s) (when applicable), work title,

102

year of publication, publication venue, method of text production (non-translated, human-

translated, machine-translated), relevant links, and researcher's notes.

3.4.2 Corpus annotation

Most corpora are annotated based on the goals of specific research projects. Annotation is

"the practice of adding interpretative, linguistic information to an electronic corpus of

spoken and/or written language data" (Garside, Leech, & McEnery 1997: 2). In the

process of annotation, this interpretive, linguistic information is tagged using existing or

newly designed conventions. For the purposes of this study, part-of-speech tagging was performed for all English-language samples. The tagged samples are stored in Disserata as well.

To tag the corpus with part-of-speech information, the latest version of the

CLAWS part-of-speech (POS) tagger designed by the University of Lancaster

(http://ucrel.lancs.ac.uk/claws/) was purchased under the academic license. In addition to

parts of speech, CLAWS tags punctuation marks and sentence boundaries. The credibility

of the tool is established by the fact that it was used to annotate one of the largest and best known corpora of English—the British National Corpus. According to the official

CLAWS website, the tagger has been significantly improved over the years, and now consistently achieves 96-97% accuracy. The example of the POS-tagged text is provided in Fig. 3.1.

103

Fig. 3.1 Example of CLAWS POS-tagged text

3.4.3 Data collection

This POS information was used to collect data on the use of cohesive devices and

prepositional phrases. The data were extracted using Access database queries. To analyze

other textual features such as nominalization, lexical density (type-token ratio), passives, and word/sentence length, two more tools were used—WordSmith Tools (Scott 2009) and Writer's Workbench. Academic licenses for both tools were purchased by the researcher.

Table 3.8 presents further details on data collection, explaining the calculations and tools used to track cohesive devices and other textual features in this study. The system of calculations has been adapted from Dong and Lan (2010). The explanations for

104

the tags referenced in Table 3.8 are provided in Table 3.9. It should be noted that

CLAW's "AT" tag includes both "the" and "no" as articles; for this reason, Disserata also counted the word "the" alone.

Table 3.8 Calculations used to track cohesive devices and textual features in the corpus (adapted from Dong and Lan 2010) Dong and Lan's model (2010) Calculations Cohesive devices Reference devices Pronominal devices (3rd person CLAWS pronouns only) (PPH1+PPHO1+PPHO2+PPHS1+PPHS2) Pronominal devices (possessive CLAWS (APPGE) pronouns) Demonstratives CLAWS (DD1+DD2) Definite article CLAWS (AT) or Disserata Comparative devices CLAWS (JJR+JJT*+RGR+RGT) Conjunction devices Additive devices CLAWS (СС) Adversative devices CLAWS (CCB) Causal devices (subordinate CLAWS (CS) conjunction) Temporal devices CLAWS (RT) Continuative devices (subordinate CLAWS (CS) conjunction) Global textual features Nominalization Writer's Workbench program Lexical density (standardized type/token Wordsmith Tools ratio)

105

Word length (average) Writer's Workbench program Passives Writer's Workbench program Prepositional phrases CLAWS (IF+II+IO+IW) Sentence length (average) Disserata * Note: In Dong and Lan's model, this tag is different (JK). However, in their explanation, they talk about superlative comparative devices (the CLAWS tag "JJT"). The "JK" tag refers to catenative adjectives ("able" in "be able to", "willing" in "be willing to"), which are not relevant for this study. For this reason, the appropriate substitution was made.

Table 3.9 CLAWS conventions for the tags used in the process of data collection CLAWS abbreviation Explanation Cohesive devices Reference devices PPH1 3rd person singular neuter personal pronoun ("it") PPHO1 3rd person sing. objective personal pronoun ("him," "her") PPHO2 3rd person plural objective personal pronoun ("them") PPHS1 3rd person sing. subjective personal pronoun ("he," "she") PPHS2 3rd person plural subjective personal pronoun ("they") APPGE possessive pronoun, pre-nominal (e.g., "my," "your," "our") DD1 singular determiner (e.g., "this," "that," "another") DD2 plural determiner (e.g., "these," "those") AT definite article ("the") JJR general comparative adjective (e.g., "older," "better," "stronger") JJT general superlative adjective (e.g., "oldest," "best," "strongest")

106

RGR comparative degree adverb ("more," "less") RGT superlative degree adverb ("most," "least") Conjunction devices CC coordinating conjunction (e.g., "and," "or") CCB adversative coordinating conjunction (e.g., "but") CS subordinating conjunction (e.g., "if," "because," "unless," "so," "for") RT quasi-nominal adverb of time (e.g., "now," "tomorrow") Textual features (prepositional phrases) IF "for" (as preposition) II general preposition IO "of" (as preposition) IW "with," "without" (as prepositions)

3.4.4 Data analysis: General procedures for descriptive statistical analysis and

significance testing

Upon extraction, the data was analyzed with statistical software SPSS. Each variable

included in this study has been post-processed in order to carry out a comprehensive data analysis. Since the texts were of variable length, the variables for each text were normed to represent the value per 1,000 words. To do so, each finding was multiplied by 1,000 and then divided by the actual number of running words in the text. Descriptive statistics were then calculated for each variable, and significance testing was performed to determine whether significant differences exist between the variables of the three populations for the main research questions, as well as for the genre comparison provided

107

in Chapter 4. The major research questions of this study are addressed in Chapter 5. This

study addresses three major research questions:

1. Does the way texts are produced (non-translated, human-translated, or machine-

translated) influence the use of cohesive devices and other global textual features

in literary texts for the Russian-into-English language pair, and if so, how?

2. Does the way texts are produced (non-translated, human-translated, or machine-

translated) influence the use of cohesive devices and other global textual features

in newspaper texts for the Russian-into-English language pair, and if so, how?

3. Does the way texts are produced (non-translated, human-translated, or machine-

translated) influence the use of cohesive devices and other global textual features

in scientific texts for the Russian-into-English language pair, and if so, how?

For the descriptive statistical analysis, the following statistical measures of central tendency are reported:

Mean—"the score located at the exact mathematical center of a distribution"

(Heiman 2001: 170). It includes every score, and is used "to describe interval or ratio data, especially when the variable is continuous" (ibid.: 171), which is the case with the data in this study. However, since the mean may be skewed due to high variations in the scores, it is beneficial to supplement this measure by such alternative measures as median and mode.

108

Median—"another name for the score at the 50th percentile," which will usually

be "around where most of the distribution is located" (Heiman 2001: 168). Another way

of defining it would be to rank the scores from high to low – the median is the score at

the midpoint of the ranking. This measure is especially useful for a very skewed

distribution.

Standard deviation (SD)—a measure of variability calculated as "the square root

of the average squared deviation of scores around the mean" (Heiman 2001: 200). This

measure is used to interpret by what average the scores deviate from the mean and "to

gauge how consistently close together the scores are" (ibid.: 201). SD is an indicator of

how accurately the scores are summarized by the mean. A large SD indicates that "a

relatively large proportion of the scores are rather far from the mean" (ibid.). SD

indicates how spread out the scores are around the mean. In a perfect normal distribution,

"approximately 34% of the scores … are always between the mean and the score that is

one SD from the mean," and so approximately 68% of the scores are between the scores

at plus one SD and minus one SD from the mean (ibid.: 202-203). And another 13.5% of the scores are included in the next SD on both the high and low side of the mean. That means that 95% (68% + 27%) of the scores will be found ±2 SD’s from the mean.

The measure of mode, "the most frequently occurring score" (Heiman 2001: 166),

is not included, since it is more useful for scores from a nominal scale, when we use

qualitative variables (e.g., green, red, blue as favorite colors), which is not the case in this

study.

109

Further, the analysis of variance (ANOVA) was used to determine "whether

significant differences exist in an experiment that contains two or more conditions"

(Heiman 2001: 457). Since three conditions (or levels) exist for one independent variable,

one-way between-subjects ANOVA was employed. For the main research question, the

independent variable (factor) is the method of text production with three conditions or

levels: human-translated, machine-translated, and non-translated.

ANOVA is used in place of the t-test for experiments with more than two conditions in order to reduce the probability of making a Type I error (rejecting a true null hypothesis). ANOVA keeps the probability of making a Type I error, also referred to as the experiment-wise error rate, equal to the α selected (Heiman 2001: 460).

In the design of the present study, the null hypothesis and the alternate hypothesis are formulated as follows:

H0: μ1 = μ2 =μ3

Ha: not all μs are equal

The null hypothesis (H0) is that "there are no differences among the populations

represented by the conditions," and so "any differences between our sample means occur

because they poorly represent that one μ that would be found for all conditions" (Heiman

2001: 461). The alternative hypothesis (Ha) implies that "there is a relationship in the

population such that the population means represented by one of our levels will be

different from the population mean represented by at least one other level" (ibid.).

110

For each variable, the F statistics, which are the basis for ANOVA significance

testing, are reported. We can determine if "two or more sample means represent different

μs" (Heiman 2001: 461) by comparing Fobt to the critical value Fcrit for the corresponding

dfbn and dfwn for the desired α level (typically, α = .05). When Fobt is significant, it

indicates that at least two means are likely to represent different populations. SPSS reports significance levels, eliminating the need to manually compare Fobt to the critical

value Fcrit found in F-tables.

ANOVA indicates whether there is a statistically significant difference

somewhere in our group means. To find out exactly which level means differ

significantly, post hoc comparisons are needed (Heiman 2001: 462). In this study,

Tukey's HSD multiple comparisons test was performed as a post hoc comparison. "HSD"

stands for the "Honestly Significant Difference." Tukey's HSD is preferred "when the ns

in all levels of the factor are equal" (ibid.: 476), which is the case with the present study

(n = 50 for all levels).

For the variables that have at least one statistically significant difference in the

three groups of texts, bar graphs of means and standard error (SE) are included for the

ease of visualizing the results. When SE bars overlap, the difference between the two

means is not statistically significant. When SE bars do not overlap, the difference may be

statistically significant, and the Tukey's HSD is used to further determine its significance.

For each variable, discussions of possible implications of the findings are

provided. When appropriate, textual examples from the corpora are included.

CHAPTER 4: TEXTUAL CHARACTERISTICS OF LITERARY, NEWSPAPER, AND SCIENTIFIC TEXTS ACROSS NON-TRANSLATED, HUMAN- TRANSLATED, AND MACHINE-TRANSLATED CORPORA

This chapter presents an overview of global textual characteristics of the three genres included in the study—literary, newspaper, and scientific. These textual characteristics were selected from the variables studied in the within-genre analyses of differences between non-translated texts, human-translated texts, and machine-translated texts

(Chapter 5).

4.1 Cohesive and other global textual characteristics selected for the genre description of the corpora

Prior to reporting results for the use of cohesive devices and other textual features across non-translated, human-translated, and machine-translated texts, it is beneficial to provide a general overview of genre-specific characteristics for the corpora under study based on the data extracted from these corpora (described in detail in Chapter 5). Such a description illuminates major differences across literary, newspaper, and scientific corpora, and may contribute to discussions of genre specificities in translation studies and other linguistic fields. In addition to non-translated English texts, which are a common object of genre studies in English, this research expands genre analysis to texts translated into English by humans and machines. 111

112

Seven cohesive features were selected for the description of genre specificities.

These features include:

− Third-person pronominal cohesive devices

− Possessive pronouns

− Demonstrative pronouns

− Definite articles

− Comparative cohesive devices

− Reference cohesive devices

− Conjunction cohesive devices

In addition, the following six global textual features were used to describe genre specificities:

− Nominalization

− Lexical density

− Average word length

− Average sentence length

− Passives

− Prepositional phrases

113

4.2 Cohesive characteristics of literary, newspaper, and scientific texts

4.2.1 Third-person pronominal cohesive devices in literary, newspaper, and

scientific texts

Third-person pronouns (e.g., "he," "she") belong to the group of reference cohesive

devices and require recourse to another item in the text in order to be interpreted

(Halliday and Hasan 1976: 31). The analysis below shows that their use in English varies

across genres.

According to one-way ANOVA, statistically significant differences in the total

number of 3rd person pronominal cohesive devices across genres were found for all three

methods of text production—non-translated texts (NT), human-translated texts (HT), and

machine-translated texts (MT) (p < .0001) (see Table 4.1).

Table 4.1 Association of 3rd person pronominal cohesive devices with the genre (Literary, Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA Total Sum Between Between Total F Sig. of Squares Groups Sum Groups df of Squares df NT 81521.037 60460.330 2 149 211.001 < 0.0001*

HT 59383.642 43861.433 2 149 207.691 < 0.0001*

MT 33393.750 23827.895 2 149 183.084 < 0.0001* Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD multiple comparisons testing revealed statistically significant differences in the total number of 3rd person pronominal devices for all pairs (Literary-

114

Newspaper, Literary-Scientific, and Newspaper-Scientific) in non-translated, human- translated, and machine-translated texts. The results of Tukey's HSD testing are presented in Table 4.2. Descriptive statistics for all pronominal cohesive devices are presented in

Table 4.3.

Table 4.2 Pairwise comparisons of 3rd person pronominal cohesive devices across genres (Literary, Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing Genre Difference in Std. Error Sig. Comparison Means NT Lit-News 30.32940* 2.39391 < 0.0001 Lit-Sci 48.68940* 2.39391 < 0.0001 News-Sci 18.36000* 2.39391 < 0.0001 HT Lit-News 28.16480* 2.05517 < 0.0001 Lit-Sci 40.93200* 2.05517 < 0.0001 News-Sci 12.76720* 2.05517 < 0.0001 MT Lit-News 19.01680* 1.61337 < 0.0001 Lit-Sci 30.57040* 1.61337 < 0.0001 News-Sci 11.55360* 1.61337 < 0.0001 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 4.3 Descriptive statistics for the total numbers of 3rd person pronominal cohesive devices across genres (N = 50) Genre NT HT MT

Literary Mean 52.32 45.84 35.05

Std. Deviation 17.84 14.74 11.59

Median 51.83 44.99 35.04

Newspaper Mean 21.99 17.67 16.03

115

Std. Deviation 10.36 9.61 7.42

Median 21.63 15.68 14.64

Scientific Mean 3.63 4.90 4.48

Std. Deviation 2.07 2.64 2.40

Median 3.20 4.62 4.08

For non-translated texts, literary texts displayed a significantly higher number of

3rd person pronominal cohesive devices (52.32 ± 17.84) than newspaper texts (21.99 ±

10.36, p < .0001) and scientific texts (3.63 ± 2.07, p < .0001). For human-translated texts,

literary texts displayed a statistically significantly higher number of 3rd person pronominal cohesive devices (45.84 ± 14.74) than newspaper texts (17.67 ± 9.61, p <

.0001) and scientific texts (4.90 ± 2.64, p < .0001). For machine-translated texts, literary texts also displayed a statistically significantly higher number of 3rd person pronominal

cohesive devices (35.05 ± 11.59) than newspaper texts (16.03 ± 7.42, p < .0001) and scientific texts (4.48 ± 2.40, p < .0001). These results are visually presented in the bar charts in Fig. 4.1.

As can be seen from Fig. 4.1 and from the statistics in Tables 4.2 and 4.3, literary texts rely on the use of 3rd person pronominal devices most heavily—they had more than

twice as many devices as newspaper texts, regardless of the method of text production.

Literary non-translated texts use more than 13 times as many 3rd person pronominal

116

cohesive devices as scientific texts. Thus, the use of 3rd person pronominal cohesive devices is a significant genre distinguisher for literary, newspaper, and scientific texts.

Fig. 4.1 Means and standard error for 3rd person pronominal cohesive devices for NT, HT, and MT in the literary, newspaper, and scientific corpora

4.2.2 Possessive pronouns in literary, newspaper, and scientific texts

Possessive pronouns are a group of personal pronouns (e.g., "his," "her"). According to

Halliday and Hasan's framework (1976), they belong to reference cohesive devices, just as the 3rd person pronouns described above. English texts often use possessive pronouns

117

as required noun modifiers. As seen below, the frequency of possessive pronouns varies

across genres regardless of the method by which the texts were produced.

According to one-way ANOVA, statistically significant differences in the total number of possessive pronouns across genres were found for all three methods of text production (p < .0001) (see Table 4.4).

Table 4.4 Association of possessive pronouns with the genre (Literary, Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA Total Sum Between Between Total F Sig. of Squares Groups Sum Groups df df of Squares NT 21049.529 15369.096 2 149 198.863 <0.0001*

HT 19678.936 12946.310 2 149 141.335 < 0.0001*

MT 10271.870 5947.699 2 149 101.096 < 0.0001* Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD multiple comparisons test revealed statistically significant differences in the total number of possessive pronouns for all pairs of genres (Literary-

Newspaper, Literary-Scientific, and Newspaper-Scientific) in non-translated, human- translated, and machine-translated texts. The results of Tukey's HSD testing are presented

in Table 4.5. Descriptive statistics for possessive pronouns are presented in Table 4.6.

Table 4.5 Pairwise comparisons of possessive pronouns across genres (Literary, Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing Genre Comparison Difference in Std. Error Sig. Means NT Lit-News 14.64080* 1.24326 < 0.0001

118

Lit-Sci 24.64980* 1.24326 < 0.0001 News-Sci 10.00900* 1.24326 < 0.0001 HT Lit-News 13.37360* 1.35352 < 0.0001 Lit-Sci 22.63200* 1.35352 < 0.0001 News-Sci 9.25840* 1.35352 < 0.0001 MT Lit-News 7.67320* 1.08473 < 0.0001 Lit-Sci 15.42420* 1.08473 < 0.0001 News-Sci 7.75100* 1.08473 < 0.0001 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level

Table 4.6 Descriptive statistics for the possessive pronouns across genres (N = 50) Genre NT HT MT

Literary Mean 28.27 25.55 17.96

Std. Deviation 8.85 8.99 7.45

Median 26.38 23.70 16.58

Newspaper Mean 13.63 12.18 10.28

Std. Deviation 5.61 7.30 5.55

Median 12.21 11.42 10.02

Scientific Mean 3.62 2.92 2.53

Std. Deviation 2.48 1.79 1.42

Median 2.88 2.62 2.66

For non-translated writing, literary texts displayed a statistically significantly higher number of possessive pronouns (28.27 ± 8.85) than newspaper texts (13.63 ± 5.61,

119

p < .0001) and scientific texts (3.62 ± 2.48, p < .0001). For the human-translated corpora,

literary texts also displayed a statistically significantly higher number of possessive

pronouns (25.55 ± 8.99) than newspaper texts (12.18 ± 7.30, p < .0001) and scientific

texts (2.92 ± 1.79, p < .0001). For machine-translated corpora, literary texts again displayed a statistically significantly higher number of possessive pronouns (17.96 ±

7.45) than newspaper texts (10.28 ± 5.55, p < .0001) and scientific texts (2.53 ± 1.42, p <

.0001). These results are visually presented in the bar charts in Fig. 4.2.

As can be seen from Fig. 4.2 and from the descriptive statistics in Table 4.6, of the three genres, literary texts rely on the use of personal pronouns most, regardless of the method of text production. They are followed by the newspaper texts. The scientific corpus contained by far the fewest number of possessive pronouns—for non-translated texts, the rounded average for the scientific texts was 4, for the newspaper texts—14, and

for the literary texts—28. In other words, literary non-translated texts contained seven

times more possessive pronouns than did scientific texts, and two times more possessive pronouns than did newspaper texts. Thus, the use of possessive pronouns is another significant genre distinguisher for literary, newspaper, and scientific texts.

120

Fig. 4.2 Means and standard error for possessive pronouns for NT, HT, and MT in the literary, newspaper, and scientific corpora

4.2.3 Demonstrative pronouns in literary, newspaper, and scientific texts

Demonstrative pronouns (e.g., "this," "that," "these," "those") belong to the group of reference cohesive devices, and realize textual cohesion "by means of location, on a scale of proximity" in relation to the speaker (space) or the moment of speech (time) (Halliday and Hasan 1976: 37-38). The use of demonstratives across genres varied, although not as consistently as the use of personal and possessive pronouns.

121

According to one-way ANOVA, statistically significant differences in the total

number of demonstrative pronouns across genres were found for all three methods of text

production (p < .05) (see Table 4.7).

Table 4.7 Association of demonstratives with the genre (Literary, Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA Total Sum Between Between Total F Sig. of Squares Groups Sum Groups df of Squares df NT 2791.247 120.470 2 149 3.315 0.0391* HT 2820.700 249.622 2 149 7.136 0.0011* MT 2175.011 177.971 2 149 6.550 0.0019* Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD multiple comparisons test revealed statistically significant differences in the use of demonstrative pronouns for some of the pairs. The results of

Tukey's HSD testing are presented in Table 4.8. Descriptive statistics for demonstrative pronouns are presented in Table 4.9.

Table 4.8 Pairwise comparisons of demonstratives across genres (Literary, Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing Genre Difference in Std. Error Sig. Comparison Means NT Lit-News -2.16380* .85249 0.0325 Lit-Sci -1.40220 .85249 0.2302 News-Sci .76160 .85249 0.6453 HT Lit-News .67540 .83643 0.6990 Lit-Sci 3.01100* .83643 0.0013 News-Sci 2.33560* .83643 0.0162

122

MT Lit-News -.90620 .73717 0.4378 Lit-Sci 1.72020 .73717 0.0543 News-Sci 2.62640* .73717 0.0014 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 4.9 Descriptive statistics for demonstratives across genres (N = 50) Genre NT HT MT

Literary Mean 9.35 12.12 9.12

Std. Deviation 3.81 4.32 3.53

Median 8.99 11.59 9.30

Newspaper Mean 11.51 11.45 10.03

Std. Deviation 4.43 4.92 4.68

Median 11.20 10.24 9.34

Scientific Mean 10.75 9.11 7.40

Std. Deviation 4.51 3.10 2.53

Median 10.82 8.78 7.22

Overall, all three genres seem to rely on the use of demonstratives, with relatively few differences. Literary non-translated texts displayed a significantly lower number of demonstratives pronouns (9.35 ± 3.81) than newspaper texts (11.51 ± 4.43, p = .0325).

No statistically significant differences were found for the other two pairs (literary- scientific and newspaper-scientific). Both human-translated literary texts and newspaper texts displayed a statistically significantly higher number of demonstratives (12.12 ± 4.32 and 11.45 ± 4.92, respectively) than scientific texts (9.11 ± 3.10, p = .0013 and p = .0162,

123

respectively), while no statistically significant differences were found between human- translated literary and newspaper texts. For machine-translated texts, newspaper texts displayed a significantly higher number of demonstratives (10.03± 4.68) than scientific texts (7.40 ± 2.53, p = .0014), while no statistically significant differences were found for the other two pairs. These results are visually presented in the bar charts in Fig. 4.3.

Fig. 4.3 Means and standard error for demonstratives for NT, HT, and MT in the literary, newspaper, and scientific corpora

124

4.2.4 Definite article in literary, newspaper, and scientific texts

The definite article ("the") is a specific, non-pronominal demonstrative cohesive device.

In English, most nouns are used in combination with some determiners—articles

(definite, indefinite, or zero), demonstrative and possessive pronouns, or other nouns in

the genitive case.

For the definite article, the genre comparison proves interesting since scientific

texts use the highest number of definite articles of the three genres, while literary texts

use the lowest number of definite articles (Table 4.96). According to one-way ANOVA,

statistically significant differences in the total number of the definite articles across

genres were found for all three methods of text production (p < .0001) (see Table 4.10).

Table 4.10 Association of the definite article with the genre (Literary, Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA Total Sum Between Between Total F Sig. of Squares Groups Sum Groups df of Squares df NT 43027.738 18226.095 2 149 54.013 < 0.0001*

HT 103553.223 52774.219 2 149 76.388 < 0.0001*

MT 78527.729 48552.243 2 149 119.050 < 0.0001* Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD multiple comparisons test revealed statistically significant differences in the total number of definite articles for all pairs of genres (Literary-

Newspaper, Literary-Scientific, and Newspaper-Scientific) in non-translated, human- translated, and machine-translated texts. The results of Tukey's HSD testing are presented

125

in Table 4.11 Descriptive statistics for all definite article cohesive devices are presented in Table 4.12.

Table 4.11 Pairwise comparisons of the definite article across genres (Literary, Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing Genre Difference in Std. Error Sig. Comparison Means NT Lit-News -7.49920* 2.59783 0.0124 Lit-Sci -26.21300* 2.59783 <0.0001 News-Sci -18.71380* 2.59783 <0.0001 HT Lit-News -27.40620* 3.71718 <0.0001 Lit-Sci -45.63900* 3.71718 <0.0001 News-Sci -18.23280* 3.71718 <0.0001 MT Lit-News -25.49680* 2.85598 <0.0001 Lit-Sci -43.87720* 2.85598 <0.0001 News-Sci -18.38040* 2.85598 <0.0001 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 4.12 Descriptive statistics for the definite article across genres (N = 50) Genre NT HT MT

Literary Mean 54.06 56.35 63.41

Std. Deviation 12.91 17.49 13.30

Median 54.23 55.37 65.45

Newspaper Mean 61.56 83.76 88.91

Std. Deviation 10.99 18.54 14.21

Median 61.62 83.06 92.13

126

Scientific Mean 80.28 101.99 107.29

Std. Deviation 14.80 19.67 15.26

Median 78.33 99.38 106.02

For non-translated writing, literary texts used definite articles significantly less

frequently (54.06 ± 12.91) than newspaper texts (61.56 ± 10.99, p = .0124) and scientific

texts (80.28 ± 14.80, p < .0001). For human-translated corpora, literary texts displayed a

significantly lower number of definite articles (56.35 ± 17.49) than newspaper texts

(83.76 ± 18.54, p < .0001) and scientific texts (101.99 ± 19.67, p < .0001). For machine-

translated corpora, literary texts displayed a significantly lower number of definite

articles (63.41 ± 13.30) than newspaper texts (88.91 ± 14.21, p < .0001) and scientific texts (107.29 ± 15.26, p < .0001). These results are visually presented in the bar charts in

Fig. 4.4.

As can be seen from Fig. 4.4 and from the statistics in Table 4.12, of the three genres, scientific texts use definite articles most frequently, regardless of the method of text production. For scientific non-translated texts, the rounded average was 80, for

human translations—102, and for machine translations—107. The scientific texts are followed by the newspaper texts, with the rounded average for non-translated texts being

62, for human translations—84, and for machine translations—89. The literary corpus contained by far the fewest number of definite articles—for non-translated texts, the rounded average for literary texts was 54, for human-translated literary texts—56, and for machine-translated literary texts—63. In other words, scientific non-translated texts used

127

1.5 times more definite articles than literary texts, and 1.3 times more definite articles than newspaper texts. Thus, the use of definite articles is yet another significant genre distinguisher for literary, newspaper, and scientific texts, along with 3rd person pronominal cohesive devices, and possessive pronouns.

Fig. 4.4 Means and standard error for the definite article for NT, HT, and MT in the literary, newspaper, and scientific corpora

128

4.2.5 Comparative cohesive devices in literary, newspaper, and scientific texts

Comparative cohesive devices are referential, and have the meaning of identity, similarity, or difference (in the case of general comparison; e.g., "same" or "different"), or numerative and epithetic expressions (in the case of particular comparison; e.g.,

"more" or "better") (Halliday and Hasan 1976: 76-77). This study covers four types of comparative cohesive devices:

− General comparative adjective (e.g., "older," "better," "stronger")

− General superlative adjective (e.g., "oldest," "best," "strongest")

− Comparative degree adverb ("more," "less")

− Superlative degree adverb ("most," "least")

For the sum of these four types of comparative cohesive devices, the genre

comparison proves interesting for non-translated texts. According to one-way ANOVA, statistically significant differences in the total number of comparatives were found in the non-translated texts for all three methods of text production (p < .0001) (see Table 4.13).

In human and machine translations, the differences across genres were not statistically

significant.

129

Table 4.13 Association of comparative devices with the genre (Literary, Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA Total Sum Between Between Total F Sig. of Squares Groups Sum Groups df of Squares df NT 1180.939 158.871 2 149 11.425 < 0.0001*

HT 989.058 27.551 2 149 2.106 0.1254

MT 1080.116 44.537 2 149 3.161 0.0453 Note: * indicates p-values significant at 0.05 alpha-level

For non-translated texts, Tukey's HSD multiple comparisons test revealed statistically significant differences in the total number of comparatives between literary and newspaper texts, and between literary and scientific texts. The results of Tukey's

HSD testing are presented in Table 4.14. Descriptive statistics for all pronominal cohesive devices are presented in Table 4.15.

Table 4.14 Pairwise comparisons of significant comparative devices across genres (Literary, Newspaper, Scientific) for NT by Tukey HSD post hoc testing Genre Difference in Std. Error Sig. Comparison Means NT Lit-News -2.29340* .52736 0.0001 Lit-Sci -2.05300* .52736 0.0004 News-Sci .24040 .52736 0.8919 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 4.15 Descriptive statistics for comparative devices across genres (N = 50) Genre NT HT MT

Literary Mean 2.93 3.45 3.35

130

Std. Deviation 1.73 2.16 2.23

Median 2.93 3.24 2.94

Newspaper Mean 5.23 4.32 4.45

Std. Deviation 2.73 3.39 3.48

Median 4.93 3.29 3.59

Scientific Mean 4.99 3.38 3.25

Std. Deviation 3.23 1.86 2.01

Median 4.43 2.98 2.79

Literary non-translated texts had a significantly lower number of comparatives

(2.93 ± 1.73) than newspaper texts (5.23 ± 2.73, p = .0001) and scientific texts (4.99 ±

3.23, p = .0004). No statistically significant differences were found for non-translated

newspaper and scientific articles. The means and standard errors for all groups are

visually presented in the bar charts in Fig. 4.5.

As can be seen from Fig. 4.5 and from the statistics in Table 4.15, literary non-

translated writing in English uses fewer comparatives, with a rounded mean of 3, than the

non-translated newspaper and scientific texts, with a rounded mean of 5 (per 1,000 words). This is not surprising, since scientific writers tend to compare results and findings, and newspaper texts tend to compare various phenomena. These findings seem to confirm our general assumptions about comparatives in these three genres.

131

Fig. 4.5 Means and standard error for comparative devices for NT, HT, and MT in the literary, newspaper, and scientific corpora

4.2.6 Reference cohesive devices in literary, newspaper, and scientific texts

For the sum of the reference cohesive devices included in this study (3rd person pronouns

+ possessive pronouns + demonstratives + definite article + comparatives), the genre

comparison also proves interesting. Of the three genres, scientific texts use the lowest

number of reference cohesive devices, while literary texts use the highest number of such

devices (Table 4.18 and Fig. 4.6).

132

According to one-way ANOVA, statistically significant differences in the total

number of reference cohesive devices across genres were found for non-translated texts

and human translations only (p < .0001) (see Table 4.16). Interestingly, no statistically

significant differences were found for machine translations (p = .1647), which may

suggest that MT tools are not as genre-sensitive in terms of using reference cohesive

devices as human writers and translators.

Table 4.16 Association of reference cohesive devices with the genre (Literary, Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA Total Sum Between Groups Between Total F Sig. of Squares Sum of Squares Groups df df NT 109167.454 51840.559 2 149 66.466 < 0.0001*

HT 54412.827 11418.735 2 149 19.521 < 0.0001*

MT 26651.601 646.100 2 149 1.826 0.1647 Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD multiple comparisons test revealed statistically significant differences in the total number of reference devices for all pairs of genres (Literary-

Newspaper, Literary-Scientific, and Newspaper-Scientific) in non-translated texts. In human translations, literary texts showed statistically significant differences compared to newspaper and scientific texts. The results of Tukey's HSD testing are presented in Table

4.17. Descriptive statistics for all reference cohesive devices are presented in Table 4.18.

133

Table 4.17 Pairwise comparisons of reference cohesive devices across genres (Literary, Newspaper, Scientific) for NT and HT by Tukey HSD post hoc testing Genre Difference in Std. Error Sig. Comparison Means NT Lit-News 33.01300* 3.94958 < 0.0001 Lit-Sci 43.66940* 3.94958 < 0.0001 News-Sci 10.65640* 3.94958 0.0211 HT Lit-News 13.93360* 3.42039 0.0002 Lit-Sci 21.00080* 3.42039 < 0.0001 News-Sci 7.06720 3.42039 0.1005 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 4.18 Descriptive statistics for reference cohesive devices across genres (N = 50) Genre NT HT MT

Literary Mean 146.93 143.31 128.89

Std. Deviation 26.74 17.53 13.51

Median 148.24 141.07 129.12

Newspaper Mean 113.92 129.37 129.70

Std. Deviation 13.03 14.99 11.61

Median 113.04 128.53 129.52

Scientific Mean 103.26 122.31 124.95

Std. Deviation 16.88 18.58 14.61

Median 102.69 122.33 122.64

Literary non-translated texts used reference devices significantly more often

(146.93 ± 26.74) than non-translated newspaper texts (113.92 ± 13.03, p < .0001) and

134

non-translated scientific texts (103.26 ± 16.88, p < .0001). The difference between newspaper and scientific genres was also statistically significant (p = .0211). Human- translated literary texts displayed a significantly higher number of reference devices

(143.31 ± 17.53) than human-translated newspaper texts (129.37 ± 14.99, p = .0002) and human-translated scientific texts (122.31 ± 18.58, p < .0001). These results are visually presented in the bar charts in Fig. 4.6.

Fig. 4.6 Means and standard error for reference cohesive devices NT, HT, and MT in the literary, newspaper, and scientific corpora

135

As can be seen from Fig. 4.6 and from the statistics in Table 4.18, of the three genres, scientific non-translated texts use reference cohesive devices least frequently. For scientific non-translated texts, the rounded average was 103 per 1,000 words, for newspaper non-translated texts—114 per 1,000 words, and for literary non-translated texts—147 per 1,000 words. The scientific human and machine translations use significantly fewer reference cohesive devices (122 and 129 per 1,000 words, respectively) than literary human translations (143 per 1,000 words). Interestingly, machine translations do not display statistically significant differences in the use of reference cohesive devices across genres. MT texts used similar numbers of reference cohesive devices across genres—129 per 1,000 words for literary texts, 130 per 1,000 words for newspaper texts, and 125 per 1,000 words for scientific texts. This may suggest that MT tools are not as genre-sensitive in terms of reference cohesive devices as human translators are.

These findings also suggest that the use of reference cohesive devices is a significant genre distinguisher for non-translated literary, newspaper, and scientific texts, along with 3rd person pronominal cohesive devices, possessive pronouns, and definite articles.

4.2.7 Conjunction cohesive devices in literary, newspaper, and scientific texts

Conjunction cohesive devices included in the study belong to four groups:

− Additive devices (e.g., "and," "or")

136

− Adversative devices (e.g., "but")

− Causal and continuative devices represented by subordinating conjunctions (e.g.,

"if," "because," "unless," "so," "for")

− Temporal devices (e.g., "now," "tomorrow")

For the use of conjunction cohesive devices across genres, statistically significant

differences were found regardless of the method of text production. Of the three genres,

scientific texts use the lowest number of conjunction cohesive devices, while literary

texts use the highest number of such devices (Table 4.21 and Fig. 4.7). According to one-

way ANOVA, statistically significant differences in the total number of conjunction

cohesive devices across genres were found for all three methods of text production (p <

.0001) (see Table 4.19).

Table 4.19 Association of conjunction cohesive devices with the genre (Literary, Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA Total Sum Between Between Total F Sig. of Squares Groups Sum Groups df of Squares df NT 19663.789 6681.764 2 149 37.830 < 0.0001*

HT 35306.205 19078.848 2 149 86.416 < 0.0001*

MT 53563.610 31540.858 2 149 105.266 < 0.0001* Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD multiple comparisons test revealed statistically significant differences in the total number of conjunction devices for all pairs of genres (Literary-

Newspaper, Literary-Scientific, and Newspaper-Scientific) in non-translated, human-

137

translated, and machine-translated texts, with the only exception of the literary- newspaper comparison for non-translated texts. The results of Tukey's HSD testing are presented in Table 4.20. Descriptive statistics for all conjunction cohesive devices are presented in Table 4.21.

Table 4.20 Pairwise comparisons of conjunction cohesive devices across genres (Literary, Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing Genre Difference in Std. Error Sig. Comparison Means NT Lit-News 4.36940 1.87950 0.0555 Lit-Sci 15.82780* 1.87950 < 0.0001 News-Sci 11.45840* 1.87950 < 0.0001 HT Lit-News 20.33920* 2.10133 < 0.0001 Lit-Sci 26.35920* 2.10133 < 0.0001 News-Sci 6.02000* 2.10133 0.0132 MT Lit-News 22.25140* 2.44798 < 0.0001 Lit-Sci 35.10240* 2.44798 < 0.0001 News-Sci 12.85100* 2.44798 < 0.0001 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 4.21 Descriptive statistics for conjunction cohesive devices across genres (N = 50) Genre NT HT MT

Literary Mean 52.40 56.82 63.70

Std. Deviation 11.39 12.27 15.72

Median 52.48 56.18 62.72

Newspaper Mean 48.03 36.49 41.45

138

Std. Deviation 8.43 11.29 12.81

Median 48.28 34.76 39.45

Scientific Mean 36.57 30.46 28.60

Std. Deviation 8.00 7.30 6.19

Median 35.94 30.04 28.23

With the exception of the literary-newspaper pair in the corpus of non-translated

texts, literary texts used conjunction devices more often than newspaper and scientific

texts. Scientific texts used the fewest conjunction devices of the three genres, regardless

of the method of text production. Non-translated literary texts used conjunction devices significantly more often (52.40 ± 11.39) than non-translated scientific texts (36.57 ± 8.00,

p < .0001). The difference between newspaper texts (48.03 ± 8.43) and scientific texts

(36.57 ± 8.00) was also statistically significant (p < .0001). Human-translated literary

texts displayed a significantly higher number of conjunction devices (56.82 ± 12.27) than

did human-translated newspaper texts (36.49 ± 11.29, p < .0001) and human-translated

scientific texts (30.46 ± 7.30, p < .0001). The difference between human-translated

newspaper and scientific genres was also statistically significant (P = .0132). Machine-

translated literary texts also displayed a significantly higher number of conjunction

devices (63.70 ± 15.72) than machine-translated newspaper texts (41.45 ± 12.81, p <

.0001) and machine-translated scientific texts (28.60 ± 6.19, p < .0001). The difference

between machine-translated newspaper and scientific genres was also statistically

significant (P < .0001). These results are visually presented in the bar charts in Fig. 4.7.

139

Fig. 4.7 Means and standard error for conjunction cohesive devices for NT, HT, and MT in the literary, newspaper, and scientific corpora

Scientific texts seem to rely least on the use of conjunction devices. This may be related to the fact that scientific writers tend to be more straightforward in reporting their results. Literary and newspaper writers, on the other hand, may use conjunctive devices for stylistic and artistic effects.

These findings suggest that the use of conjunction cohesive devices may be a significant genre distinguisher for non-translated literary, newspaper, and scientific texts,

140

along with 3rd person pronominal cohesive devices, possessive pronouns, definite articles, and reference devices discussed above.

4.3 Textual characteristics of literary, newspaper, and scientific texts

4.3.1 Nominalization in literary, newspaper, and scientific texts

Halliday (1994) broadly defines nominalization as a process "whereby any element or group of elements is made to function as a nominal group in the clause" (41). It is expected that different genres would employ nominalization with different frequencies.

This is supported by the statistical analysis presented below.

The use of nominalization was found to be significantly different across genres regardless of the method of text production. Of the three genres, scientific texts use nominalization most frequently, while literary texts have the fewest number of nominalizations (Table 4.24 and Fig. 4.8). According to one-way ANOVA, statistically significant differences in the total number of nominalizations across genres were found for all three methods of text production (p < .0001) (see Table 4.22).

Table 4.22 Association of nominalization with the genre (Literary, Newspaper, Scientific) for NT, HT, MT by one-way ANOVA Total Sum Between Between Total F Sig. of Squares Groups Sum Groups df of Squares df NT 33857.692 24286.850 2 149 186.513 < 0.0001*

HT 41708.251 28275.381 2 149 154.713 < 0.0001*

MT 39524.980 25534.291 2 149 134.144 < 0.0001* Note: * indicates p-values significant at 0.05 alpha-level

141

Tukey's HSD multiple comparisons test revealed statistically significant differences in nominalization for all pairs of genres (Literary-Newspaper, Literary-

Scientific, and Newspaper-Scientific) in non-translated, human-translated, and machine- translated texts. The results of Tukey's HSD testing are presented in Table 4.23.

Descriptive statistics for nominalization are presented in Table 4.24.

Table 4.23 Pairwise comparisons of nominalization across genres (Literary, Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing Genre Difference in Std. Error Sig. Comparison Means NT Lit-News -14.93560* 1.61379 < 0.0001 Lit-Sci -31.15960* 1.61379 < 0.0001 News-Sci -16.22400* 1.61379 < 0.0001 HT Lit-News -19.65640* 1.91186 < 0.0001 Lit-Sci -33.46040* 1.91186 < 0.0001 News-Sci -13.80400* 1.91186 < 0.0001 MT Lit-News -19.10800* 1.95115 < 0.0001 Lit-Sci -31.73940* 1.95115 < 0.0001 News-Sci -12.63140* 1.95115 < 0.0001 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level

142

Table 4.24 Descriptive statistics for nominalization across genres (N = 50) Genre NT HT MT

Literary Mean 5.84 6.88 7.40

Std. Deviation 3.14 3.91 4.07

Median 5.85 6.20 6.78

Newspaper Mean 20.77 26.54 26.50

Std. Deviation 8.28 12.03 11.07

Median 19.42 23.90 24.26

Scientific Mean 37.00 40.34 39.13

Std. Deviation 10.81 10.68 12.10

Median 35.83 41.19 37.34

For the non-translated corpora, scientific texts used nominalization significantly more often (37.00 ± 10.81) than newspaper texts (20.77 ± 8.28, p < .0001) and literary texts (5.84 ± 3.14, p < .0001). The difference between newspaper texts and literary texts was also statistically significant (P < .0001). For the human-translated corpora, scientific texts also displayed a significantly higher number of nominalizations (40.34 ± 10.68) than newspaper texts (26.54 ± 12.03, p < .0001) and literary texts (6.88 ± 3.91, p <

.0001). Human-translated newspaper texts also used a significantly higher number of nominalization than did literary texts (P < .0001). For machine-translated corpora, scientific texts also displayed significantly more instances of nominalization (39.13 ±

12.10) than newspaper texts (26.50 ± 11.07, p < .0001) and literary texts (7.40 ± 4.07, p <

.0001). Continuing the pattern, the machine-translated newspaper texts were significantly

143

higher than literary texts (P < .0001). These results are presented in the bar charts in Fig.

4.8.

Fig. 4.8 Means and standard error for nominalization for NT, HT, and MT in the literary, newspaper, and scientific corpora

As seen from these data, regardless of the method of text production, scientific texts use nominalization most frequently, while literary texts—least frequently.

According to Halliday (1994: 265-267), nominalization is related to impersonal or abstract voicing of ideas; thus, it is not surprising that scientific texts use nominalization

144

most. Literary and editorial writers, on the other hand, may not use it as frequently, since they strive more for artistic and stylistic effects. The lower number of nominalizations in literary texts (6 per 1,000 words for non-translated texts and 7 per 1,000 words for human and machine translations) supports this point. As we see later in this chapter, scientific writing also uses a significantly higher number of passives than does literary and newspaper writing, thus focusing on the process and not on the actor/agent. Both nominalizations and passives contribute to abstractness of scientific texts. Newspaper texts were found to be in between scientific and newspaper genres in terms of both nominalization and passives, which is consistent with the nature of these text-types.

Thus, the use of nominalization may also be considered a significant genre distinguisher, regardless of the method of text production, along with 3rd person pronominal cohesive devices, possessive pronouns, definite articles, reference devices, and conjunction devices discussed above.

4.3.2 Lexical density in literary, newspaper, and scientific texts

As a measure of lexical density, this study employs the standardized type-token ratio

(STTR) calculated by WordSmith Tools. As described earlier, type-token ratio (TTR) is a measure of vocabulary variation within a written text, with tokens being the total number of words in a text, and types being different words (WordSmith Tools Manual).

Standardized TTR is a measure designed by WordSmith Tools to correct for differences in the lengths of individual texts comprising a corpus.

145

According to one-way ANOVA, statistically significant differences in STTR were

found across genres (p < .0001) (see Table 4.25).

Table 4.25 Association of STTR with the genre (Literary, Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA Total Sum Between Between Total F Sig. of Squares Groups Sum Groups df of Squares df NT 4101.161 3229.133 2 149 272.172 < 0.0001* 239.685 HT 6297.795 4819.794 2 149 < 0.0001*

MT 6153.044 4843.472 2 149 271.841 < 0.0001* Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD multiple comparisons test revealed statistically significant differences in STTR for Literary-Scientific and Newspaper-Scientific pairs, regardless of the method of text production. Literary and Newspaper texts were not found to be statistically significantly different in STTR in all three groups of texts (non-translated, human-translated, and machine-translated). The results of Tukey's HSD testing are presented in Table 4.26. Descriptive statistics for STTR are presented in Table 4.27.

Table 4.26 Pairwise comparisons of STTR across genres (Literary, Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing Genre Difference in Std. Error Sig. Comparison Means NT Lit-News -1.09940 .48712 0.0653 Lit-Sci 9.24660* .48712 < 0.0001 News-Sci 10.34600* .48712 < 0.0001 HT Lit-News .02340 .63417 0.9992

146

Lit-Sci 12.03640* .63417 < 0.0001 News-Sci 12.01300* .63417 < 0.0001 MT Lit-News -.32160 .59695 0.8524 Lit-Sci 11.89020* .59695 < 0.0001 News-Sci 12.21180* .59695 < 0.0001 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 4.27 Descriptive statistics for STTR across genres (N = 50) Genre NT HT MT

Literary Mean 74.51 72.82 72.17

Std. Deviation 2.37 3.11 3.44

Median 74.48 73.29 73.13

Newspaper Mean 75.61 72.80 72.49

Std. Deviation 1.97 2.91 2.84

Median 75.60 73.00 72.92

Scientific Mean 65.27 60.78 60.28

Std. Deviation 2.88 3.47 2.62

Median 65.14 61.19 60.43

For the non-translated corpora, scientific texts had a statistically significantly

lower STTR (65.27 ± 2.88) than literary texts (74.51 ± 2.37, p < .0001) and newspaper

texts (75.61 ± 1.97, p < .0001). No statistically significant differences were found

between non-translated literary and non-translated newspaper texts (p = .0653). For the

human-translated corpora, scientific texts also had a significantly lower STTR (60.78 ±

3.47) than literary texts (72.82 ± 3.11, p < .0001) and newspaper texts (72.80 ± 2.91, p <

147

.0001). No statistically significant differences were found between human-translated

literary and newspaper texts (P = .9992). For machine-translated corpora, scientific texts were found to have a statistically significantly lower STTR as well (60.28 ± 2.62), when compared to literary texts (72.17 ± 3.44, p < .0001) and newspaper texts (72.49 ± 2.84, p

< .0001). No statistically significant differences were found between machine-translated literary and newspaper texts (P = .8524). These results are presented in the bar charts in

Fig. 4.9.

Fig. 4.9 Means and standard error for STTR for NT, HT, and MT in the literary, newspaper, and scientific corpora

148

As we see from the findings, literary and newspaper texts have a higher STTR

than scientific texts, regardless of the method of text production. These findings may

support the general intuition that scientific texts tend to use a narrow range of vocabulary

since their goal is clarity and not artistic or stylistic effects. It is worth noting the

similarity in STTR between literary texts and newspaper editorials, regardless of the method of text production.

These findings suggest that STTR may also be considered a genre distinguisher for comparing literary and newspaper texts with scientific texts.

4.3.3 Average word length in literary, newspaper, and scientific texts

Average word length is a component of common readability formulas, and may be predicted to be different across genres. This is supported by the statistics presented below.

The average word length was found to be significantly different across genres regardless of the method of text production, with the only exceptions being in the comparison of newspaper and scientific texts in human and machine translations. Of the three genres, scientific texts have the highest average word length, while literary texts have the lowest (Table 4.30 and Fig. 4.10).

According to one-way ANOVA, statistically significant differences in the average word length across genres were found for all three methods of text production (p < .0001)

(see Table 4.28).

149

Table 4.28 Association of average word length with the genre (Literary, Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA Total Between Between Total F Sig. Sum of Groups Sum Groups df Squares of Squares df NT 30.828 18.979 2 149 117.727 < 0.0001* HT 22.085 12.997 2 149 105.118 < 0.0001* MT 17.960 10.840 2 149 111.917 < 0.0001* Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD multiple comparisons test revealed statistically significant differences in average word length for all pairs of genres (Literary-Newspaper, Literary-

Scientific, and Newspaper-Scientific) in non-translated texts and for literary-newspaper and literary-scientific pairs in human and machine translations. The results of Tukey's

HSD testing are presented in Table 4.29. Descriptive statistics for average word length are presented in Table 4.30.

Table 4.29 Pairwise comparisons of average word length across genres (Literary, Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing Genre Difference in Std. Error Sig. Comparison Means NT Lit-News -.64500* .05678 < 0.0001 Lit-Sci -.82980* .05678 < 0.0001 News-Sci -.18480* .05678 0.0040 HT Lit-News -.60160* .04973 < 0.0001 Lit-Sci -.64500* .04973 < 0.0001 News-Sci -.04340 .04973 0.6583 MT Lit-News -.55360* .04401 < 0.0001

150

Lit-Sci -.58560* .04401 < 0.0001 News-Sci -.03200 .04401 0.7479 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 4.30 Descriptive statistics for average word length across genres (N = 50) Genre NT HT MT

Literary Mean 4.25 4.35 4.33

Std. Deviation 0.33 0.24 0.22

Median 4.27 4.34 4.31

Newspaper Mean 4.90 4.95 4.89

Std. Deviation 0.25 0.25 0.21

Median 4.91 4.98 4.90

Scientific Mean 5.08 4.99 4.92

Std. Deviation 0.26 0.26 0.23

Median 5.08 4.99 4.90

For the non-translated corpora, scientific texts had a significantly higher average

word length (5.08 ± 0.26) than newspaper texts (4.90 ± 0.25, p < .0001) and literary texts

(4.25 ± 0.33, p < .0001). The difference between non-translated newspaper and scientific

texts was also found to be statistically significant (p = .0040). For the human-translated

corpora, literary texts displayed a significantly lower average word length (4.35 ± 0.24) than newspaper texts (4.95 ± 0.25, p < .0001) and scientific texts (4.99 ± 0.26, p < .0001).

The difference in average word length between human-translated newspaper and

scientific texts was not statistically significant. For the machine-translated corpora,

151

literary texts also displayed a significantly lower average word length (4.33 ± 0.22) than newspaper texts (4.89 ± 0.21, p < .0001) and scientific texts (4.92 ± 0.23, p < .0001). The difference in average word length between machine-translated newspaper and scientific texts was not statistically significant. These results are presented in the bar charts in Fig.

4.10.

Fig. 4.10 Means and standard error for average word length for NT, HT, and MT in the literary, newspaper, and scientific corpora

152

These findings are not unexpected, since scientific texts are often perceived to be

drier and more difficult to read due to longer words that may appear to be more complex.

As seen earlier, scientific texts were found to have a higher number of nominalizations,

which tend to be lengthy.

Thus, average word length seems to be another parameter that differentiates

genres. The more pronounced differences were found for the when

compared with scientific and newspaper texts, regardless of the method of text

production.

4.3.4 Average sentence length in literary, newspaper, and scientific texts

Similarly to average word length, average sentence length is a component of common

readability formulas, and may be predicted to be different across genres. In Gary and

Leary's comprehensive study of readability (1935), longer sentences were found to be more difficult for readers. In fact, the study found that average sentence length is among the best predictors of textual difficulty. This is supported by the statistics presented below.

The average sentence length was found to be significantly different across genres regardless of the method of text production. Of the three genres, scientific texts have the highest average sentence length, while literary texts have the lowest (Table 4.33 and Fig.

4.11).

153

According to one-way ANOVA, statistically significant differences in the average

sentence length across genres were found for all three methods of text production (p <

.0001) (see Table 4.31).

Table 4.31 Association of average sentence length with the genre (Literary, Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA Total Between Between Total F Sig. Sum of Groups Sum Groups df Squares of Squares df NT 6924.718 1266.508 2 149 16.452 < 0.0001* HT 6887.851 3714.722 2 149 86.045 < 0.0001* MT 7569.700 4087.414 2 149 86.272 < 0.0001* Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD multiple comparisons test revealed statistically significant differences in average sentence length for all pairs of genres (Literary-Newspaper,

Literary-Scientific, and Newspaper-Scientific), regardless of the method of text production. The results of Tukey's HSD testing are presented in Table 4.32. Descriptive statistics for average sentence length are presented in Table 4.33.

Table 4.32 Pairwise comparisons of average sentence length across genres (Literary, Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing Method Difference in Std. Error Sig. Comparison Means NT Lit-News -3.55000* 1.24083 0.0133 Lit-Sci -7.11760* 1.24083 < 0.0001 News-Sci -3.56760* 1.24083 0.0128 HT Lit-News -4.00380* .92921 0.0001 Lit-Sci -11.97280* .92921 < 0.0001

154

News-Sci -7.96900* .92921 < 0.0001 ASL—MT Lit-News -3.85080* .97343 0.0003 Lit-Sci -12.48480* .97343 < 0.0001 News-Sci -8.63400* .97343 < 0.0001 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 4.33 Descriptive statistics for average sentence length across genres (N = 50) Genre NT HT MT

Literary Mean 17.16 16.24 16.28

Std. Deviation 9.40 5.98 6.41

Median 15.04 14.43 14.60

Newspaper Mean 20.71 20.24 20.13

Std. Deviation 3.45 3.21 3.75

Median 20.22 19.91 19.50

Scientific Mean 24.27 28.21 28.76

Std. Deviation 3.91 4.32 3.99

Median 23.64 27.02 28.58

For the non-translated corpora, scientific texts had a statistically significantly

higher average sentence length (24.27 ± 3.91) than newspaper texts (20.71 ± 3.45, p =

.0128) and literary texts (17.16 ± 9.40, p < .0001). The difference between non-translated

newspaper and literary texts was also found to be statistically significant (P = .0128). For

the human-translated corpora, scientific texts also displayed a significantly higher average sentence length (28.21 ± 4.32) than newspaper texts (20.24 ± 3.21, p < .0001)

155

and literary texts (16.24 ± 5.98, p < .0001). The difference in average sentence length between human-translated newspaper and literary texts was also statistically significant

(P = .0001). For the machine-translated corpora, scientific texts also displayed statistically significantly higher average sentence length (28.76 ± 3.99) than newspaper texts (20.13 ± 3.75, p < .0001) and literary texts (16.28 ± 6.41, p < .0001). The difference in average sentence length between machine-translated newspaper and literary texts was also statistically significant (p = .0003). These results are presented in Fig. 4.11.

Fig. 4.11 Means and standard error for average sentence length for NT, HT, and MT in the literary, newspaper, and scientific corpora

156

These findings are not unexpected since scientific texts are often perceived to be more difficult to read due to longer sentences. The average sentence length is thus

another parameter that may be used to differentiate among genres, in addition to 3rd person pronominal cohesive devices, possessive pronouns, definite articles, reference devices, conjunction devices, and average word length discussed above.

4.3.5 Passives in literary, newspaper, and scientific texts

Passives are an important characteristic of genres. The basic discourse function of passives is "to switch the focus of action in the active clause from subject/actor to object/patient" (Westin 2002: 118). Thus, we may expect literary and newspaper texts to use fewer passives than scientific texts, since scientific writers tend to write in a depersonalized style.

According to one-way ANOVA, statistically significant differences in the use of passives across genres were found for all three methods of text production (p < .0001)

(see Table 4.34).

Table 4.34 Association of passives with the genre (Literary, Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA Total Sum Between Between Total F Sig. of Squares Groups Sum of Groups df Squares df NT 10976.555 6905.019 2 149 124.650 < 0.0001* 8164.949 5525.896 2 149 153.901 < 0.0001* HT

MT 4078.861 2284.856 2 149 93.610 < 0.0001* Note: * indicates p-values significant at 0.05 alpha-level

157

Tukey's HSD multiple comparisons test revealed statistically significant

differences in the use of passives for all pairs of genres (Literary-Newspaper, Literary-

Scientific, and Newspaper-Scientific), regardless of the method of text production. The results of Tukey's HSD testing are presented in Table 4.35. Descriptive statistics for the use of passives are presented in Table 4.36.

Table 4.35 Pairwise comparisons of passives across genres (Literary, Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing Genre Difference in Std. Error Sig. Comparison Means NT Lit-News -3.42100* 1.05257 0.0041 Lit-Sci -15.79500* 1.05257 < 0.0001 News-Sci -12.37400* 1.05257 < 0.0001 HT Lit-News -3.71900* .84741 0.0001 Lit-Sci -14.32560* .84741 < 0.0001 News-Sci -10.60660* .84741 < 0.0001 MT Lit-News -1.82900* .69869 0.0263 Lit-Sci -9.04080* .69869 < 0.0001 News-Sci -7.21180* .69869 < 0.0001 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 4.36 Descriptive statistics for passives across genres (N = 50) Genre NT HT MT

Literary Mean 5.10 5.97 6.34

Std. Deviation 2.93 3.66 2.53

Median 4.62 5.46 6.25

158

Newspaper Mean 8.52 9.69 8.17

Std. Deviation 4.40 4.43 4.18

Median 8.48 9.02 8.00

Scientific Mean 20.89 20.30 15.39

Std. Deviation 7.43 4.56 3.57

Median 18.84 20.67 15.52

For the non-translated corpora, scientific texts had a statistically significantly

higher number of passives per 1,000 words (20.89 ± 7.43) than newspaper texts (8.52 ±

4.40, p < .0001) and literary texts (5.10 ± 2.93, p < .0001). The difference between non-

translated newspaper and literary texts was also found to be statistically significant (p =

.0041). For the human-translated corpora, scientific texts also had a significantly higher

number of passives (20.30 ± 4.56) than newspaper texts (9.69 ± 4.43, p < .0001) and literary texts (5.97 ± 3.66, p < .0001). The difference in the use of passives between newspaper and literary human translations was also statistically significant (p = .0001).

For the machine-translated corpora, the scientific texts too were found to have a significantly higher number of passives (15.39 ± 3.57) than newspaper texts (8.17 ± 4.18, p < .0001) and literary texts (6.34 ± 2.53, p < .0001). The difference in the number of passives per 1,000 words between machine-translated newspaper and literary texts was also statistically significant (P = .0263). These results are presented in the bar charts in

Fig. 4.12.

159

Fig. 4.12 Means and standard error for passives for NT, HT, and MT in the literary, newspaper, and scientific corpora

These findings support general intuition that scientific texts favor passives as a means of depersonalizing discourse since scientific writers may want to avoid

subjectivity in their scientific narrative and to focus more on the process than the

actor/agent. Literary texts, on the contrary, use the fewest number of passives across the

genres under study. This aligns with the opinion that the use of passives in literary writing is to be avoided, unless called for by specific artistic needs. This opinion, while

debatable, still holds when we compare literary and scientific writing.

160

The results also suggest that the use of passives is another parameter that may be

used to differentiate among genres, in addition to 3rd person pronominal cohesive devices,

possessive pronouns, definite articles, reference devices, conjunction devices, and

average word and sentence length.

4.3.6 Prepositional phrases in literary, newspaper, and scientific texts

Prepositional phrases are a common feature of informational discourse. As Westin (2002)

points out, the use of prepositional phrases is "an effective way of packing high amounts

of information into idea units and of expanding the size of them" (71).

According to one-way ANOVA, statistically significant differences in the use of

prepositions across genres were found for all three methods of text production (p < .0001)

(see Table 4.37).

Table 4.37 Association of prepositions with the genre (Literary, Newspaper, Scientific) for NT, HT, and MT by one-way ANOVA Total Sum Between Between Total F Sig. of Squares Groups Sum Groups df of Squares df NT 56337.842 27949.523 2 149 72.364 < 0.0001* 129758.624 76509.720 2 149 105.607 < 0.0001* HT

MT 149529.113 103225.023 2 149 163.852 < 0.0001* Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD multiple comparisons test revealed statistically significant

differences in the use of prepositions for all pairs of genres (Literary-Newspaper,

161

Literary-Scientific, and Newspaper-Scientific), regardless of the method of text production. The results of Tukey's HSD testing are presented in Table 4.38. Descriptive statistics for average word length are presented in Table 4.39.

Table 4.38 Pairwise comparisons of prepositions across genres (Literary, Newspaper, Scientific) for NT, HT, and MT by Tukey HSD post hoc testing Genre Difference in Std. Error Sig. Comparison Means NT Lit-News -16.38320* 2.77934 < 0.0001 Lit-Sci -33.43400* 2.77934 < 0.0001 News-Sci -17.05080* 2.77934 < 0.0001 HT Lit-News -26.63240* 3.80651 < 0.0001 Lit-Sci -55.30820* 3.80651 < 0.0001 News-Sci -28.67580* 3.80651 < 0.0001 MT Lit-News -30.41620* 3.54961 < 0.0001 Lit-Sci -64.22740* 3.54961 < 0.0001 News-Sci -33.81120* 3.54961 < 0.0001 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 4.39 Descriptive statistics for prepositions across genres (N = 50) Genre NT HT MT

Literary Mean 93.39 98.50 96.04

Std. Deviation 15.36 16.31 19.63

Median 93.63 98.20 95.24

Newspaper Mean 109.77 125.14 126.45

Std. Deviation 12.47 22.08 16.48

162

Median 112.26 125.72 127.30

Scientific Mean 126.82 153.81 160.26

Std. Deviation 13.71 18.25 16.98

Median 130.33 152.64 162.56

For the non-translated corpora, scientific texts had a statistically significant higher

number of prepositions per 1,000 words (126.82 ± 13.71) than newspaper texts (109.77 ±

12.47, p < .0001) and literary texts (93.39 ± 15.36, p < .0001). The difference between

non-translated newspaper and literary texts was also found to be statistically significant

(P < .0001). For the human-translated corpora, scientific texts also had a statistically

significant higher number of prepositions (153.81 ± 18.25) than newspaper texts (125.14

± 22.08, p < .0001) and literary texts (98.50 ± 16.31, p < .0001). The difference in the use

of prepositions between human-translated newspaper and literary texts was also

statistically significant (P < .0001). For the machine-translated corpora, scientific texts also were found to have a significantly higher number of prepositions (160.26 ± 16.98)

than newspaper texts (126.45 ± 16.48, p < .0001) and literary texts (96.04 ± 19.63, p <

.0001). The difference in the number of prepositions per 1,000 words between machine-

translated newspaper and literary texts was found to be statistically significant as well (P

< .0001). These results are presented in the bar charts in Fig. 4.13.

163

Fig. 4.13 Means and standard error for prepositions for NT, HT, and MT in the literary, newspaper, and scientific corpora

These findings support Westin's claim that prepositional phrases are typical for informational discourse (2002). Of the three genres under study, scientific articles are perhaps the most informative genre, followed by the genre of newspaper editorials.

Literary texts are typically seen to be as the least informative genre of the three. This is reflected in the findings, with the scientific texts having the highest number of prepositions in all three groups of texts (non-translated, human-translated, and machine- translated texts), and the literary texts having the fewest (see Fig. 4.13).

164

The results also suggest that the use of prepositions is a textual characteristic that may be used to differentiate among genres, along with 3rd person pronominal cohesive devices, possessive pronouns, definite articles, reference devices, conjunction devices, average word and sentence length, and the use of passives.

4.4 Conclusions

The genre comparison performed for non-translated texts, human-translated texts, and machine-translated texts suggests that most of the textual features included in this analysis differ for all or some pairs of conditions across all or some methods of text production.

Of all these features, seven global textual characteristics were found to differ across literary, newspaper, and scientific texts regardless of the method of text production. Table 4.40 represents differences in the means for these seven global characteristics. All these differences were found to be statistically significant at the 0.05 level for all pairs of conditions (literary vs. newspaper, literary vs. scientific, and newspaper vs. scientific) in non-translated, human-translated, and machine-translated texts.

165

Table 4.40 Differences in mean scores for the cohesive characteristics and global textual features found to be significantly different at the 0.05 level for NT, HT, and MT Genre Literary Newspaper Scientific Feature Cohesive characteristics Third-person pronominal Highest* Lowest* cohesive devices Possessive pronouns Highest* Lowest* Definite article Lowest* Highest* Global textual characteristics Nominalization Lowest* Highest* Average sentence length Lowest* Highest* Passives Lowest* Highest* Prepositional phrases Lowest* Highest* *significant at the 0.05 level for all pairs of conditions (literary vs. newspaper, literary vs. scientific, and newspaper vs. scientific) regardless of the method of text production

The findings reveal that of the three genres, the non-translated, human-translated, and machine-translated literary texts are characterized by the highest number of 3rd

person cohesive devices and possessive pronouns, while the scientific texts—by the

lowest. This may be related to a more personalized nature of literary texts compared to

scientific writing, which may lead to a higher number of personal pronouns in the literary

genre. The scientific genre, on the contrary, tends to be void of personal descriptions and

strives for objective, non-personal accounts of scientific procedures, observations, and

findings. Newspaper editorial writing stands in between the literary and scientific genres

166

in terms of the use of personal and possessive pronouns, probably due to its middle-

ground nature as journalistic reporting combined with personalized opinions.

For the use of the definite articles, passives, nominalizations, and prepositions, as

well as in terms of average sentence length, the literary texts of all three methods of text

production showed the lowest mean scores, while the scientific texts—the highest (Table

4.40). Again, newspaper editorials were in between the other two genres. Some

explanations for these findings are ventured below.

The lower use of definite articles in the literary texts may be explained by the previously mentioned finding that literary texts rely more heavily on personal pronouns than do other genres. The combination of these findings may suggest that the literary texts tended to use a variety of noun modifiers—possessive pronouns, definite articles, demonstratives, etc., while the scientific texts tended to rely more on definite articles as noun modifiers. The newspaper texts, again, take the middle position in between the two other genres.

The lower average sentence length of the literary texts may be due to literary writing being less uniform and more artistically-motivated in terms of sentence length, compared to scientific writing, which is expected to have relatively uniform, stylistically-

neutral sentences without artistic effects. In addition, literary texts often contain dialogs,

which tend to have short or incomplete sentences.

167

As discussed earlier, the frequent use of nominalizations and passives in scientific

writing may be related to the dry and abstract nature of its reporting. Nominalization is

related to impersonal or abstract voicing of ideas (Halliday 1994: 265-267), while

passives switch the attention of the reader from the subject or actor to the object (Westin

2002: 118). Thus, nominalizations and passives appear more suited for scientific texts

than for the genres of literature and editorial writing. Newspaper editorials use a higher

number of nominalizations and passives than literary texts, which may be related to their

goal of reporting events and opinions.

Prepositional phrases are found to be most frequent in the scientific texts, and

least frequent in literary texts, which may be explained by their function of packing high

amounts of information (Westin 2002: 71). Packing high amounts of information seems

more suited for scientific and newspaper genres than for the genre of literature.

In addition to the seven features mentioned above, which were found to be

statistically significantly different for all three pairs of conditions (literary vs. newspaper, literary vs. scientific, and newspaper vs. scientific) for the non-translated, human-

translated, and machine-translated corpora, there were several additional textual features

that were found to be significantly different for most of the conditions for some methods

of text production. These features include conjunction devices, comparatives, the sum of

the reference devices, lexical density (as standardized type-token ratio), and average

word length. The findings for these features are briefly summarized below.

168

For conjunction cohesive devices, the scientific texts used the significantly lowest number of such devices compared to the literary and newspaper texts (with statistically significant differences found for all pairs of conditions across all three methods of text production, with the only exception being the literary-newspaper pair of non-translated texts), while the literary texts used the highest. This may be linked to the nature of scientific writing, which seeks to establish various logical links between ideas more explicitly than literary and newspaper writing.

The use of comparatives was found to be significantly lower in non-translated literary texts compared to non-translated scientific and newspaper writing. This may be explained by the tendency of scientific and newspaper texts to compare results and other phenomena more frequently than literary texts. Interestingly, no statistically significant differences were found for the use of comparatives in human- and machine-translated texts. This may be related to the nature of translated texts in general or to the influences of the source language genre conventions on the translations.

For the sum of all reference cohesive devices included in this study, the non- translated literary texts contained the significantly highest number of such devices, while the scientific non-translated contained the lowest number (with statistically significant differences found for all three pairs of conditions). Among the human-translated texts, literary texts contained the significantly highest number of reference devices as compared with newspaper and literary texts. However, any statistically significant differences in the sum of reference devices were lost in the machine-translated corpus. This finding may

169

suggest that MT tools are less sensitive to the use of reference cohesive devices than

human translators and original authors.

For lexical density, measured as standardized type-token ratio (STTR),

statistically significant differences were found for the literary vs. scientific and

newspaper vs. scientific texts in non-translated, human-translated, and machine-translated texts. No statistically significant differences were found between the literary and newspaper texts regardless of the method of text production, which may suggest their similarity in terms of STTR. It is not surprising that scientific texts displayed the lowest

STTR for all three methods of text production—scientific writers are expected to use a narrower range of vocabulary, since their goals is clarity and not stylistic effects. In addition, scientific articles tend to be focused on a specific topic, which may limit vocabulary choices.

For average word length, statistically significant differences were found for all pairs of conditions with the exception of the newspaper vs. scientific comparison in both human and machine translations. The newspaper and scientific texts were found to have the higher average word length compared to the literary texts, which may explain the general perception that scientific texts are harder to read. One possible explanation that may have contributed to this finding is the higher number of nominalizations (which tend to be longer) in the scientific and newspaper texts as compared with the literary texts.

This comparison of non-translated, human-translated, and machine-translated

literary, newspaper, and scientific texts shows that these genres differ significantly in

170

most of the features included in this study, often regardless of the method of text

production. Thus, the reported findings confirm that studies of global textual features should be performed within genres. Last but not least, the genre comparison outlined in this chapter provides an informative background for the discussion of the data discovered for the association of cohesive devices and other textual features with the method of text production within each genre (Chapter 5).

CHAPTER 5: RESULTS AND ANALYSIS: ASSOCIATION OF COHESIVE DEVICES AND OTHER GLOBAL TEXTUAL FEATURES WITH THE METHOD OF TEXT PRODUCTION

This chapter reports on the analysis of the data collected in this corpus study, covering descriptive statistics for cohesive and other global textual parameters, as well as significance testing for differences in these parameters across populations. The general procedures for data handling, as well as statistical measures for descriptive statistics and significance testing are outlined in Chapter 3. This chapter addresses the main research question of this study. It reports on the comparison of three text groups distinguished by the method of their production—translated by humans, translated by machines, and non- translated. Such a comparison is made separately within each of the three genres included in the study (literary, newspaper, and scientific).

5.1 Association of cohesive devices and other global textual features with the method of text production: Overview of variables under study

This section includes a brief outline of the variables studied across three conditions of production (non-translated, human-translated, and machine-translated) in three separate corpora based on genre/text-type (literary, newspaper, and scientific).

171

172

5.1.1 Cohesive devices

Cohesive devices included into this study are subdivided into two major types—reference devices and conjunction devices. Each group of devices is discussed separately.

5.1.1.1 Reference cohesive devices

Reference cohesive devices included in this study are represented by the following groups:

− Pronominal cohesive devices—3rd person pronouns

− 3rd person singular neuter personal pronoun ("it")—tagged as PPH1

− 3rd person singular objective personal pronoun ("him," "her")—tagged as PPHO1

− 3rd person plural objective personal pronoun ("them")—tagged as PPHO2

− 3rd person singular subjective personal pronoun ("he," "she")—tagged as PPHS1

− 3rd person plural subjective personal pronoun ("they")—tagged as PPHS2

− The total for 3rd person pronominal cohesive devices—

PPH1+PPHO1+PPHO2+PPHS1+PPHS2

− Pronominal cohesive devices—possessive pronouns ("his," "her," "their")—

tagged as APPGE

− Demonstratives

− Singular determiners ("this," "that," "another")—tagged as DD1

− Plural determiners ("these," "those")—tagged as DD2

− Definite article ("the")—tagged as THE

173

− Comparative cohesive devices

− General comparative adjective (e.g., "older," "better," "stronger")—tagged as JJR

− General superlative adjective (e.g., "oldest," "best," "strongest")—tagged as JJT

− Comparative degree adverb ("more," "less")—tagged as RGR

− Superlative degree adverb ("most," "least")—tagged as RGT

− The total for comparative cohesive devices—JJR+JJT+RGR+RGT

− The total number of reference cohesive devices—PPH1+PPHO1+PPHO2+PPHS1

+PPHS2+APPGE+DD1+DD2+THE+JJR+JJT+RGR+RGT

5.1.1.2 Conjunction cohesive devices

Conjunction cohesive devices included in this study are represented by the following types:

− Additive devices (e.g., "and," "or")—tagged as CC

− Adversative devices (e.g., "but")—tagged as CCB

− Causal and continuative devices represented by subordinating conjunctions (e.g.,

"if," "because," "unless," "so," "for")—tagged as CS

− Temporal devices (e.g., "now," "tomorrow")—tagged as RT

− The total number of conjunction cohesive devices—CC+CCB+CS+RT

174

5.1.1.3 Reference and conjunction cohesive devices

The total number of reference and conjunction devices included in this study—

PPH1+PPHO1+PPHO2+PPHS1+PPHS2+APPGE+DD1+DD2+THE+JJR+JJT+RGR

+RGT+CC+CCB+CS+RT

5.1.2 Global textual features

The study focuses on six features that characterize texts globally:

− Nominalization

− Lexical density (as standardized type-token ratio—STTR)

− Average word length

− Average sentence length

− Passives

− Prepositional phrases, which include the following:

− IF—preposition "for"

− II—general preposition

IO—preposition "of"

IW—prepositions "with" and "without"

Total number of prepositional phrases included in this study—IF+II+IO+IW

− The results for each feature are outlined in separate subsections for (1) literary

texts, then (2) newspaper texts, and finally (3) scientific texts.

175

5.2 Association of cohesive devices and other global textual features with the method

of text production: Literary corpus

This section reports on the results of the analysis of the literary corpus of the 32 textual

features [1-PPH1, 2-PPHO1, 3-PPHO2, 4-PPHS1, 5-PPHS2, 6-Total Pronominal

Cohesive Devices, 7-APPGE, 8-DD1, 9-DD2, 10-THE, 11-JJR, 12-JJT, 13-RGR, 14-

RGT, 15-Total for Comparative Cohesive Devises, 16-Total for Reference Cohesive

Devices, 17-CC, 18-CCB, 19-CS, 20-RT, 21-Total for Conjunction Cohesive Devices,

22-Total for All Reference + Conjunction Devices, 23-Nominalization, 24-STTR, 25-

Average Word Length, 26-Average Sentence Length, 27-Passives, 28-IF, 29-II, 30-IO,

31-IW, 32-Total Prepositional Phrases] outlined above across three conditions for the independent variable of text production method (non-translated, human-translated, and machine-translated).

5.2.1 Cohesive devices in non-translated, human-translated, and machine-translated texts in the literary corpus

5.2.1.1 Reference cohesive devices in the literary corpus

5.2.1.1.1 Pronominal cohesive devices represented by 3rd person pronouns

Pronominal cohesive devices expressed by 3rd person pronouns encompass the following

categories:

− PPH1—3rd person singular neuter personal pronoun ("it")

176

− PPHO1—3rd person singular objective personal pronoun ("him," "her")

− PPHO2—3rd person plural objective personal pronoun ("them")

− PPHS1—3rd person singular subjective personal pronoun ("he," "she")

− PPHS2—3rd person plural subjective personal pronoun ("they")

− The total of all pronominal cohesive devices expressed by 3rd person pronouns is

included as a separate variable.

For 3rd person pronouns in literary texts, one-way ANOVA indicated that at a significance level of at least α = 0.05 (95% confidence level), there are statistically significant differences for all variables except for the 3rd person singular neuter personal pronoun "it" (see Table 5.1).

As described earlier, the use of personal pronouns in English and Russian is different due to the fact that the Russian language is characterized by grammatical gender, while the English language is not. This fact is expected to influence translators' work, consistent with Toury's translation law of interference. This might explain the number of statistically significant differences found for 3rd person pronominal cohesive devices across literary non-translated texts and texts translated into English from Russian by humans and machines.

Interestingly, no significant differences were found for the pronoun "it," even though its use in the two languages varies due to the need to express grammatical gender in Russian for both animate and inanimate objects represented by this pronoun. It might be informative to see if any statistically significant differences are found for this pronoun

177

in the newspaper and scientific corpora, as well as in further research on larger

collections of texts and additional genres.

Table 5.1 Association of 3rd person pronominal cohesive devices with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA Dependent Total Sum Between Between Total df F Sig. Variable of Squares Groups Sum Groups df of Squares PPH1 3022.28 30.50 2 149 .749 0.4745

PPHO1 2114.96 144.60 2 149 5.394 0.0055*

PPHO2 501.55 40.68 2 149 6.488 0.0020* PPHS1 19673.91 5468.20 2 149 28.292 < 0.0001* PPHS2 2155.00 107.98 2 149 3.877 0.0229* Personal 40444.56 7614.54 2 149 17.047 < 0.0001* Pronouns Note: * indicates p-values significant at at least 0.05 alpha-level To determine the pairs of conditions for which these statistically significant

differences exist, post hoc Tukey's HSD multiple comparisons testing was performed for the variables that displayed statistically significant differences in the one-way ANOVA testing. The results of Tukey's HSD testing are presented in Table 5.2. Descriptive statistics for all pronominal cohesive devices are presented in Table 5.3.

Table 5.2 Pairwise comparisons of significant 3rd person pronominal cohesive devices (NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means PPHO1 NT-HT .03200 .73222 0.9989 NT-MT 2.09860* .73222 0.0132

178

HT-MT 2.06660* .73222 0.0149 PPHO2 NT-HT -.89860* .35413 0.0325 NT-MT .33480 .35413 0.6125 HT-MT 1.23340* .35413 0.0019 PPHS1 NT-HT 9.51340* 1.96609 < 0.0001 NT-MT 14.56320* 1.96609 < 0.0001 HT-MT 5.04980* 1.96609 0.0300 PPHS2 NT-HT -1.07720 .74633 0.3215 NT-MT 1.00060 .74633 0.3750 HT-MT 2.07780* .74633 0.0166 Personal NT-HT 6.48360 2.98887 0.0799 Pronouns NT-MT 17.27420* 2.98887 < 0.0001 HT-MT 10.79060* 2.98887 0.0012 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.3 Descriptive statistics for 3rd person pronominal cohesive devices in literary texts (N = 50) Personal Method PPH1 PPHO1 PPHO2 PPHS1 PPHS2 Pronouns

NT Mean 11.20 6.34 2.39 26.88 5.50 52.32

Std. 4.28 3.79 1.60 12.42 4.17 17.84 Deviation

Median 10.88 6.23 2.38 26.72 5.06 51.83

HT Mean 12.29 6.31 3.29 17.37 6.58 45.84

Std. 4.48 4.43 2.13 9.74 3.98 14.74 Deviation

Median 11.66 5.24 3.08 17.50 5.88 44.99

179

MT Mean 11.93 4.24 2.06 12.32 4.50 35.05

Std. 4.76 2.50 1.52 6.39 2.93 11.59 Deviation

Median 11.58 4.06 1.69 11.17 3.80 35.04

For 3rd person singular objective personal pronouns ("him," "her") (PPHO1), the

Tukey's HSD post hoc test revealed that their use was significantly lower for machine- translated literary texts (4.24 ± 2.5) than for both non-translated literary texts (6.34 ±

3.79, p = .0132) and human-translated literary texts (6.31 ± 3.79, p = .0149) (see Tables

5.2 and 5.3). These results are visualized in Fig. 5.1.

Fig. 5.1 Means and standard error (± 1 SE) for 3rd person singular objective pronominal cohesive devices in literary texts

These results suggest that literary texts produced by humans (whether non- translated or translated by humans) are similar in their use of 3rd person singular objective

180

personal pronouns, while machine-translated literary texts show lower frequency of such pronouns. A possible explanation for this may be that GoogleTranslate, an automated translation tool used in this study, does not make as many adjustments in the use of 3rd

person singular objective personal pronouns as human translators do. Since cohesion is

created in a text through a network of cohesive devices, which are interpreted by the

reader/translator, such interpretations may differ across human brain and MT algorithms.

It would be interesting to see if humans interpret cohesive networks created with the help

of 3rd person singular objective pronouns with a higher accuracy than machines do, and

thus make use of them more frequently. Such studies are likely to be done by the

developers of major MT tools. The following example illustrates how a human translator

(Victoria Mesopir) interprets the source text to incorporate the use of the 3rd person

singular objective pronoun "him," while the MT tool omits it (incorrectly):

Russian source text: — Дай какой-нибудь листик, — сказала Элина Егору.

Потому что это была его квартира, а, ясно, не потому что он ей мог нравиться. Это

было исключено.

Human translation: —Give me some kinda paper, Elina said to Egor—because

we were at his pad, you see, not because she might find him attractive in any way. That

was out of the question.

Translated by Victoria Mesopir

181

Machine translation: - Give me some leaf, - Elina said Yegor. Because it was his apartment, and, clearly, not because she might like. This was ruled out.

(From Nuclear Spring/Ядерная весна by Evgeni Alyokhin)

For 3rd person plural objective personal pronoun ("them") (PPHO2), human- translated literary texts displayed a statistically significantly higher number of these cohesive devices (3.29 ± 2.13) than both literary non-translated (2.39 ± 1.6, p = .0325) and machine-translated texts (2.06 ± 1.52, p = 0.0019) (see Tables 5.2 and 5.3). No statistically significant difference was found for non-translated and machine-translated texts. The graph in Fig. 5.2 presents these results visually.

Fig. 5.2 Means and standard error (± 1 SE) for 3rd person plural objective pronominal cohesive devices in literary texts

182

Similarly to 3rd person singular objective pronouns, machine-translated texts used fewer 3rd person plural objective pronouns than human-translated texts. This, again,

might be related to possible differences in the ways humans and machines interpret

cohesive networks created with the help of 3rd person objective pronouns. Compared to

human translators, MT tools might show a tendency to use fewer pronominal cohesive

devices. The following example illustrates this point. In this example, the translator

Victoria Mesopir uses the 3rd person plural objective pronoun "them" to render the

Russian pronoun of the same type ним. The MT tool, however, does not interpret the source text correctly, and omits this pronoun. It is also interesting to note that the MT tool

misinterprets the Russian pronoun свои, which refers to the subject of the sentence я ('I'),

and thus should be translated as "my." This also illustrates difficulties MT tools may have

interpreting cohesive ties realized with the help of Russian personal pronouns.

Russian source text: Я трогал свои ватные ноги, по ним бегали большие

теплые мурашки.

Human translation: I was touching my feet. They had big warm goose bumps

running up and down them.

Translated by Victoria Mesopir

Machine translation: I touched his padded feet, it ran great for warm shivers.

(From Nuclear Spring/Ядерная весна by Evgeni Alyokhin)

183

Interestingly, human-translated texts used the highest number of 3rd person plural

objective personal pronouns compared to machine-translated and non-translated texts. It might be related to possible differences in the use of these cohesive devices in Russian and English, or it may be the translators attempting to clarify the intent of the Russian through the use of these pronouns. Studies comparing pronominal cohesive networks in

English and Russian texts might shed more light on such differences.

For 3rd person singular subjective personal pronoun ("he," "she") (PPHS1), all

three means (non-translated, human-translated, and machine-translated texts) were found

statistically significantly different for all level pairs by the Tukey's HSD post hoc testing

(see Tables 5.2 and 5.3). Literary non-translated texts displayed a statistically

significantly higher number of 3rd person singular subjective pronominal devices (26.88 ±

12.42) than human-translated texts (17.37 ± 9.74, p < .0001) and machine-translated texts

(12.32 ± 6.39, p < .0001) (see Table 5.2). In addition, human-translated texts displayed a statistically significantly higher number of 3rd person singular subjective pronouns (17.37

± 9.74, p = .03) than machine-translated texts (12.32 ± 6.39). These results are visualized

in Fig. 5.3.

184

Fig. 5.3 Means and standard error (± 1 SE) for 3rd person singular subjective pronominal cohesive devices in literary texts

Again, just as for 3rd person singular and plural objective pronouns, machine-

translated texts showed a significantly lower number of 3rd person singular subjective

pronouns in literary texts. This might indicate an interesting tendency of MT tools to

avoid the use of 3rd person cohesive pronouns compared to their use in non-translated and

human-translated texts. In the following example, the translator Nick Allen introduces 3rd

person singular subjective pronouns "he" twice in his translation, which seems to create

more explicit cohesive ties, while the MT tool does not do this. Allen also adjusts

punctuation to match English norms, which is not done in the MT output. Not surprisingly, the cultural reference to a popular Russian children's TV show

"АБВГДейка" is only transliterated in the MT output.

185

Russian source text: —Вот. Протокол о твоем задержании милицией.

Подписывай. [new paragraph begins] Больше всего следователь походил на клоуна

Клепу из "АБВГДейки". Маленькое треугольное лицо под копной различить было

уже сложно.

Human translation: "Right, this is the form about your detention by the militia.

Sign it," he said, his triangular face dwarfed by the shock of hair on his head. He

reminded me of Klера the Clown from our ABC kids' show.

Translated by Nick Allen

Machine translation: - Here. Minutes of your arrest by the police. Sign. [new

parahraph begins] Most of the investigator looked like a clown Kljopov from

"ABVGDeyki." A small triangular face under a shock it was too difficult to discern.

(From The Diesel Stop/Дизелятник by Arkady Babchenko)

The significantly higher use of 3rd person singular subjective pronominal cohesive

devices in non-translated texts compared to translations is interesting. It might indicate a

higher tolerance of the English language for the repetition of such pronominal devices in

literary texts. This conjecture, however, would require further research comparing uses of

such pronominal devices in English and Russian literary texts.

For 3rd person plural subjective personal pronouns (PPHS2) ("they"), the only pair

that showed statistically significant difference by Tukey's HSD post hoc testing was

human translations—machine translations (see Tables 5.2 and 5.3). Human-translated

186

literary texts used a significantly higher number of 3rd person subjective personal pronominal cohesive devices (6.58 ± 3.98) than machine-translated literary texts (4.5 ±

2.93, p = .0166). No statistically significant differences were found for non-translated texts vs. human-translated texts and non-translated texts vs. machine-translated texts. The bar chart in Fig. 5.4 represents these results graphically.

Fig. 5.4 Means and standard error (± 1 SE) for 3rd person plural subjective pronominal cohesive devices in literary texts

Once again, machine-translated texts displayed a statistically significant tendency to use fewer 3rd person pronominal cohesive devices than human translations. As noted above, this might be related to differences in the success of interpreting pronominal cohesive networks by humans and machines. If MT tools have more difficulty interpreting pronominal cohesive devices than humans do, MT tools may be set to avoid

187

the use of pronominal cohesive devices in order to reduce the possibility of introducing

interpretation errors in translations produced by machines.

Finally, for the total number of 3rd person cohesive pronominal devices, machine-

translated texts show a statistically significantly lower mean compared to the other two

groups (see Tables 5.2 and 5.3). The mean for machine-translated texts (35.05 ± 11.59)

was statistically significantly lower than for human-translated texts (45.84 ± 14.74, p =

.0012) and non-translated texts (52.32 ± 17.84, p < .0001). No statistically significant differences were detected for the non-translated—human-translated pair. The bar chart in

Fig. 5.5 represents these differences graphically.

Fig. 5.5 Means and standard error (± 1 SE) for the total of 3rd person pronominal cohesive devices in literary texts

188

Machine-translated literary texts consistently showed a lower number of 3rd

person pronominal cohesive devices for most categories, as well as for the total. For 3rd

person singular subjective pronouns, 3rd person singular objective pronouns, and the total of all 3rd person pronominal cohesive devices, machine-translated texts displayed

statistically significant lower frequency of these devices compared to both human-

translated and non-translated texts. For 3rd person plural subjective pronouns and 3rd person plural objective pronouns, machine-translated literary texts showed significantly lower frequency than human-translated literary texts. This tendency of MT tools to avoid the use of pronominal devices might suggest that translators editing MT output (post- editing) should pay special attention to pronominal cohesive devices. In order to make

MT output closer to non-translated texts, they might want to introduce more pronominal cohesive devices based on the ability of a human mind to interpret complex cohesive pronominal networks.

Other significant differences in the use of different categories of 3rd person

pronominal cohesive devices may suggest that cohesion via such devices is realized differently in Russian and English, and due to Toury's law of interference, this difference makes its way into target texts. Specifically, statistically significant differences were found between non-translated and human-translated texts for 3rd person plural subjective

pronouns and 3rd person singular objective pronouns. For machine translations, differences with non-translated texts were noted for 3rd person singular objective and

subjective pronouns, as well as for the total number of 3rd person pronouns.

189

5.2.1.1.2 Pronominal cohesive devices represented by possessive pronouns

Possessive pronouns represent a large class of cohesive devices. In English, most nouns

are grammatically required to have modifiers, and possessive pronouns often serve this

function. The CLAWS program used to tag the corpora in this study tags all possessive

pronouns with one tag—APPGE. It covers 1st person possessive pronouns ("my," "our"),

2nd person possessive pronouns ("your"), and 3rd person possessive pronouns ("his,"

"hers," "its," "theirs"). As discussed in Chapter 2, 1st and 2nd person pronouns are often

exophoric (referring to the context of the situation or discourse), pointing to the world

outside of the actual text (e.g., to the author, readers, etc.). For this reason, not all of them

can be classified as pure cohesive devices as per Halliday and Hasan, who understand

cohesion as a phenomenon within a text and not outside of it.

Since English and Russian have different grammatical rules for the use of

possessive pronouns, with Russian allowing for nouns to exist without any explicit

modifiers, it was deemed interesting to see if this fact influences translations from

Russian into English in any way. Possessive pronouns are considered more of a textual

characteristic than a purely cohesive one. They are included into the analysis to expand

the overall picture of differences between translated and non-translated texts.

For possessive pronouns in literary texts, one-way ANOVA indicated statistically

significant differences (F(2,147) = 19.955, p < 0.0001) (see Table 5.4).

190

Table 5.4 Association of possessive pronouns with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA Dependent Total Sum Between Between Total df F Sig. Variable of Squares Groups Sum Groups df of Squares APPGE 13375.783 2856.073 2 149 19.955 < 0.0001* Note: * indicates p-values significant at 0.05 alpha-level A Tukey's HSD post hoc test revealed that the use of possessive pronouns in machine-translated literary texts (17.96 ± 7.45) is statistically significantly lower than in human-translated texts (25.55 ± 8.99, p < .0001) and non-translated texts (28.27 ± 8.85, p

< .0001) (see Tables 5.5 and 5.6). No statistically significant difference was found between human-translated and non-translated texts. Fig. 5.6 graphically shows means and standard errors for possessive pronouns in texts produced by different methods.

Table 5.5 Pairwise comparisons of possessive pronouns (NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means APPGE NT-HT 2.71660 1.69189 0.2465 NT-MT 10.31080* 1.69189 < 0.0001 HT-MT 7.59420* 1.69189 < 0.0001 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.6 Descriptive statistics for possessive pronouns in literary texts (N = 50) Method Mean Std. Deviation Median NT 28.27 8.85 26.39 HT 25.55 8.99 23.70 MT 17.96 7.45 16.58

191

Fig. 5.6 Means and standard error (± 1 SE) for possessive pronouns in literary texts

As is the case with 3rd person pronominal cohesive devices, machine-translated texts display statistically significant differences in the use of possessive pronouns compared with non-translated and human-translated texts. This might be related to the

fact that nouns in Russian often have no modifiers (such as possessive pronouns), and

their interpretation highly depends on the readers' sensitivity to context and coherence in

a text. Developers of MT tools might be cautious in relying on machine algorithms for

such interpretations and thus avoid having MT tools introduce possessives when there is

none in Russian. As we see in later sections, MT tools might favor the use of definite

articles as noun modifiers instead, which do not carry as much cohesive information as

do more concrete possessive pronouns. For post-editors of machine translations, this may suggest that increasing the number of possessive pronouns in MT output is desirable.

192

Introducing possessive pronouns in MT output may increase the cohesiveness of target

texts.

5.2.1.1.3 Demonstratives

Demonstratives are represented by two groups: − DD1—singular determiners ("this," "that," "another")

− DD2—plural determiners ("these," "those")

For the literary texts, there was a statistically significant difference between

groups for singular determiners as established by one-way ANOVA (F(2,147) = 9.541, p

= .0001) (see Table 5.7). For plural determiners, no statistically significant difference was

found (p = 0.2312).

Table 5.7 Association of demonstrative cohesive devices with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum Groups df df of Squares DD1 1989.489 228.590 2 149 9.541 0.0001* DD2 224.481 4.429 2 149 1.479 0.2312 Note: * indicates p-values significant at 0.05 alpha-level A Tukey's HSD post hoc test revealed that human-translated literary texts contained a statistically significantly higher number of singular demonstratives (10.41 ±

4.06) than non-translated texts (8.06 ± 3.38, p = .0025) and machine-translated texts (7.6

± 2.84, p = .0002) (see Tables 5.8 and 5.9). The graphical representation of means and

193

standard error for singular demonstratives across different groups of texts is given in Fig.

5.7.

Table 5.8 Pairwise comparisons of singular demonstrative cohesive devices (NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means DD1 NT-HT -2.35380* .69221 0.0025 NT-MT .46700 .69221 0.7786 HT-MT 2.82080* .69221 0.0002 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.9 Descriptive statistics for demonstrative cohesive devices in literary texts (N = 50) Method DD1 DD2

NT Mean 8.0582 1.2924

Std. 3.37837 .97063 Deviation

Median 7.9350 .9800

HT Mean 10.4120 1.7122

Std. 4.05769 1.28840 Deviation

Median 10.4950 1.5500

MT Mean 7.5912 1.5286

Std. 2.83874 1.37432 Deviation

Median 7.5250 1.1200

194

Fig. 5.7 Means and standard error (± 1 SE) for singular demonstratives in literary texts

The increased number of demonstratives in human-translated texts compared with non-translated texts could be interpreted as a sign of explicitation in translation. Blum-

Kulka (2001) considers explicitation inherent to the process of translation, stating that

"interpretations performed by the translator on the source text might lead to a target language text which is more redundant than the source text," and this redundancy may be expressed by a higher level of cohesive expliciteness in translations (300).

It is interesting to note that machine-translated texts did not exhibit a higher number of demonstratives compared to non-translated texts. This might suggest that explicitation is a phenomenon pertinent to human translation only. This is consistent with the history of translation universals—since its start, such universals were linked to explicitations made by human translators, as well as to their tendency to simplify or

195

normalize target texts. Such tendencies occur based on interpretations performed by

human translators. This is consistent with Blum-Kulka's reasoning above.

Professional human translators rarely doubt their understanding of the source text,

and are thus not afraid to explicitate or simplify texts. MT tools are not yet capable of this kind of interpretation. One of the problems today's MT faces is extracting meaning from source texts. Adding information (explicitation) to target texts or changing them

(simplification and normalization) may increase error rate in MT output, and is to be avoided. This might be one possible explanation to why machine-translated literary texts did not differ from non-translated texts in the number of singular demonstratives.

5.2.1.1.4 Definite article "the" as cohesive device

For the literary corpus, there was a statistically significant difference in the use of definite article "the" across non-translated, human-translated, and machine-translated texts, as determined by one-way ANOVA (F(2,147) = 5.491, p = .0050) (see Table 5.10). A

Tukey's HSD post hoc test revealed that machine-translated literary texts contained a significantly higher number of definite articles (63.41 ± 13.3) than non-translated texts

(54.06 ± 12.91, p = .0051) and human-translated texts (56.35 ± 17.49, p = .0461) (see

Tables 5.11 and 5.12). The graphical representation of means and standard errors for the use of the definite article "the" across different groups of texts is given in Fig. 5.8.

196

Table 5.10 Association of definite article "the" with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum Groups df df of Squares THE 34185.868 2376.564 2 149 5.491 0.0050* Note: * indicates p-values significant at 0.05 alpha-level

Table 5.11 Pairwise comparisons of definite article "the" (NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means THE NT-HT -2.28880 2.94204 0.7171 NT-MT -9.35220* 2.94204 0.0051 HT-MT -7.06340* 2.94204 0.0461 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level

Table 5.12 Descriptive statistics for the use of the definite article "the" in literary texts (N = 50) Method Mean Std. Deviation Median

NT 54.0624 12.90558 54.2300

HT 56.3512 17.48834 55.3700

MT 63.4146 13.29561 65.4500

197

Fig. 5.8 Means and standard error (± 1 SE) for definite article "the" in literary texts

A significantly higher number of definite articles in machine-translated texts

might be related to the lower number of possessive pronouns in such texts. It might be

that MT tools are programmed to use definite articles as noun modifiers instead of

singular possessive pronouns, since the latter carry more cohesive information and thus

require more accurate interpretation in order to be used correctly. Editors of MT output

might use this knowledge in their work to justify replacing some definite articles with

possessive pronouns in MT output, which might improve the cohesiveness of the text.

5.2.1.1.5 Comparative cohesive devices

Comparative cohesive devices are represented by the following groups of comparative

adjectives and adverbs:

198

− JJR—general comparative adjective (e.g., "older," "better," "stronger")

− JJT—general superlative adjective (e.g., "oldest," "best," "strongest")

− RGR—comparative degree adverb ("more," "less")

− RGT—superlative degree adverb ("most," "least")

One-way ANOVA of literary texts showed no statistically significant differences between non-translated, human-translated, and machine-translated texts for any comparative cohesive devices with p > .05 (see Table 5.13). The descriptive statistics for comparative cohesive devices are presented in Table 5.14

Table 5.13 Association of comparative cohesive devices with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA Dependent Total Sum Between Between Total df F Sig. Variable of Squares Groups Sum Groups df of Squares JJR 251.120 1.381 2 149 .406 0.6668 JJT 64.547 .231 2 149 .264 0.7684 RGR 129.285 4.176 2 149 2.453 0.0895 RGT 69.250 2.536 2 149 2.795 0.0644 ALL 628.234 7.438 2 149 .881 0.4167 Comparatives

Table 5.14 Descriptive statistics for comparative devices in literary texts (N = 50) Comparative Method JJR JJT RGR RGT Devices NT Mean 1.5466 .6214 .5548 .2110 2.9340 Std. 1.34677 .66832 .63909 .33467 1.73276 Deviation Median 1.2300 .4900 .4800 < .0001 2.9300

199

HT Mean 1.3120 .7168 .9612 .4556 3.4466 Std. 1.26626 .59191 1.17708 .78638 2.16490 Deviation Median .9800 .5150 .9000 < .0001 3.2450 MT Mean 1.4414 .6792 .7204 .5100 3.3518 Std. 1.29595 .71803 .87136 .79442 2.23160 Deviation Median 1.0200 .5500 .5150 .4250 2.9350

This result was interesting, since it might indicate that the use of comparative

devices in literary texts is similar in Russian and English, and translators do not introduce

new comparatives or reduce their number in target texts. The same holds true for MT

performance. It would be interesting to see whether other genres will show any

statistically significant differences in terms of comparative cohesive devices.

5.2.1.1.6 Total number of reference cohesive devices

For the total number of reference cohesive devices in the literary texts, there was a

statistically significant difference between groups as determined by one-way ANOVA

(F(2,147) = 11.344, p < .0001) (see Table 5.15).

Table 5.15 Association of the total number of reference cohesive devices with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA Dependent Total Sum Between Groups Between Total df F Sig. Variable of Squares Sum of Squares Groups df Reference 68152.802 9112.301 2 149 11.344 < 0.0001 Devices Note: * indicates p-values significant at 0.05 alpha-level

200

A Tukey's HSD post hoc test revealed that machine-translated literary texts

contained a significantly lower number of reference cohesive devices (128.89 ± 13.5)

than non-translated texts (146.93 ± 26.74, p < .0001) and human-translated texts (143.31

± 17.53, p = .0013) (see Tables 5.16 and 5.17). The difference between non-translated and human-translated texts was not significant. These results are graphically presented in

Fig. 5.9.

Table 5.16 Pairwise comparisons of the total number of reference cohesive devices (NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means Reference NT-HT 3.62620 4.00817 0.6382 Devices NT-MT 18.04600* 4.00817 < 0.0001 HT-MT 14.41980* 4.00817 0.0013 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level

Table 5.17 Descriptive statistics for the total number of reference cohesive devices in literary texts (N = 50) Method Mean Std. Deviation Median

NT 146.9334 26.74287 148.2450

HT 143.3072 17.52879 141.0700

MT 128.8874 13.50810 129.1200

201

Fig. 5.9 Means and standard error (± 1 SE) for the total number of reference cohesive devices in literary texts

Machine-translated texts might use fewer reference cohesive devices than texts produced or translated by humans due to the fact that in order to use cohesive devices, full understanding of textual meaning is required. MT tools might not yet be able to construct such complete and adequate understanding of meaning, and thus might be set to avoid using many reference cohesive devices to reduce the possibility of error. This has direct implications for editors of MT output, who should be on the lookout for a chance to introduce reference cohesive devices when post-editing. This might improve cohesiveness of post-edited target texts.

202

5.2.1.2 Conjunction cohesive devices in the literary corpus

This section reviews data and analysis of the corpus of literary texts for all four types of conjunction cohesive devices, and then looks at the total number of such devices across the groups of texts. For the convenience of reading, the types of conjunction cohesive devices are repeated below:

− Additive devices (e.g., "and," "or")—tagged as CC

− Adversative devices (e.g., "but")—tagged as CCB

− Causal and continuative devices represented by subordinating conjunctions (e.g.,

"if," "because," "unless," "so," "for")—tagged as CS

− Temporal devices (e.g., "now," "tomorrow")—tagged as RT

The sum of the conjunction cohesive devices included in the study is analyzed as a separate variable [CC+CCB+CS+RT].

One-way ANOVA indicated statistically significant differences for additive devices (F(2,147) = 14.3, p < .0001), adversative devices (F(2,147) = 3.186, p = .0442), and the total number of conjunction cohesive devices (F(2,147) = 9.227, p = .0002) (see

Table 5.18). While they were close to being significant, they did not meet our p < .05 test, so no statistically significant differences were found for causal and continuative devices

(p = 0.0641) or temporal devices (p = 0.0607).

203

Table 5.18 Association of conjunction cohesive devices with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA Dependent Total Sum of Between Between Total F Sig. Variable Squares Groups Sum Groups df df of Squares CC 16430.885 2676.117 2 149 14.300 < 0.0001* CCB 873.155 36.278 2 149 3.186 0.0442* CS 1984.969 72.820 2 149 2.799 0.0641 RT 1342.173 50.208 2 149 2.856 0.0607 Total Conj. 29092.755 3244.767 2 149 9.227 0.0002* Note: * indicates p-values significant at 0.05 alpha-level For additive cohesive devices, Tukey's HSD post hoc test revealed that machine- translated literary texts contained a significantly higher number of additive cohesive devices (41.12 ± 11.99) than non-translated texts (30.84 ± 7.87, p < .0001) and human- translated texts (34.97 ± 8.66, p = .0051) (see Tables 5.19 and 5.20). There was no statistically significant difference in additive devices between non-translated and human- translated texts. These results are presented graphically in Fig. 5.10.

Table 5.19 Pairwise comparisons of conjunction cohesive devices (NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means CC NT-HT -4.13260 1.93463 0.0861 NT-MT -10.28060* 1.93463 < 0.0001 HT-MT -6.14800* 1.93463 0.0051 CCB NT-HT -.74220 .47720 0.2684 NT-MT -1.19280* .47720 0.0359 HT-MT -.45060 .47720 0.6132

204

Total Conj. NT-HT -4.42560 2.65207 0.2207 NT-MT -11.30420* 2.65207 0.0001 HT-MT -6.87860* 2.65207 0.0280 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.20 Descriptive statistics for conjunction cohesive devices in literary texts (N = 50) Method CC CCB CS RT Conj. Devices

NT Mean 30.8392 5.2732 10.3678 5.9178 52.3986

Std. 7.86744 2.03956 3.53163 2.81862 11.39462 Deviation

Median 31.0050 5.2050 10.3450 5.8450 52.4800

HT Mean 34.9718 6.0154 8.9886 6.8480 56.8242

Std. 8.66450 2.26149 3.56639 2.96469 12.26819 Deviation

Median 33.6850 6.1900 8.5950 6.8700 56.1750

MT Mean 41.1198 6.4660 8.8076 7.3088 63.7028

Std. 11.98914 2.79374 3.71913 3.10365 15.72145 Deviation

Median 41.2150 6.4350 8.6950 7.2050 62.7200

205

Fig. 5.10 Means and standard error (± 1 SE) for additive conjunction devices in literary texts

For adversative cohesive devices in literary texts, Tukey's HSD post hoc test revealed only one statistically significant difference—between non-translated texts (5.27

± 2.04, p = .0359) and machine-translated texts (6.47 ± 2.8). No statistically significant difference was found for other pairs (p > .05) (see Tables 5.19 and 5.20). These results are visually presented in Fig. 5.11.

206

Fig. 5.11 Means and standard error (± 1 SE) for adversative conjunction devices in literary texts

For the total of all conjunction cohesive devices, Tukey's HSD post hoc test showed that machine-translated literary texts contained a significantly higher number of conjunction cohesive devices (63.7 ± 15.72) than non-translated texts (52.4 ± 11.39, p =

.0001) and human-translated texts (56.82 ± 12.27, p = .0280) (see Tables 5.19 and 5.20).

There was no statistically significant difference in the total number of conjunction devices between non-translated and human-translated texts. These results are presented in a bar graph in Fig. 5.12.

207

Fig. 5.12 Means and standard error (± 1 SE) for the total number of conjunction devices in literary texts

If we look at the means in Figures 5.10, 5.11 and 5.12, we will see that for additive and adversative conjunction devices, as well as for the total number of conjunctive devices, machine-translated literary texts showed the highest means across the three groups, while non-translated texts showed the lowest means across the three groups (even though some of these differences were not statistically significant). It might be hypothesized that the higher number of conjunction devices in both human-translated and machine-translated texts may occur due to the interference from the Russian language. If so, the Russian language may tend to use conjunctive devices more often than does the English language. This conjecture requires future comparative research for

Russian and English. However, if this turns out to be true, the statistical results above may suggest that human translators attempt to level out such differences by lowering the

208

number of conjunction cohesive devices in translations, while MT tools tend to keep the

conjunctive devices from the source language. This presents an interesting topic for

future comparative linguistics and translation studies research.

5.2.1.3 Total reference and conjunction cohesive devices in the literary corpus

For the total of reference and conjunction cohesive devices included into the literary

corpus for this study, one-way ANOVA indicated statistically significant difference

(F(2,147) = 13.112, p < .0001) (see Table 5.21). It should be noted that this mean does

not represent the total number of all cohesive devices in texts, since the study does not

include cohesive devices that are best studied by manual analysis (e.g., lexical cohesion).

This number is included into the study to show that "lump" statistical analysis of cohesive

devices might not yield such interesting results as individual statistical tests for concrete

types of cohesive devices.

Table 5.21 Association of total reference and conjunction cohesive devices with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum of Groups df df Squares Total Ref. & 18644.412 2822.554 2 149 13.112 < 0.0001* Conj. Devices Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD post hoc test revealed that machine-translated literary texts contained a significantly higher number of total reference and conjunction cohesive devices (48.17 ± 12.86) than non-translated texts (37.59 ± 7.9, p < .0001) and human-

209

translated texts (42.07 ± 9.75, p = .0106) (see Tables 5.22 and 5.23). There was no statistically significant difference in total reference and conjunction devices between non- translated and human-translated texts. These results are presented in a bar graph in Fig.

5.13.

Table 5.22 Pairwise comparisons of total reference and conjunction cohesive devices (NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means Total Ref. & NT-HT -4.48320 2.07491 0.0815 Conj. Devices NT-MT -10.58440* 2.07491 < 0.0001 HT-MT -6.10120* 2.07491 0.0106 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.23 Descriptive statistics for total reference and conjunction cohesive devices in literary texts (N = 50) Method Mean Std. Deviation Median

NT 37.5862 7.89952 37.5600

HT 42.0694 9.74650 41.6300

MT 48.1706 12.86462 48.1800

210

Fig. 5.13 Means and standard error (± 1 SE) for total reference and conjunction devices in literary texts

The discovered statistically significant differences in the total number of reference and conjunction devices for non-translated—machine-translated and human-translated— machine-translated pairs in literary texts suggests that translators and editors should pay special attention to such devices. Just as other "smaller" linguistic items, such as punctuation marks, these devices need to be "translated" and adjusted on a global textual level. However, the detailed statistical analysis that breaks down the total of reference and conjunction devices into concrete types seems more interesting and informative for developing pedagogical interventions and recommendations for translators and post- editors (as reported in 5.2.1.1.1 and 5.2.1.1.2).

211

5.2.2 Global textual features in non-translated, human-translated, and machine- translated texts in the literary corpus

The study focuses on six features that characterize texts globally—nominalization, lexical density (as standardized type-token ratio), average word length, average sentence length, passives, and prepositional phrases.

5.2.2.1 Nominalization

There was no statistically significant difference for the literary corpus between groups as determined by one-way ANOVA (F(2,147) = 2.264, p = .1075) (see Table 5.24). It is interesting to note that the means gradually increase from the lowest for non-translated texts (5.84 ± 3.14) to "in the middle" for human-translated texts (6.88 ± 3.91) to the highest for machine-translated texts (7.4 ± 4.07) (see Table 5.25). If statistically significant differences had been found, this might have been interpreted as the influence

of the source language since Russian seems to favor nominalization more than English

does. A larger sample length and bigger corpora might be required to further investigate

nominalization in translation. Manual analyses might also prove interesting.

Table 5.24 Association of nominalization with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA Dependent Total Sum of Between Between Total df F Sig. Variable Squares Groups Sum of Groups df Squares Nom. 2106.842 62.966 2 149 2.264 0.1075

212

Table 5.25 Descriptive statistics for nominalization in literary texts (N = 50) Method Mean Std. Deviation Median

NT 5.8378 3.14324 5.8450

HT 6.8808 3.91105 6.2000

MT 7.3952 4.06638 6.7850

5.2.2.2 Lexical density

As a measure of lexical density, this study of literary texts used standardized type-token

ratio (STTR) calculated by WordSmith Tools. Type-token ratio (TTR) is a measure of

vocabulary variation within a written text, with tokens being the total number of words in

a text, and types being different words (WordSmith Tools Manual). Standardized TTR is

a measure designed by WordSmith Tools to adjust for differences in individual texts'

lengths.

As determined by one-way ANOVA, there was a statistically significant

difference between groups (F(2,147) = 8.089, p = .0005) (see Table 5.26).

Table 5.26 Association of STTR with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA Dependent Total Sum Between Between Total df F Sig. Variable of Squares Groups Sum Groups df of Squares STTR 1472.714 146.009 2 149 8.089 0.0005* Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD post hoc test revealed that non-translated literary texts are characterized by a significantly higher STTR (74.51 ± 2.37) than human-translated texts

213

(72.82 ± 3.1, p = .0151) and machine-translated texts (72.17 ± 3.44, p = .0004) (see

Tables 5.27 and 5.28). There was no statistically significant difference in STTR between human- and machine-translated texts. These results are presented in a bar graph in Fig.

5.14.

Table 5.27 Pairwise comparisons of STTR (NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing Dependent Method Comparison Difference in Std. Error Sig. Variable Means STTR NT-HT 1.69360* .60084 0.0151 NT-MT 2.33980* .60084 0.0004 HT-MT .64620 .60084 0.5308 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.28 Descriptive statistics for STTR in literary texts (N = 50) Method Mean Std. Deviation Median

NT 74.5124 2.36940 74.4750

HT 72.8188 3.10600 73.2900

MT 72.1726 3.43720 73.1350

214

Fig. 5.14 Means and standard error (± 1 SE) for STTR in literary texts

This finding is in agreement with Laviosa's results (1998a, 1998b), who reports

that English translated texts are characterized by a narrower range of vocabulary

compared to English non-translated texts. As Laviosa suggests, this might be viewed as

evidence in support of simplification as a universal of translation. Laviosa investigated

human-translated texts only. The present study includes machine-translated texts as well,

and finds that both human-translated and machine-translated literary texts display a statistically significantly lower standardized type-token ratio than literary non-translated

texts. This suggests the possibility of expanding the concept of simplification to texts

translated by machines. Since this study focuses on the Russian into English pair only, it

would be beneficial to discover similar results for other language pairs.

215

The fact that in this study, these differences were studied and confirmed for translations from Russian only might point to a different possible explanation; namely, that Russian non-translated literary texts have a lower range of vocabulary than do

English non-translated literary texts. This may be supported by looking at the vocabulary statistics for English and Russian. The Second Edition of the 20-volume Oxford English

Dictionary contains entries for 171,476 words in current use, 47,156 obsolete words, and around 9,500 derivative words included as subentries (http://oxforddictionaries.com/), which makes it 228,132 words altogether. For the Russian language, various sources cite the number of around 150,000 words. This is supported historically, since English absorbed French and Latin during the Norman Conquest. While it is difficult or perhaps even impossible to calculate the exact number of words in different languages (for various reasons, e.g., due to differences in definitions of what constitutes a word etc.), the number of total words for English seems to be higher than for Russian. This is especially true for names of Russian females, as there are only about a dozen names used for all the women in the country, whereas, in the U.S., there are thousands of names used for women, many of them quite weird. This would align with the finding that texts translated from Russian into English by humans and machines have statistically significantly lower vocabulary variation than do English non-translated texts. This conjecture, however, requires further comparative research.

216

5.2.2.3 Average word length

For average word length, there was no statistically significant difference between groups

for the genre of literary texts as determined by one-way ANOVA (F(2,147) = 1.842, p =

.1621) (see Table 5.29). The means, standard deviations, and medians for the three groups are presented in Table 5.30.

Table 5.29 Association of average word length with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA Dependent Total Sum Between Groups Between Total F Sig. Variable of Squares Sum of Squares Groups df df 0.162 Word Length 10.896 0.266 2 149 1.842 1

Table 5.30 Descriptive statistics for average word length in literary texts (N = 50) Method Mean Std. Deviation Median

NT 4.2512 0.33415 4.2700

HT 4.3470 0.23624 4.3400

MT 4.3324 0.22243 4.3150

Given the large size of this study, it is likely a fact that average word length is not

a characteristic that will differentiate between non-translated, human-translated, and machine-translated literary texts.

217

5.2.2.4 Average sentence length

For average sentence length, there was no statistically significant difference between

groups for literary texts as determined by one-way ANOVA (F(2,147) =.244, p = .7839)

(see Table 5.31). The means, standard deviations, and medians for the three groups are

presented in Table 5.32.

Table 5.31 Association of average sentence length with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA Dependent Total Sum Between Between Total df F Sig. Variable of Squares Groups Sum Groups df of Squares Sentence 8119.486 26.854 2 149 .244 0.7839 Length

Table 5.32 Descriptive statistics for average sentence length in literary texts (N = 50) Method Mean Std. Deviation Median

NT 17.1554 9.39545 15.0450

HT 16.2386 5.97924 14.4300

MT 16.2784 6.41327 14.6050

As with average word length, it appears that average sentence length is not a

characteristic that will differ between non-translated, human-translated, and machine- translated literary texts.

218

5.2.2.5 Passives

For passives, there was no statistically significant difference between groups for literary

texts as determined by one-way ANOVA (F(2,147) = 2.161, p = .1189) (see Table 5.33).

The means, standard deviations, and medians for the three groups are presented in Table

5.34.

Table 5.33 Association of passives with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum Groups df df of Squares Passives 1432.131 40.902 2 149 2.161 0.1189

Table 5.34 Descriptive statistics for passives in literary texts (N = 50) Method Mean Std. Deviation Median

NT 5.0980 2.92934 4.6250

HT 5.9694 3.66388 5.4600

MT 6.3446 2.52733 6.2500

Passives appear not to differ across non-translated, human-translated, and machine-translated literary texts.

5.2.2.6 Prepositional phrases

Prepositional phrases in this study include four types of prepositions that are possible to study with the help of the CLAWS part-of-speech tagger:

219

− IF—preposition "for"

− II—general preposition

− IO—preposition "of"

− IW—prepositions "with" and "without"

− Total prepositions [IF+II+IO+IW]

The results of one-way ANOVA for the four types of prepositions, as well as for the sum of these types, are presented in Table 5.35. For preposition "for," there was a statistically significant difference between groups for literary texts as determined by one- way ANOVA (F(2,147) = 5.434, p = .0053). No statistically significant differences were found for other types of prepositions, as well as for the total of all prepositions included in this study (p > .05). The means, standard deviations, and medians for the three groups are presented in Table 5.37.

Table 5.35 Association of prepositional phrases with the method of text production (NT, HT, and MT) in the corpus of literary texts by one-way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum Groups df df of Squares IF 740.377 50.966 2 149 5.434 0.0053* II 14220.527 93.400 2 149 .486 0.6161 IO 15446.033 251.670 2 149 1.217 0.2990 IW 1237.813 41.749 2 149 2.566 0.0803 Prep. Phrases 44130.270 654.603 2 149 1.107 0.3334 Note: * indicates p-values significant at 0.05 alpha-level

220

Tukey's HSD post hoc test revealed that machine-translated literary texts are

characterized by a significantly lower number of prepositional phrases with "for" (5.4 ±

2) than non-translated texts (6.52 ± 2.12, p = .0294) and human-translated texts (6.73 ±

2.4, p = .0072) (see Tables 5.36 and 5.37). There was no statistically significant

difference in prepositional phrases with "for" between non-translated and human- translated texts (P = .8747). These results are presented in a bar graph in Fig. 5.15.

Table 5.36 Pairwise comparisons of prepositional phrases with "for" (NT, HT, and MT) for the corpus of literary texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means IF NT-HT -.21360 .43312 0.8747 NT-MT 1.11580* .43312 0.0294 HT-MT 1.32940* .43312 0.0072 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.37 Descriptive statistics for prepositional phrases in literary texts (N = 50) Method IF II IO IW Prepositions

NT Mean 6.5176 60.6120 18.2928 7.9664 93.3876

Std. 2.11601 9.87818 6.06014 2.62802 15.35747 Deviation

Median 6.7200 61.1200 18.3400 8.5050 93.6350

HT Mean 6.7312 62.3116 20.2060 9.2552 98.5036

Std. 2.40309 9.77259 10.52150 2.72462 16.31160 Deviation

Median 6.8650 62.8100 18.5100 9.4100 98.1950

221

MT Mean 5.4018 60.6646 21.4414 8.5288 96.0350

Std. 1.95378 9.75842 12.75390 3.17481 19.63004 Deviation

Median 5.2450 61.2800 19.5950 8.2900 95.2350

Fig. 5.15 Means and standard error (± 1 SE) for prepositional phrases with "for" in literary texts

Differences in the use of the preposition "for" may be related to differences between English and Russian linguistic systems. It appears that human translators use this preposition more frequently than machines, possibly due to the fact that its use might require the level of interpretation MT has not reached yet. In this respect, human translators bring their texts closer to English non-translated writing than do MT tools.

222

Interestingly, no statistically significant differences were found in the use of the preposition "of" in the literary corpus. This may be related to the fact that Russian literary texts do not favor genitive noun chains as much as other text types. Still, the average number of prepositional phrases with "of" was slightly higher in human and machine translations than in non-translated texts (but not statistically significant).

As the following example shows, MT tools might have trouble deciphering long noun chains in Russian literary texts and sometimes use prepositions incorrectly, which is beneficial to know for editors of MT output:

Russian source text: Переминание с ноги на ногу более молодого человека.

Опухшесть лица, общая нечеткость, расплывчатость облика более молодого

человека.

Human translation: The shifting from one foot to the other of the younger person. The bloating of the face, the general, fuzzy, unfocused appearance of the younger person.

Translated by Douglas Robinson

Machine translation: Pereminanie from one foot to the younger man. Opuhshest person, the general lack of clarity, vagueness appearance of a younger man.

(From More Elderly Person by Dmitry Danilov)

In this example, the MT tool had trouble with the genitive construction in

"переминание с ноги на ногу более молодого человека," and rendered it with the

223

preposition "to." Then, the MT tool chose an inappropriate meaning for the noun лица

(which can mean both "face" and "person") and did not find the translation for the word

опухшесть ('bloating'), which resulted in the omission of another genitive noun chain in

the MT output "opukhshest person."

Additionally, MT tools might be programmed to avoid using long noun chains

with the preposition "of," as is seen in the MT translation of "расплывчатость облика

более молодого человека" (two Genitive constructions: "расплывчатость облика" and

"облик … человека"). The MT tool avoided using two of-constructions and (incorrectly)

lumped one of them into a noun-chain in English: "vagueness appearance" instead of

"vagueness of appearance." This example also supports the idea that human translators

work more successfully to avoid long noun chains with prepositions "of": instead of

translating both genitive constructions with of-phrases, a human translator avoided one of

them: "unfocused appearance of the younger person." Of course, further research is

needed to be able to generalize these anecdotal observations.

5.3 Association of cohesive devices and other global textual features with the method

of text production: Newspaper corpus

5.3.1 Cohesive devices in non-translated, human-translated, and machine-translated texts in the newspaper corpus

5.3.1.1 Reference cohesive devices in the newspaper corpus

224

5.3.1.1.1 Pronominal cohesive devices represented by 3rd person pronouns

Pronominal cohesive devices expressed by 3rd person pronouns encompass the following

categories:

− PPH1—3rd person singular neuter personal pronoun ("it")

− PPHO1—3rd person singular objective personal pronoun ("him," "her")

− PPHO2—3rd person plural objective personal pronoun ("them")

− PPHS1—3rd person singular subjective personal pronoun ("he," "she")

− PPHS2—3rd person plural subjective personal pronoun ("they")

− Total of all pronominal cohesive devices expressed by 3rd person pronouns—

PPH1+PPHO2+PPHO3+PPHS1+PPHS2

For 3rd person pronouns in newspaper texts, one-way ANOVA indicated that at a significance level α = 0.05 (95% confidence level), there are statistically significant differences for singular neuter personal pronouns ("it"), plural subjective personal pronouns ("they"), and the total number of all 3rd person pronouns (see Table 5.38). This

is somewhat different from literary texts, which displayed no statistically significant

differences across the modes of production for the neuter personal pronoun "it." Both

literary and newspaper texts showed statistically significant differences across the modes

of production for plural subjective pronoun "they" and for the total number of 3rd person

pronouns.

225

Table 5.38 Association of 3rd person pronominal cohesive devices with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA Dependent Total Sum Between Between Total df F Sig. Variable of Squares Groups Sum Groups df of Squares PPH1 4067.903 238.131 2 149 4.570 0.0119*

PPHO1 251.864 3.834 2 149 1.136 0.3238

PPHO2 428.485 1.233 2 149 .212 0.8092 PPHS1 4056.013 50.977 2 149 .936 0.3947 PPHS2 1561.738 100.314 2 149 5.045 0.0076* Total Personal 13435.034 948.210 2 149 5.581 0.0046* Pronouns Note: * indicates p-values significant at 0.05 alpha-level To determine the pairs of conditions for which these statistically significant differences exist, post hoc Tukey's HSD multiple comparisons testing was performed for the variables that displayed statistically significant differences in the one-way ANOVA testing. The results of Tukey's HSD testing are presented in Table 5.39. Descriptive statistics for all pronominal cohesive devices are presented in Table 5.40.

Table 5.39 Pairwise comparisons of significant 3rd person pronominal cohesive devices (NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means PPH1 NT-HT 2.66840* 1.02084 0.0266 NT-MT 2.67720* 1.02084 0.0260 HT-MT .00880 1.02084 1.0000 PPHS2 NT-HT 1.18320 .63061 0.1492

226

NT-MT 1.99140* .63061 0.0054 HT-MT .80820 .63061 0.4078 Total Personal NT-HT 4.31900 1.84331 0.0531 Pronouns NT-MT 5.96160* 1.84331 0.0043 HT-MT 1.64260 1.84331 0.6468 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.40 Descriptive statistics for 3rd person pronominal cohesive devices in newspaper texts (N = 50) Total Personal Method PPH1 PPHO1 PPHO2 PPHS1 PPHS2 Pronouns

NT Mean 10.65 0.58 1.36 4.73 4.68 21.99

Std. 6.06 1.11 1.52 6.57 3.48 10.36 Deviation

Median 10.34 0.00 1.13 2.24 4.12 21.63

HT Mean 7.98 0.97 1.51 3.72 3.50 17.67

Std. 4.99 1.68 1.90 5.21 3.17 9.61 Deviation

Median 6.82 0.00 1.24 1.68 2.83 15.68

MT Mean 7.97 0.73 1.29 3.35 2.69 16.03

Std. 4.06 0.99 1.68 3.38 2.77 7.42 Deviation

Median 7.33 0.00 1.08 2.26 2.38 14.64

227

For 3rd person singular neuter personal pronoun "it" (PPH1), the Tukey's HSD

post hoc test revealed that their use was significantly higher for non-translated newspaper

texts (10.65 ± 6.06) than for both human-translated texts (7.98 ± 4.99, p = .0266) and

machine-translated texts (7.97 ± 4.06, p = .0260) (see Tables 5.39 and 5.40). These

results are visualized in Fig. 5.16.

Fig. 5.16 Means and standard error (± 1 SE) for 3rd person singular neuter pronominal cohesive device ("it") in newspaper texts

As we can see from the graph in Fig. 5.16 and the descriptive statistics in Table

5.40, HT and MT newspaper texts are very close in terms of their use of the 3rd person singular neuter pronoun "it." The first possible explanation that comes to mind is that both human translators and MT tools treat the use of this pronoun similarly. However, this might not reflect why these numbers turned out to be so close. As noted above, MT

228

tools might have trouble interpreting pronominal relationships in a text, as the example

below illustrates:

Russian source text: Прозападные демократы слишком слабы и

непопулярны и не смогут удержать власть, даже если она сама упадет им в руки, …

Human translation: Pro- democrats are too weak and unpopular and will not be able to retain power, even if it lands in their hands.

Machine translation: Pro-Western Democrats are too weak and unpopular, and will not be able to retain power, even if she falls into their hands, …

(From The Honeymoon’s Over, But No Divorce Yet, by Kiril Zubkov, Izvestia)

In this example, the MT tool referred to the word "power" as "she," since the

Russian source text uses the pronoun она (which means "she" or "it," depending on

whether the noun referred to is animate or inanimate).

The statistically significant differences between non-translated newspaper texts

and both human- and machine-translated newspaper texts might stem from the fact that

the use of "it" in English and Russian is different. In Russian, we need to express

grammatical gender for both animate and inanimate objects represented by this pronoun,

and its form (он/она/оно) depends on the gender of these animate and inanimate objects.

This is not the case for English. Additionally, in English, "it" is used in constructions

where it takes the place of a subject (not the case for Russian), as illustrated in the

example below.

229

Russian source text: Неверно сравнивать нынешний размер долга с

обязательствами США в 1945-1947 гг.: тогда речь шла о золотых долларах, а сейчас

- о бумажных; тогда отношение фондовых рынков к ВВП составляло 30-40%, а

сегодня - 130-160%; тогда международный финансовый рынок был пигмеем по

сравнению с тем, каким он стал в наши дни.

Human translation: It is incorrect to compare the current debt value with the

U.S. liability in 1945-1947. Then it was about the gold dollar and now it is about the

paper dollar; then the stock ratio was 30-40 percent of GDP, now it is 130-160 percent and then the global financial market pales in comparison with what it is today.

Machine translation: Incorrect to compare the current size of the debt obligations of the United States in 1945-1947.: Then it was a gold dollar, and now - on

paper, then the ratio of stock market to GDP ratio was 30-40%, and today - 130-160%, while international financial market was a pygmy compared to what it has become today.

(From Diet for the Dollar by Vladislav Inozemscev, Izvestia)

In this example, the Russian subjectless clause неверно is rendered by a human

translator as "it is incorrect," and by the MT tool, as "incorrect." The MT tool seems to

have failed to introduce "it" as a subject in this clause. To illustrate some similarities in

the use of the pronoun it in human and machine translations, this same example contains

the pronominal cohesive device он (in "каким он стал в наши дни"), which refers to the

230

term "международный финансовый рынок" and is (correctly) rendered as "it" in both

human and machine translations.

These examples suggest that in order to find out why we observe such differences

and similarities in the use of the pronoun "it," comparative research needs to be

performed. The clear differences between English and Russian grammatical systems are

likely to influence the use of the pronominal cohesive device "it" in translated and not-

translated English texts.

Interestingly, in literary texts, no statistically significant differences were found

for the use of the pronominal cohesive device "it." This might be related to the fact that

literary texts are less likely to use constructions requiring "it" as a formal subject.

For 3rd person plural subjective personal pronouns ("they"), the only pair that showed statistically significant difference by Tukey's HSD post hoc testing was non- translated newspaper texts—machine-translated newspaper texts (see Tables 5.39 and

5.40). Non-translated newspaper texts used a significantly higher number of 3rd person

subjective personal pronominal cohesive devices (4.68 ± 3.48) than machine-translated

newspaper texts (2.69 ± 2.77, p = .0054). No statistically significant differences were found for non-translated—human-translated and human-translated—machine-translated pairs. The bar chart in Fig. 5.17 represents these results graphically.

231

Fig. 5.17 Means and standard error (± 1 SE) for 3rd person plural subjective pronominal cohesive device ("they") in newspaper texts

Machine-translated newspaper texts displayed a statistically significant tendency to use fewer 3rd person plural subjective pronouns than non-translated newspaper texts.

The graph in Fig. 5.17 shows that of three groups of newspaper texts, machine-translated

texts used the fewest number of pronouns "they" (even though for the pair human

translation vs. machine translation, the difference was not statistically significant). This

might be related to the overall tendency of MT tools to avoid using pronominal cohesive

devices in order to reduce the possibility of introducing an error since the use of such

devices highly depends on the appropriate interpretation of cohesive ties in a text.

In the example below, the MT tool repeats the Russian они (referring to

"гуманитарные проблемы"/'humanitarian problems') when it is explicitly used in the

232

source text but does not introduce it when it is omitted in Russian, as the human translator does (see the second "they" in the human translation).

Russian source text: Должны ли мы предположить, что в этих странах нет

гуманитарных проблем или они освобождаются от критики, поскольку [___]

рассматриваются как важные рычаги уменьшения влияния России на

постсоветском пространстве или, возможно, для других геополитических целей?

Human translation: Should we assume that in these countries there are no

humanitarian problems or that they are free from criticism, because they could be

important levers to reduce Russian influence in the former Soviet Union, or possibly, for

other geopolitical ends?

Machine translation: Are we to assume that in these countries there is no

humanitarian problems, or they are exempt from criticism, because [____] as important

levers to reduce Russia's influence in the former Soviet Union, or perhaps for other

geopolitical objectives?

(From Can We Save the “Reset”? by Edward Lozansky, Nezavisimaia Gazeta)

Finally, for the total number of 3rd person cohesive pronominal devices, the mean for machine-translated newspaper texts (16.03 ± 7.42) was statistically significantly lower than for non-translated newspaper texts (21.99 ± 10.36, p = .0043) (see Tables 5.39 and 5.40). No statistically significant differences were detected for the non-translated—

233

human-translated and human-translated—machine-translated pairs. The bar chart in Fig.

5.18 represents these differences graphically.

Fig. 5.18 Means and standard error (± 1 SE) for the total of 3rd person pronominal cohesive devices in newspaper texts

This result is similar to the findings for literary texts, and supports the idea that

MT tools show a significantly lower number of 3rd person pronominal cohesive devices.

As noted above, the tendency of MT tools to avoid the use of pronominal devices might be related to the need to precisely interpret pronominal cohesive ties in a text, which might still be somewhat problematic for MT tools. In terms of consequences for translators and editors working with MT output, this might mean that MT output might benefit from introducing new pronominal cohesive devices. The ability of a human brain to interpret complex cohesive pronominal networks and thus introduce more pronominal cohesive devices might bring MT output closer to non-translated texts.

234

5.3.1.1.2 Pronominal cohesive devices represented by possessive pronouns

As mentioned earlier, possessive pronouns represent a large class of cohesive devices.

English resorts to their use quite frequently since most English nouns are grammatically

required to have modifiers. It should be noted that the CLAWS part-of-speech tagger marks all possessive pronouns with one tag—APPGE, and does not differentiate between first-, second-, and third-person possessive pronouns or between singular and plural categories. As discussed in Chapter 2, 1st and 2nd person pronouns are often exophoric

and refer to the world outside the actual text (e.g., to the author or readers). For this

reason, not all of them can be classified as pure cohesive devices as per Halliday and

Hasan, who understand cohesion as a phenomenon within a text.

The category of possessive pronouns is included in this study, since it is expected

to show statistically significant differences due to the fact that English and Russian have

different grammatical rules for the use of these pronouns. The findings on these pronouns

might add to the overall picture of differences between translated and non-translated

texts, as well as to the evidence pointing to Toury's law of interference.

For possessive pronouns in newspaper texts, one-way ANOVA indicated statistically significant differences (F(2,147) = 3.650, p = .0284) (see Table 5.41).

Table 5.41 Association of possessive pronouns with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA Dependent Total Sum Between Groups Between Total F Sig. Variable of Squares Sum of Squares Groups df df APPGE 5940.635 281.070 2 149 3.650 0.0284* Note: * indicates p-values significant at 0.05 alpha-level

235

A Tukey's HSD post hoc test revealed that the use of possessive pronouns in machine-translated newspaper texts (10.28 ± 5.55) is significantly lower than in non-

translated newspaper texts (13.63 ± 5.61, p = .0214) (see Tables 5.42 and 5.43). No

statistically significant difference was found for the other two pairs of conditions. Fig.

5.19 graphically shows means and standard errors for possessive pronouns in texts

produced by different methods.

Even though no statistically significant differences were found between human-

translated and machine-translated texts, the machine-translated texts displayed the lowest

mean for possessive pronouns across the three groups. This is similar to the results found

for literary texts, and might support the idea that MT tools avoid introducing new

possessives into translations to avoid errors related to incorrect interpretations of

cohesive ties in a text.

Table 5.42 Pairwise comparisons of possessive pronouns (NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means APPGE NT-HT 1.44940 1.24097 0.4742 NT-MT 3.34320* 1.24097 0.0214 HT-MT 1.89380 1.24097 0.2817 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level

236

Table 5.43 Descriptive statistics for possessive pronouns in newspaper texts (N = 50) Method Mean Std. Deviation Median

NT 13.63 5.61 12.21

HT 12.18 7.30 11.42

MT 10.28 5.55 10.02

Fig. 5.19 Means and standard error (± 1 SE) for possessive pronouns in newspaper texts

The example below illustrates how the MT tool uses the definite article "the" where the human translator preferred "his" (underlined).

Russian source text: Не случайно Барак Обама незадолго до объявления о

начале ___ предвыборной кампании, потратив на операцию свыше 600 миллионов

долларов (более 110 миллионов ушло только на крылатые ракеты Tomohawk),

отказался от прежнего участия в агрессии.

237

Human translation: It’s no coincidence that Barack Obama, having spent over

$600 million (more than $110 million gone only to Tomahawk cruise missiles) on the

operation, declined more involvement in aggression shortly before the announcement of

his election campaign.

Machine translation: Barack Obama is no accident shortly before the announcement of the election campaign, spending on the operation of more than $ 600

million (more than 110 million spent only on cruise missiles Tomohawk), declined from

the previous involvement in aggression.

(From NATO Is Ready for a Land Operation in Libya by Sergei Balmasov,

Pravda)

While using the definite article "the" instead of the pronoun "his" is a valid

strategy on the part of the MT tool, editors of MT output might keep in mind that adding

possessive pronouns and replacing some of the definite articles with such pronouns might

be beneficial for making translated texts sound more natural in English. Further research

is needed on whether readers are affected by the difference in the use of possessive

pronouns in non-translated, human-translated, and machine-translated texts.

5.3.1.1.3 Demonstratives

Demonstratives are represented by two types:

− DD1—singular determiners ("this," "that," "another")

− DD2—plural determiners ("these," "those")

238

For the newspaper corpus, no statistically significant differences between groups for either type of demonstratives were found, with p > .05 (see Table 5.44). The descriptive statistics for the use of demonstratives in the newspaper corpus are presented in Table 5.45.

Table 5.44 Association of demonstrative cohesive devices with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA Dependent Total Sum Between Between Total df F Sig. Variable of Squares Groups Sum Groups df of Squares DD1 2486.704 59.101 2 149 1.789 0.1707 DD2 511.276 1.565 2 149 0.226 0.7982 Table 5.45 Descriptive statistics for demonstrative cohesive devices in newspaper texts (N = 50) Method DD1 DD2

NT Mean 9.47 2.05

Std. Deviation 3.74 1.83

Median 9.76 1.40

HT Mean 9.21 2.24

Std. Deviation 4.30 2.01

Median 8.99 1.63

MT Mean 8.03 2.00

Std. Deviation 4.13 1.74

Median 7.50 1.64

239

This finding is different from the literary texts, which displayed statistically significant differences for singular determiners, with human-translated texts characterized by a higher number of singular demonstratives than non-translated and machine- translated texts. It might indicate that the use of determiners is closer across the three groups in newspaper texts.

In addition, the absence of statistically significant differences for determiners and other variables in this study might be related to the fact that newspaper articles are generally much shorter. The average length of the samples in the newspaper corpus was

807 words, which is lower than in the literary corpus (with the average of 1,986 words) and in the scientific corpus (with the average of 1,816 words). A larger newspaper corpus might have provided a better research field. In the future, it might be interesting to include only longer newspaper articles into the newspaper corpus and see if the results replicate (although, on the downside, longer articles would not be as representative of the genre, since newspaper writing tends to be brief).

5.3.1.1.4 Definite article "the" as cohesive device

For the newspaper corpus, there was a statistically significant difference between groups for the use of the definite article "the" as determined by one-way ANOVA (F(2,147) =

47.537, p < .0001) (see Table 5.46). A Tukey's HSD post hoc test revealed that non- translated newspaper texts contained a significantly lower number of definite articles

(61.56 ± 10.99) than human-translated newspaper texts (83.76 ± 18.54, p < .0001) and

240

machine-translated newspaper texts (88.91 ± 14.21, p < .0001). There was no statistically

significant difference between human-translated and machine-translated newspaper texts

in their use of "the." The graphical representation of means and standard errors for the

use of the definite article "the" across different groups of newspaper texts is given in Fig.

5.20.

Table 5.46 Association of definite article "the" with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA Dependent Total Sum Between Between Total df F Sig. Variable of Squares Groups Sum Groups df of Squares THE 53776.063 21120.480 2 149 47.537 < 0.0001 Note: * indicates p-values significant at 0.05 alpha-level Table 5.47 Pairwise comparisons of definite article "the" (NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means THE NT-HT -22.19580* 2.98092 < 0 .0001 NT-MT -27.34980* 2.98092 < 0 .0001 HT-MT -5.15400 2.98092 0.1979 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.48 Descriptive statistics for the use of the definite article "the" in newspaper texts (N = 50) Method Mean Std. Deviation Median

NT 61.56 10.99 61.62

HT 83.76 18.54 83.06

MT 88.91 14.21 92.13

241

Fig. 5.20 Means and standard error (± 1 SE) for definite article "the" in newspaper texts

As discussed in the section on the literary corpus above, a significantly higher number of definite articles in machine-translated texts compared to non-translated texts

may be related to a lower number of possessive pronouns in these texts. MT tools might

use definite articles as noun modifiers more often than they use possessive pronouns

since the latter require more accurate interpretation of textual cohesion in order to be used

correctly. Editors of MT output might keep this in mind and reduce the number of

definite articles in MT output by replacing them with possessive pronouns,

demonstratives, nouns in the possessive case, or zero article. This would improve the

cohesiveness of machine-translated texts.

In the example below, the MT tool introduced a definite article when slightly misinterpreting the source text sentence. The human translator interpreted the source text

242

more appropriately and used the noun in the possessive case ("Obama's family" for

"семья Обамы").

Russian source text: Все знают, что в семье Обамы именно Мишель играет

первую скрипку.

Human translation: Everyone knows that in Obama’s family, it is Michelle who plays first fiddle.

Machine translation: Everyone knows that the family is Michelle Obama plays first violin.

(From Obama Is Not Only the President, He Is Also Michelle’s Hubby by Nikolai

Zlobin, Rossiiskaia Gazeta)

Interestingly, for the newspaper corpus, human-translated texts also used a significantly higher number of definite articles, compared to non-translated texts. This finding is different from the results for the literary corpus, where no statistically significant differences were found for non-translated and human-translated texts. This may be related to a possible use of MT tools by newspaper translators, or to the fact that newspaper texts undergo less editing than literary texts. Newspaper translations have to happen quickly, and it may potentially result in less checking and editing.

In addition, the finding that newspaper non-translated texts significantly differed in their use of the definite articles, using fewer of them (the average of 61.56 for non-

243

translated texts vs. 83.76 and 88.91 for human and machine translations, respectively)

might point to the universal tendency of translations toward explicitation.

5.3.1.1.5 Comparative cohesive devices

Comparative cohesive devices are represented by the following groups of comparative

adjectives and adverbs:

− JJR—general comparative adjective (e.g., "older," "better," "stronger")

− JJT—general superlative adjective (e.g., "oldest," "best," "strongest")

− RGR—comparative degree adverb ("more," "less")

− RGT—superlative degree adverb ("most," "least")

− Total of all comparatives [JJR+JJT+RGR+RGT]

For the corpus of newspaper articles, one-way ANOVA showed statistically

significant differences between non-translated, human-translated, and machine-translated

texts for general comparative adjectives and superlative degree adverbs (see Table 5.49).

Table 5.49 Association of comparative cohesive devices with the method of text

production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA

Dependent Total Sum Between Groups Between Total df F Sig. Variable of Squares Sum of Squares Groups df JJR 422.769 28.439 2 149 5.301 0.0060* JJT 278.170 1.780 2 149 .473 0.6238 RGR 308.225 5.523 2 149 1.341 0.2648 RGT 374.840 16.869 2 149 3.464 0.0339* Total 1545.394 23.970 2 149 1.158 0.3170 Comparatives Note: * indicates p-values significant at 0.05 alpha-level

244

For general comparative adjectives, Tukey's HSD post hoc test revealed that non- translated newspaper texts contained a significantly higher number of comparative adjectives (2.09 ± 1.95) than human-translated newspaper texts (1.12 ± 1.61, p = .0097) and machine-translated newspaper texts (1.23 ± 1.28, p = .0246) (see Tables 5.50 and

5.51). These results are graphically presented in Fig. 5.21.

Table 5.50 Pairwise comparisons of general comparative adjective cohesive devices (NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means JJR NT-HT .97260* .32757 0.0097 NT-MT .86540* .32757 0.0246 HT-MT -.10720 .32757 0.9427 RGT NT-HT -.75860* .31210 0.0428 NT-MT -.65220 .31210 0.0955 HT-MT .10640 .31210 0.9380 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.51 Descriptive statistics for comparative devices in newspaper texts (N = 50) Total Comparative Method JJR JJT RGR RGT Devices

NT Mean 2.09 1.21 1.51 0.42 5.23

Std. 1.95 1.37 1.37 0.74 2.73 Deviation

Median 1.40 1.12 1.31 0.00 4.93

HT Mean 1.12 0.97 1.05 1.18 4.32

245

Std. 1.61 1.29 1.56 1.81 3.39 Deviation

Median 0.71 0.38 0.00 0.38 3.29

MT Mean 1.23 0.98 1.17 1.07 4.45

Std. 1.28 1.45 1.36 1.86 3.48 Deviation

Median 1.14 0.00 0.88 0.00 3.59

Fig. 5.21 Means and standard error (± 1 SE) for general comparative adjectives in newspaper texts

This result is interesting since it might suggest that non-translated newspaper writing tends to use comparative adjectives more frequently than Russian newspaper writing (if we assume that the statistically lower averages for both groups of translated texts reflect the situation in the Russian language, which is in agreement with Toury's law

246

of interference). This could be interpreted as a sign that non-translated English writing for editorials and commentary favors comparisons. However, it should be noted that the averages for all three groups were rather low (if we round the numbers, we get the average of 2 for non-translated texts and the average of 1 for human-translated and machine-translated texts).

For superlative degree adverbs, Tukey's HSD post hoc test revealed that non- translated newspaper texts contained a significantly lower number of superlative adverbs

(0.42 ± 0.74) than human-translated newspaper texts (1.18 ± 1.81, p = .0428) (see Tables

5.50 and 5.51). These results are graphically presented in Fig. 5.22.

Fig. 5.22 Means and standard error (± 1 SE) for superlative degree adverbs in newspaper texts

247

No statistically significant difference at the .05 level was found for non-translated

and machine-translated newspaper texts, even though the means for human-translated and

machine-translated texts were rather close (1.18 ± 1.81 for human-translated and 1.07 ±

1.86 for machine-translated).

Combining both statistically significant results for comparatives might suggest

that compared with newspaper articles translated from Russian, non-translated English

texts tend to avoid superlative adverbs, relying instead on comparative adjectives. Still, it

is worth noting that the usage of comparative adjectives and adverbs in all newspaper text

types—non-translated, human-translated, and machine-translated—was low. In fact, the

low averages might suggest that comparatives are not frequently used in newspaper

writing as newspaper articles normally concentrate on reporting the facts rather than

comparing the facts.

5.3.1.1.6 Total number of reference cohesive devices

For the total number of reference cohesive devices in newspaper texts, there was a

statistically significant difference between groups as determined by one-way ANOVA

(F(2,147) = 23.043, p < .0001) (see Table 5.52).

248

Table 5.52 Association of the total number of reference cohesive devices with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA Dependent Total Sum Between Between Total df F Sig. Variable of Squares Groups Groups df Sum of Squares Total Reference 34077.170 8133.662 2 149 23.043 < 0.0001* Cohesive Devices Note: * indicates p-values significant at 0.05 alpha-level A Tukey's HSD post hoc test revealed that non-translated newspaper texts

contained a significantly lower number of reference cohesive devices (113.92 ± 13.03)

than human-translated newspaper texts (129.37 ± 14.99, p < .0001) and machine-

translated newspaper texts (129.70 ± 11.61, p < .0001) (see Tables 5.53 and 5.54). The

descriptive statistics for the total number of reference cohesive devices are presented in

Table 5.54. Figure 5.23 graphically presents these findings.

Table 5.53 Pairwise comparisons of the total number of reference cohesive devices (NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means Total Reference NT-HT -15.45320* 2.65696 < 0.0001 Cohesive Devices NT-MT -15.78320* 2.65696 < 0.0001 HT-MT -.33000 2.65696 0.9915 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level

249

Table 5.54 Descriptive statistics for the total number of reference cohesive devices in newspaper texts (N = 50) Method Mean Std. Deviation Median

NT 113.92 13.03 113.04

HT 129.37 14.99 128.53

MT 129.70 11.61 129.52

Fig. 5.23 Means and standard error (± 1 SE) for the total number of reference cohesive devices in newspaper texts

This finding is different from the results for the literary corpus, in which the machine-translated texts displayed a significantly lower number of reference cohesive devices compared to the other two groups. This may be related to differences in genre

conventions for non-translated texts—non-translated literary texts were found to use a

250

significantly higher number of references devices compared to non-translated newspaper

and scientific writing. However, this difference was not found in machine-translated texts, which may suggest that MT tools are less sensitive to genre conventions concerning the use of reference devices. Since this variable represents a sum of all reference cohesive devices, a clearer picture was seen when we looked at each category of reference cohesive devices separately.

5.3.1.2 Conjunction cohesive devices

This section presents data and analysis for the four types of conjunction cohesive devices, and then looks at the total number of such devices across the groups of texts in the newspaper corpus. For the convenience of reading, the four types of conjunction cohesive devices are presented below:

− Additive devices (e.g., "and," "or")—tagged as CC

− Adversative devices (e.g., "but")—tagged as CCB

− Causal and continuative devices represented by subordinating conjunctions (e.g.,

"if," "because," "unless," "so," "for")—tagged as CS

− Temporal devices (e.g., "now," "tomorrow")—tagged as RT

− Total of all conjunction cohesive devices [CC+CCB+CS+RT]

For newspaper texts, one-way ANOVA revealed statistically significant

differences in additive devices (CC) (F(2,147) = 5.868, p = .0035), adversative devices

(CCB) (F(2,147) = 8.606, p = .0003), subordinating conjunctions (CS) (F(2,147) = 3.845,

251

p = .0236), and the total number of conjunction cohesive devices (F(2,147) = 13.880, p <

.0001) (see Table 5.55). No statistically significant differences were found for temporal

devices (p = 0.4776).

Table 5.55 Association of conjunction cohesive devices with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum Groups df df of Squares CC 13287.747 982.432 2 149 5.868 0.0035* CCB 1662.896 174.300 2 149 8.606 0.0003* CS 2678.760 133.151 2 149 3.845 0.0236* RT 1306.570 13.071 2 149 .743 0.4776 13.88 Total Conj. 21110.036 3353.352 2 149 < 0.0001* 0 Note: * indicates p-values significant at 0.05 alpha-level For additive conjunction cohesive devices (CC) in newspaper texts, Tukey's HSD post hoc test revealed that non-translated texts contained a significantly higher number of additive cohesive devices (28.67 ± 8.75) than human-translated texts (22.43 ± 8.65, p =

.0024) (see Tables 5.56 and 5.57). There were no statistically significant differences in additive conjunction devices for other pairs. These results are presented graphically in

Fig. 5.24.

Table 5.56 Pairwise comparisons of conjunction cohesive devices (NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means CC NT-HT 6.23840* 1.82986 0.0024

252

NT-MT 2.58560 1.82986 0.3368 HT-MT -3.65280 1.82986 0.1167 CCB NT-HT 2.61300* .63644 0.0002 NT-MT 1.63540* .63644 0.0299 HT-MT -.97760 .63644 0.2771 CS NT-HT 1.97140* .83228 0.0499 NT-MT 2.02480* .83228 0.0425 HT-MT .05340 .83228 0.9977 Total Conj. NT-HT 11.54420* 2.19812 < 0.0001 NT-MT 6.57780* 2.19812 0.0091 HT-MT -4.96640 2.19812 0.0649 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.57 Descriptive statistics for conjunction cohesive devices in newspaper texts (N = 50) Total Conjunctive Method CC CCB CS RT Devices NT Mean 28.67 6.69 8.29 4.38 48.03 Std. 8.75 3.82 3.62 2.66 8.43 Deviation Median 27.22 5.90 7.88 4.14 48.28 HT Mean 22.43 4.08 6.32 3.66 36.49 Std. 8.65 2.76 4.39 3.13 11.29 Deviation Median 21.26 3.66 5.58 3.20 34.76 MT Mean 26.08 5.05 6.27 4.05 41.45 Std. 9.99 2.86 4.43 3.09 12.81 Deviation Median 25.01 4.82 5.68 3.70 39.45

253

Fig. 5.24 Means and standard error (± 1 SE) for additive conjunction devices in newspaper texts

This result is different from the data for literary texts, where machine-translated texts contained a significantly higher number of additive cohesive devices. This, again, might point to genre differences across literary and newspaper texts in English, as well as to such differences in Russian versus English. A lower number of additive devices may also point at the universal of simplification in translation.

Another possibility is that human translators might avoid using additive cohesive devices in their work and choose to further interpret the meaning of the source text, reflecting their interpretations in translations. This might be viewed as evidence in support of explicitation in translation. The following example, in which the Russian

254

additive conjunction и ('and') is replaced in the human-translated text by "yet," illustrates this possibility:

Russian source text: …этот договор уникален в ряду разоруженческих

договоров тем, что совершенно невыгоден США и очень выгоден России… .

Human translation: This treaty is unique for disarmament treaties in that it is

entirely unbeneficial for the United States yet very beneficial for Russia… .

Machine translation: …this agreement is unique in a number of disarmament

treaties that the U.S. completely unprofitable and very profitable Russia… .

(From The Chinese Secret of American Civility by Alexsandr Khramchikhin,

Izvestiia)

For adversative cohesive devices in newspaper texts, Tukey's HSD post hoc test

revealed that non-translated texts contained a significantly higher number of these

devices (6.69 ± 3.82) than human-translated (4.08 ± 2.76, p = .0002) and machine-

translated texts (5.05 ± 2.86, p = .0299). No statistically significant difference was found

for human-translated and machine-translated texts (p = .2771) (see Tables 5.56 and 5.57).

These results are visually presented in Fig. 5.25.

255

Fig. 5.25 Means and standard error (± 1 SE) for adversative conjunction devices in newspaper texts

These results may suggest differences in the use of adversative conjunction devices in newspaper texts in English and Russian, if we assume that both human and machine translations are influenced by what is found in Russians source texts. It is interesting to note that for literary texts, the situation was the opposite—non-translated

English literary texts contained a significantly lower number of adversative devices compared with machine-translated literary texts. This points to genre differences in both

English and Russian across literary and newspaper texts.

For causal and continuative cohesive devices realized through subordinating conjunctions, Tukey's HSD post hoc test revealed that non-translated newspaper texts contained a significantly higher number of these devices (8.29 ± 3.62) than human-

256

translated (6.32 ± 4.39, p = .0499) and machine-translated newspaper texts (6.27 ± 4.43, p = .0425). No statistically significant difference was found for human-translated vs.

machine-translated newspaper texts (p = .9977) (see Tables 5.56 and 5.57). These results

are visually presented in Fig. 5.26.

Fig. 5.26 Means and standard error (± 1 SE) for causal and continuative conjunction devices in newspaper texts

If Toury's law of interference is assumed to be at work, these differences may be

viewed as reflecting differences in cohesion expressed with causal and continuative

conjunction devices in English and Russian. Russian newspaper writers might rely more

on other types of conjunctions or different means of cohesion, which might be reflected

in the English translations of newspaper articles. Further research is needed to pin-point

the causes.

257

For the total number of conjunctive cohesive devices in newspaper texts, Tukey's

HSD post hoc test revealed that non-translated newspaper texts contained a significantly higher number of these devices (48.03 ± 8.43) than human-translated (36.49 ± 11.29, p <

.0001) and machine-translated newspaper texts (41.45 ± 12.81, p = .0091). Our p < .05 test of statistical significance was just missed for human and machine translations (p =

.0649) (see Tables 5.56 and 5.57). These results are visually presented in Fig. 5.27.

Fig. 5.27 Means and standard error (± 1 SE) for the total number of conjunction devices in newspaper texts

This result is not surprising, since the means for additive, adversative, and subordinating conjunctive devices have been found to be significantly higher in non- translated texts than in either human- or machine-translated texts. As mentioned above,

258

this might suggest differences in the use of conjunction devices in English and Russian,

with Russian relying more on other means of cohesion.

Also, this finding might support the translation universal of simplification if

further research finds that constructions with conjunctions get simplified in translation.

This, however, may not be the case for machine translation, which usually sticks closer to

the source text in terms of such textual elements as conjunctions. In the following

example, the conjunction но ('but') is omitted in the human-translated text:

Russian source text: Но пока треть респондентов уверены, что политика

Обамы только ухудшает состояние дел в экономике.

Human translation: ___ For the time being, one third of respondents believe that

Obama's policy worsens the state of affairs in the economy.

Machine translation: But while one-third of respondents believe that Obama's

policy only affects the state of the economy.

(From Obama Is Losing Voters by Andrey Terekhov, Nezavisimaia Gazeta)

In the following example, both human and machine translation use the

construction "which is" in place of the Russian conjunction а ('but'):

Russian source text: Растет разочарование политикой демократов в сфере

экономики, а это главный вопрос кампании.

259

Human translation: There is growing frustration with Democratic policies in the

economic sphere, which is at the heart of the Democratic campaign and the national

elections.

Machine translation: There is growing frustration Democratic policies in the

economic sphere, which is the main issue campaign.

(From Obama Is Losing Voters by Andrey Terekhov, Nezavisimaia Gazeta)

5.4.1.3 Reference and conjunction cohesive devices in the newspaper corpus

For the total number of reference and conjunction cohesive devices included into this study, one-way ANOVA indicated a statistically significant difference (F(2,147) =

10.286, p < .0001) for newspaper texts (see Table 5.58). This result is similar to the findings for the literary corpus. As noted earlier, this mean does not represent the total number of all cohesive devices in texts—the study does not include cohesive devices that are best studied by manual analysis (e.g., lexical cohesion). While the finding that this parameter displays statistically significant differences is interesting, individual statistical tests for concrete types of cohesive devices are more informative.

Table 5.58 Association of the total reference and conjunction cohesive devices with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one- way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum Groups df df of Squares Total Reference & 16841.392 2067.570 2 149 10.286 0.0001* Conj. Devices Note: * indicates p-values significant at 0.05 alpha-level

260

Tukey's HSD post hoc test revealed that non-translated newspaper texts contained a significantly higher number of total reference and conjunction cohesive devices (37.09

± 9.07) than human-translated (28.02 ± 9.89, p < .0001) and machine-translated newspaper texts (31.98 ± 11.02, p = .0315) (see Tables 5.59 and 5.60). There was no statistically significant difference in the total for reference and conjunction devices between human- and machine-translated newspaper texts. These results are presented in a bar graph in Fig. 5.28.

Table 5.59 Pairwise comparisons of the total reference and conjunction cohesive devices (NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means Total Reference & NT-HT Conjunction 9.06960* 2.00502 < 0.0001 Devices NT-MT 5.11260* 2.00502 0.0315 HT-MT -3.95700 2.00502 0.1224 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.60 Descriptive statistics for total reference and conjunction cohesive devices in newspaper texts (N = 50) Method Mean Std. Deviation Median

NT 37.09 9.07 35.85

HT 28.02 9.89 26.18

MT 31.98 11.02 29.90

261

Fig. 5.28 Means and standard error (± 1 SE) for total reference and conjunction cohesive devices in newspaper texts

In agreement with Toury's law of interference, these findings may point to differences in the use of cohesive devices in English and Russian. To the extent that translations are influenced by their source texts, the findings may indicate that Russian newspaper texts use fewer reference and conjunction devices than English newspaper texts. One possible explanation contributing to this may be that, instead of verbally expressed cohesive devices, Russian newspaper writers make use of punctuation marks.

For instance, Bystrova-McIntyre (2007: 146) finds that Russian editorials use significantly more colons and dashes than English editorials. Indeed, in Russian, colons and dashes often imply causal relationships, as in the example below:

262

Russian source text: Да и само слово Палестина вызывает в Израиле ужас и

отвращение – там, отказывая палестинцам в праве на существование, хотят

называть их просто «арабами».

Close English translation: And the very word "Palestine" causes in Israel horror

and disgust, because there (in Israel), refusing the Palestinians the right to exist, they just

want to call them (the Palestinians) "Arabs."

(From Why Do They Want Blood? by Maksim Shevchenko, Vzgliad)

It should be noted that the findings for the newspaper corpus are different from

the previously described findings for the literary corpus. Interestingly, non-translated

literary and newspaper texts contained a similar number of total reference and

conjunction devices (37.59 and 37.09, respectively). However, for the literary corpus, the

numbers of total reference and conjunction devices were higher for the human-translated

and machine-translated texts than for the non-translated texts, while for the newspaper

corpus, these numbers were reversed, with literary human- and machine-translated texts

having lower total reference and conjunction devices than non-translated newspaper texts. This might point to greater differences across literary and newspaper genres for

Russian, which may be an interesting research topic for scholars working with Russian

writing genres. It also has implications for translators working from Russian into English,

who might want to use more reference and conjunction devices in their translations.

263

The findings presented in this section may be used to design various research

projects related to translation and genre studies.

5.3.2 Global textual features in non-translated, human-translated, and machine- translated texts in the newspaper corpus

The study covers six features that characterize texts globally—nominalization, lexical density (as standardized type-token ratio), average word length, average sentence length, passives, and prepositional phrases. This section reviews their usage in the corpus of newspaper texts.

5.3.2.1 Nominalization

As determined by one-way ANOVA, there was a statistically significant difference in nominalization between groups for the newspaper corpus (F(2,147) = 4.916, p = .0086)

(see Table 5.61).

Table 5.61 Association of nominalization with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum Groups df df of Squares Nominalization 17559.722 1100.886 2 149 4.916 0.0086* Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD post hoc test revealed that non-translated newspaper texts are characterized by a significantly lower instances of nominalization (20.77 ± 8.28) than human-translated (26.54 ± 12.03, p = .0197) and machine-translated newspaper texts

264

(26.50 ± 11.07, p = .0206) (see Tables 5.62 and 5.63). There was no statistically significant difference in nominalization between human-translated and machine- translated newspaper texts (in fact, the means were very close—26.54 and 26.50, respectively). These results are presented in a bar graph in Fig. 5.29.

Table 5.62 Pairwise comparisons of nominalization (NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means Nominalization NT-HT -5.76380* 2.11627 0.0197 NT-MT -5.72980* 2.11627 0.0206 HT-MT .03400 2.11627 0.9999 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level

Table 5.63 Descriptive statistics for nominalization in newspaper texts (N = 50) Method Mean Std. Deviation Median

NT 20.77 8.28 19.42

HT 26.54 12.03 23.90

MT 26.50 11.07 24.26

265

Fig. 5.29 Means and standard error (± 1 SE) for nominalization in newspaper texts

Nominalization is among the issues commonly mentioned by Russian into English translators, and these findings confirm it. Russian favors nominalization, and translators often have to find ways to re-cast Russian sentences to make them more suitable for

English readers. The finding that both human and machine translations of Russian newspaper texts contain a significantly higher number of nominalizations than non- translated English texts may be explained by Toury's law of interference. In the translated newspaper corpus, nominalizations may have "infiltrated" translations from the Russian

source texts.

Based on these findings, it might be suggested that translators and post-editors working from Russian into English in the genre of newspaper editorials reduce the occurrence of nominalization in target texts. One of the strategies might be the use of

266

verbal forms, which are more typical for English, as is done in the example below in the

human translation:

Russian source text: Опросы же показывают, что в случае проведения

выборов завтра Обама проиграл бы Кейну 41 против 43 процентов.

Human translation: Polls also show that if the election were held tomorrow,

Obama would lose to Cain 41 to 43 percent.

Machine translation: Polls also show that in the case of an election tomorrow,

Obama would have lost Kane 41 to 43 percent.

(From The President’s Image Was Stolen by Vladislav Vorobiev, Rossiiskaia

Gazeta)

In terms of genre, Russian literary texts may be less likely to use nominalization,

since both human and machine translations contained a lower number of nominalizations

(6.88 and 7.40, respectively), well below the 26.54 and 26.50 found in human and

machine translations, respectively, in the newspaper texts.

5.3.2.2 Lexical density

To measure of lexical density, this study used standardized type-token ratio (STTR) calculated by WordSmith Tools. As determined by one-way ANOVA, there was a statistically significant difference between groups (F(2,147) = 21.759, p < .0001) (see

Table 5.64).

267

Table 4.64 Association of STTR with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum Groups df df of Squares STTR 1294.580 295.704 2 149 21.759 < 0.0001* Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD post hoc test revealed that non-translated newspaper texts are characterized by a significantly higher STTR (75.61 ± 1.97) than human-translated (72.80

± 2.91, p < .0001) and machine-translated newspaper (72.49 ± .84, p < .0001) (see Tables

4.65 and 4.66). There was no statistically significant difference in STTR between human- translated and machine-translated newspaper texts. These results are presented in a bar graph in Fig. 5.30.

Table 4.65 Pairwise comparisons of STTR (NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means STTR NT-HT 2.81640* .52135 < 0.0001 NT-MT 3.11760* .52135 < 0.0001 HT-MT .30120 .52135 0.8322 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 4.66 Descriptive statistics for STTR in newspaper texts (N = 50) Method Mean Std. Deviation Median NT 75.61 1.97 75.60 HT 72.80 2.91 73.00 MT 72.49 2.84 72.92

268

Fig. 5.30 Means and standard error (± 1 SE) for STTR in newspaper texts

These findings mirror the ones for the literary corpus, where non-translated texts

displayed a significantly higher STTR as well. As described above, this is consonant with

Laviosa's results (1998a, 1998b), who reports that English translated texts are

characterized by a narrower range of vocabulary compared to non-translated texts.

Laviosa views this as evidence of simplification as a translation universal. Unlike

Laviosa's studies, this study includes machine-translated texts as well, and thus suggests the possibility of expanding the concept of simplification to machine-translated texts.

This study focuses on Russian into English pair only, and thus it would be interesting to include MT into studies performed on other language pairs.

Another possibility for a lower STTR in translations from Russian into English is a likely lower range of vocabulary in the Russian language, as was discussed in 4.3.2.2.

269

5.3.2.3 Average word length

For average word length, there was no statistically significant difference between groups

for the genre of newspaper texts as determined by one-way ANOVA (F(2,147) = 1.007, p

= .3678) (see Table 5.67). The means, standard deviations, and medians for the three groups are presented in Table 5.68.

Table 5.67 Association of average word length with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA Dependent Total Sum of Between Between Total F Sig. Variable Squares Groups Sum of Groups df df Squares Word Length 8.346 .113 2 149 1.007 0.3678

Table 5.68 Descriptive statistics for average word length in newspaper texts (N = 50) Method Mean Std. Deviation Median

NT 4.90 0.25 4.91

HT 4.95 0.25 4.98

MT 4.89 0.21 4.90

These results are similar to the corpus of literary texts, where no statistically

significant differences between groups were found.

5.3.2.4 Average sentence length

For average sentence length, there was no statistically significant difference between

groups for newspaper texts as determined by one-way ANOVA (F(2,147) = .386, p =

270

.6808) (see Table 5.69). The means, standard deviations, and medians for the three groups are presented in Table 5.70.

Table 5.69 Association of average sentence length with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum Groups df df of Squares Sentence 1786.205 9.320 2 149 .386 0.6808 Length

Table 5.70 Descriptive statistics for average sentence length in newspaper texts (N = 50) Method Mean Std. Deviation Median

NT 20.71 3.45 20.22

HT 20.24 3.21 19.91

MT 20.13 3.75 19.50

These results are similar to the corpus of literary texts, where no statistically significant differences between groups were found. This finding is different from

Laviosa's findings of a lower average sentence length in translated newspaper texts, although she did not confirm this hypothesis for literary texts (1998a, 1998b). Laviosa suggests that her findings for newspaper texts may support the translation universal of explicitation. Interestingly, the average sentence lengths for all methods of text production were higher for newspaper texts than for literary texts—over 20 words per sentence for newspaper texts vs. 16-17 words per sentence for literary texts, with the differences being statistically significant.

271

5.3.2.5 Passives

For passives, there was no statistically significant difference between groups for

newspaper texts as determined by one-way ANOVA (F(2,147) = 1.675, p = .1909) (see

Table 5.71). The means, standard deviations, and medians for the three groups are presented in Table 5.72. These results are similar to the corpus of literary texts, where no statistically significant differences between groups were found.

Table 5.71 Association of passives with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum Groups df df of Squares Passives 2828.462 63.024 2 149 1.675 0.1909

Table 5.72 Descriptive statistics for passives in newspaper texts (N = 50) Method Mean Std. Deviation Median

NT 8.52 4.40 8.48

HT 9.69 4.43 9.02

MT 8.17 4.18 8.00

This study showed no statistically significant difference in the frequency of usage

of passives in newspaper texts across three methods of production. Newspaper reporters

are taught to avoid using passive phrases, such as "the Tigers were victorious over the

Red Sox" in favor of active phrases, such as "the Tigers triumphed over the Red Sox," when reporting about events.

272

5.3.2.6 Prepositional phrases

This study includes four types of prepositions that yield themselves to an automated

analysis:

− IF—preposition "for"

− II—general preposition

− IO—preposition "of"

− IW—prepositions "with" and "without"

− Total of all prepositions [IF+II+IO+IW]

The results of one-way ANOVA for the four types of prepositions, as well as for

the sum of these types, are presented in Table 4.73. For the preposition "of," there was a

statistically significant difference between groups for newspaper texts as determined by

one-way ANOVA (F(2,147) = 27.093, p < .0001). The one-way ANOVA also showed a

statistically significant difference for the total number of prepositional phrases (F(2,147)

= 14.107, p < .0001). No statistically significant differences were found for other types of prepositions (p > .05). The means, standard deviations, and medians for the three groups are presented in Table 5.75.

Table 5.73 Association of prepositional phrases with the method of text production (NT, HT, and MT) in the corpus of newspaper texts by one-way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum Groups df df of Squares IF 2805.866 26.025 2 149 .688 0.5041

273

II 19596.586 381.040 2 149 1.457 0.2362 IO 20288.890 5464.453 2 149 27.093 < 0.0001* IW 1499.258 36.001 2 149 1.808 0.1675 Total Prepositional 53412.647 8600.914 2 149 14.107 < 0.0001* Phrases Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD post hoc test revealed that non-translated newspaper texts are characterized by a significantly lower number of prepositional phrases with of (29.14 ±

8.23) than human-translated (39.03 ± 11.53, p < .0001) and machine-translated newspaper texts (43.60 ± 10.09, p < .0001) (see Tables 5.74 and 5.75). The difference in prepositional phrases with "of" between non-translated and human-translated newspaper texts (p = .0622) fell just outside our p < .05 criteria for significant differences. These results are presented in a bar graph in Fig. 5.31.

Table 5.74 Pairwise comparisons of prepositional phrases with "of" (NT, HT, and MT) for the corpus of newspaper texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means IO NT-HT -9.88880* 2.00845 < 0.0001 NT-MT -14.46240* 2.00845 < 0.0001 HT-MT -4.57360 2.00845 0.0622 Total NT-HT Prepositional -15.36520* 3.49194 0.0001 Phrases NT-MT -16.68040* 3.49194 < 0.0001 HT-MT -1.31520 3.49194 0.9248 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level

274

Table 5.75 Descriptive statistics for prepositions in newspaper texts (N = 50) Total Method IF II IO IW Prepositions

NT Mean 9.26 65.23 29.14 6.13 109.77

Std. 4.92 9.16 8.23 3.00 12.47 Deviation

Median 8.66 65.32 28.94 5.50 112.26

HT Mean 9.67 69.11 39.03 7.33 125.14

Std. 3.90 13.62 11.53 3.23 22.08 Deviation

Median 9.32 69.59 37.97 7.14 125.72

MT Mean 8.65 67.57 43.60 6.62 126.45

Std. 4.17 11.08 10.09 3.23 16.48 Deviation

Medi 8.1 44. 67.88 6.32 127.30 an 6 13

275

Fig. 5.31 Means and standard error (± 1 SE) for prepositional phrases with "of" in newspaper texts

For the total number of prepositional phrases in newspaper texts, Tukey's HSD post hoc test revealed that non-translated newspaper texts are characterized by a significantly lower number of prepositional phrases (109.77 ± 12.47) than human- translated (125.14 ± 22.08, p = .0001) and machine-translated newspaper texts (126.45 ±

16.48, p < .0001) (see Tables 5.74 and 5.75). There was no statistically significant difference in prepositional phrases between human and machine translations (P = .9248).

These results are presented in a bar graph in Fig. 5.32.

276

Fig. 5.32 Means and standard error (± 1 SE) for the total number of prepositional phrases in newspaper texts

These findings are likely related to the differences found for the use of the preposition "of" (see possible explanations above). This example is a good illustration of how a granular analysis typically reveals more concrete and thus informative results than a "lumped" analysis.

As discussed earlier, Russian allows for noun chains with genitive meaning. With

Russian being a synthetic language, such chains do not require prepositions. Since

English is an analytical language, genitive noun chains may not work without a preposition, typically the preposition "of." Translators, especially novices, often struggle when rendering long Russian noun chains into English, because the use of multiple prepositions "of" in English writing might lead to clumsy constructions.

277

For this reason, it comes as little surprise that human and machine translations

from Russian contained a statistically significantly higher number of prepositional

phrases with "of" than non-translated newspaper writing. For machine-translated newspaper texts, the number was slightly higher than for human-translated newspaper texts, although no statistically significant difference was found.

These findings suggest that linguists working with newspaper texts should try to rephrase long Russian noun phrases in a way that avoids repeating the preposition "of" frequently. According to the findings, this advice seems to be relevant for both human translators and editors of MT output. The following example illustrates how a machine translation repeated the preposition "of" but a human translator avoided that repetition:

Russian source text: — Если иранские власти решатся на эмбарго, то их

страна сразу потеряет порядка $600 млн в день при цене $100 за баррель, ведь в ЕС

ежесуточно идет 500–600 тыс. баррелей, — подсчитал иранист, эксперт Института

востоковедения РАН Владимир Сажин.

Human translation: If the Iranian authorities decide to impose an embargo, it

will immediately lose around $600 million a day at the price of $100 per barrel because

the EU receives a daily shipment 500-600,000 barrels, according to Vladimir Sazhin, an

expert from the Institute of Oriental Studies.

Machine translation: - If the Iranian authorities have solved the embargo, the

country once they lose about $ 600 million a day at a cost of $ 100 per barrel, because the

278

EU is a daily 500-600 thousand barrels - calculated Iranologist, expert of the Institute of

Oriental Studies, Vladimir Sazhin.

(From Tehran Threatens to Deprive Europe of Oil for a Long Time by Konstantin

Volkov, Izvestia)

In this example, MT uses the preposition "of" twice in "expert of the Institute of

Oriental Studies," while a human translator avoids repeating "of": "an expert from the

Institute of Oriental Studies." The latter seems to be a better rendering for the Russian

noun chain "эксперт Института востоковедения."

In addition, editors of MT output should be on the lookout for potential

difficulties MT tools might encounter with Russian noun chains. The following example

from the newspaper corpus illustrates this point:

Russian source text: Япония, приобретающая сейчас в Иране 10% нефти от

общего числа своих закупок, сокращает этот объем из-за давления со стороны

США.

Human translation: Japan, whose total imports include ten percent of oil from

Iran, has reduced this amount due to pressure from the United States.

Machine translation: Japan is now in Iran acquiring 10% of total oil of its

purchases, reduce this amount because of pressure from the United States.

(From Tehran Threatens to Deprive Europe of Oil for a Long Time by Konstantin

Volkov, Izvestia)

279

In this example, the MT tool had difficulty with the genitive nouns chains in

"10% нефти от общего числа своих закупок" (genitive constructions: "10% нефти"

and "общего числа … закупок"), and rendered it with a phrase "10% of total oil of its

purchases," which may be confusing to the target reader. Also, the MT tool repeated the

preposition "of" three times in one relatively short sentence. The human translator re-

worked the sentence and used the preposition "of" only once (see the sentence above).

It would be interesting to see if any statistically significant differences in the use

of the preposition "of" are found for the scientific texts, which are expected to favor long

noun chains in Russian as well.

5.4 Association of cohesive devices and other global textual features with the method

of text production: Scientific corpus

5.4.1 Cohesive devices in non-translated, human-translated, and machine-translated texts in the scientific corpus

5.4.1.1 Reference cohesive devices in the scientific corpus

5.4.1.1.1 Pronominal cohesive devices represented by 3rd person pronouns

This study examines pronominal cohesive devices expressed by 3rd person pronouns in the following categories:

− PPH1—3rd person singular neuter personal pronoun ("it")

− PPHO1—3rd person singular objective personal pronoun ("him," "her")

280

− PPHO2—3rd person plural objective personal pronoun ("them")

− PPHS1—3rd person singular subjective personal pronoun ("he," "she")

− PPHS2—3rd person plural subjective personal pronoun ("they")

− The total of all pronominal cohesive devices expressed by 3rd person pronouns

[PPH1+PPHO1+PPHO2+PPHS1+PPHS2]

In the scientific corpus, for 3rd person pronouns, one-way ANOVA indicated that at a significance level α = 0.05 (95% confidence level), there are statistically significant differences for singular neuter personal pronouns ("it"), singular objective personal pronouns ("him," "her"), and the total number of all 3rd person pronouns (see Table 5.76) between non-translated, human-translated and machine-translated texts. Newspaper texts also displayed a statistically significant difference for the use of the singular neuter personal pronoun "it," while the literary texts displayed no such difference. The statistically significant difference for singular objective personal pronouns ("him," "her") is not so definitive since the scientific texts use almost none of these pronouns. All three corpora—literary, newspaper, and scientific—displayed statistically significant differences for the total number of 3rd person pronouns between non-translated, human-

translated, and machine-translated texts.

Table 5.76 Association of 3rd person pronominal cohesive devices with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA Dependent Total Sum Between Between Total df F Sig. Variable of Squares Groups Sum Groups df of Squares PPH1 666.187 50.627 2 149 6.045 0.0030*

281

PPHO2 28.578 0.780 2 149 2.061 0.1309 PPHS1 14.369 0.179 2 149 0.927 0.3982 PPHS2 85.180 0.315 2 149 0.273 0.7616 Personal 873.926 42.005 2 149 3.711 0.0268* Pronouns Note: * indicates p-values significant at 0.05 alpha-level To determine the pairs of conditions for which these statistically significant differences exist, post hoc Tukey's HSD multiple comparisons testing was performed for the variables that displayed statistically significant differences in the one-way ANOVA

testing. The results of Tukey's HSD testing are presented in Table 5.77. Descriptive

statistics for all pronominal cohesive devices are presented in Table 5.78.

Table 5.77 Pairwise comparisons of significant 3rd person pronominal cohesive devices (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means PPH1 NT-HT -1.42180* 0.40927 0.0019 NT-MT -.76260 0.40927 0.1531 HT-MT 0.65920 0.40927 0.2443 Personal NT-HT -1.27380* 0.47579 0.0224 Pronouns NT-MT -.84480 0.47579 0.1814 HT-MT 0.42900 0.47579 0.6401 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level

282

Table 5.78 Descriptive statistics for 3rd person pronominal cohesive devices in scientific texts (N = 50) Personal Method PPH1 PPHO1 PPHO2 PPHS1 PPHS2 Pronouns

NT Mean 2.68 0.00 0.24 0.08 0.63 3.63

Std. 1.82 0.00 0.41 0.36 0.73 2.07 Deviation

Median 2.26 0.00 0.00 0.00 0.48 3.20

HT Mean 4.10 0.00 0.20 0.03 0.58 4.90

Std. 2.29 0.00 0.38 0.20 0.79 2.64 Deviation

Median 3.48 0.00 0.00 0.00 0.00 4.62

MT Mean 3.44 0.04 0.37 0.11 0.52 4.48

Std. 2.00 0.13 0.51 0.35 0.76 2.40 Deviation

Median 2.98 0.00 0.00 0.00 0.00 4.08

For 3rd person singular neuter personal pronoun ("it") (PPH1), the Tukey's HSD post hoc test revealed that their use was significantly lower usage for scientific non- translated texts (2.68 ± 1.82) than for human-translated scientific articles (4.10 ± 2.29, p

= .0019) (see Tables 5.77 and 5.78). No statistically significant differences were found between non-translated texts and machine-translated texts (P = .1531), or between human-translated and machine-translated texts (P = .2443). These results are visualized in Fig. 5.33.

283

Fig. 5.33 Means and standard error (± 1 SE) for 3rd person singular neuter pronominal cohesive device ("it") in scientific texts

These findings might point to Toury's law of interference, where the influence of

Russian is reflected in the translations. Some Russian scientific writers favor subjectless passive and introductory constructions (such as обнаружено ('it was found'), требуется

('it is required'), or очевидно ('it is obvious'), which might get transferred into English in translation. The following example illustrates this tendency.

Russian source text: При этом важно обнаружить в кристаллах области

низких внутренних напряжений и их упрочнить. Оказалось, что эту задачу удается

решить, если при отжиге материала или конструкции производить изменение

нагрузки.

284

Human translation: As this takes place, it is important to detect in crystals regions of low internal stresses and to strengthen them. It turned out that this problem can be solved by changing the load upon annealing of the material or the construction

Machine translation: It is important to detect the crystals at low internal stresses and to cement them. It turned out that this problem can be solved if the annealing of the material or design changes to make the load.

(From Programmed Strengthening of Crystalline Materials Using Copper and

Aluminum as an Example by I. M. Neklyudov et al., Fizika metallov i metallovedenie/The

Physics of Metals and Metallography)

In this example, two Russian subjectless constructions are rendered into English using the pronoun "it" in both human and machine translations. Such constructions are not uncommon in non-translated scientific writing; however, it is possible that they are not used as frequently. An example of such a construction is presented below:

It has already been reported that nanostructures can enhance the sensitivity of a biosensor by one to two orders of magnitude, due to the large surface area per unit volume ratio, which allows the immobilization of a larger amount of the enzyme.

(From Fabrication and Characterization of ZnO Nanowire Arrays with an

Investigation into Electrochemical Sensing Capabilities by Jessica Weber et al., Journal of Nanomaterials)

285

Based on these findings, it might be recommended that translators of scientific

articles try to reduce the number of constructions with "it" when translating or post-

editing and, instead, to use the corresponding noun.

For 3rd person singular objective personal pronouns ("him," "her") (PPHO1), the

usage in scientific texts was found to be very low. Because of high degree of sparsity

(most of the values for individual texts were 0) of the 3rd person singular pronoun

variable, it was decided not to analyze it. High sparsity of these pronouns is expected for texts of this genre because they do not typically discuss human beings. For this reason, this category of pronouns does not seem critical for the work of scientific translators.

Finally, for the total number of 3rd person cohesive pronominal devices, the mean for non-translated scientific texts (3.63 ± 2.07) was significantly lower than for human- translated scientific texts (4.90 ± 2.64, p = .0224) (see Tables 5.77 and 5.78). No statistically significant differences were detected for the non-translated—machine- translated and human-translated—machine-translated pairs. The bar chart in Fig. 5.34 represents these differences visually.

286

Fig. 5.34 Means and standard error (± 1 SE) for the total of 3rd person pronominal cohesive devices in scientific texts

The findings for the total numbers of 3rd person pronominal cohesive devices are

parallel to the findings for the singular neuter pronoun "it." Comparing the information in

the graphs in Fig. 5.34 and 5.35 shows that the findings for the total number of 3rd person

cohesive pronouns were most likely due to the differences found for the pronoun "it."

This supports benefits of a granular approach to analyzing cohesion. The discussion of

the pronoun "it" is found above.

5.4.1.1.2 Pronominal cohesive devices represented by possessive pronouns

This section covers the usage in scientific texts of all possessive pronouns, because they

are tagged by the CLAWS software with one tag—APPGE. The CLAWS software does

not differentiate between first-, second-, and third-person possessive pronouns. As

287

discussed in Chapter 2, 1st and 2nd person pronouns are often exophoric, referring to the

world outside of the text (e.g., to the author or readers). For this reason, not all of them

can be classified as pure cohesive devices as per Halliday and Hasan, who understand

cohesion as a phenomenon within a text. The category of possessive pronouns is included

in this study since it is expected to show statistically significant differences due to the

fact that English and Russian have different grammatical rules for the use of these

pronouns.

As reported in Chapter 4, scientific texts resort to the use of such pronouns

relatively infrequently. Still, significant differences in possessive pronouns in scientific

texts may be interesting for linguistics.

For possessive pronouns in scientific texts, one-way ANOVA indicated statistically significant differences (F(2,147) = 3.985, p = .0206) (see Table 5.79).

Table 5.79 Association of possessive pronouns with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum Groups df df of Squares APPGE 588.212 30.255 2 149 3.985 0.0206* Note: * indicates p-values significant at 0.05 alpha-level A Tukey's HSD post hoc test revealed that the use of possessive pronouns in machine-translated literary texts (2.53 ± 1.42) is statistically significantly lower than in non-translated texts (3.62 ± 2.48, p = .0166) (see Tables 5.80 and 5.81). No statistically significant difference was found for the other two pairs of conditions. Fig. 5.35

288

graphically shows means and standard errors for possessive pronouns in texts produced by the different methods.

Table 5.80 Pairwise comparisons of possessive pronouns (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means APPGE NT-HT .69880 .38965 0.1753 NT-MT 1.08520* .38965 0.0166 HT-MT .38640 .38965 0.5833 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level

Table 5.81 Descriptive statistics for possessive pronouns in scientific texts (N = 50) Method Mean Std. Deviation Median

NT 3.62 2.48 2.88

HT 2.92 1.79 2.62

MT 2.53 1.42 2.66

289

Fig. 5.35 Means and standard error (± 1 SE) for possessive pronouns in scientific texts

Even though no statistically significant differences were found between human-

translated and machine-translated texts, the machine-translated texts displayed a slightly

lower mean for possessive pronouns across the three groups. This is similar to the results

found for the literary and newspaper texts, and might support the idea that MT tools

avoid introducing new possessives into translations to avoid errors related to incorrect

interpretations of cohesive ties in a text. Since the total numbers of possessive pronouns

in scientific texts is relatively small, no further discussion is provided.

5.4.1.1.3 Demonstratives

Demonstratives are represented by two types:

− DD1—singular determiners ("this," "that," "another")

290

− DD2—plural determiners ("these," "those")

For the scientific corpus, statistically significant differences between groups were

found for singular determiners (F(2,147) = 12.777, p < .0001) (see Table 5.82). No

statistically significant difference was found for plural determiners (p > .05).

Table 5.82 Association of demonstrative cohesive devices with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA Dependent Total Sum Between Groups Between Total F Sig. Variable of Squares Sum of Squares Groups df df DD1 1399.233 207.218 2 149 12.777 < 0.0001* DD2 347.403 6.440 2 149 1.388 0.2528 Note: * indicates p-values significant at 0.05 alpha-level A Tukey's HSD post hoc test revealed that the use of singular determiners in the non-translated scientific articles (8.40 ± 3.63) was significantly higher than in both human-translated (6.85 ± 2.58, p = .0196) and machine-translated scientific articles (5.53

± 2.11, p < .0001) (see Tables 5.83 and 5.84). No statistically significant difference was found for human and machine translations. The descriptive statistics for the use of demonstratives in the scientific corpus are presented in Table 5.93. Fig. 5.36 graphically shows means and standard errors for singular demonstratives in scientific texts produced by the different methods.

Table 5.83 Pairwise comparisons of singular determiners (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means DD1 NT-HT 1.55200* 0.56952 0.0196

291

NT-MT 2.87600* 0.56952 < 0.0001 HT-MT 1.32400 0.56952 0.0555 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.84 Descriptive statistics for demonstrative cohesive devices in scientific texts (N = 50) Method DD1 DD2

NT Mean 8.40 2.35

Std. Deviation 3.63 1.76

Median 8.34 2.16

HT Mean 6.85 2.26

Std. Deviation 2.58 1.42

Median 6.40 2.26

MT Mean 5.53 1.87

Std. Deviation 2.11 1.36

Median 5.14 1.66

292

Fig. 5.36 Means and standard error (± 1 SE) for singular demonstratives in scientific texts

These findings are different from the literary texts, where human-translated texts were characterized by a higher number of singular determiners than non-translated and machine-translated texts. This might indicate that human literary translators make use of determiners, and might support the translation universal of explicitation. For newspaper texts, no statistically significant differences for singular and plural determiners were found.

The findings for the scientific corpus suggest that scientific translators and editors of MT output should use demonstratives more often. The example below illustrates how a singular determiner may be introduced in translation.

Russian source text: Используемый в работе сплав Х20Н80 имеет следующий

химический состав… .

293

Human translation: The Kh20N80 alloy used in this work has the following

chemical composition… .

Machine translation: As used in the fusion of Cr20Ni80 has the following

chemical composition… .

(From Short-Range Ordering and the Abnormal Mechanical Properties of a Ni-

20% Cr Alloy by N. R. Dudova et al., Fizika metallov i metallovedenie/The Physics of

Metals and Metallography)

In this example, the Russian phrase "в работе" ("in the paper/work") is rendered

by a human translator with the help of a determiner—"in this work." The MT tool had

difficulties interpreting the meaning of the phrase "используемый в работе сплав" ("the alloy used in the paper/work"), and rendered it as "as used in the fusion of." The MT tool did not introduce a determiner.

It should be noted that the averages for the singular determiners in the scientific corpus, while significantly different for non-translated—human-translated and non- translated—machine-translated pairs, seem rather close to each other to the naked eye— the rounded mean for non-translated texts is 8, for human translations—7, and for machine translations—6. Examining individual cases of the singular determiners in scientific translations is likely to point to the use of это ('this') or данный ('present,' as in

"the present paper"), as happens in the following examples.

294

Russian source text: Оказалось, что эту задачу удается решить, если при

отжиге материала или конструкции производить изменение нагрузки.

Использование данного метода упрочнения связано с трудностями определения

технологии нагружения.

Human translation: It turned out that this problem can be solved by changing

the load upon annealing of the material or the construction. The employment of this

method of strengthening is connected with difficulties of determining the precise details

of the loading process.

Machine translation: It turned out that this problem can be solved if the

annealing of the material or design changes to make the load. Using this method of hardening due to the difficulties in determining the loading technology.

(From Programmed Strengthening of Crystalline Materials Using Copper and

Aluminum as an Example by I. M. Neklyudov et al., Fizika metallov i metallovedenie/The

Physics of Metals and Metallography)

5.4.1.1.4 Definite article "the" as cohesive device

As with the literary and newspaper corpora, the scientific corpus displayed a statistically

significant difference between groups for the use of definite article "the" as determined

by one-way ANOVA (F(2,147) = 36.654, p < .0001) (see Table 5.85).

A Tukey's HSD post hoc test revealed that non-translated scientific texts

contained a significantly lower number of definite articles (80.28 ± 14.80) than did

295

machine-translated scientific texts (107.29 ± 15.26, p < .0001) and human-translated

scientific texts (101.99 ± 19.67, p < .0001) (see Tables 5.86 and 5.87). There was no

significance difference in the usage of "the" between human translations and machine

translations. The graphical representation of means and standard errors for the use of the definite article "the" across different groups of scientific texts is given in Fig. 5.37.

Table 5.85 Association of definite article "the" with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA Dependent Total Sum Between Groups Between Total df F Sig. Variable of Squares Sum of Squares Groups df THE 61583.337 20492.090 2 149 36.654 < 0.0001* Note: * indicates p-values significant at 0.05 alpha-level Table 5.86 Pairwise comparisons of definite article "the" (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means THE NT-HT -21.71480* 3.34384 <0 .0001 NT-MT -27.01640* 3.34384 < 0.0001 HT-MT -5.30160 3.34384 0.2551 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.87 Descriptive statistics for the use of the definite article "the" in scientific texts (N = 50) Method Mean Std. Deviation Median

NT 80.28 14.80 78.33

HT 101.99 19.67 99.38

MT 107.29 15.26 106.02

296

Fig. 5.37 Means and standard error (± 1 SE) for definite article "the" in scientific texts

These findings are quite interesting because the Russian language has no articles.

They need to be introduced in English by translators based on their interpretation of the

source text. The lower number of definite articles in non-translated scientific texts might indicate that English scientific writers may resort to other determiners more frequently. In addition, they may feel more confident leaving certain English nouns without any determiners. A higher usage of definite articles by human translators may be viewed as evidence of normalization as a translation universal (when translators overuse a common feature of the target language, especially when it does not exist in the source language).

As mentioned earlier, MT tools may be programmed to use the definite article as

a noun modifier since its use is more likely to be accurate than the use of other

determiners (e.g., possessive pronouns or demonstratives). Editors of MT output might

297

keep this in mind and reduce the number of definite articles in MT output for scientific

texts by replacing them with demonstratives, zero articles, or other grammatical means.

This might improve the cohesiveness of machine-translated texts. This advice may apply

to scientific human translators as well.

The following example attempts to illustrate how it may be done. The suggested

edit reduces the number of definite articles and constructions with "of" found in the

human and machine translation.

Russian source text: Исследование микропластичности кристаллов

позволяет выяснить природу источников и закономерности образования скоплений

дислокаций, релаксации напряжений в результате взаимодействия друг с другом и

переход их в более выгодное энергетическое состояние.

Human translation: Investigations of the microplasticity of crystals make it

possible to clarify the nature of sources and laws of the formation of dislocation pile-ups,

stress relaxation as a result of their interaction with each other, and their transition into a more advantageous energetic state.

Machine translation: The study microplasticity crystals allows us to determine

the nature and sources of the formation of clusters of dislocations, stress relaxation due to

interaction with each other and their transition into a more favorable energy state.

298

Suggested edit: Studies in crystal microplasticity enable researchers to discover sources and laws for the formation of dislocation clusters, stress relaxation resulting from their interaction, and their transition to a more favorable energy state.

(From Programmed Strengthening of Crystalline Materials Using Copper and

Aluminum as an Example by I. M. Neklyudov et al., Fizika metallov i metallovedenie/The

Physics of Metals and Metallography)

5.4.1.1.5 Comparative cohesive devices

Comparative cohesive devices are represented by the following groups of comparative adjectives and adverbs:

− JJR—general comparative adjective (e.g., "older," "better," "stronger")

− JJT—general superlative adjective (e.g., "oldest," "best," "strongest")

− RGR—comparative degree adverb ("more," "less")

− RGT—superlative degree adverb ("most," "least")

− Total of all comparative cohesive devices [JJR+JJT+RGR+RGT]

For the scientific corpus, one-way ANOVA showed statistically significant

differences between non-translated, human-translated, and machine-translated articles for

general superlative adjectives and for total comparatives (see Table 5.88).

299

Table 5.88 Association of comparative cohesive devices with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA Dependent Total Sum Between Between Total df F Sig. Variable of Squares Groups Sum Groups df of Squares JJR 555.776 16.213 2 149 2.209 0.1135 JJT 79.902 4.955 2 149 4.860 0.0090* RGR 109.904 4.584 2 149 3.199 0.0436 RGT 47.233 1.840 2 149 2.979 0.0539 All Comp. 970.481 93.546 2 149 7.841 0.0006* Note: * indicates p-values significant at 0.05 alpha-level For general superlative adjectives, Tukey's HSD post hoc test revealed that non- translated scientific articles contained a significantly higher number of superlative adjectives (0.65 ± 0.93) than human-translated (0.26 ± 0.62, p = .0195) and machine-

translated scientific articles (0.27 ± 0.54, p = .0227) (see Tables 5.89 and 5.90). These

means are low, and, when rounded, result in 1 superlative per 1,000 words in non-

translated texts and 0 per 1,000 words in both human- and machine-translated texts. This

suggests that scientific writers resort to superlatives rather infrequently, possibly due to

the fact that superlatives tend to represent extreme degrees of comparison, which are rare

in scientific analysis.

Table 5.89 Pairwise comparisons of comparative cohesive devices (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means JJT NT-HT .38940* .14281 0.0195 NT-MT .38160* .14281 0.0227

300

HT-MT -.00780 .14281 0.9984 All Comparatives NT-HT 1.60700* .48849 0.0036 NT-MT 1.73600* .48849 0.0015 HT-MT .12900 .48849 0.9623 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level

Table 5.90 Descriptive statistics for comparative devices in scientific texts (N = 50) Method JJR JJT RGR RGT Comparative Devices

NT Mean 2.89 0.65 0.96 0.49 4.99

Std. 2.51 0.93 1.02 0.80 3.23 Deviation

Median 2.17 0.49 0.95 0.22 4.43

HT Mean 2.29 0.26 0.57 0.26 3.38

Std. 1.57 0.62 0.76 0.39 1.86 Deviation

Median 2.18 0.00 0.23 0.00 2.98

MT Mean 2.12 0.27 0.61 0.26 3.25

Std. 1.49 0.54 0.73 0.38 2.01 Deviation

Median 1.63 0.00 0.46 0.00 2.79

For the total of all comparative devices, Tukey's HSD post hoc test revealed that non-translated scientific articles contained a significantly higher number of comparative devices (4.99 ± 3.23) than human translations (3.38 ± 1.86, p = .0036) and machine translations (3.25 ± 2.01, p = .0015) (see Tables 5.89 and 5.90). These results are graphically presented in Fig. 5.38.

301

Fig. 5.38 Means and standard error (± 1 SE) for comparative devices in scientific texts

The human and machine translations exhibited no statistically significant

differences in the use of comparative devices, which may indicate that human translators

and MT tools tend not to change comparatives in translation. The finding that non-

translated scientific articles contained a statistically significantly higher number of

comparatives—5 per 1,000 words compared to 3 per 1,000 words for human and machine

translations—is interesting, and may suggest differences in styles for English and Russian

scientific writing.

5.4.1.1.5 Total number of reference cohesive devices

For the scientific corpus, the total number of reference cohesive devices was found to be

statistically significantly different between groups as determined by one-way ANOVA

(F(2,147) = 24.886, p < .0001) (see Table 5.91).

302

Table 5.91 Association of the total number of reference cohesive devices with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA Dependent Total Sum Between Groups Between Total df F Sig. Variable of Squares Sum of Squares Groups df Reference 55340.548 13998.070 2 149 24.886 < 0.0001* Devices Note: * indicates p-values significant at 0.05 alpha-level A Tukey's HSD post hoc test revealed non-translated scientific articles contained a significantly lower number of reference cohesive devices (103.26 ± 16.88) than human translations (122.31 ± 18.58, p < .0001) and machine translations (124.95 ± 14.61, p <

.0001) (see Tables 5.92 and 5.93). The descriptive statistics for the total number of reference cohesive devices are presented in Table 5.54. Figure 5.39 presents these findings graphically.

Table 5.92 Pairwise comparisons of the total number of reference cohesive devices (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means Reference NT-HT -19.04240* 3.35405 < 0.0001 Devices NT-MT -21.68600* 3.35405 < 0.0001 HT-MT -2.64360 3.35405 0.7109 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.93 Descriptive statistics for the total number of reference cohesive devices in scientific texts (N = 50) Method Mean Std. Deviation Median NT 103.26 16.88 102.69 HT 122.31 18.58 122.33 MT 124.95 14.61 122.64

303

Fig. 5.39 Means and standard error (± 1 SE) for the total number of reference cohesive devices in scientific texts

These findings are similar to the findings for the newspaper corpora, where non- translated texts also displayed a significantly lower number of reference cohesive devices. For literary texts, the findings were different—the machine-translated literary texts displayed a significantly lower number of reference cohesive devices compared to the other two groups. The lower number of reference cohesive devices in non-translated scientific and newspaper articles might indicate that the genre conventions for English

scientific and newspaper writing are different from those for English literary texts. This

conjecture, however, does not seem to be reflected in human-translated texts. It is

important to note that this variable represents a sum of all reference cohesive devices

studied in this project, and a clearer picture is seen when we analyzed each category of

reference cohesive devices separately.

304

5.4.1.2 Conjunction cohesive devices in the scientific corpus

This section presents data and analysis for the four types of conjunction cohesive devices, and then looks at the total number of such devices across the groups of texts in the scientific corpus. Below is the list of conjunctive devices included in this study:

− Additive devices (e.g., "and," "or")—tagged as CC

− Adversative devices (e.g., "but")—tagged as CCB

− Causal and continuative devices represented by subordinating conjunctions (e.g.,

"if," "because," "unless," "so," "for")—tagged as CS

− Temporal devices (e.g., "now," "tomorrow")—tagged as RT

− Total conjunction cohesive devices [CC+CCB+CS+RT]

For scientific texts, one-way ANOVA revealed statistically significant differences in additive devices (F(2,147) = 6.492, p = .0020), subordinating conjunctions (causal and continuative devices) (F(2,147) = 13.834, p < .0001), temporal devices (F(2,147) =

3.279, p = .0404), and the total number of conjunction cohesive devices (F(2,147) =

16.749, p < .0001) (see Table 5.94). No statistically significant differences were found for adversative devices (p = .0854), which is different from the result for literary and newspaper texts. Additive devices and the total number of conjunction devices were found to be significantly different in all three genres.

305

Table 5.94 Association of conjunction cohesive devices with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA Dependent Total Sum Between Groups Between Total F Sig. Variable of Squares Sum of Squares Groups df df CC 6757.255 548.384 2 149 6.492 0.0020* CCB 139.016 4.576 2 149 2.502 0.0854 CS 1415.837 224.279 2 149 13.834 < 0.0001* RT 298.731 12.757 2 149 3.279 0.0404* Total Conj. 9365.541 1738.080 2 149 16.749 < 0.0001* Note: * indicates p-values significant at 0.05 alpha-level

For additive cohesive devices in scientific texts, Tukey's HSD post hoc test revealed that non-translated scientific texts contained a statistically significantly higher number of additive cohesive devices (26.71 ± 7.60) than machine-translated texts (22.03

± 5.56, p = .0013) (see Tables 5.95 and 5.96). There was no statistically significant difference in additive devices for non-translated texts compared to human-translated texts and for human- vs. machine-translated texts. These results are presented in Fig. 5.40.

Table 5.95 Pairwise comparisons of conjunction cohesive devices (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means CC NT-HT 2.56180 1.29980 0.1231 NT-MT 4.67640* 1.29980 0.0013 HT-MT 2.11460 1.29980 0.2376 CS NT-HT 2.54900* .56942 < 0.0001 NT-MT 2.63660* .56942 < 0.0001 HT-MT .08760 .56942 0.9870 RT NT-HT .61100 .27896 0.0762

306

NT-MT .62600 .27896 0.0673 HT-MT .01500 .27896 0.9984 Total NT-HT 6.10580* 1.44066 0.0001 Conjunctions NT-MT 7.97040* 1.44066 < 0.0001 HT-MT 1.86460 1.44066 0.4007 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.96 Descriptive statistics for conjunction cohesive devices in scientific texts (N = 50) Method CC CCB CS RT Conjunctive Devices NT Mean 26.71 0.95 7.25 1.66 36.57 Std. 7.60 1.10 3.32 1.49 8.00 Deviation Median 24.86 0.49 6.72 1.38 35.94 HT Mean 24.15 0.56 4.71 1.05 30.46 Std. 6.16 0.81 2.62 1.26 7.30 Deviation Median 24.24 0.00 5.12 0.90 30.04 MT Mean 22.03 0.91 4.62 1.04 28.60 Std. 5.56 0.93 2.54 1.43 6.19 Deviation Median 21.84 0.84 4.02 0.78 28.23

307

Fig. 5.40 Means and standard error (± 1 SE) for additive conjunction devices in scientific texts

These results are different from the data for literary texts, where machine- translated texts contained significantly higher number of additive cohesive devices. This, again, might point to genre differences across literary and scientific texts in English, as well as to such differences in those texts written in Russian. A lower number of additive devices in translated scientific texts may also point at the universal of simplification in translation. Further research is needed to investigate these conjectures.

For causal and continuative devices in scientific texts, Tukey's HSD post hoc test revealed that non-translated scientific texts contained a significantly higher number of such cohesive devices (7.25 ± 3.32) than human-translated texts (4.71 ± 2.62, p < .0001) and machine-translated texts (4.62 ± 2.54, p < .0001) (see Tables 5.95 and 5.96). There

308

was no statistically significant difference in causal and continuative devices for

translations performed by humans vs. machines. These results are presented graphically

in Fig. 5.41.

Fig. 5.41 Means and standard error (± 1 SE) for causal and continuative conjunction devices in scientific texts

If Toury's law of interference is at work, these findings may reflect differences in

how causal and continuative conjunctive cohesion is expressed in Russian and English. It

seems that non-translated texts rely on this type of cohesion more often than translated

texts. Since human and machine translations were found to be similar in their use of

causal and continuative devices, it is not likely that the translation universal of

simplification is at play (since translation universals are, most likely, products of the

309

human brain). However, further research on translation universals in human and machine

translations is clearly needed.

For temporal conjunction devices, Tukey's post hoc test revealed no statistically

significant differences for pairs.

For the total of all conjunction devices included into this study, Tukey's HSD post

hoc test revealed that non-translated scientific articles contained a statistically

significantly higher number of these devices (36.57 ± 8.00) than human translations

(30.46 ± 7.30, p = .0001) and machine translations (28.60 ± 6.19, p < .0001). No

statistically significant difference was found between human and machine translations (p

= .4007) (see Tables 5.95 and 5.96). These results are visually presented in Fig. 5.42.

Fig. 5.42 Means and standard error (± 1 SE) for the sum of conjunction devices in scientific texts

310

This result is not surprising since the means for additive and subordinating (causal and continuative) conjunctive devices have been found to be significantly higher in non-

translated scientific texts. As mentioned in the discussions of the use of conjunction

devices in literary and newspaper texts, this might suggest differences in the use of

conjunction in English and Russian scientific writing, with Russian relying more on other

means of cohesion or being more implicit in terms of cohesive links.

Also, this finding may support the translation universal of simplification if further

research finds that constructions with conjunctions are simplified in translation. This,

however, requires further research, since it is questionable if translation universals work

similarly in both human and machine translations. Thus, the finding that machine

translations use conjunction devices with approximately the same frequency as human

translations suggests that the differences in means for non-translated texts and

translations are more likely to be due to Toury's law of interference.

In terms of practical applications, the findings for additive and subordinating

conjunctions, as well as for the sum of the conjunction devices covered in this research,

suggest that human translators and editors of MT output may want to introduce more

explicit conjunctive cohesive ties to their final products. Further research in readability

for non-translated and Russian-into-English translated scientific articles is required to

determine whether the differences in the number of conjunctions across such texts

influence readability.

311

5.4.1.3 Reference and conjunction cohesive devices in the scientific corpus

For the total number of reference and conjunction cohesive devices included in this study,

one-way ANOVA indicated statistically significant difference (F(2,147) = 10.562, p =

.0001) for scientific texts (see Table 5.97). As noted earlier, this mean does not represent

the total number of all cohesive devices in texts—the study does not include cohesive

devices that are best studied by manual analysis (e.g., lexical cohesion). While the

finding that this parameter displays statistically significant differences is interesting,

individual statistical tests for concrete types of cohesive devices are more informative.

Table 5.97 Association of reference and conjunction cohesive devices with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA Dependent Total Sum Between Groups Between Total F Sig. Variable of Squares Sum of Squares Groups df df Reference & Conjunction 47230.275 5934.504 2 149 10.562 0.0001* Devices Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD post hoc test revealed that non-translated scientific articles contained a significantly lower number of reference and conjunction cohesive devices

(139.83 ± 17.53) than human translations (152.77 ± 17.91, p = .0005) and machine translations (153.55 ± 14.65, p = .0002) (see Tables 5.98 and 5.99). There was no statistically significant difference in the total for reference and conjunction devices between human and machine translations. These results are presented in a bar graph in

Fig. 5.43.

312

Table 5.98 Pairwise comparisons of reference and conjunction cohesive devices (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means Total Reference & NT-HT -12.93500* 3.35216 0.0005 Conj. Devices NT-MT -13.71660* 3.35216 0.0002 HT-MT -.78160 3.35216 0.9705 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.99 Descriptive statistics for total reference and conjunction cohesive devices in scientific texts (N = 50) Method Mean Std. Deviation Median NT 139.83 17.53 140.77 HT 152.77 17.91 152.65 MT 153.55 14.65 151.32 Fig. 5.43 Means and standard error (± 1 SE) for total reference and conjunction cohesive devices in scientific texts

313

Looking at the granular analysis, these results for the sum of reference and

conjunction devices may be explained by the differences for the total of reference

cohesive devices. These findings are similar to the results for literary texts, where MT

texts displayed a significantly higher number of reference and conjunction devices than

non-translated texts. However, for literary texts, no statistically significant difference was found for the comparison of non-translated and human-translated texts. In the case of

scientific texts, non-translated texts contained a statistically significantly lower number of

reference and conjunction devices than both human and machine translations. This may

be related to the fact that literary translators pay more attention to stylistic features of

texts. Scientific translators, on the other hand, may be more interested in an accurate

account of the source text than in the target text's stylistic features. Still, this is only a

conjecture, and further research is needed.

5.4.2 Global textual features in non-translated, human-translated, and machine-

translated texts in the scientific corpus

The study covers six features that characterize texts globally—nominalization, lexical density (as standardized type-token ratio), average word length, average sentence length, passives, and prepositional phrases—in the corpus of scientific texts.

5.4.2.1 Nominalization

As determined by one-way ANOVA, there was no statistically significant difference in nominalization between groups for the scientific corpus (F(2,147) = 1.140, p = .3227)

314

(see Table 4.100). No Tukey's HSD post hoc test was performed. The descriptive

statistics for nominalization in scientific texts is presented in Table 5.101.

Table 5.100 Association of nominalization with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum groups df df of Squares Nominalization 18778.431 286.742 2 149 1.140 0.3227 Note: * indicates p-values significant at 0.05 alpha-level Table 5.101 Descriptive statistics for nominalization in newspaper texts (N = 50) Method Mean Std. Deviation Median

NT 37.00 10.81 35.83

HT 40.34 10.68 41.19

MT 39.13 12.10 37.34

While no statistically significant differences were found, it is interesting to note

that the number of nominalizations in non-translated scientific articles is slightly lower

(37 per 1,000 words) than for human-translated (40 per 1,000 words) and machine-

translated scientific articles (39 per 1,000 words). In general, Russian seems to favor

nominalization more than English does, and it may be reflected in the slightly lower

number of nominalizations in non-translated texts (if Toury's law of interference is

assumed). However, the fact that no statistically significant differences were found may

suggest that English and Russian scientific texts use nominalization similarly due to

specificities of the scientific genre. As genre comparison for nominalization showed,

315

scientific texts in English favor nominalization regardless of the method of text

production (see Chapter 4).

5.4.2.2 Lexical density

As a measure of lexical density, this study used standardized type-token ratio (STTR)

calculated by WordSmith Tools. Type-token ratio (TTR) is a measure of vocabulary

variation within a written text. Standardized TTR is a measure designed by WordSmith

Tools to adjust for differences in individual texts' lengths.

As determined by one-way ANOVA, there was a statistically significant

difference between groups (F(2,147) = 3.877, p = .0229) (see Table 5.102).

Table 5.102 Association of STTR with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA Dependent Total Sum of Between Between Total F Sig. Variable Squares Groups Sum Groups df df of Squares STTR 2087.105 753.086 2 149 3.877 0.0229* Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD post hoc test revealed that non-translated scientific texts are characterized by a statistically significantly higher STTR (65.27 ± 2.88) than human- translated (60.78 ± 3.47, p < .0001) and machine-translated texts (60.28 ± 2.62, p <

.0001) (see Tables 5.103 and 5.104). There was no statistically significant difference in

STTR between human and machine translations. These results are presented in a bar graph in Fig. 5.44.

316

Table 5.103 Pairwise comparisons of STTR (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means STTR NT-HT 4.48340* .60249 < 0.0001 NT-MT 4.98340* .60249 < 0.0001 HT-MT .50000 .60249 0.6851 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.104 Descriptive statistics for STTR in scientific texts (N = 50) Method Mean Std. Deviation Median

NT 65.27 2.88 65.14

HT 60.78 3.47 61.19

MT 60.28 2.62 60.43

Fig. 5.44 Means and standard error (± 1 SE) for STTR in scientific texts

317

These findings mirror the results for the literary and scientific corpora, where

non-translated texts also displayed a significantly higher STTR. As discussed earlier, this is consonant with Laviosa's results (1998a, 1998b). In Laviosa's study, English translated texts were characterized by a narrower range of vocabulary compared to non-translated texts. Laviosa views this as evidence of simplification as a translation universal. Unlike

Laviosa's studies, this study includes machine-translated texts as well, and thus suggests the possibility of expanding the concept of simplification to texts translated by machines.

This study focuses on Russian into English pair only, and thus it would be interesting to include MT into studies performed for other language pairs.

Another possibility for a lower STTR in translations from Russian into English is a likely lower range of vocabulary in the Russian language, as was discussed above.

5.4.2.3 Average word length

For average word length, one-way ANOVA indicated statistically significant differences between groups for the genre of scientific texts (F(2,147) = 5.326, p = .0058) (see Table

5.105).

Table 5.105 Association of average word length with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum of Groups df df Squares Word Length 9.859 .666 2 149 5.326 0.0058* Note: * indicates p-values significant at 0.05 alpha-level

318

Tukey's HSD post hoc test revealed that non-translated scientific texts are characterized by a significantly higher average word length (5.08 ± 0.26) than machine- translated scientific texts (4.92 ± 0.23, p = .0039) (see Tables 5.106 and 5.107). There was no statistically significant difference in average word length between non-translated and human-translated scientific texts, or between human- and machine-translated scientific texts. These results are presented in a bar graph in Fig. 5.45.

Table 5.106 Pairwise comparisons of average word length (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means Word Length NT-HT .08900 .05001 0.1800 NT-MT .16300* .05001 0.0039 HT-MT .07400 .05001 0.3037 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.107 Descriptive statistics for average word length in scientific texts (N = 50) Method Mean Std. Deviation Median

NT 5.08 0.26 5.08

HT 4.99 0.26 4.99

MT 4.92 0.23 4.90

319

Fig. 5.45 Means and standard error (± 1 SE) for average word length in scientific texts

These results differ from the findings for literary and newspaper texts, where no statistically significant differences between groups were found. A higher average word length for non-translated scientific texts compared to machine-translated scientific texts might be viewed as supporting the idea of simplification in translation. However, the means are rather low (see Fig. 5.45). In fact, if we round them to one decimal point, non- translated scientific articles display an average word length of 5.1, human translations—

5.0, and machine translations—4.9. For this reason, it may not be fruitful to discuss further the possible causes for such differences.

320

5.4.2.4 Average sentence length

For average sentence length, there was a statistically significant difference between

groups for scientific texts as determined by one-way ANOVA (F(2,147) = 18.032, p <

.0001) (see Table 5.108).

Table 5.108 Association of average sentence length with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA Dependent Total Sum Between Groups Between Total F Sig. Variable of Squares Sum of Squares Groups df df Sentence 3043.731 599.623 2 149 18.032 < 0.0001* Length Note: * indicates p-values significant at 0.05 alpha-level

Tukey's HSD post hoc test revealed that non-translated scientific texts are characterized by a significantly lower average sentence length (24.27 ± 3.91) than human translations (28.21 ± 4.32, p < .0001) and machine translations (28.76 ± 3.99, p < .0001)

(see Tables 5.109 and 5.110). There was no statistically significant difference in average sentence length between human and machine translations. These results are presented in a bar graph in Fig. 5.46.

Table 5.109 Pairwise comparisons of average sentence length (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means Sentence NT-HT -3.93840* .81551 < 0.0001 Length NT-MT -4.49020* .81551 < 0.0001 HT-MT -.55180 .81551 0.7775 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level

321

Table 5.110 Descriptive statistics for average sentence length in scientific texts (N = 50) Method Mean Std. Deviation Median

NT 24.27 3.91 23.64

HT 28.21 4.32 27.02

MT 28.76 3.99 28.58

Fig. 5.46 Means and standard error (± 1 SE) for average sentence length in scientific texts

These results differ from the findings for literary and newspaper texts, where no statistically significant differences between groups were found. A higher average sentence length for human and machine translations may point to Toury's law of interference, and reflect the situation in Russian scientific texts, in which sentences might be longer. Further research is needed to determine if this is the case.

322

5.4.2.5 Passives

For passives, there was a statistically significant difference between groups for scientific

texts as determined by one-way ANOVA (F(2,147) = 15.438, p < .0001) (see Table

5.111). The means, standard deviations, and medians are presented in Table 5.113.

Table 5.111 Association of passives with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA Dependent Total Sum Between Groups Between Total F Significance Variable of Squares Sum of Squares Groups df df Passives 5261.184 913.257 2 149 15.438 < 0.0001* Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD post hoc test revealed that machine-translated scientific texts are characterized by a statistically significantly lower number of passives (15.39 ± 3.57) than non-translated scientific texts (20.89 ± 7.43, p < .0001) and human-translated scientific

texts (20.30 ± 4.56, p < .0001) (see Tables 5.112 and 5.113). There was no statistically

significant difference in passives between non-translated and human-translated scientific

texts. These results are presented in a bar graph in Fig. 5.47.

Table 5.112 Pairwise comparisons of passives (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means Passives NT-HT .59800 1.08771 0.8468 NT-MT 5.50760* 1.08771 < 0.0001 HT-MT 4.90960* 1.08771 < 0.0001 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level

323

Table 5.113 Descriptive statistics for passives in scientific texts (N = 50) Method Mean Std. Deviation Median

NT 20.89 7.43 18.84

HT 20.30 4.56 20.67

MT 15.39 3.57 15.52

Fig. 5.47 Means and standard error (± 1 SE) for passives in scientific texts

These results are different from the findings for literary and newspaper texts,

where no statistically significant differences between the groups were found. The finding

that non-translated scientific articles and human-translated scientific articles are similar in

their use of passives may suggest similarities between scientific genres in English and

Russian. In addition, it may indicate that human translators produce texts that are close to source texts in terms of their use of passives. The findings that machine-translated texts

324

use fewer passives may suggest that MT tools are either programmed to use fewer passive constructions, or, possibly, have difficulties deciphering Russian passives, thus translating them as non-passive constructions. The following example illustrates this possibility.

Russian source text: Вместе с тем внимание к метану обусловлено той

опасностью, которую он представляет как источник взрывов, пожаров и внезапных

выбросов угля и породы в шахтах.

Human translation: Moreover, attention devoted to methane is caused by its danger as a source of explosions, fires, and sudden coal and rock outbursts in mines.

Machine translation: However, attention to methane due to the danger it poses as a source of explosions, fires and sudden outbursts of coal and rock in the mines.

(From Thermodynamics of a Gas—Coal Massif and a Nonuniform Gas

Distribution in a Coal Bed by A. D. Alekseev et al., Zhurnal tekhnicheskoi fiziki/Thechnical Physics, 2010)

In this example, the Russian short form of the passive perfective verbal adjective

обусловлено is translated by a human translator with a passive form "is caused," but is left out in the MT output, which replaces it with "due to." This may be related to specificities of MT tools. Based on these findings and the example, it may be suggested that editors of MT output watch for instances of passives in the source text and introduce

325

more of them in scientific translations since genre conventions for English scientific writing seem to permit the greater use of passives.

5.4.2.6 Prepositional phrases

This study includes four types of prepositions that may be studied through automated analysis:

− IF—preposition "for"

− II—general preposition (e.g., "in," "after," "at," "by," etc.)

− IO—preposition "of"

− IW—prepositions "with" and "without"

− Total prepositional phrases [IF+II+IO+IW]

The results of one-way ANOVA for the four types of prepositions, as well as for

the sum of these types, are presented in Table 5.114. Statistically significant differences

were found for all prepositional variables with the exception of the prepositions "with"

and "without." For the preposition "for," there was a statistically significant difference

between groups for scientific texts as determined by one-way ANOVA (F(2,147) =

16.749, p < .0001). For general prepositions, one-way ANOVA also established a

statistically significant difference between groups (F(2,147) = 8.177, p = .0004). For the

preposition "of," a statistically significant difference was established with F(2,147) =

115.278, p < .0001. The one-way ANOVA also showed a statistically significant

difference for the total number of prepositional phrases (F(2,147) = 58.336, p < .0001).

326

No statistically significant differences were found for the prepositions "with" and

"without" (p > .05).

Table 5.114 Association of prepositional phrases with the method of text production (NT, HT, and MT) in the corpus of scientific texts by one-way ANOVA Dependent Total Sum Between Between Total F Sig. Variable of Squares Groups Sum Groups df df of Squares IF 2603.886 494.499 2 149 16.749 < 0.0001* II 17403.281 1742.321 2 149 8.177 0.0004* IO 54421.086 33232.492 2 149 115.278 < 0.0001* IW 1887.970 7.727 2 149 .302 0.7398 Total Prep. 71126.717 31472.804 2 149 58.336 < 0.0001* Phrases Note: * indicates p-values significant at 0.05 alpha-level Tukey's HSD post hoc test was performed to determine the pairs of conditions with statistically significant differences. The results of pairwise comparisons by Tukey

HSD testing are presented in Table 5.115. The means, standard deviations, and medians for the three groups are presented in Table 5.116.

Table 5.115 Pairwise comparisons of significant prepositional phrases (NT, HT, and MT) for the corpus of scientific texts by Tukey HSD post hoc testing Dependent Method Difference in Std. Error Sig. Variable Comparison Means IF NT-HT 3.29900* .75762 0.0001 NT-MT 4.23260* .75762 < 0.0001 HT-MT .93360 .75762 0.4361 II NT-HT -7.87340* 2.06434 0.0006 NT-MT -1.53320 2.06434 0.7385

327

HT-MT 6.34020* 2.06434 0.0071 IO NT-HT -21.96020* 2.40117 < 0.0001 NT-MT -36.18500* 2.40117 < 0.0001 HT-MT -14.22480* 2.40117 < 0.0001 Total NT-HT Prepositional -26.99020* 3.28484 < 0.0001 Phrases NT-MT -33.44080* 3.28484 < 0.0001 HT-MT -6.45060 3.28484 0.1249 Notes: * indicates that the mean difference is significant at the 0.05 level; significance values in bold are significant at the 0.05 level Table 5.116 Descriptive statistics for prepositional phrases in scientific texts (N = 50) Method IF II IO IW Prepositions

NT Mean 10.87 69.93 37.10 8.92 126.82

Std. 3.94 9.63 8.42 3.20 13.71 Deviation

Median 10.04 68.91 37.05 8.32 130.33

HT Mean 7.57 77.80 59.06 9.38 153.81

Std. 3.46 10.36 14.50 4.13 18.25 Deviation

Median 7.07 76.40 57.44 8.38 152.64

MT Mean 6.64 71.46 73.29 8.88 160.26

Std. 3.94 10.93 12.30 3.32 16.98 Deviation

Median 6.04 71.19 73.26 8.44 162.56

328

For the preposition "for," Tukey's HSD post hoc test revealed that non-translated

scientific texts had a higher number of prepositional phrases with "for," (10.87 ± 3.94)

than human translations (7.57 ± 3.46, p = .0001) and machine translations (6.64 ± 3.94, p

< .0001) (see Tables 5.115 and 5.116). The difference in the use of prepositional phrases

with "for" between human and machine translations just missed being statistically

significant (P = .0622). These results are presented in a bar graph in Fig. 5.48.

Fig. 5.48 Means and standard error (± 1 SE) for prepositional phrases with "for" in scientific texts

These findings may suggest that the preposition "for" in scientific translations is

underused. Using the preposition "for" in place of the preposition "of," where possible, is

often a good strategy that helps avoid chains of of-phrases in English translations,

329

resulting from Russian genitive noun chains. This strategy may be suggested to both scientific translators and to editors working with MT output.

The following example illustrates this point. In this example, the human translator and the MT tool render the Russian genitive noun chain "сумма упругих энергий всех

таких областей," with genitive nouns "энергий" and "областей," with two prepositions

"of": "the sum of the elastic energies of all such regions" (HT) and "the sum of the elastic energy of all these areas" (MT). In the suggested edit, one of these prepositions is replaced with "for": "the sum of the elastic energies for all such regions."

Russian source text: Затем вычисляем упругую энергию каркаса как сумму

упругих энергий всех таких областей.

Human translation: We then calculate the elastic energy of the frame as the sum of the elastic energies of all such regions.

Machine translation: Then we calculate the elastic energy of the frame as the sum of the elastic energy of all these areas.

Suggested edit: We then calculate the elastic energy of the frame as the sum of the elastic energies for all such regions

(From Thermodynamics of a Gas—Coal Massif and a Nonuniform Gas

Distribution in a Coal Bed by A. D. Alekseev et al., Zhurnal tekhnicheskoi fiziki/Thechnical Physics, 2010)

330

For general prepositions, such as "in," "after," "at," etc., Tukey's HSD post hoc test revealed that human scientific translations had a higher number of prepositional phrases with general prepositions (77.80 ± 10.36) than non-translated (69.93 ± 9.63, p =

.0006) and machine-translated scientific texts (71.46 ± 10.93, p = .0071) (see Tables

5.115 and 5.116). There was no statistically significant difference in the use of general prepositions between non-translated and machine-translated texts (p = .7385). These results are presented in a bar graph in Fig. 5.49.

Fig. 5.49 Means and standard error (± 1 SE) for the use of general prepositions in scientific texts

These findings may be viewed as evidence of explicitation in translation, when human translators interpret meanings of the source text and express them more explicitly in their work. The absence of statistically significant differences between non-translated

331

and machine-translated texts might support this interpretation, since explicitation may be considered as a property of the human mind. In the following example, the human translator re-phrases the source text sentence, introducing the preposition "by" in a passive construction, while the MT tool avoids using passive construction and contains no such preposition. It should be noted that in this case, the MT output requires human editing prior to publication. Interestingly, the human translator in this example also breaks down the original Russian sentence, starting a new sentence in this part of the narration. MT tools tend to preserve original sentence boundaries.

Russian source text: … за данный тип эха ответственны акустические

колебания отдельных частиц, … .

Human translation: The phonon echo is produced by the acoustic oscillations of

individual particles,… .

Machine translation: … for this type of echo responsible acoustic oscillations of

individual particles, … .

(From Phonon Echo in High-Temperature Superconductors as a Nonlinear

Magnetoacoustic Phenomenon by V. Pleshakov et al., Zhurnal tekhnicheskoi

fiziki/Thechnical Physics, 2011)

For the preposition "of," Tukey's HSD post hoc test revealed statistically

significant differences for all pairs with p < .0001. According to the analysis, machine-

translated scientific texts had the highest number of prepositional phrases with "of"

332

(73.29 ± 12.30), significantly different from non-translated scientific texts, which had the

lowest number of such prepositions (37.10 ± 8.42, p < .0001) and from human-translated scientific texts, which were in between the non-translated and machine-translated scientific texts (59.06 ± 14.50, p < .0001). The difference between non-translated and human-translated scientific articles was also significant (p < .0001) (see Tables 5.115 and

5.116). These results are presented in a bar graph in Fig. 5.50.

Fig. 5.50 Means and standard error (± 1 SE) for the use of the preposition "of" in scientific texts

These findings are similar to newspaper texts, where non-translated texts contained a significantly lower number of prepositions "of" compared to human and machine translations (although for literary texts, no statistically significant difference was found between human and machine translations).

333

As discussed in the section on newspaper texts, because Russian is a synthetic

language, it allows for noun chains with genitive meaning with no prepositions. Since

English is an analytical language, genitive noun chains often require a preposition,

typically, the preposition "of." Translators, especially novices, may struggle when

working with long Russian noun chains because the use of multiple prepositions "of" in

English might lead to clumsy constructions.

The finding that human-translated texts contained a statistically significantly

lower number of prepositions "of" than machine-translated texts may suggest that human translators do a better job varying the use of their prepositions. Still, the finding that non- translated texts contained 1.6 times fewer prepositions "of" than human translations suggests that human translators may be recommended to reduce the number of prepositions "of" in their work. The finding that machine-translated scientific texts

contained two times as many prepositions "of" as compared to non-translated scientific texts suggests that post-editors should reduce the number of prepositions "of" in the MT output. The example above, discussing replacing prepositions "of" with "for," when possible, illustrates this point. Below is another example illustrating the overuse of the

preposition "of."

Russian source text: Отклик упругой подсистемы некоторого

порошкообразного материала на воздействие последовательности импульсов

переменного поля носит название радиочастотного, или фононного, эха.

334

Human translation: The response of the elastic subsystem of a powder material

to the action of a train of alternating field pulses is referred to as the rf or phonon echo.

Machine translation: The response of the elastic subsystem of a powdery

material on the impact of a sequence of pulses of the alternating field is known as radio frequency or phonon echo.

Suggested edit: The response of the elastic subsystem in a powder material to a

sequence of alternating field pulses is referred to as the radio-frequency or phonon echo.

(From Phonon Echo in High-Temperature Superconductors as a Nonlinear

Magnetoacoustic Phenomenon by V. Pleshakov et al., Zhurnal tekhnicheskoi

fiziki/Thechnical Physics, 2011)

In this example, the Russian genitive noun chains "упругой подсистемы

некоторого порошкообразного материала" (with the genitive nouns подсистемы and

материала) and "воздействие последовательности импульсов переменного поля"

(with the genitive nouns последовательности, импульсов, and поля) result in the sentence with four prepositions "of" in the human translation, and with five prepositions

"of" in the MT output. For a sentence of 30 words, such frequent use of the preposition

"of" is not stylistically appropriate. In the suggested edit, the number of prepositions "of" is reduced to only two by using other prepositions ("in") and re-phrasing (omitting the

Russian noun воздействие ('impact') which does not change the meaning of the translation).

335

Finally, for the total number of prepositional phrases in scientific texts, Tukey's

HSD post hoc test revealed that non-translated scientific texts are characterized by a significantly lower number of prepositional phrases (126.82 ± 13.71) than human translations (153.81 ± 18.25, p < .0001) and machine translations (160.26 ± 16.98, p <

.0001) (see Tables 5.115 and 5.116). There was no statistically significant difference in the use of prepositional phrases between human translations and machine translations (P

= .1249). These results are presented in a bar graph in Fig. 5.51.

Fig. 5.51 Means and standard error (± 1 SE) for the total number of prepositional phrases in scientific texts

These findings are likely related to the differences found for the use of the preposition "of," possible explanations for which are discussed above. This example is a

336

good illustration of how a granular analysis typically reveals more concrete and thus

informative results than a "lumped" analysis.

5.5 Conclusions

This chapter presents a detailed overview of the data collected for the three corpora of

literary, newspaper, and scientific texts that were produced by different methods—non-

translated, human-translated, and machine-translated. Altogether, thirty-two variables

were analyzed for each corpus. The number of statistically significant results suggests

that non-translated texts differ from human-translated and machine-translated texts

according to a number of variables. Additionally, human and machine translations differ from each other on a number of parameters. These results will be summarized in Chapter

6.

Generally, the results presented in this chapter are reviewed in terms of their

relation to Toury's law of interference, the translation universals of explicitation and

simplification, and the potential implications for the practical work of translators and

post-editors. Chapter 6 will discuss in detail which results may be viewed as supporting

Toury's law of interference and translation universals. In addition, the findings are used to

suggest recommendations for translators and post-editors. Chapter 6 will list these

practical recommendations for translators and post-editors, which may be used in the

industry and in translator training programs.

CHAPTER 6: CONCLUSIONS, RECOMMENDATIONS, AND FUTURE DIRECTIONS

This final chapter briefly summarizes the study’s results, making it easier to navigate the thirty-two cohesive and other global textual features studied across three different genres and for three methods of text production. It discusses implications of these results, and suggests practical recommendations for linguists and trainers based on those results. In addition, it notes limitations of the study and outlines directions for future research of cohesion and other global textual features.

Rapid development within the translation industry is putting new demands on linguists and trainers working in it. To keep up with these demands, translation studies research must expand its boundaries to include new and powerful realities of the industry, such as machine translation, computer-assisted translation tools, post-editing, and more.

In addition, to maximize applicability to situations faced by modern translators, translation studies should continue to widen its research focus and conduct more studies of non-literary genres that are in demand in today's world. To aid work of translators,

(post-)editors, and trainers, global features of texts, such as cohesion and other, should be studied in more depth.

My hope is that this study has fulfilled some of these goals by including two non- literary genres in its scope, by adding machine-translated output to more typically studied

337

338

human translations, and by investigating a wide array of cohesive and other global textual features. This has made it possible not only to contribute to traditional translation studies’ quest for translation universals and laws of translation, but also to create practical recommendations for linguists working in the industry today.

6.1 Summary of results

This study investigates differences in cohesive and other global textual features between non-translated, human-translated, and machine-translated texts. It covers three genres— literary, newspaper, and scientific. Its data suggest that there are statistically significant differences in the use of cohesive and other global textual features in English texts produced by different methods. These results contribute to discussions of translation universals and general laws of translation, and inform the work of translators, (post-) editors, and trainers. This section provides a summary of these results for each genre.

6.1.1 Summary of results for the literary corpus

For the literary corpus, several cohesive features and global textual characteristics displayed statistically significant differences across the three methods of text production.

The results of the statistical analyses are summarized in Tables 6.1 and 6.2. Table 6.1 provides a general, non-numeric picture of the mean scores for each of the thirty-two features studied across three methods of text production. Table 6.2 shows exactly where statistically significant differences were discovered. The detailed account of the data and their analyses were provided in Chapter 5.

339

Table 6.1 Differences in mean scores for cohesive characteristics and global textual features in the literary corpus Method Non- Human- Machine-

translated translated translated Feature Cohesive characteristics 3rd person singular neuter personal Lowest Highest pronouns ("it") 3rd person singular objective Highest Lowest personal pronouns ("him," "her") 3rd person plural objective Highest Lowest personal pronouns ("them") 3rd person singular subjective Highest Lowest personal pronouns ("he," "she") 3rd person plural subjective Highest Lowest personal pronouns ("they") 3rd person pronominal cohesive Highest Lowest devices (Sum) Possessive pronouns (e.g., "his," Highest Lowest "her," "their") Singular demonstrative Highest Lowest determiners (e.g., "this," "that") Plural demonstrative determiners Lowest Highest (e.g., "these," "those") Definite article ("the") Lowest Highest General comparative adjectives Highest Lowest (e.g., "older," "better," "stronger") General superlative adjectives Lowest Highest (e.g., "oldest," "best," "strongest") Comparative degree adverbs (e.g., Lowest Highest

340

"more," "less") Superlative degree adverbs (e.g., Lowest Highest "most," "least") Comparative devices (Sum) Lowest Highest Reference cohesive devices (Sum Highest Lowest of pronominal, demonstrative, and comparative devices) Additive conjunction devices (e.g., Lowest Highest "and," "or") Adversative conjunction devices Lowest Highest (e.g., "but") Causal and continuative Highest Lowest conjunction devices represented by subordinating conjunctions (e.g., "if," "because," "unless," "so," "for") Temporal conjunction devices Lowest Highest (e.g., "now," "tomorrow") Conjunction devices (Sum) Lowest Highest Reference and conjunction Lowest Highest cohesive devices (Sum) Global textual characteristics Nominalization Lowest Highest Lexical density (STTR) Highest Lowest Average word length Lowest Highest Average sentence length Highest Lowest Passives Lowest Highest Preposition "for" Highest Lowest General prepositions (e.g., "in," Lowest Highest "about")

341

Preposition "of" Lowest Highest Prepositions "with" and "without" Lowest Highest Prepositional phrases (Sum) Lowest Highest

Table 6.2 Pairwise comparisons of significant cohesive and global textual characteristics (NT, HT, and MT) by Tukey HSD post hoc testing in the literary corpus Method comparison NT vs. HT NT vs. MT HT vs. MT Feature Cohesive characteristics 3rd person singular neuter personal pronouns ("it") 3rd person singular objective 0.0132* 0.0149* personal pronouns ("him," "her") 3rd person plural objective 0.0325* 0.0019* personal pronouns ("them") 3rd person singular subjective < 0.0001* < 0.0001* 0.0300* personal pronouns ("he," "she") 3rd person plural subjective 0.0166* personal pronouns ("they") 3rd person pronominal cohesive < 0.0001* 0.0012* devices (Sum) Possessive pronouns (e.g., "his," < 0.0001* < 0.0001* "her," "their") Singular demonstrative 0.0025* 0.0002* determiners (e.g., "this," "that") Plural demonstrative determiners (e.g., "these," "those") Definite article ("the") 0.0051* 0.0461*

342

General comparative adjectives (e.g., "older," "better," "stronger") General superlative adjectives (e.g., "oldest," "best," "strongest") Comparative degree adverbs (e.g., "more," "less") Superlative degree adverbs (e.g., "most," "least") Comparative devices (Sum) Reference cohesive devices (Sum < 0.0001* 0.0013* of pronominal, demonstrative, and comparative devices) Additive conjunction devices (e.g., < 0.0001* 0.0051* "and," "or") Adversative conjunction devices 0.0359* (e.g., "but") Causal and continuative conjunction devices represented by subordinating conjunctions (e.g., "if," "because," "unless," "so," "for") Temporal conjunction devices (e.g., "now," "tomorrow") Conjunction devices (Sum) 0.0001* 0.0280* Reference and conjunction < 0.0001* 0.0106* cohesive devices (Sum) Global textual characteristics Nominalization Lexical density (STTR) 0.0151* 0.0004* Average word length

343

Average sentence length Passives Preposition "for" 0.0294* 0.0072* General prepositions (e.g., "in,"

"about") Preposition "of" Prepositions "with" and "without" Prepositional phrases (Sum) * the mean difference is significant at the 0.05 level Summarizing the results from Table 6.2: in the literary corpus, statistically significant differences were found for:

− 3rd person singular objective pronouns ("him," "her")—NT vs. MT and HT vs.

MT

− 3rd person plural objective personal pronouns ("them")—NT vs. HT and HT vs.

MT

− 3rd person singular subjective personal pronouns ("he," "she")—all pairs

− 3rd person plural subjective personal pronouns ("they")—HT vs. MT

− 3rd person pronominal cohesive devices (Sum)—NT vs. MT and HT vs. MT

− Possessive pronouns (e.g., "his," "her," "their")—NT vs. MT and HT vs. MT

− Singular demonstrative determiners (e.g., "this," "that")—NT vs. HT and HT vs.

MT

− Definite article ("the")—NT vs. MT and HT vs. MT

344

− Reference cohesive devices (Sum of pronominal, demonstrative, and comparative

devices)—NT vs. MT and HT vs. MT

− Additive conjunction devices (e.g., "and," "or")—NT vs. MT and HT vs. MT

− Adversative conjunction devices (e.g., "but")—NT vs. HT

− Conjunction devices (Sum)—NT vs. MT and HT vs. MT

− Reference and conjunction cohesive devices (Sum)—NT vs. MT and HT vs. MT

− Lexical density (STTR)—NT vs. HT and NT vs. MT

− Preposition for—NT vs. MT and HT vs. MT

In the literary corpus, statistically significant differences were found in texts produced by different methods for all pronominal categories except the pronoun "it." This may suggest that pronominal cohesive devices should be on the radar screen of translators, editors, post-editors, and translator trainers working in the domain of literature.

These findings may be related to differences in the use of pronominal cohesive devices in English and Russian. First, Russian has grammatical gender, which English lacks, and second, Russian nouns do not grammatically require modifiers (in Russian, pronominal modifiers and other determiners are used with nouns only when the context requires them). This supports Toury's law of interference, according to which certain features of the source text are reflected in the target text.

Possibly, MT tools do not make as many adjustments in the use of pronominal cohesive devices as human translators do. Cohesion is created in a text through a network

345

of cohesive devices that are interpreted by the reader/translator, and such interpretations

may differ across the human brain and MT algorithms.

In fact, the data suggest MT tools may be set up to avoid using pronominal

cohesive devices, preferentially using the definite article instead. As can be seen from

Table 6.1, MT tools (GoogleTranslate, in this case) use the lowest number of all

pronominal categories (statistically significant for some or all pairs of conditions), with

the exception of the pronoun "it," while using the statistically significant highest number

of definite articles compared with texts written or translated by humans. This apparent

design preference of MT tools for the definite article may be related to the fact that it

carries less grammatical information than pronominal cohesive devices. For instance, the

definite article does not distinguish between animate or inanimate objects, between plural

or singular objects, or between male or female animate objects, and thus leaves less room

for a potential error based on misinterpretations of cohesive ties in a Russian source text.

A similar picture is found for the use of demonstratives ("this" or "that"), where

the MT tool uses the lowest number of singular demonstratives, which may also be

related to the tendency of MT tools to use the definite article as a noun modifier.

Interestingly, human-translated literary texts contain the significantly highest

number of singular demonstratives compared with non-translated and machine-translated texts. This may be interpreted as a sign of explicitation in human translation. As mentioned earlier, Blum-Kulka (2001: 300) considers explicitation inherent to the process of translation, stating that "interpretations performed by the translator on the

346

source text might lead to a target language text which is more redundant than the source text," and this redundancy may be expressed by a higher level of cohesive expliciteness in translations.

This may suggest that explicitation is a phenomenon pertinent to human translation only. This is consistent with the history of translation universals—since the earliest discussion of this concept, such universals were linked to explicitations made by human translators, as well as to human translators' tendency to simplify or normalize target texts. MT tools may be set to avoid explicitation since it may introduce additional errors in the MT output. Professional human translators, on the contrary, rarely doubt their understanding of the source text and are thus not afraid to explicitate or simplify texts, subconsciously or consciously.

In the literary corpus, no statistically significant differences were found for the use of comparative devices. This may suggest that their use is similar in English and

Russian, and translators or MT tools do not introduce or reduce their number in target texts. Consequently, it is possible that translators, (post-)editors, and translator trainers need not be as concerned with this category of cohesive ties.

The conclusions outlined above are also reflected in the findings for the total number of reference cohesive devices (which include pronominal devices, the definite article, demonstratives, and comparatives). Machine-translated literary texts were found to use a statistically significantly lower number of reference devices than non-translated and human-translated texts. This has direct implications for editors of MT output, who

347

should be on the lookout for opportunities to introduce reference cohesive devices when

post-editing. This may improve the cohesiveness of post-edited target texts.

For additive and adversative conjunction devices, as well as for the total number

of conjunctive devices, machine-translated literary texts were found to have the highest number of these devices, while non-translated texts exhibited the lowest number (even

though some of these differences were not statistically significant). It might be hypothesized that the higher number of conjunction devices in both human and machine translations may occur due to interference from the Russian language. If so, Russian literary texts may tend to use conjunctive devices more often than English literary texts.

This conjecture requires further comparative research for Russian and English. However, if this turns out to be true, the statistical results above may suggest that human translators attempt to level out such differences by lowering the number of conjunctive cohesive devices in translations, while MT tools tend to keep the conjunctive devices of the source language. This presents an interesting topic for future comparative linguistics and translation studies research.

The greater occurrence of conjunctive cohesive devices may also be viewed as evidence of explicitation in translation. In the case of conjunctive devices, however, the difference concerns both human and machine translations. It is debatable whether MT tools are capable of explicitation, since, as is implicit in the definition of explicitation, it appears to be a phenomenon of the human brain. Further research into this phenomenon and a discussion within the translation studies community seem to be called for.

348

For global textual features, the literary texts revealed statistically significant differences for two features only—lexical density (as STTR) and the use of the preposition "for." Such features as nominalization, passives, average word and sentence length, as well as most of the prepositions included in this study, were not found to be significantly different. This may suggest that the use of such features is similar in the two languages and across different methods of text production, at least for literary texts.

For lexical density, literary non-translated texts displayed a statistically significantly higher STTR than human-translated and machine-translated texts. This finding is in agreement with Laviosa (1998a, 1998b), who reports that English translated texts are characterized by a narrower range of vocabulary than non-translated texts. As

Laviosa suggests, this might be viewed as evidence in support of simplification as a universal of translation.

Laviosa investigated human-translated texts only. The present study includes machine-translated texts as well, and finds that both human-translated and machine- translated literary texts display a significantly lower standardized type-token ratio than literary non-translated texts. On one hand, this may suggest the possibility of expanding the concept of simplification to texts translated by machines, programmed to prefer, when synonyms or near-synonyms exist, the more commonly used word or phrase in the target language. This conjecture calls for further research and discussion. Since this study focuses on the Russian into English pair only, it may also be beneficial to investigate other language pairs.

349

The fact that in this research, these differences were studied and found for both human and machine translations from Russian, might point to a different possible explanation; namely, that Russian literary texts have a lower range of vocabulary than do literary non-translated texts, as discussed earlier. This would be in agreement with the finding that texts translated from Russian into English by humans and machines have statistically significantly lower vocabulary variation than do non-translated texts. This suggests further comparative research.

6.1.2 Summary of results for the newspaper corpus

For the newspaper corpus, several cohesive features and global textual characteristics also displayed statistically significant differences across the three methods of text production. The results of the statistical analyses are summarized in Tables 6.3 and 6.4.

The detailed account of the data and their analyses were provided in Chapter 5.

Table 6.3 Differences in mean scores for the cohesive characteristics and global textual features in the newspaper corpus Method Human- Machine- Non-translated translated translated Feature Cohesive characteristics 3rd person singular neuter Highest Very close to Lowest personal pronouns ("it") MT 3rd person singular objective Lowest Highest personal pronouns ("him," "her")

350

3rd person plural objective Highest Lowest personal pronouns ("them") 3rd person singular subjective Highest Lowest personal pronouns ("he," "she") 3rd person plural subjective Highest Lowest personal pronouns ("they") 3rd person pronominal cohesive Highest Lowest devices (Sum) Possessive pronouns (e.g., Highest Lowest "his," "her," "their") Singular demonstrative Highest Lowest determiners (e.g., "this," "that") Plural demonstrative Highest Lowest determiners (e.g., "these," "those") Definite article ("the") Lowest Highest General comparative adjectives Highest Lowest (e.g., "older," "better," "stronger") General superlative adjectives Highest Lowest Very close to HT (e.g., "oldest," "best," "strongest") Comparative degree adverbs Highest Lowest (e.g., "more," "less") Superlative degree adverbs Lowest Highest (e.g., "most," "least") Comparative devices (Sum) Highest Lowest Reference cohesive devices Lowest Close to MT Highest (Sum of pronominal, demonstrative, and comparative devices)

351

Additive conjunction devices Highest Lowest (e.g., "and," "or") Adversative conjunction Highest Lowest devices (e.g., "but") Causal and continuative Highest Close to MT Lowest conjunction devices represented by subordinating conjunctions (e.g., "if," "because," "unless," "so," "for") Temporal conjunction devices Highest Lowest (e.g., "now," "tomorrow") Conjunction devices (Sum) Highest Lowest Reference and conjunction Highest Lowest cohesive devices (Sum) Global textual characteristics Nominalization Lowest Close to MT Highest Lexical density (STTR) Highest Lowest Average word length Highest Lowest Average sentence length Highest Lowest Passives Highest Lowest Preposition "for" Highest Lowest General prepositions (e.g., "in," Lowest Highest "about") Preposition "of" Lowest Highest Prepositions "with" and Lowest Highest "without" Prepositional phrases (Sum) Lowest Highest

352

Table 6.4 Pairwise comparisons of significant cohesive and global textual characteristics (NT, HT, and MT) by Tukey HSD post hoc testing in the newspaper corpus Method comparison NT vs. HT NT vs. MT HT vs. MT Feature Cohesive characteristics 3rd person singular neuter 0.0266* 0.0260* personal pronouns ("it") 3rd person singular objective personal pronouns ("him," "her") 3rd person plural objective personal pronouns ("them") 3rd person singular subjective personal pronouns ("he," "she") 3rd person plural subjective 0.0054* personal pronouns ("they") 3rd person pronominal cohesive 0.0043* devices (Sum) Possessive pronouns (e.g., "his," 0.0214* "her," "their") Singular demonstrative determiners (e.g., "this," "that") Plural demonstrative determiners (e.g., "these," "those") Definite article ("the") < 0.0001* < 0.0001* General comparative adjectives 0.0097* 0.0246* (e.g., "older," "better," "stronger")

353

General superlative adjectives (e.g., "oldest," "best," "strongest") Comparative degree adverbs (e.g., "more," "less") Superlative degree adverbs (e.g., 0.0428* "most," "least") Comparative devices (Sum) Reference cohesive devices < 0.0001* < 0.0001* (Sum of pronominal, demonstrative, and comparative devices) Additive conjunction devices 0.0024* (e.g., "and," "or") Adversative conjunction devices 0.0002* 0.0299* (e.g., "but") Causal and continuative 0.0499* 0.0425* conjunction devices represented by subordinating conjunctions (e.g., "if," "because," "unless," "so," "for") Temporal conjunction devices (e.g., "now," "tomorrow") Conjunction devices (Sum) < 0.0001* 0.0091* Reference and conjunction < 0.0001* 0.0315 cohesive devices (Sum) Global textual characteristics Nominalization 0.0197* 0.0206* Lexical density (STTR) < 0.0001* < 0.0001* Average word length Average sentence length

354

Passives Preposition "for" General prepositions (e.g., "in,"

"about") Preposition "of" < 0.0001* < 0.0001* Prepositions "with" and

"without" Prepositional phrases (Sum) 0.0001* < 0.0001* * the mean difference is significant at the 0.05 level Summarizing the results from Table 6.4: in the newspaper corpus, statistically significant differences were found for:

− 3rd person singular neuter personal pronouns ("it")—NT vs. HT and NT vs. MT

− 3rd person plural subjective personal pronouns ("they")—NT vs. MT

− 3rd person pronominal cohesive devices (Sum)—NT vs. MT

− Possessive pronouns (e.g., "his," "her," "their")—NT vs. MT

− Definite article ("the")—NT vs. HT and NT vs. MT

− General comparative adjectives (e.g., "older," "better," "stronger")—NT vs. HT

and NT vs. MT

− Superlative degree adverbs (e.g., "most," "least")—NT vs. HT

− Reference cohesive devices (Sum of pronominal, demonstrative, and comparative

devices)—NT vs. HT and NT vs. MT

− Additive conjunction devices (e.g., "and," "or")—NT vs. HT

− Adversative conjunction devices (e.g., "but")—NT vs. HT, NT vs. MT

355

− Causal and continuative conjunction devices represented by subordinating

conjunctions (e.g., "if," "because," "unless," "so," "for")—NT vs. HT and NT vs.

MT

− Conjunction devices (Sum)—NT vs. HT and NT vs. MT

− Reference and conjunction cohesive devices (Sum)—NT vs. HT and NT vs. MT

− Nominalization—NT vs. HT and NT vs. MT

− Lexical density (STTR)—NT vs. HT and NT vs. MT

− Preposition "of"—NT vs. HT and NT vs. MT

− Prepositional phrases (Sum)—NT vs. HT and NT vs. MT

For the newspaper corpus, one of the most notable findings was that none of the thirty-two features displayed statistically significant differences between texts translated by humans and texts translated by the MT tool. For twelve textual features, non-

translated texts were found to be significantly different from both human and machine

translations. (For the literary corpus, the results were quite different—thirteen features displayed statistically significant differences between human and machine translations).

This finding is striking, and suggests that translators working with newspaper

texts create translations that are closer to MT output in terms of the use of cohesive

devices and global textual features. This may be related to the fact that translators working in the newspaper industry have to show a fast turnaround, and their work may not undergo as many rounds of editing as literary translations do. Possibly, conveying the original message quickly takes priority over stylistic and artistic elements of newspaper

356

writing in translations. It should be noted that the newspaper corpus did not include

newspaper translations performed for U.S. nation-wide newspapers, such as The New

York Times, which occasionally (although very rarely) publish articles translated from

Russian; instead, this portion of the corpus was compiled from five Russian newspapers whose articles are regularly translated into English by these periodicals’ English divisions or by websites that translate international news into English.

These findings may also suggest that some of the translators involved may have used MT tools to aid their work. The use of MT tools in translators’ work is an inevitable development that is generally welcomed by the language industry. It is said to decrease cost and increase translators' productivity. However, more research is needed to develop practical guidelines for editors of MT output. This study is a step in this direction.

In terms of individual features, for pronominal cohesive devices, the newspaper corpus displayed statistically significant differences between non-translated and machine- translated texts in the use of the pronouns "they" and "it," for the sum of the 3rd person

pronominal devices, and in the use of possessive pronouns. The use of the pronoun "it"

was found to be significantly different for non-translated vs. human-translated texts and

for non-translated vs. machine-translated texts. For all of these categories, the machine

translations displayed the lowest numbers of the three groups, while the non-translated

texts—the highest. This suggests that editors working with MT output may need to adjust

the use of such devices when editing newspaper texts.

357

Similarly to the literary corpus, non-translated newspaper texts used the

significantly lowest number of definite articles compared with human and machine

translations, while machine translations used definite articles most frequently. As

mentioned earlier, MT tools may be set up to avoid using pronominal cohesive devices,

and prefer the definite article instead. This strategy leaves less room for a potential error that may stem from MT tools' misinterpretations of implicit cohesive ties in a Russian source text.

As discussed earlier, differences in the use of pronominal cohesive devices may be due to Toury's law of interference. Overuse of the definite article in human-translated texts would appear to support Baker's normalization hypothesis, according to which translators strive to conform to the norms of the target language, sometimes overusing features that are typical in it (Baker 1996). However, the finding that machine translation texts overuse definite articles as well may point to a more pragmatic explanation that while the MT tools are set up to use definite articles for noun modifiers as an error- avoidance strategy, human translators of newspaper texts may use them more frequently because they do not think to use personal pronouns when they are absent in the Russian source texts. Again, this points to a direction of future research.

For comparative cohesive devices, statistically significant differences were found for general comparative adjectives (such as "older") and for superlative degree adverbs

(e.g., "most"). For both features, the differences were found between non-translated and human-translated texts. Interestingly, non-translated newspaper texts used the highest

358

number of comparative adjectives, but the lowest number of superlative adverbs. This might suggest that non-translated newspaper articles avoid using superlative adverbs, while relying more on comparative adjectives. However, this result may not be as indicative as other findings, since the means for these variables were relatively low (e.g.,

2.09 for comparative adjectives and 0.42 for superlative adverbs in non-translated texts).

For the sum of reference cohesive devices (which includes pronominal, demonstrative, and comparative devices, as well as the definite article), non-translated newspaper texts displayed a significantly lower mean compared with human and machine translations. This may be due to the high number of definite articles in human- and machine-translated texts, where they seem to be overused.

This result is different from the findings for the literary corpus, where non- translated texts were found to use a significantly higher number of reference cohesive devices compared with human and machine translations. This may point to differences in genre conventions—non-translated literary texts were found to use a significantly higher number of reference devices compared with newspaper and scientific writing. However, this difference was not found in machine-translated texts, which may suggest that MT tools are less sensitive to genre conventions concerning the use of reference devices and so tend to use them similarly across the three genres included in this study. Since this variable represents a sum of all reference cohesive devices, a clearer picture is seen when we look separately at each category within reference cohesive devices.

359

For conjunctive cohesive devices, statistically significant differences were found

for all categories except for temporal conjunctive devices (e.g., "now") (Table 6.4). For adversative devices (e.g., "but"), subordinate conjunction (e.g., "because"), and the sum of all conjunction devices, differences were found to be statistically significant for two pairs—non-translated vs. human-translated texts, and non-translated vs. machine- translated texts. For additive conjunction devices (such as "and"), statistically significant differences were found between non-translated and human-translated texts.

For most of these categories, human-translated texts were found to use the lowest number of conjunction devices, while non-translated texts were found to use the highest.

This may point to the universal of simplification in human translation. Another possibility is that human translators use different means of cohesion to express conjunctive connections.

These differences may also be viewed to support Toury's law of interference. If this law is at work, the results may reflect differences in cohesion expressed with

conjunctive devices in English and Russian. Compared with English newspapers, Russian

newspaper writers may rely more on other, non-conjunctive means of cohesion or leave

some cohesive ties unexpressed, which might get transferred into English translations of

newspaper articles. Further research may pinpoint the causes.

For the sum of reference and conjunction devices, non-translated newspaper texts were found to have the significantly highest number, which may suggest that linguists

360

need to pay more attention to the use of such devices while editing MT output or

translating.

For other global textual features, the newspaper corpus displayed statistically

significant differences in the use of four features—nominalization, lexical density (as

STTR), use of the preposition "of," and the total number of the prepositional phrases

studied. All differences were found to be significant for two pairs—non-translated vs.

human-translated texts and non-translated vs. machine-translated texts.

Nominalization is one of the problematic areas anecdotally reported by Russian into English translators, and these findings confirm it. Russian favors nominalization, and translators often have to find ways to re-cast Russian source text sentences to make them more suitable for English readers. Thus, the finding that both human and machine translations of Russian newspaper editorials contained a significantly higher number of nominalizations may be explained by Toury's law of interference. In the translated newspaper corpus, nominalizations may have "infiltrated" translations from the original

Russian writing. Based on this, it makes sense to suggest that translators and post-editors working from Russian into English in the genre of newspaper editorials consider reducing

the occurrence of nominalization in target texts by using verbal forms or other linguistic

means.

For lexical density, non-translated newspaper texts displayed a significantly higher STTR than human-translated and machine-translated texts. As with the literary

corpus, which had similar findings, this is in agreement with Laviosa's results (1998a,

361

1998b). As Laviosa suggests, this might be viewed as evidence in support of

simplification as a universal of translation.

The fact that in this study, these differences were found for both human and machine translations from Russian may suggest that Russian newspaper texts have a lower range of vocabulary than English texts. Information concerning the vocabulary size in the Russian and English languages was discussed in the results for the literary corpus.

For the preposition "of" and the sum of all prepositions studied, non-translated texts displayed significantly lower means compared with both human and machine translations. These findings are likely related to the influences of the source language, thus supporting Toury's law of interference.

As discussed earlier, Russian allows for noun chains with genitive meaning. In

Russian, which is a synthetic language, such chains do not require prepositions. In

English, on the contrary, genitive noun chains may not be possible without a preposition insofar as English is an analytical language and so lacks means to express genitive relationships with the help of noun endings. In English, such relationships often require the preposition "of." Translators, especially novices, often struggle when rendering long

Russian noun chains into English, ending up with the preposition "of" repeated in close proximity, which leads to clumsy English constructions.

For this reason, it comes as little surprise that human and machine translations from Russian contained a statistically significantly higher number of prepositions "of"

362

than non-translated newspaper writing. For MT, the number was slightly higher than for

HT, although no statistically significant difference was found.

These findings suggest that linguists working with newspaper texts may consider

rephrasing long Russian noun phrases in a way that avoids repeating the preposition "of"

frequently. According to the findings, this advice seems relevant for both human

translators and editors of MT output.

6.1.3 Summary of results for the scientific corpus

For the scientific corpus, a number of cohesive features and global textual characteristics also displayed statistically significant differences across the three methods of text production. The results of the statistical analyses are summarized in Tables 6.5 and 6.6.

The detailed account of the data and their analyses were provided in Chapter 5.

Table 6.5 Differences in mean scores for the cohesive characteristics and global textual features in the scientific corpus Method Human- Machine- Non-translated translated translated Feature Cohesive characteristics 3rd person singular neuter Lowest Highest personal pronouns ("it") 3rd person singular objective Zero Zero 0.04 personal pronouns ("him," "her") 3rd person plural objective Lowest Highest personal pronouns ("them")

363

3rd person singular subjective Lowest Highest personal pronouns ("he," "she")

3rd person plural subjective Highest (0.63) 0.58 Lowest (0.52) personal pronouns ("they") 3rd person pronominal cohesive Lowest Highest devices (Sum) Possessive pronouns (e.g., "his," Highest Lowest "her," "their") Singular demonstrative Highest Lowest determiners (e.g., "this," "that") Plural demonstrative determiners Highest Lowest (e.g., "these," "those") Definite article ("the") Lowest Highest General comparative adjectives Highest Lowest (e.g., "older," "better," "stronger") General superlative adjectives Highest (0.65) Lowest (0.26) Close to HT (e.g., "oldest," "best," "strongest") (0.27) Comparative degree adverbs (e.g., Highest (0.96) Lowest (0.57) 0.61 "more," "less") Superlative degree adverbs (e.g., Highest (0.49) Lowest (0.26) Lowest (0.26) "most," "least") Comparative devices (Sum) Highest Lowest Reference cohesive devices (Sum Lowest Highest of pronominal, demonstrative, and comparative devices) Additive conjunction devices Highest Lowest (e.g., "and," "or") Adversative conjunction devices Highest Lowest (e.g., "but")

364

Causal and continuative Highest Lowest conjunction devices represented by subordinating conjunctions (e.g., "if," "because," "unless," "so," "for") Temporal conjunction devices Highest Very close to Lowest (e.g., "now," "tomorrow") MT Conjunction devices (Sum) Highest Lowest Reference and conjunction Lowest Highest cohesive devices (Sum) Global textual characteristics Nominalization Lowest Highest Lexical density (STTR) Highest Lowest Average word length Highest Lowest Average sentence length Lowest Highest Passives Highest Lowest Preposition "for" Highest Lowest General prepositions (e.g., "in," Lowest Highest "about") Preposition "of" Lowest Highest Prepositions "with" and "without" Highest Lowest Prepositional phrases (Sum) Lowest Highest

Table 6.6 Pairwise comparisons of significant cohesive and global textual characteristics (NT, HT, and MT) by Tukey HSD post hoc testing in the scientific corpus Method comparison NT vs. HT NT vs. MT HT vs. MT Feature Cohesive characteristics 3rd person singular neuter 0.0019*

365

personal pronouns ("it") 3rd person singular objective n/a n/a n/a3 personal pronouns ("him," "her") 3rd person plural objective personal pronouns ("them") 3rd person singular subjective personal pronouns ("he," "she") 3rd person plural subjective personal pronouns ("they") 3rd person pronominal cohesive 0.0224* devices (Sum) Possessive pronouns (e.g., "his," 0.0166* "her," "their") Singular demonstrative 0.0196* < 0.0001* determiners (e.g., "this," "that") Plural demonstrative determiners (e.g., "these," "those") Definite article ("the") < 0.0001* < 0.0001* General comparative adjectives (e.g., "older," "better," "stronger") General superlative adjectives 0.0195* 0.0227* (e.g., "oldest," "best," "strongest")

3 Of high degree of sparsity (most of the values for individual texts are 0) of this variable, it was decided not to analyze it.

366

Comparative degree adverbs (e.g., "more," "less") Superlative degree adverbs (e.g., "most," "least") Comparative devices (Sum) 0.0036* 0.0015* Reference cohesive devices (Sum < 0.0001* < 0.0001* of pronominal, demonstrative, and comparative devices) Additive conjunction devices 0.0013* (e.g., "and," "or") Adversative conjunction devices (e.g., "but") Causal and continuative < 0.0001* < 0.0001* conjunction devices represented by subordinating conjunctions (e.g., "if," "because," "unless," "so," "for") Temporal conjunction devices (e.g., "now," "tomorrow") Conjunction devices (Sum) 0.0001* < 0.0001* Reference and conjunction 0.0005* 0.0002* cohesive devices (Sum) Global textual characteristics Nominalization Lexical density (STTR) 0.0001* < 0.0001* Average word length 0.0039* Average sentence length 0.0001* < 0.0001* 0.0001* Passives < 0.0001* < 0.0001* Preposition "for" 0.0001* < 0.0001* General prepositions (e.g., "in," 0.0006* 0.0071*

367

"about") Preposition "of" < 0.0001* < 0.0001* < 0.0001* Prepositions "with" and "without" Prepositional phrases (Sum) < 0.0001* < 0.0001*

Summarizing the results from Table 6.4: in the scientific corpus, statistically significant differences were found for:

− 3rd person singular neuter personal pronouns ("it")—NT vs. HT

− 3rd person pronominal cohesive devices (Sum)—NT vs. HT

− Possessive pronouns (e.g., "his," "her," "their")—NT vs. MT

− Singular demonstrative determiners (e.g., "this, "that")—NT vs. HT and NT vs.

MT

− Definite article ("the")—NT vs. HT and NT vs. MT

− General superlative adjectives (e.g., "oldest," "strongest")—NT vs. HT and NT

vs. MT

− Comparative devices (Sum)—NT vs. HT and NT vs. MT

− Reference cohesive devices (Sum of pronominal, demonstrative, and comparative

devices)—NT vs. HT and NT vs. MT

− Additive conjunction devices (e.g., "and," "or")—NT vs. HT

− Causal and continuative conjunction devices represented by subordinating

conjunctions (e.g., "if," "because," "unless," "so," "for")—NT vs. HT and NT vs.

MT

368

− Conjunction devices (Sum)—NT vs. HT and NT vs. MT

− Reference and conjunction cohesive devices (Sum)—NT vs. HT and NT vs. MT

− Lexical density (STTR)—NT vs. HT and NT vs. MT

− Average word length—NT vs. MT

− Average sentence length—all pairs

− Passives—NT vs. MT and HT vs. MT

− Preposition for—NT vs. HT and NT vs. MT

− General prepositions (e.g., "in," "about")—NT vs. HT and HT vs. MT

− Preposition of—all pairs

− Prepositional phrases (Sum)—NT vs. HT and NT vs. MT

For 3rd person pronominal cohesive devices, the scientific corpus displayed some

statistically significant differences. However, the counts of such devices in this corpus were low (with the average sums of these devices per 1,000 words being 3.63 for non- translated texts, 4.9 for human translations, and 4.48 for machine translations). In fact, of the three genres included in the study, the scientific corpus contains the statistically significantly lowest number of 3rd person pronouns. This sparsity may be expected for

scientific texts dealing with physics of metals and materials because their topics are

generally focused on scientific phenomena and not on human beings. This suggests that

linguists should expect to encounter few 3rd person pronominal cohesive devices in

scientific texts on the subject of physics.

369

For possessive pronouns and demonstratives, non-translated scientific texts displayed the highest number, while MT output—the lowest. For the definite article, the findings were the opposite—non-translated scientific texts used the fewest definite articles, while machine-translated texts—the highest. These findings are similar to the results for the literary and newspaper corpora.

As mentioned earlier, MT tools may be set up to prefer the definite article as a noun modifier. The use of the definite article leaves less room for a potential error that may stem from MT tools' misinterpretations of cohesive ties in a Russian source text.

Thus, linguists working with scientific texts may want to consider reducing the number of definite articles in human and machine translations, replacing them with pronominal or demonstrative cohesive devices, or zero article.

For comparative cohesive devices, statistically significant differences were found for general superlative adjectives (such as "oldest") and for the sum of the studied comparative devices. For both features, the differences were found for non-translated texts compared with human and machine translations. Interestingly, non-translated scientific texts used the highest number of superlative adjectives and scored the highest for the sum of comparative devices. Still, the overall usage of such devices was relatively infrequent (e.g., for the sum of comparative devices, the average per 1,000 words for the non-translated texts was 4.99, for human translations—3.38, and for machine- translations—3.25). Thus, these devices may not be of particular concern for translators, unless they encounter an unusually frequent number of them.

370

For the sum of reference cohesive devices (which includes pronominal, demonstrative, and comparative devices, as well as the definite article), non-translated scientific texts displayed significantly lower counts than human and machine translations.

This may be due to a possible overuse of definite articles in human- and machine- translated texts. These findings are similar to the results for the newspaper corpus. Since this variable represents a sum of all reference cohesive devices, a more informative picture is attained by looking at each category separately.

For conjunctive cohesive devices, statistically significant differences were found for all categories except for adversative (e.g., "but") and temporal conjunctive devices

(e.g., "when") (Table 6.6). For subordinate conjunctions (e.g., "because") and for the sum of all conjunctive devices, differences were found to be statistically significant for two pairs—non-translated vs. human-translated texts and non-translated vs. machine- translated texts. For additive conjunction devices (such as "and"), statistically significant differences were found between non-translated and machine-translated texts.

For all of these categories, non-translated scientific texts used the highest number of conjunctive cohesive devices, while machine-translated texts used the lowest number of them. This may point to the universal of simplification in translation or to Toury's law of interference. If Toury's law is at work, the results reflect differences in conjunctive cohesion between English and Russian. Similarly to Russian newspaper writers, Russian scientific writers may rely more on other, non-conjunctive means of cohesion (e.g., punctuation) or leave some cohesive ties unexpressed, which may get transferred into

371

English translations of newspaper and scientific articles. Further research may help pin- point the causes.

For the sum of reference and conjunction devices, scientific non-translated texts were found to have the statistically significantly lowest average per 1,000 words.

Looking at the results for the individual categories that comprise this sum variable (Table

6.5), it is likely that this result is due to an overuse of definite articles in human and machine translations. In any case, it proves more useful to look at the individual categories that comprise this variable (above).

For global textual features, the scientific corpus revealed statistically significant differences for all features except for nominalization (Table 6.6). For lexical density, non-translated scientific texts displayed a statistically significantly higher STTR than human-translated and machine-translated texts, with machine-translated texts having the lowest STTR. As discussed when reviewing the similar results found for the literary and scientific corpora, this is in agreement with Laviosa's results (1998a, 1998b). As Laviosa suggests, this might be viewed as evidence in support of simplification as a universal of translation.

As mentioned earlier, these findings may also suggest that Russian scientific, newspaper, and literary texts have a lower range of vocabulary compared with English texts of these genres. Possible facts in support of this conjecture were discussed when reviewing results for the literary corpus.

372

For average sentence length, non-translated texts had the lowest average of the three groups, while machine-translated texts had the highest (with all differences found to be statistically significant). This finding is interesting, and may point to Toury's law of interference. More research is needed to advance this conjecture.

Passives were used most frequently in non-translated scientific texts, and least

frequently, in machine-translated scientific texts. For this variable, statistically significant

differences were found for non-translated vs. machine-translated texts and human-

translated vs. machine-translated texts. These results differed from the findings for

literary and newspaper texts, where no statistically significant differences between the

groups were found. The finding that non-translated scientific texts and human translations

are similar in their use of passives may suggest similarities between scientific genres in

English and Russian. In addition, it may indicate that human scientific translators produce

texts that are close to source texts in terms of their use of passives. The findings that

machine-translated texts use fewer passives than the other two groups may suggest that

MT tools are programmed to use fewer passive constructions, or, possibly, have

difficulties deciphering Russian passives, thus translating them as non-passive

constructions (see examples in Chapter 5).

For the use of prepositions, significant differences were found for all categories

except for the prepositions "with"/"without." This suggests that scientific translators and

editors should pay special attention to the use of prepositions in their work. This is

especially relevant for the use of the prepositions "of" and "for." Similarly to other

373

genres, non-translated scientific texts used the lowest number of prepositions "of" and the

highest number of prepositions "for." As discussed earlier, this may be related to

difficulties human translators and MT tools have when rendering long Russian noun

chains with genitive meaning. MT tools and human translators, especially novices, may

more often than stylistically appropriate use multiple prepositions "of" in close proximity

in their English translations.

These findings suggest that translators and (post-) editors working with scientific

texts should try to rephrase long Russian noun phrases in a way that avoids repeating the

preposition "of" frequently, perhaps resorting to the underused preposition "for" and

other cohesive means. According to the findings, this advice seems to be relevant for both

human translators and editors of MT output.

6.2 Practical recommendations for linguists working in the translation industry and

translator trainers

Using the findings of this study, I have developed a list of practical recommendations for

translators, editors of MT output, translator trainers, and others involved in the language

industry. These recommendations may help fine-tune the use of cohesive ties and other

global textual features, making it more similar to non-translated writing, and thus

improve target texts' cohesiveness. Some of these may be common knowledge: in this case, the point of this research was to challenge common knowledge and see if it can be supported or refuted with data. The recommendations below are developed for literary,

374

newspaper, and scientific texts; however, they may also be extensible for linguists and translator trainers working with other text-types.

When appropriate:

− Increase the use of 3rd person pronominal cohesive devices and possessive

pronouns as noun modifiers, when genre-appropriate (e.g., for newspaper and

literary texts)

− Decrease the use of definite articles, especially when editing MT output, in all

genres, by:

− Replacing them with possessive pronouns, nouns in genitive case, zero article, or

other cohesive means in newspaper and literary texts

− Replacing them with zero article, nouns in genitive case, or other cohesive means

in scientific texts (3rd person and possessive pronouns are infrequent in scientific

texts)

− Be on the lookout for constructions with the 3rd person singular neuter pronoun

"it"—it was underused in newspaper texts translated by humans and machines,

but overused in scientific human translations

− Pay attention to the use of reference devices as cohesive means. These include 3rd

person pronouns, possessive pronouns, demonstratives, comparatives, and definite

articles. Attention to these features is especially important when editing MT

output, as MT tools seem to have difficulty interpreting cohesive ties involved in

the use of such devices, do not adjust them, and over-rely on definite articles

375

− Introduce additional conjunctions in newspaper and scientific translations,

explicitating cohesive ties, to approximate non-translated writing

− Adjust the use of prepositions (relevant for all three genres):

− Look for ways to replace the preposition "of" with other prepositions (such as

"for," which seems to be underused in translations) and other cohesive means,

especially when the preposition "of" is repeated in close proximity (particularly

relevant for newspaper and scientific genres; highly relevant for MT output)

− In MT output, be on the lookout for potential errors related to misinterpretations

of Russian genitive noun chains (often rendered in English with the preposition

"of")

− When appropriate, increase variation in vocabulary (relevant for all three

genres)—texts translated from Russian by humans and machines tend to have a

lower type-token ratio than non-translated texts

− Avoid overusing nominalization, especially in newspaper texts

− When editing MT output, be on the lookout for misinterpretations of Russian

passive constructions and for the underuse of passives in the genres where it is

common, such as scientific writing

− In scientific texts, consider breaking longer sentences up into shorter ones. If

appropriate

376

6.3 Limitations of this study

Some limitations of this study are inherent to corpus-based studies in general. For instance, corpus size and representativeness are eternal issues in corpus design—as

Zannetin (2011: 15) puts it, "representativeness remains the 'holy grail' of corpus linguistics, something to strive for rather than something that can reasonably be attained."

While this study represents a greater number of texts and genres than many other similar studies, increasing the number of texts, sources, and genres would likely make it more representative.

For the genres of newspaper and scientific writing, it would be beneficial to control for translators of these texts. For this project, it was not possible since the names of the translators were often unavailable. This, however, may change in the next decade or two. Alternatively, working more closely with the venues publishing these translations may allow for more control in terms of balancing the work of individual translators included in the corpus (the way it was done for the literary genre, where the names of translators are now customarily available).

For the scientific corpus, the compilation process revealed a higher than expected difficulty that character-recognition software has when recognizing physical and mathematical formulae. While formulae were not part of the study, and thus did not influence most of the data, it is possible that they influenced slightly such variables as word and sentence length in the scientific corpus. In the future, a more refined procedure for dealing with such difficulties should be put in place.

377

The automated analysis performed in this study helped avoid problems with inter-

rater reliability, common in manual analysis. However, since the corpus was designed by

a human, even though the researcher made decisions based on best practices of corpus

compilation, corpus design "will always carry the unintended influence of the

designer(s)" (Ahmad 2008: 61).

In addition, any software carries a certain percentage of error. For instance, the

CLAWS part-of-speech tagger is reported to have 96-97% accuracy (according to the

official CLAWS website). Some inaccuracy in part-of-speech tagging, however, still

remains. Other software drawbacks include differences in word count—different

programs may define a word differently. This study used the same software for the word

and sentence count throughout the entire process, to avoid such problems.

Certain features, such as comparatives, were found to be relatively infrequent in the texts selected for the corpora, especially in the newspaper corpus. In the future, it may be beneficial to set the limit for the shortest texts included not only for the literary genre, where it was feasible, but also for the newspaper genre. On the downside, it would go against genre conventions since newspaper articles tend to be short, and thus bias the results in favor of the writing styles used in longer articles.

The study includes cohesive and other global features that yield themselves well to automated analysis. Other important global textual features, such as lexical cohesion, do not lend themselves well to automated analysis, and so were not included.

378

Supplementing automated corpus studies with manual corpus analysis is beneficial, and is typically feasible with a research team.

In general, data-driven studies often draw criticism from proponents of qualitative research in translation studies. In my opinion, quantitative and qualitative research are complementary, and their results should be used to supplement one other.

6.4 Future research directions

The corpus assembled for the purposes of this study may additionally be used to research a plethora of linguistic features, such as collocations, lexical cohesion, punctuation, paragraph length, unique items, idioms, emphatic structures, hedges, verbal adjectives and participles, to name a few items of interest. This makes it a valuable research resource.

Further research may generate additional evidence of translation universals and

Toury's law of interference, which I hypothesize to be accountable for many statistically significant differences found in this study. A more detailed comparative analysis (most likely, performed manually) of potential instances of explicitation, simplification, or source language interference may help shed more light on these concepts in translation studies, as well as on the nature of translated and non-translated texts, and extend these discussions to machine-translated texts.

This corpus may also be developed further to include additional text-types, increase the number of texts included in each sub-corpus, and include MT outputs from

379

other MT tools. For research questions involving comparative analysis, the translated parts of the corpus may be aligned with the source texts. To facilitate such comparative analysis, the Russian part of the corpus may be tagged for parts of speech.

The research done with the help of this corpus may be extended to include edited machine-translation output. It may be studied both as a process (with additional tools, such as key-stroke logging or eye-tracking software) and as a product (to be included into the corpus as the fourth group of texts, in addition to non-translated, human-translated, and machine-translated texts).

The collected data may potentially be used in developing automated methods of identifying translated texts. Such methods could be a useful (self-) assessment tool for those involved in translation (cf. Baroni et al. 2006), as well as for researchers who extract parallel corpora from the web, when parallel texts for inclusion need to be identified and assessed (Resnik et al. 2003). Baroni et al. (2006) also note that automated techniques of distinguishing between translated and non-translated texts might be useful in identifying multilingual plagiarism.

In addition to further research, the present corpus may be used in translation pedagogy, translation evaluation, and other areas (Baer, Bystrova-McIntyre 2009,

Bowker 2001, Bowker 2003, Uzar 2004; López-Rodríguez, et al. 2007, Toury 1995, among others). Fruitful applications of this corpus to translation evaluation include assessing translations (Bowker 2001; Bowker 2003) and developing students' self-

380

assessment and peer-assessment skills (Bowker 2000; Bowker, 2003; Uzar 2004; López-

Rodríguez, et al. 2007) (Baer, Bystrova-McIntyre 2009: 161).

Using corpora in pedagogical settings has notable advantages. Lynne Bowker, who wrote extensively on the use of corpora in translation evaluation and pedagogy, notes that a corpus may serve as "a benchmark against which translator trainers can compare student translations on a number of different levels," making trainers' feedback more constructive (2001: 345). In addition, the use of corpora may remove "a great deal of the subjectivity" from the task of translation evaluation, providing an evaluator with "a wide range of authentic and suitable texts" to help assess students' translation solutions

(Bowker 2001: 345; Bowker 2000: 184). This may be especially helpful for specialized translation assignments when trainers are not as familiar with the subject matter (Bowker

2001: 345).

Besides, corpora offer "a common evaluative framework" for translation students and trainers (Bowker 2001: 361), helping students be more receptive to their trainers' feedback by allowing them to "see for themselves that it [feedback] is based on corpus evidence and not merely on the subjective impressions or incomplete understanding of the translator trainer" (Bowker 2003: 180). Research should be done to determine whether students are indeed more receptive to such feedback (Baer, Bystrova-McIntyre

2009: 162).

In their training, the use of such corpora as this one can "raise students' interest in and awareness of specialized language," helping students grow as independent learners

381

(Bowker 2001: 362). With the help of specialized tools, such as WordSmith Tools or

ParaConc (a multilingual concordancer), students can check general corpus statistics

(such as type-token ratio, sentence and paragraph length, etc.), compile word frequency

lists and collocation tables for different text-types, study concordances (KWICs) and

aligned source and target segments, and perform many other interesting and educational

experiments.

The corpus compiled for the purposes of this research may be helpful in training

future translators, editors, and other linguists, as has been outlined above. The fact that it

includes texts of different genres produced by three methods of production (non-

translated, human-translated, and machine-translated) differentiates it from most existing translational corpora, and extends its use to training of translation editors, including editors working with machine-translated output.

REFERENCES

Adab, Beverly, and Christina Schaeffner. "The Idea of the Hybrid Text in Translation:

Contact as Conflict." Across Languages and Cultures. Special Issue on Hybrid

Texts and Translation. Christina Schaeffner and Beverly Adab (guest eds.) 2.2

(2001): 167-80.

Ahmad, Khurshid. "Being in Text and Text in Being: Notes on Representative Text."

Incorporating Corpora: The Linguist and the Translator. Eds. Anderman, Gunilla

and Margaret Rogers. Clevedon, Buffalo & Toronto: Multilingual Matters. 60-94.

Angelone, Eric. "Uncertainty, Uncertainty Management, and Metacognitive Problem

Solving in the Translation Task." Translation and Cognition. Eds. Shreve,

Gregory and Eric Angelone. Amsterdam/Philadelphia: John Benjamins, 2010. 17-

40.

Bachman, Lyle F. Fundamental Considerations in Language Testing. Oxford: Oxford

University Press, 1990.

Baer, Brian J., and Tatyana Bystrova-McIntyre. "Assessing Cohesion: Developing

Assessment Tools on the Basis of Comparable Corpora." ATA Scholarly

Monograph Series, XIV. Testing and Assessment in Translation and Interpreting

Studies: A Call for Dialogue between Research and Practice. Eds. Angelelli,

382

383

Claudia V. and Holly E. Jacobson. Amsterdam/Philadelphia: John Benjamins,

2009. 159-83.

Baker, Mona. "Corpus-Based Translation Studies: The Challenges That Lie Ahead."

Terminology, LSP and Translation: Studies in Language Engineering in Honour

of Juan C. Sager. Ed. Somers, Harold. Amsterdam: John Benjamins, 1996. 175-

86.

Baker, Mona. In Other Words: A Coursebook on Translation. London & New York:

Routledge, 1992.

Baker, Monda, and Kirsten Malmkjaer, eds. Routledge Encyclopedia of Translation

Studies London/New York: Routledge, 1998.

Baroni, Marco, and Silvia Bernardini. "A New Approach to the Study of Translationese:

Machine-Learning the Difference between Original and Translated Text."

Literary and Linguistic Computing 21.3 (2006): 259-74.

Barthes, Roland. S/Z. Paris: Seuil, 1975.

Baskette, Floyd K., Jack Z. Sissors, and Brian S. Brooks. The Art of Editing. NY:

Macmillan, 1992.

Bassnett-McGuire, Susan. Translation Studies. London & New York: Methuen, 1980.

Bellos, David. Is That a Fish in Your Ear? New York: Faber and Faber, Inc., 2011.

384

Berner, R. Thomas. The Process of Editing. The Pennsylvania State University. Boston:

Allyn and Bacon, 1991.

Berzlánovich, Ildikó, Markus Egg, and Gisela Redeker. "Coherence Structure and

Lexical Cohesion in Expository and Persuasive Texts." Constraints in Discourse.

University of Potsdam/Germany, 2008. 19-26. Eds. Benz, Anton, Peter Kuehnlein

and Manfred Stede.

Bex, Tony. Variety in Written English London: Routledge, 1996.

Biber, Douglas. Variation across Speech and Writing. Cambridge: Cambridge University

Press, 1988.

Bloom-Kulka, Shoshana. "Shifts of Cohesion and Coherence in Translation." Interlingual

and Intercultural Communication: Discourse and Cognition in Translation and

Second Language Acquisition Studies. Eds. House, Juliane and Shoshana Bloom-

Kulka. Tuebingen: Gunter Narr Verlag, 1986. 17-35.

Bloom-Kulka, Shoshana. "Shifts of Cohesion and Coherence in Translation." The

Translation Studies Reader. Ed. Venuti, Lawrence. London & New York:

Routledge, 2004. 298-313.

Blum-Kulka, Shoshana, and Edward A. Levenston. "Universals of Lexical

Simplification." Strategies in Interlanguage Communication. Eds. Faerch, Claus

and Gabriele Casper. London & NY: Longman, 1983. 119-40.

385

Bowker, Lynne. "Corpus-Based Applications for Translator Training." Corpus-Based

Approaches to Contrastive Linguistics and Translation Studies. Eds. Granger,

Sylviane, Jacques Lerot and Stephanie Petch-Tyson. Amsterdam & New York,

NY: Rodopi, 2003. 169-83.

Bowker, Lynne. "A Corpus-Based Approach to Evaluating Student Translations."

Evaluation and Translation. Special Issue of the Translator. Ed. Maier, Carol.

Vol. 6.2, 2000. 183-210.

Bowker, Lynne. "Productivity Vs. Quality? A Pilot Study on the Impact of Translation

Memory Systems." Localisation Focus 4.1 (2005): 13-20.

Bowker, Lynne. "Towards a Methodology for a Corpus-Based Approach to Translation

Evaluation." Meta 46.2 (2001): 345-64.

Bowker, Lynne, and Jennifer Pearson. Working with Specialized Language: A Practical

Guide to Using Corpora. London: Routledge, 2002.

Brown, Gillian, and George Yule. Discourse Analysis. Cambridge: Cambridge University

Press, 1983.

Bureau of Labor Statistics, U.S. Department of Labor. "Occupational Outlook Handbook,

2010-2011 Edition.". 2010. February 14, 2012.

.

Byrne, Jody. Technical Translation: Usability Strategies for Translating Technical

Documentation. Dordrecht: Springer, 2006.

386

Bystrova-McIntyre, Tatyana. "Between Norms and Style: Using Corpora to Understand

Punctuation Use in Russian and English." Translating Russia: From Theory To

Practice. Ohio Slavic Papers 8 (2006): 103-28.

Bystrova-McIntyre, Tatyana. "Cohesion in Translation: A Corpus Study of Translated

and Non-Translated Texts (Russian into English)." MidWest Slavic Conference,

2011.

Bystrova-McIntyre, Tatyana. "Looking at the Overlooked: A Corpora Study of

Punctuation Use in Russian and English." Translation and Interpreting Studies

2.1 (2007): 137-62.

Bystrova-McIntyre, Tatyana. "The Translator's Other Hats: Towards an Expanded

Translation Pedagogy." American Translators Association 51st Conference, 2010.

Callow, Kathleen. Discourse Considerations in Translating the Word of God. Michigan:

Zondervan, 1974.

Campbell, Kim Sydow. Coherence, Continuity, and Cohesion: Theoretical Foundations

for Document Design. Hove, UK: Lawrence Erlbaum Associates, 1995.

Campbell, Stuart. Translation into the Second Language. London: Longman, 1998.

Carl, Michael, et al. "The Process of Post-Editing: A Pilot Study." Copenhagen Studies in

Language 41 (2011): 131-42.

Caron, Jean. An Introduction to Psycholinguistics. Toronto: University of Toronto Press,

1992.

387

Chafe, Wallace, and Jane Danielewicz. Properties of Spoken and Written Language,

1987.

Chandler, Daniel. An Introduction to Genre Theory. 1997. February 9, 2012

.

Chapman, L. John. Reading: From 5-11 Years. Milton Keynes, England: Open UP, 1987.

Chau Hu, Helen. "Cohesion and Coherence in Translation Theory and Pedagogy." Word

50.1 (1999): 33-46.

Cherney, Leora Reiff, Barbara B. Shadden, and Carl O. Coelho. Analyzing Discourse in

Communicatively Impaired Adults. Gaithersburg, MD: Aspen Publishers, 1998.

Chiang, Steve. "The Importance of Cohesive Conditions to Perceptions of Writing

Quality at the Early Stages of Foreign Language Learning." System 31.4 (2003):

471-84.

Crowhurst, Marion. "Cohesion in Argument and Narration at Three Grade Levels."

Research in the Teaching of English 21 (1987): 185-201.

De Beaugrande, Robert. Text, Discourse, and Process. London: Longman, 1980.

De Beaugrande, Robert, and Wolfgang Dressler. Introduction to Text Linguistics.

London: Longman, 1981.

Derrida, Jacques "The Law of Genre." On Narrative. Ed. Mitchell, William J. Thomas.

Chicago: University of Chicago Press, 1981. 51-77.

388

Dong, Da-Hui, and Yu-Su Lan. "Textual Competence and the Use of Cohesion Devices

in Translating into a Second Language." The Interpreter and Translator Trainer

4.1 (2010): 47-88.

Dooley, Robert A., and Stephen H. Levinsohn. Analyzing Discourse: A Manual of Basic

Concepts. Dallas, Texas: SIL International, 2001.

Doyle, Ann. "The Limitations of Cohesion." Research in the Teaching of English 16.4

(1982): 390-93.

Dragga, Sam, and Gwendolyn Gong. Editing: The Design of Rhetoric. Amityville, NY:

Baywood Publishing Co., 1989.

Dragsted, Barbara. "Computer-Aided Translation as a Distributed Cognitive Task."

Pragmatics & Cognition 14.2 (2006): 443-64.

Dragsted, Barbara. "Segmentation in Translation and Translation Memory Systems. An

Empirical Investigation of Cognitive Segmentation and Effects of Integration a

Tm System into the Translation Process." Copenhagen Business School, 2004.

Dragsted, Barbara. "Segmentation in Translation: Differences across Levels of Expertise

and Difficulty." Target 17.1 (2005): 49-70.

DuBay, William. The Principles of Readability. Costa Mesa, CA: Impact Information,

2004.

Eco, Umberto. The Role of the Reader. Bloomington: Indiana University Press, 1979.

389

Enkvist, Nils Erik. Linguistic Stylistics. The Hague: Mouton, 1973

Erofeyev, Victor, and Andrew Reynolds, eds. The Penguin Book of New Russian Writing.

New York: Penguin USA, 1996.

Eskola, Sari. "Untypical Frequencies in Translated Language: A Corpus-Based Study on

a Literary Corpus of Translated and Non-Translated Finnish." Translation

Universals—Do They Exist? Eds. Mauranen, Anna and Pekka Kujamaeki.

Amsterdam: John Benjamins, 2004. 83-99.

Everett, Daniel L. "Formal Linguistics and Field Work." Notes on Linguistics 57 (1992):

11-25.

Fine, Jonathan. How Language Works: Cohesion in Normal and Nonstandard

Communication. Norwood, NJ: Ablex Publishing, 1994.

Firth, John Rupert. Papers in Linguistics 1934–1951. London: Oxford University Press,

1957.

Fowler, Roger. Language in the News: Discourse and Ideology in the Press.

London/New York: Routledge, 1991.

Francis, Gill. Anaphoric Nouns. Discourse Analysis Monographs No. 11. Discourse

Analysis Monographs No. 11. University of Birmingham: English Language

Research, 1985.

Francis, Gill. "Aspects of Nominal-Group Lexical Cohesion. ." Interface: Journal of

Applied Linguistics/Tijdschrift voor Toegepaste Linguistiek 4 (1989): 27–53.

390

Frawley, William "Prolegomenon to a Theory of Translation." Translation: Literary,

Linguistic and Philosophical Perspectives. Ed. Frawley, William London &

Toronto: Associated University Presses, 1984. 159-75.

García, Ignacio. "Research on Translation Tools." Translation Research Projects 2. Eds.

Pym, Anthony and Alexander Perekrestenko. Tarragona: Intercultural Studies

Group, 2009. 27-33.

Garside, Roger, Geoffrey Leech, and Anthony McEnery. Corpus Annotation: Linguistic

Information from Computer Text Corpora. London: Longman, 1997.

Online and Free! Ten Years of Online Machine Translation: Origins, Developments,

Current Use and Future Prospects. MT Summit XI. 10-14 September 2007.

Gellerstam, Martin. "Translationese in Swedish Novels Translated from English."

Translation Studies in Scandinavia. Eds. Wollin, Lars and Hans Lindquist. Lund:

CWK Gleerup, 1986. 88–95.

Gellerstam, Martin. "Translations as a Source for Cross-Linguistic Studies. Papers from a

Symposium on Text-Based Cross-Linguistic Studies in Lund." Languages in

Contrast. Eds. Aijmer, Karin, Bengt Altenberg and Mats Johansson. Lund: Lund

University Press, 1996. 53–62.

Gleason, Henry Allen Jr. "Contrastive Analysis in Discourse Structure." Monograph

Series on Languages and Linguistics 21 (Georgetown University Institute of

391

Languages and Linguistics). [reprinted in Makkai and Lockwood 1973: 258-76.]

(1968).

Gómez González, María de los Ángeles. "Evaluating Lexical Cohesion in Telephone

Conversations." Discourse Studies 12.5 (2010): 1–25.

"GoogleTranslate Webpage". 2012. Google. March 20 2012.

.

Goutsos, Dionysis. Modeling Discourse Topic: Sequential Relations and Strategies in

Expository Text. Norwood, NJ: Ablex Publishing, 1997.

Granger, Sylviane. "The Corpus Approach: A Common Way Forward for Contrastive

Linguistics and Translation Studies." Corpus-Based Approaches to Contrastive

Linguistics and Translation Studies. Eds. Granger, Sylviane, Jacques Lerot and

Stephanie Petch-Tyson. Amsterdam & Atlanta: Rodopi, 2003. 17-29.

Grimes, Joseph E. The Thread of Discourse. The Hague: Mouton, 1975.

Guerberof, Ana. "Post-Editing Mt and Tm. A Spanish Case." Multilingual 19.6 (2008):

45-50.

Gutwinski, Waldemar. Cohesion in Literary Texts: A Study of Some Grammatical and

Lexical Features of English Discourse. The Hague: Mouton, 1976.

Halliday, Michael A.K. An Introduction to Functional Grammar. 1st ed. London:

Arnold., 1985.

392

Halliday, Michael A.K. An Introduction to Functional Grammar (Second Edition).

London: Edward Arnold, 1994.

Halliday, Michael A.K., and Ruqaiya Hasan. Cohesion in English. London: Longman,

1976.

Halliday, Michael A.K., and Ruqaiya Hasan. Language, Context, and Text: Aspects of

Language in a Social Semiotic Perspective. Geelong, Victoria: Deakin University

Press, 1985.

Halliday, Michael A.K., and Christian M.I.M. Matthiessen. Construing Experience

through Meaning: A Language Based Approach to Cognition. London: Cassell,

1999.

Hansen, Silvia. "The Nature of Translated Text - an Interdisciplinary Methodology for

the Investigation of the Specific Properties of Translations." Saarbrücken

Dissertations in Computational Linguistics and Language Technology 13 (2003).

Harrigan, Jane T. The Editorial Eye. NY: St. Martin's Press, 2003.

Hasan, Ruqaiya. "Coherence and Cohesive Harmony." Understanding Reading

Comprehension. Ed. Flood, James. Delaware: International Reading Association,

1984. 181-219.

Hasan, Ruqaiya. Grammatical Cohesion in Spoken and Written English. Harlow:

Longman, 1968.

393

Hasselgård, Hilde. "Sentence Opening in English and Norwegian." Corpus-Based Studies

in English. Papers from the Seventeenth International Conference on English

Language Research on Computerized Corpora. Ed. Ljung, M. Amsterdam:

Rodopi, 1997. 3-20.

Hatim, Basil, and Ian Mason. Discourse and the Translator. London & New York:

Longman, 1990.

Hendricks, W. O. "Discourse Analysis as a Semiotic Endeavor." Semiotica 72 (1988): 97-

124.

Herbst, Thomas. English Linguistics: A Coursebook for Students of English. Berlin/New

York: Walter de Gruyter GmbH & Co., 2010.

Hoey, Michael. "Another Prospective on Coherence and Cohesive Harmony." Functional

and Systemic Linguistics: Approaches and Uses. Ed. Ventola, Eija, 1991. 385-

414.

Hoey, Michael. On the Surface of Discourse. London: George Allen and Unwin, 1983.

Hoey, Michael. Patterns of Lexis in Text. New York/Oxford/Toronto: Oxford University

Press, 1991.

Horning, Alice. "Readable Writing: The Role of Cohesion and Redundancy." Journal of

Advanced Composition 11.1 (1991): 135-45.

Hu, Helen Chau. "Cohesion and Coherence in Translation Theory and Pedagogy." Word

50.1 (1999): 33-46.

394

Hutchins, W. John. "Machine Translation: A Brief History." Concise History of the

Language Sciences: From the Sumerians to the Cognitivists. Eds. Koerner, E. F.

Konrad and Robert E. Asher. Oxford: Pergamon Press, 1995. 431-45.

Iossel, Mikhail, and Jeff Parker, eds. Rasskazy. New Fiction from a New Russia. Portland,

OR: TinHouse Books, 2009.

Irwin, Judith. "The Effect of Linguistic Cohesion on Prose Comprehension." Journal of

Reading Behavior 12.4 (1980).

Irwin, Judith W. . "Cohesion Factors in Children's Textbooks." Understanding and

Teaching Cohesion Comprehension. Ed. Irwin, Judith W., 1986. 57 – 66.

Iser, Wolfgang. The Implied Reader. Baltimore: Johns Hopkins University Press, 1974.

James, Carl. Contrastive Analysis. London: Longman, 1983

Jaworski, Adam, and Nikolas Coupland, eds. The Discourse Reader. London: Routledge,

1999.

Johansson, Stig, and Knut Hofland. "The English-Norwegian Parallel Corpus: Current

Work and New Directions." Multilingual Corpora in Teaching and Research.

Eds. Botley, Simon P., Anthony M. McEnery and Andrew Wilson. Amsterdam:

Rodopi, 2000. 134-47.

Kachroo, Balkrishan "Textual Cohesion and Translation." Meta 29.2 (1984): 128-34.

395

Kaplan, Robert B. The Anatomy of Rhetoric: Prolegomena to a Functional Theory of

Rhetoric. Philadelphia: Center for Curriculum Development, 1972.

Kaplan, Robert B. "Contrastive Rhetoric and the Teaching of Composition." TESOL

Quarterly 1.4 (1967): 10-16.

Kaplan, Robert B. "Cultural Thought Patterns in Inter-Cultural Education." Language

learning XVI.142 (1966): 1-20.

Kennedy, Graeme D. An Introduction to Corpus Linguistics. London: Longman, 1998.

Khanna, Raman R., et al. "Performance of an Online Translation Tool When Applied to

Patient Educational Material." Journal of Hospital Medicine 6 (2011): 519–25.

Kitson, Harry D. The Mind of the Buyer. New York: Macmillan, 1921.

Koch, Holger. A Functional Perspective of Cohesion in English. Norderstedt, Germany:

GRIN Verlag, 2001.

Kolln, Martha. "Cohesion and Coherence." Evaluating Writing: The Role of Teacher's

Knowledge About Text, Learning and Culture. Eds. Cooper, Charles R. and Lee

Odell. Urbana, IL: National Council of Teachers 1999. 93-113.

Kress, Gunther. "Linguistic and Ideological Transformations in News Reporting."

Language, Image, Media. Eds. Davis, Howard and Paul Walton. Oxford: Basil

Blackwell, 1983. 120-38.

396

Kruger, Alet. "Corpus-Based Translation Research: Its Development and Implications for

General, Literary and Bible Translation." Acta Theologica 22. Supplementum 2

(2002): 70–106.

Kruger, Alet. "Corpus-Based Translation Research: Its Development and Implications for

General, Literary and Bible Translation." Acta Theologica Supplementum 2

(2002): 70–106.

Kuo, Sai-Hua, and Mari Nakamura. "Translation or Transformation? A Case Study of

Language and Ideology in the Taiwanese Press." Discourse and Society 16.3

(2005): 393-417.

Larson, Richard. "Structure and Form in Non-Narrative Prose." Teaching Composition:

12 Bibliographical Essays. Ed. Tate, Gary. Fort Worth, TX: Christian UP, 1987

39-82.

Laviosa, Sara. "Core Patterns of Lexical Use in a Comparable Corpus of English

Narrative Prose." Meta 43.4 (1998b): 557-70.

Laviosa, Sara. Corpus-Based Translation Studies: Theory, Findings, Applications.

Amsterdam and Atlanta: Rodopi, 2002.

Laviosa, Sara. "The English Comparable Corpus: A Resource and a Methodology." Unity

in Diversity? Current Trends in Translation Studies. Eds. Bowker, Lynne, et al.

Manchester: St. Jerome Publishing, 1998a. 101-12.

397

Le, Elisabeth "The Role of Paragraphs in the Construction of Coherence – Text

Linguistics and Translation Studies." International Review of Applied Linguistics

in Language Teaching 42.3 (2004): 259-75.

Lee, David Y.W. "Genres, Registers, Text Types, Domains, and Styles: Clarifying the

Concepts and Navigating a Path through the BNC Jungle." Language Learning &

Technology 15.3 (2001): 37-72.

Leech, Geoffrey. "New Resources, or Just Better Old Ones? The Holy Grail of

Representativness." Corpus Linguistics and the Web. Eds. Hundt, Marianne,

Nadja Nesselhauf and Carolin Biewer. Amsterdam/New York: Rodopi, 2007.

133-49.

LeTourneau, Mark S. Explaining Levels of Language from Sentence to Text. Lewiston,

NY & Queenston, Ontario: The Edwin Mellen Press, 2007.

Longacre, Robert E. The Grammar of Discourse. New York: Plenum, 1996.

Lonsdale, Allison Beeby. Teaching Translation from Spanish to English: Worlds Beyond

Words. Ottawa, Canada: University of Ottawa Press, 1996.

López-Rodríguez, Clara Inés, Bryan J. Robinson, and María Isabel Tercedor-Sánchez. "A

Learner-Generated Corpus to Direct Learner-Centered Courses." Translation and

Meaning. Eds. Thelen, Marcel and Barbara Lewandowska Tomaszczyk.

Maastricht: Zuyd University, 2007. 197-211.

398

Martin, James R. "Cohesion and Texture." The Handbook of Discourse Analysis. Eds.

Schiffrin, Deborah, Deborah Tannen and Heidi E. Hamilton. Oxford: Blackwell

Publishing, 2001 35-53

Martin, James R. English Text: System and Structure. Amsterdam: Benjamins, 1992

Martin, James R. "Incongruent and Proud: De-Vilifying 'Nominalization'." Discourse and

Society 19 (2008): 801-10.

Mauranen, Anna "Strange Strings in Translated Language: A Study on Corpora."

Intercultural Faultlines. Research Model in Translation Studies 1: Textual and

Cognitive Aspect. Ed. Olohan, Maeve Manchester: St. Jerome, 2000. 119-41.

Mauranen, Anna, and Pekka Kujamäki, eds. Translation Universals: Do They Exist?

Amsterdam and Philadelphia: John Benjamins, 2004.

May, Rachel. "Sensible Elocution: How Translation Works in and Upon Punctuation."

The Translator 3.1 (1997): 1-20.

McCarthy, Michael. Discourse Analysis for Language Teachers. Cambridge: Cambridge

University Press, 1991

Mohamed-Sayidina, Aisha. "A Contrastive Study of Syntactic Relations, Cohesion, and

Punctuation as Markers of Rhetorical Organization in Arabic and English

Narrative Texts." University of Exeter, 1993.

Moon, Rosamund. Fixed Expressions and Idioms in English: A Corpus-Based Approach.

Oxford: Clarendon Press, 1998.

399

Mossop, Brian. Revising and Editing for Translators. Manchester, UK & Northampton,

MA: St. Jerome Publishing, 2001.

Munday, Jeremy. Introducing Translation Studies: Theories and Applications. London:

Routledge, 2001.

Neubert, Albrecht, and Gregory Shreve. Translation as Text. Kent, Ohio: The Kent State

University Press, 1992.

O’Brien, Sharon. "Controlled Language and Post-Editing." Multilingual - Writing for

Translation - Getting Started Guide 17.7 (2006): 17-19.

O’Brien, Sharon. "Eye-Tracking and Translation Memory Matches." Perspectives:

Studies in Translatology 14.3 (2006): 185-204.

Olohan, Maeve. Introducing Corpora in Translation Studies. London/New York:

Routledge, 2004.

Olohan, Maeve, and Mona Baker. "Reporting That in Translated English: Evidence

for Subconscious Processes of Explicitation?" Across Languages and Cultures 1.2

(2000): 141-58.

Øverås, Linn. "In Search of the Third : An Investigation of Norms in Literary

Translation." Meta 43.4 (1998): 571-88.

Perfetti, Charles, A. "Comprehending Newspaper Headlines." Journal of Memory and

Language 26 (1987): 692-713.

400

Peterson, Nadya L., ed. Russian Love Stories. An Anthology of Contemporary Prose.

New York: Peter Lang, 2009.

Puurtinen, Tiina. Linguistic Acceptability in Translated Children's Literature. Joensuu,

Finland: University of Joensuu Publications in the Humanities, 1995.

Quirk, Randolph, et al. A Grammar of Contemporary English. London: Longman, 1972.

Resnik, Philip, and Noah A. Smith. "The Web as a Parallel Corpus." Computational

Linguistics 29.3 (2003): 349–80.

Ribas López, Carlota "Translation Memories as Vehicles for Error Propagation. A Pilot

Study." Unpublished dissertation. Rovira i Virgili, Tarragona, 2007.

Samson Jr., Donald C. Editing Technical Writing. NY: Oxford University Press, 1993.

Schaeffner, Christina, and Beverly Adab. "The Idea of the Hybrid Text in Translation

Revisited." Across Languages and Cultures 2.2 (2001): 277-302.

Schiffrin, Deborah, Deborah Tannen, and Heidi Ehernberger Hamilton, eds. The

Handbook of Discourse Analysis. Malden, Mass.: Blackwell Publishers, 2001.

Scott, Mike. Wordsmith Tools 5.0. Oxford: Oxford University Press, 2009.

Scott, Mike. "WordSmith Tools." Lexically.Net. 2010. January 15 2012

.

SDL. "Intelligent Machine Translation Webpage." February 14 2012.

.

401

Shreve, Gregory M. "Knowing Translation: Cognitive and Experiential Aspects of

Translation Expertise from the Perspective of Expertise Studies." Translation

Studies: Perspectives on an Emerging Discipline. Ed. Riccardi , Alessandra.

Cambridge: Cambridge University Press, 2002. 150–71.

Simmons, Synthia. "Cohesion in Russian: A Model for Discourse Analysis." American

Association of Teachers of Slavic and East European Languages 25.2 (1981): 64-

79.

Sinclair, John. Corpus, Concordance, Collocation. Oxford: Oxford University Press,

1991.

Sinclair, John McHardy. "Beginning the Study of Lexis." In Memory of J.R. Firth. Eds.

Bazel, Charles E., et al. London: Longman, 1966. 410-30.

Smith, Raoul, and William Frawley. "Conjunctive Cohesion in Four Discourse Types."

Text 3.4 (1983): 347-74.

Steiner, George. After Babel. London: Oxford University Press, 1975.

Stotsky, Sandra. "Types of Lexical Cohesion in Expository Writing: Implications for

Developing the Vocabulary of Academic Discourse." College Composition and

Communication 34.4 (1983): 430-46.

Tannen, Deborah. Talking Voices: Repetition, Dialogue, and Imagery in Conversational

Discourse. Cambridge, England & NY: Cambridge University Press, 1989.

402

Teich, Elke, and Peter Fankhauser. "Wordnet for Lexical Cohesion Analysis."

Proceedings of the 2nd Global Wordnet Conference. Eds. Soijika, Petr, et al.

Brno, Czech Republic: Masaryk University, 2004. 326-31.

Thompson, Geoff. Introducing Functional Grammar. London: Edward Arnold, 1996.

Tierney, Robert J., and James H. Mosenthal. "Cohesion and Textual Coherence."

Research in the Teaching of English 17 (1983): 215-29.

Tirkkonen-Condit, Sonja. "Translationese—A Myth or an Empirical Fact?" Target 14.2

(2002): 207–20.

Tirkkonen-Condit, Sonja "Unique Items—Over- or Under-represented in Translated

Language?" Translation Universals—Do They Exist? Eds. Mauranen, Anna and

Pekka Kujamaeki. Amsterdam and Philadelphia: John Benjamins, 2004. 177-84.

Toury, Gideon Descriptive Translation Studies and Beyond. Amsterdam and

Philadelphia: John Benjamins, 1995.

Troia, Gary A. Instruction and Assessment for Struggling Writers: Evidence-Based

Practices. New York & London: Guilford Publications, 2009.

University of Lancaster, UK. "Claws Part-of-Speech Tagger for English." March 24

2012. .

Uzar, Rafal. "A Corpus Methodology for Analysing Translation." Cadernos De

Tradução: Corpora E Tradução. Ed. Tagnin, S.E.O. Vol. 1. Florianópolis:: NUT,

2002. 235-63.

403

Uzar, Rafal. "A Toolbox for Translation Quality Assessment." Practical Applications in

Language and Computers. Ed. Lewandowska-Tomaszczyk, Barbara. Frankfurt:

Peter Lang, 2004. 153-62.

Van Dijk, Teun A., ed. Handbook of Discourse Analysis. Vol. 2. 4 vols. Orlando, FL:

Academic Press, 1985.

Vanderauwera, Ria. Dutch Novels Translated into English: The Transformation of a

‘Minority’ Literature. Amsterdam: Rodopi, 1985.

Venuti, Lawrence, ed. The Translation Studies Reader. London: Routledge, 2000.

Venuti, Lawrence, ed. The Translation Studies Reader. London and New York:

Routledge, 2001.

Verikaite, Daiva. "Variation of Conjunctive Discourse Markers across Different Genres."

Man and the Word (Žmogus ir žodis) 3.7 (2005): 68-75.

Vogel, Mabel, and Carleton Washburne. "An Objective Method of Determining Grade

Placement of Children’s Reading Material." Elementary School Journal 28

(1928): 373-81.

Volansky, Vered, Noam Ordan, and Shuly Wintner. "More Human or More Translated?

Original Texts Vs. Human and Machine Translations." Seminar on Computational

Linguistics (ISCOL)/Bar-Ilan Symposium on Foundations of Artificial

Intelligence, Bar-Ilan University by the Department of Computer Science. Ed.

404

Wallis, Julian. "Interactive Translation Vs. Pre-Translation in the Context of Translation

Memory Systems: Investigating the Effects of Translation Method on

Productivity, Quality and Translator Satisfaction." MA Thesis. 2006.

Westin, Ingrid. Language Change in English Newspaper Editorials. Amsterdam & New

York, NY: Rodopi, 2002.

Winter, Eugene O. "A Clause Relational Approach to English Texts: A Study of Some

Predictive Lexical Items in Written Discourse." Instructional Science 6.1 (1977):

1-92.

Winter, Eugene O. Towards a Contextual Grammar of English. London: George Allen

and Unwin, 1982.

Witte, Stephen P., and Lester Faigley. "Coherence, Cohesion, and Writing Quality."

Composition and Communication 32 (1981): 189-204.

Writer's Workbench Analysis Programs Quick Guide. 2008. March 14 2012.

df>.

Yankova, Diana. "Semantic Relations in Statutory Texts: A Study of English and

Bulgarian." SKY Journal of Linguistics 19 (2006): 189-222.

Yeh, Chun-Chun. "The Relationship of Cohesion and Coherence: A Contrastive Study of

English and Chinese." Journal of Language and Linguistics 3.2 (2004): 243-60.

405

Zanettin, Federico. "Corpora in Translation Practice." First international workshop on

language resources for translation work and research proceedings (2002). 30

April 2008.

Zanettin, Federico. Corpus Typology and Corpus-Based Research Overview. Kent State

University: Workshop on Corpus Design, 2008.

Zanettin, Federico. "Translation and Corpus Design." A Journal of Professional

Communication. 26 (2011): 14-23.

Zhao, Jia, Wenli Yan, and Yumei Zhou. "A Corpus-Based Study of Cohesion in English

Medical Texts and Its Chinese Translation." International Journal of Biometrical

Science 5.3 (2009): 313-20.