<<

A CORPUS-BASED INVESTIGATION OF LEXICAL COHESION IN EN & IT NON-TRANSLATED TEXTS AND IN IT TRANSLATED TEXTS

A thesis submitted To Kent State University in partial Fufillment of the requirements for the Degree of Doctor of Philosophy

by

Leonardo Giannossa

August, 2012

© Copyright by Leonardo Giannossa 2012

All Rights Reserved

ii

Dissertation written by

Leonardo Giannossa

M.A., University of Bari – Italy, 2007

B.A., University of Bari, Italy, 2005

Approved by

______, Chair, Doctoral Dissertation Committee Brian Baer

______, Members, Doctoral Dissertation Committee Richard K. Washbourne

______, Erik Angelone

______, Patricia Dunmire

______, Sarah Rilling

Accepted by

______, Interim Chair, Modern and Classical Language Studies Keiran J. Dunne

______, Dean, College of Arts and Sciences Timothy Moerland

iii

Table of Contents

LIST OF FIGURES ...... vii

LIST OF TABLES ...... viii

DEDICATION ...... xi

ACKNOWLEDGEMENTS ...... xii

ABSTRACT ...... xiv

INTRODUCTION ...... 1

Why Study Lexical Cohesion? ...... 1

Research Hypotheses ...... 8

Research Method ...... 9

Significance of my Research Hypotheses ...... 12

Summary of Chapters ...... 15

CHAPTER I ...... 16

1.1 Coherence vs. Cohesion ...... 16

1.2 Lexical Cohesion ...... 21

1.3 Lexical Cohesion Studies in Analysis and ...... 26

1.4 Lexical Chaining Sources ...... 32

1.4.1 The WordNet Project ...... 32

1.5 Cohesion and Lexical Cohesion in Translation Studies ...... 34

CHAPTER II ...... 44

2.1 Methodological Approaches: Text Analysis and ...... 45

2.1.1 Text Analysis...... 45

iv

2.2 Tools ...... 57

2.2.1 WordSmith Tools ...... 57

2.2.2 WordNet ...... 60

2.3 Preliminary Analysis ...... 62

2.4 Semantic Relation Analysis ...... 66

2.5 Statistical Analysis ...... 70

CHAPTER III ...... 72

3.1 Parallel and Comparable Corpora ...... 72

3.2 Textual Analysis ...... 73

3.2.1 Standardized Type-Token Ratio...... 73

3.2.2 Sentence number ...... 77

3.2.3 Lexical Density ...... 82

3.2.4 Readability ...... 84

3.2.5 Average Sentence Length...... 86

3.3 Semantic Analysis ...... 89

3.3.1 Repetition and Modified Repetition ...... 90

3.3.2 Synonyms ...... 93

3.3.3 Antonyms, Meronyms and Holonyms ...... 95

3.3.4 Hypernyms ...... 101

3.3.5 Hyponyms ...... 103

3.4 SPSS Statistical Analysis...... 106

3.4.1 Textual Features ...... 106

v

3.4.2 SPSS Statistical Analysis: Semantic Features ...... 108

CHAPTER IV ...... 113

4.1 Introduction ...... 113

4.2 Textual Features ...... 117

Average Sentence length ...... 117

Sentence number ...... 119

STTR ...... 121

Lexical Density ...... 122

4.3 Semantic Features ...... 123

4.3.1 Repetition ...... 123

4.3.2 Synonyms ...... 126

4.3.3 Meronyms, Holonyms, Hypernyms, Hyponyms, Antonyms ...... 127

4.3.4 Semantic categories other than repetitions as a whole ...... 128

CHAPTER V ...... 133

5.1 Introduction ...... 133

5.2 Pedagogical Implications ...... 138

5.3 Limitations and Future Directions ...... 152

GLOSSARY OF ACRONYMS ...... 157

References ...... 159

Webography: ...... 167

vi

LIST OF FIGURES

Figure 1 – WordList Frequency List ...... 58

Figure 2 – Wordlist Statistics ...... 59

Figure 3 – MultiWordNet Interface ...... 61

vii

LIST OF TABLES

Table 1 – Preliminary Textual Analysis Screenshot ...... 63

Table 2 – Semantic Relation Analysis ...... 66

Table 3.1 – Parallel Corpus STTRs ...... 73

Table 3.2 – Parallel Corpus Types ...... 74

Table 3.3 Comparable Corpus STTRs ...... 76

Table 3.4 – Comparable Corpus STTR Means ...... 77

Table 3.5 – Parallel Corpus Sentence Numbers...... 77

Table 3.6 – Average Sentence Numbers ...... 78

Table 3.7 – Parallel and Comparable Corpus Average Sentence Length ...... 79

Table 3.8 – Average Sentence Number in an average text of 3,339 tokens ...... 80

Table 3.9 – Comparable Corpus Average Sentence Number ...... 81

Table 3.10 – Parallel Corpus Lexical Density ...... 82

Table 3.11 – Lexical Density in English and Italian Originals ...... 83

Table 3.12 – Parallel and Comparable Corpus Average Lexical Density ...... 83

Table 3.13 – Parallel Corpus Readability Indices ...... 84

Table 3.14 – Parallel and Comparable Corpus Readability Indices ...... 85

Table 3.15 – Parallel Corpus Average Sentence Length ...... 86

Table 3.16 Parallel Corpus mean ASL ...... 87

Table 3.17 – Comparable Corpus ASL ...... 88

Table 3.18 – Comparable Corpus Mean ASL ...... 89

Table 3.19 – Mean ASL in English and Italian Originals...... 89

viii

Table 3.20 – Parallel Corpus Repetitions ...... 90

Table 3.21 – Parallel Corpus Average Repetition ...... 91

Table 3.22 – Comparable Corpus Average Repetition ...... 91

Table 3.23 – Percentage of Use of Repetition Compared to Text Size ...... 92

Table 3.24 – Parallel Corpus Synonyms ...... 93

Table 3.25 – Parallel Corpus Synonym Means ...... 94

Table 3.26 – Parallel Corpus Antonyms ...... 95

Table 3.27 – Parallel Corpus Antonym Means ...... 96

Table 3.28 – Italian Originals Antonyms ...... 96

Table 3.29 – Parallel Corpus Meronyms ...... 97

Table 3.30 – Parallel Corpus Meronym Means ...... 98

Table 3.31 – Italian Originals Meronyms ...... 98

Table 3.32 – Parallel Corpus Holonyms ...... 99

Table 3.33 – Parallel Corpus Holonym Means ...... 100

Table 3.34 – Italian Originals Holonyms ...... 100

Table 3.35 – Parallel Corpus Hypernyms ...... 101

Table 3.36 – Parallel Corpus Hypernym Means ...... 102

Table 3.37 – Italian Originals Hypernyms ...... 103

Table 3.38 – Parallel Corpus Hyponyms ...... 104

Table 3.39 – Parallel Corpus Hyponym Means ...... 105

Table 3.40 – Italian Originals Hyponyms ...... 105

Table 3.41 – Parallel & Comparable Corpus Statistical Data ...... 109

ix

Table 3.42 – Parallel & Comparable Corpus Statistical Means...... 111

x

DEDICATION

To my parents Antonio Giannossa and Rita Tocci, without them and their good teachings I would never have come as far as I have

xi

ACKNOWLEDGEMENTS

Firstly I would like to express deep gratitude to my advisor, Dr. Brian Baer, who helped me throughout the, at times hard, and stressful writing stages of this dissertation with his insightful advice and suggestions. Sometimes, it is hard to find the motivation to write your own dissertation because of the many obstacles and hurdles that you might find along the way, but his constant presence and aid enabled me to find the motivation to keep moving on. I would also like to thank the remaining members of the committee, namely Dr. Richard Kelly Washbourne, Dr. Erik Angelone, Dr. Patricia Dunmire, and

Dr. Sarah Rilling, who contributed to this dissertation through their valuable and thought- provoking comments and feedback which helped me to enhance this work in its style and content.

Secondly, I would like to thank all the professors from the Institute for Applied

Linguistics at Kent State University whom I had during my coursework years. Thanks to them I have come to learn a lot more than I already did about the field of translation studies. Each class helped me fine-tune my critical thinking and pedagogical skills thus promoting those scholarly skills that I will be needing once out of the program and into the academic environment. Their constructive feedback on papers and their hands-on approach to translation studies help me grow both as a translator and as a scholar. In particular, I would like to express my gratitude to Dr. Isabel Lacruz for her help in coming up with the experimental design for my study and for the time she devoted to my questions and the interest she showed in my topic as well as her constant encouragement.

I am also grateful to Dr. Wakabayashi for helping me present and publish my first paper

xii

on the CATS website and for being always available whenever I need her help.

Thirdly, I am deeply indebted to my mother and my father who, over the last four years, though geographically far away from me by thousands of miles, could not have been closer morally and spiritually. They have always been there for me and supported me in all of my decisions and for their unconditional, unselfish love and support I am deeply grateful to them.

Fourthly, I would like to thank all the people that, over the past four years, I have got to know at Kent State University and all of my colleagues from the translation department with whom I shared some unforgettable moments which I will cherish in the years to come. In particular, I would like to thank Loubna Bilali, Sohomjit Ray, Ajisa

Fukudenji, Monica Rodriguez and long-time friend Adriana Di Biase, whom I first met in

2004 during the first year of my master’s program at the University of Bari (Italy). For their true friendship, I will always be grateful to them.

Finally, I express my sincere and deepest gratitude to all the people who, in crossing my my life pathway, left an indelible mark because they all contributed, to varying degrees, to making me become the person I am today. In this respect, I would like to thank Mariangela Monteleone, and Italian Instructors Rosa Commisso and

Donatella Salvatori who warmly welcomed me in their life at the beginning of this long journey called PhD.

xiii

ABSTRACT

The present study sets out to investigate lexical cohesion and the network of lexical chains it creates from the point of view of translation. This topic has been largely understudied in the translation field, though many scholars acknowledge its importance and the major role it plays in shaping the quality of a translation, (Hoey [1991] demonstrates that approximately 50% of a text’s cohesive markers are lexical) and in affecting the target readership’s response to translations. This study employs Morris and

Hirst’s (1991) categorization of lexical cohesion into 1) reiteration with identity of reference; 2) reiteration without identity of reference; 3) reiteration by means of superordinates; 4) systematic semantic relations; and 5) non-systematic semantic relations. The study tests two hypotheses. Hypothesis number one claims that Italian translations of English scientific articles taken from Le Scienze, which is the Italian edition of Scientific American, tend to reproduce the lexical cohesive markers of the source texts failing to make them conform to the TL norms and text-type conventions.

Hypothesis number two claims that the lexical cohesive markers used in articles originally written in Italian and published in Le Scienze differ from the ones used in the

Italian translations. In both experiments, WordNet, which is a lexical database for the

English language designed by Professor George A. Miller at Princeton University, will be used to identify the senses and the semantic relations connecting the different lexical chains in the texts. As for the Italian texts, MultiWordNet, a multilingual lexical database which allows the Italian version to be strictly aligned with the Princeton

WordNet, will be used in analyzing word senses and semantic relations. Hypotheses

xiv

number 1 and 2 are strictly interrelated insofar as they both set out to demonstrate that not much emphasis has been placed yet on cohesion in general and lexical cohesion in particular in translator training programs and that a greater awareness of it could benefit both professionals and novices.

xv

INTRODUCTION

Why Study Lexical Cohesion?

This dissertation sets out to investigate the topic of lexical cohesion, which has long been understudied in translation studies, thus raising awareness of the major role that this cohesive device plays in translator training programs. The corpus-based approach to the study of such cohesive device combined with the manual analysis of the intra-textual semantic relations and the statistical processing of the results is designed to provide evidence pointing to the importance of lexical cohesion in translation quality.

Lexical cohesion is one of the five cohesive devices that were first identified by

Halliday and Hasan in their pioneering work Cohesion in English (1976), wherein lexical cohesion is defined as “the cohesive effect achieved by the selection of vocabulary (1976:

274)”. However, at the time, they both seemed unaware of the dominant role that this cohesive device played, and still plays, in creating texture and making texts coherent.

This shortcoming is raised by Hoey (1991) in his book Patterns of in Text, in which he argues that Halliday and Hasan fail to acknowledge the primary role of lexical cohesion in building texture. He draws this conclusion by looking at the sample analyses of seven different types of texts included at the end of Cohesion in English. He focuses on the frequency data of the five types of cohesive markers and notices that lexical cohesion accounts for more than forty percent, compared to thirty-two percent for reference, twelve percent for conjunction, ten percent for ellipsis and four percent for substitution (1991: 9). Yet, Hoey states, the two authors cover lexical cohesion in less

1

2

than twenty pages while dedicating over fifty pages to conjunction (1991: 9). But Hoey is not the only one who holds this view; Elke Teich and Peter Fankhauser are of the same opinion and in their article “Wordnet for Lexical Cohesion Analysis” state that lexical cohesion makes the most substantive contribution to the semantic coherence of a text

(2004: 327). To be more precise, nearly fifty percent of a text’s cohesive ties are said to be lexical (2004: 327).

Lexical cohesion, however, is not only important because it affects the texture and coherence of a text; it also plays a major role in the interpreting process individuals engage in when reading texts or listening to dialogues. In this respect, Morris and Hirst, who investigated lexical cohesion as an indicator of text-structure/coherence, claim that this cohesive device helps readers solve ambiguity issues and narrow down the meaning of by providing clues to determine the coherence of a text (1991: 23). In translator training, the above-mentioned statement implies that cohesion could be used to help students disambiguate the content of a text to be translated or narrow down the choice of translation equivalents to a few candidates. This disambiguation process could be fostered by having students work on pre-translation tasks such as the building of referential networks of texts to be translated (see Chapter 5 for further information).

Referential networks are lists of lexical items that are semantically related to one another.

The identification of semantic relations using, for example, electronic thesauri, can help students narrow down the semantic field of a lexical item and, within this semantic field, single out its conceptual meaning. However, one might wonder whether disambiguation applies to all types of translation. A case in point is literary texts, especially the ones in

3

which coherence is disrupted on purpose and thus disambiguation may be hard to achieve. In these cases, disambiguation entails making sure that the target readership has the same reaction or response to a piece of poetry or literary work that the source text readership had when reading that same piece of work. Indeed, the author of a literary text chooses his or her lexis having a specific purpose and readership in mind. Through word choice, s/he aims at evoking a particular emotion or reaction on the reader’s part. Hence, coherence, in this case, refers to everything in a text that leads the reader to respond/react to the text in the way the writer originally intended.

The problem posed by the study of lexical cohesion in translation is due to the fact that cohesive devices are language-, culture- and text-type-specific, as pointed out by such scholars as Elke Teich and Peter Fankhauser (2004), Beata B. Klebanov, Daniel

Diermeier and Eyal Beigman (2008), Hatim and Mason (1990), Mona Baker (1992) and

Shoshana Blum-Kulka (2000). In this respect, Blum-Kulka argues that differences in and style preferences between any two languages will bring about different ways to express cohesion in the target language (299-300). It follows that when translating from any L1 into any L2 and from any L1 text-type into any L2 text-type, there will inevitably be a change in the kinds of cohesive devices the translator will adopt, as well as their distribution in the text. This means that, depending on the language into which one is translating, the types of cohesive ties and their distribution will have to change accordingly. From what has been said so far, it is possible to assert that in order for a translation to be coherent, it needs to reproduce the network of lexical chains that characterize the source text, but at the same time, the types and distribution of the lexical

4

items creating lexical chains need to be adapted to the target language (TL) norms and

TL text-type. To this end, one needs first to get acquainted with the cohesive devices peculiar to the target language and sub-language, and only then will it be possible to make the appropriate changes in cohesive terms when translating into L2.

Though there are many studies approaching and investigating lexical cohesion from both text-based (linguistic) and reader-based (cognitive) points of view, few of them are concerned with translation and pedagogical issues; and even when they do deal with such issues, they mostly tend to approach the problem from a corpus-based standpoint, treating lexical cohesion in theoretical terms, making generalizations about its use in translation and suggestions about prospective fruitful pathways of research. For instance,

Mona Baker (1992) and Steiner (2005) argue that a product-based analysis of lexical cohesion can be achieved by adopting a corpus-based approach that automates the computation of lexical density and type-token ratio. Lexical density tells us if two or more texts are similar in terms of number of content words, whereas type-token ratio gives information about variation in vocabulary. One of the drawbacks of a purely corpus-based approach to the study of lexical cohesion is that it would only take into consideration words that tend to co-occur and are therefore semantically related while ignoring those words that, while semantically related, do not co-occur. Indeed, there are also semantic relationships and connections among words that require even greater background and culture knowledge on the reader’s part. This is true of culture-bound terms, namely terms that developed historically and culturally in a specific area and are peculiar to the people located therein, as well as technical terms. In the latter respect, one

5

of the articles to be analyzed in this study “White Matter Matters” contains two terms which are used as synonyms, namely white matter, which occurs twenty-seven times, and white cabling, which occurs just once. When electronic corpora are used, attention is placed on words or sets of words co-occurring with a certain frequency, which is not the case with white cabling. Hypothetically, during an electronic search white cabling would come up in the search results only if the adjective white were looked up; but if the search is done using the noun matter, then the expression white cabling would not appear.

Besides, even when the two expressions do appear in the search results as in the first of the two above-mentioned scenarios, it still lies with the reader to decide whether or not they are semantically related by resorting not only to textual clues but also to his/her world and culture knowledge which machines do not have.

This is exactly the issue on which pure linguists and schema theoreticians disagree. In this regard, Glenn Flucher in his article “Cohesion and Coherence in Theory and Reading Research” argues that for schema theoreticians, coherence comes first. In other words, coherence precedes cohesion, which means that readers first look for coherence based on their world and culture knowledge and then recognize cohesion

(1989: 146). By contrast, linguists give cohesion a primary role in making a text coherent. In Halliday and Hasan (1976), the interpretation of the cohesive devices populating a text is text-bound; in other words, one does not need to go beyond the text to find the clues leading to the disambiguation of word meanings. This idea of cohesion as an index of textual coherence has mostly been criticized by schema theory scholars because the latter view as an interactive process between a reader and a

6

text. Thus, the belief that coherence is to be found in the text is viewed as questionable because schema theoreticians are of the opinion that a reader’s background and culture knowledge affects his or her interpretation of a text as highly or poorly coherent depending on whether “there is a mismatch in cultural background knowledge between the reader and that assumed by the text.” (Carrel 1982: 485) Empirical findings suggest that whenever such a mismatch occurs there is likely to be a loss of textual cohesion

(Carrol 1982: 485). Simply put, this means that if the reader does not have or fails to access the appropriate background schema underlying the text he or she is reading, the presence of cohesive ties in it will not be of any help in making sense of the text.

It is important to note here that many of these studies that investigate cohesion from a reader-based standpoint are concerned with EFL (English as a Foreign Language) reading and writing. As far as lexical cohesion is concerned, there is an empirical study by Keiko Muto (2006) that is worth mentioning here. The experiment sets out to investigate how lexical cohesion could help EFL students in reading and writing. Forty first-year students from two Extensive Reading classes were first introduced to the notion of lexical cohesion and then they were instructed to read three short stories and find clues relating to the temporal and geographical settings and features of the stories.

The purpose was to study the degree to which students were able to use knowledge of cohesion to interpret the stories. It was found that lack of cultural/background knowledge on the part of the readers prevented them from determining lexical cohesion. In the experiment, the students failed to recognize “Madison Square” as a clue word linked to the place where one of the three stories took place, namely New York (Muto 2006: 114-

7

115), because they were not familiar with where “Madison Square” was located.

When it comes to translation, the above-mentioned implies that in order for the translator to make sense of the text, s/he needs to be familiar not only with the TL culture but also with TL text-type norms and TL cohesion norms. Lexical cohesion, being expressed through content words and more generally speaking through vocabulary, is inevitably language-specific. As Mona Baker notices (1992), the lack of equivalents in a target language pushes translators to resort to superordinates, paraphrases or loan words.

The use of such devices results in the generation of different lexical chains in the target text (207). As a consequence, the source and target texts wind up differing in terms of networks of lexical cohesion and so will trigger different associations in the mind of the readers (210). Insofar as lexical cohesion has a major impact on the style, quality and interpretation of a translation on the part of the target readers, the topic deserves more attention and investigation than has already been done. It is necessary to shed light on what strategies translators can adopt when translating to reproduce the network of lexical chains of the source text as close as possible in the target text, thus trying to keep readers’ associations intact.

Translation, as a transcoding activity involving any two verbal codes, calls for both reading and transferring strategies. In both cases, cohesive markers, in general, and lexical cohesion, in particular, are of paramount importance because they help translators, who, ideally, are proficient in the two languages and cultures involved and any other facets that the latter encompass, to interpret the text when reading it; but cohesive markers also have an impact, when projected into the target text/culture, on the way the

8

target readership, based on their world, culture, and text-type knowledge, will perceive the text as coherent. In this case, coherence can be thought of in terms of mental associations. In other words, the target text can be deemed coherent when the TT readership experiences the same reaction to the text as the one intended by the original writer for his/her readership. This idea of cohesion as involving extra-textual factors was first introduced by Singh in “Contrastive Textual Cohesion”, in which he draws a distinction between semantic (linguistic) and pragmatic cohesion (1979). According to this author, in semantic cohesion, the information can be extracted from the text, whereas in pragmatic cohesion, the information is inferred from outside the linguistic context. As mentioned above, Halliday and Hasan focus on the former whereas Brown and Baker and other scholars hold the view that both types of cohesion need to be taken into account because they both contribute to the texture and coherence of a text.

As much as reading in translation is of paramount importance and deserves further investigation, the goal of this research will not be concerned with it. Instead, I will concentrate on the stylistic preferences and/or differences in terms of use and amount of lexical cohesive devices that exist between Original English and Original Italian as well as Original Italian and Translated Italian.

Research Hypotheses

The research hypotheses to be studied in this dissertation approach lexical cohesion from a corpus-based point of view and provide evidence in support of the thesis maintained by many and translation scholars according to whom cohesion is

9

language- and text-type-specific. In particular, hypothesis one claims that Italian translations of English scientific articles taken from Le Scienze, the Italian version of the

American-English magazine Scientific American, tend to reproduce the lexical cohesive markers of the source text. Hypothesis two, on the other hand, claims that articles taken from Le Scienze which were originally written in Italian by Italian authors tend to differ in the use and amount of lexical cohesive markers from Italian translated texts published in the same magazine.

Research Method

One parallel corpus will be used to compare lexical cohesion in English texts and Italian translations and test hypothesis number one. In this respect, the magazine that will be used is Scientific American, which publishes news and articles about science, technology information and policy for a general educated audience (the official website states that one third of its readership has a postgraduate degree). The online subscription to the magazine allows users to get access to current issues as well as to the magazine’s archive.

This is important to this study because articles have been randomly selected over a span of ten years (more precisely from 1991 to 2000). Scientific American has also been chosen because it has an Italian edition called Le Scienze. It started off as the Italian translation of the American magazine in 1968 but now features both American and Italian contributions to scientific research. The fact that the Italian edition features both translations and articles written by Italian scientists makes it possible not only to compare the English texts and their translations into Italian to see if the translators reproduced the

10

English lexical cohesion network in the target texts, but also to find out whether there are any differences in lexical cohesion between articles originally written in Italian and the ones translated into Italian from English. The latter will constitute a comparable or reference corpus made up of Italian articles extracted from the same magazine that will be used to test the second hypothesis.

It is worth mentioning here that the same magazine was used by Maria Teressa

Mussacchio (2007) in “The Distribution of Information in LSP Translation. A Corpus

Study of Italian.” In her study, the author investigates features of information structures in translation such as the principle of given-new information and that of end-focus (2007:

94). In other words, she aims at finding out whether or not translations use the given-new information flow of the original texts. To this end, a translation corpus consisting of nine

English articles on popular physics and their translations was collected. The English articles were all published in the American monthly Scientific American over a span of ten years (1993 through 2003). The Italian translations were published in the Italian edition Le Scienze. In addition to this, a reference corpus of Italian articles on the same topic was used to compare the results from the translation corpus to see whether or not the information structure used in the Italian translations conformed to TL norms. The author, for example, focuses her attention on verbs of happening such accadere and venire which, in Italian, usually cause the -verb inversion thus promoting the verb to marked theme. However, Musacchio notices that in the Italian translations, this inversion does not take place; instead, the information flow of the source text is recreated thus impairing the natural information flow of Italian (2007: 94). Besides analyzing the

11

information structure of these texts, the author also discusses cohesion, though this is not the main focus of the article. Only three pages or so are concerned with cohesion, and in these three pages the cohesive markers that are dealt with are conjunctions and repetition.

As far as repetition is concerned, it is said that English science cohesion is often created through reiteration by means of repetition, especially noun repetition, which in Italian is avoided for stylistic reasons unless non-repetition is a source of ambiguity (2007: 99). In the translation corpus, the author identifies many examples of repetition in the Italian translation that are clearly a calque of the English cohesive sentence structure (2007: 99).

My study fits in with the research carried out by the above-mentioned author; however, this is just a starting point, since the author does not focus on lexical cohesion and the different categories of reiteration and identified by Morris and Hirst, which in my research are the main focus of analysis, not to mention that the approach to the analysis of cohesive devices is also different, since in my research, it is the network of lexical chains created through the use of synonyms, hyponyms, meronyms, superordinates and other lexical cohesion markers that is under investigation. In this respect, fifteen articles have been selected from the American Magazine Scientific

American. The articles cover a span of ten years (from 1999 to 2009). The past issues of the magazine are available through the online subscription. As for the Italian edition, Le

Scienze issued a double DVD box set early in September 2010, containing all the issues of the Italian edition from 1968 till today. This has made it possible to have a wide range of topics in electronic format at my disposal from which to choose.

The textual analysis of the corpora has been conducted using WordSmith tools,

12

which has been used to calculate lexical density, type-token ratio, sentence number and average sentence length.

As in the previous hypothesis, WordNet has been used to identify word senses and semantic relations between lexical items in texts. It is expected that Italian translations will reproduce the lexical cohesion markers of the English articles, thus failing to comply with TL and text-type norms and conventions. The target language norms and text-type conventions have been extrapolated from the reference corpus made up of articles originally written in Italian by Italian researchers and published in the same magazine (Le

Scienze) as the articles making up the parallel corpus. Based on the information that the analysis of the articles contained in the reference corpus provides, pedagogical suggestions have been made about how to make translation students aware of lexical cohesion-related language norms and text-type conventions. In investigating the lexical chains of the Italian documents and translation, MultiWordNet, which is a multilingual lexical database wherein the Italian version is strictly aligned with the Princeton

WordNet, has been used. Finally, the statistical analysis tool SPSS (Statistical Package for the Social Sciences) has been employed to verify the significance of the findings of both the textual and semantic analysis.

Significance of my Research Hypotheses

The significance of this study lies in the role that cohesion plays, both at the textual and extra-textual level, in the interpretation and translation of texts. Cohesion in general, and lexical cohesion in particular, matter because they have an impact on the quality of

13

translations. Aiwei Shi proved empirically that when students are not aware of the role cohesive markers play in a text, they make common errors at the sentential and infra- sentential level where lexical collocates abound (1988: 145-146). Likewise, Barbara

Folkart in her article “Cohesion and the Teaching of Translation” argues that because of students’ tendency to focus on form, that is, on lower sententional levels, rather than intrasententional and textual levels, they fail to give cohesion the priority it deserves

(1988: 143). Students fail to see the text as a whole because cohesive markers come into play only at the suprasententional rank, and this ultimately affects the quality of a translation (1988: 151).

Teaching cohesion in translator-training programs can actually help students become aware that a text is more than just the sum of its lexical units. By learning what cohesion is and how it works, students will hopefully start viewing the text to be translated as a unit in itself. There are several studies on the role that cohesion plays in interpreting texts but not all of them are concerned with translation. Failing to appropriately convey the cohesive markers of the source text according to the norms and conventions of the target language and text-type may have an impact first on the quality of a translation and second on the response of the target readership, thus leading to either the failure or success of the translated text.

Defining translation quality in terms of cohesion is not new to the translation field. Balkrishan Kachroo in “Textual Cohesion and Translation” states that an

“authentic” or rather good quality translation needs to consider factors that go beyond the sentence level (1984). One of these factors is textual cohesion. In his view, an authentic

14

translation always strives to match the distribution of cohesive devices in the target language text to those in the source language text but most importantly to those in the target sub-language.

This dissertation sets out to further investigate the topic of cohesion in translation by focusing on just one of Halliday and Hasan’s cohesive markers, which is deemed to be the most important. By so doing, I hope to bring more awareness to the topic and to offer pedagogical suggestions. To this end, one of the chapters of this study will be concerned with the pedagogical implications of teaching lexical cohesion in translator-training programs. In particular, the stress will be on how teaching lexical cohesion and cohesion in general can improve the quality of translations and the means through which such teaching can be carried out. For example, having translation students build reference corpora in their L2 and focus on the stylistic preferences in terms of cohesive markers that a certain language and text type adopt will foster not only their awareness of the text as a global unit but also their self-learning skills which are crucial to the life-long learning objective that modern pedagogy aims at. According to Silvia Bernardini (2002), corpora can be used as pedagogical tools for making students interested and for having them develop “autonomous learning strategies” and for “raising their language consciousness, etc. (166).” Through their use, students gain valuable insights not only into the target readership’s expectations and their own native language but also into the way the latter may be used to achieve different communicative purposes in texts. It is therefore about time we made students aware that texts consist of lexical items that have textual, intra-textual and extra-textual associations, the interpretation of which requires a

15

global or -textual approach to text analysis when translating.

This study aims at raising awareness of the oft-neglected role that cohesion plays in translation. The empirical and corpus-based approach to the study of lexical cohesion in translation is designed to provide evidence pointing to the importance of lexical cohesion. To this end, a future step involves the empirical investigation of the effects of explicit training in cohesion. However, to accomplish this and actually teach cohesion in a translation setting, which involves at least two verbal codes, it is necessary to study the cohesive features that are language specific and, within each language, it is also necessary to inquire into how language-specific text-types vary in terms of lexical cohesion markers and cohesion in general.

Summary of Chapters

This dissertation consists of five chapters. Chapter I presents a detailed overview of the main cohesion and coherence theories as well as hands-on studies on lexical cohesion in both second and translation studies. Chapter II describes the methodological approach as well as corpus, semantic and statistical tools used to conduct the corpus-based investigation of lexical cohesion. Chapter 3 reports the findings of both the textual and semantic analyses of the parallel and comparable corpora. Chapter four provides a detailed discussion of the results presented in chapter three in light of similar studies. Finally, chapter five draws some conclusions, puts forward a framework for a pedagogy of translational cohesion, and suggests some future directions to further investigate this topic from a process-based point of view.

CHAPTER I

LITERATURE REVIEW

1.1 Coherence vs. Cohesion

Since Halliday and Hasan’s seminal work Cohesion in English, much has been written about what a text should look like and what differentiates a text from a non-text. In this respect, Halliday and Hasan claim that a text is characterized by texture, which refers to the property of meaning unity shared by parts of any discourse (1976: 2). This concept of texture which defines a text was further elaborated and developed by Widdowson (1978), and De Beaugrande and Dressler (1981). Widdowson introduced the idea that texts could also be coherent without there being explicit or overt cohesive devices in an instance of either a spoken or written discourse (1978: 24-26). In such cases, the reader or listener makes sense of the text by inferring the links between the sentences based on his or her interpretation of the illocutionary acts performed by the sentences themselves (1978: 27-

29). In line with Widdowson’s cohesion theory, De Beaugrande and Dressler, in their work Introduction to Text Linguistics, draw a clear distinction between cohesion and coherence, each having its own role in building textuality. In this resepct, they define a text as a communicative occurrence that meets seven standards of textuality:

1) Cohesion

2) Coherence

16

17

3) Intentionality

4) Acceptability

5) Informativity

6) Situationality

7) Intertextuality

Of these seven standards, cohesion and coherence are claimed to play the major role in creating texture (1981: 3). In De Beaugrande and Dressler’s words, cohesion relates to

“the ways in which components of the SURFACE TEXT […] are mutually connected within a sequence (1981: 3);” whereas coherence is defined as a “continuity of senses” by which the authors refer to the logical organization of arguments in a text that allows readers to make sense of the text and perceive it as a coherent whole (1981: 84).

Although their notion of coherence is largely text-based, they also acknowledge that texts do not make sense by themselves but rather require “the interaction of text-presented knowledge with people’s stored knowledge of the world (1981: 6).” By contrast, Halliday and Hasan’s definition of coherence is exclusively text-based. They claim that cohesion, which “refers to relations of meaning that exist within the text, and that define it as a text,” is an index of texture or text coherence (1976: 4). Hasan (1984) clearly states that coherence is a feature of language; in other words, cohesion is the foundation on which coherence is built (1984: 181). She actually rejects findings of research into the relationship between coherence and cohesion measured in terms of reader response

18

because in her view coherence is recognized by readers as a result of the cohesive harmony or rather interaction among cohesive ties.

This idea of viewing cohesion as the major contributor to the coherence of a text is mainly sustained and supported by scholars working in linguistics. Schema theoreticians, by contrast, see text processing as an interactive process involving the text and the reader (Carrell, 1982: 480). Fulcher, for instance, argues that according to schema theory coherence comes first or plays a primary role. It follows that coherence precedes cohesion, which means that readers first look for coherence based on their world/culture knowledge and then recognize cohesion and make sense of the text (Fulcher, 1989: 146).

Comprehending a text involves an interactive process between the listener’s or reader’s background knowledge of content and structure (the so-called content schematic knowledge) and the text itself (Carell, 1983: 82). In this respect, Campell (1995), drawing on Gestalt Theory, argues that coherence is more than the mere sum of local or global cohesive relations (1995: 80). To put it in simpler terms, cohesive ties are only one of the factors affecting the understanding of a text as a coherent whole. Textual coherence also depends on a recipient’s knowledge and on the following four principles:

1) Relevance

2) Clarity

3) Adequacy and

4) Accuracy

19

In order for a text to be coherent, the information contained in the text needs to be considered relevant, clear, adequate and accurate (1995: 96). Nevertheless, this does not exclude the possibility that the recipient will regard some of this information as irrelevant, unclear, inadequate or inaccurate.

In line with this subjective and reader-centered view of coherence is Stoddarb’s notion of textual synergism, which refers to that holistic phenomenon whereby the readers of a text grasp greater meaning than that conveyed by the sum of the words contained in it (1991: XIII-XIV). This synergism, according to Stoddarb, is a mental process initiated by what linguists refer to as “texture,” which consists of various kinds of text patterns, one of them being cohesive ties (1991: XIV). The author rejects any static/linguistics-founded definitions of text in that they do not take into account the synergism thereof. One of the positions held in traditional linguistics is that a text can be analyzed in isolation from its environment (1991: 9). However, Stoddarb maintains that when people re-read texts, they rarely interpret them in the same way as the first time, which implies that our perceptions of the meanings contained in a text are not static but mutable (1991: 9). The only way to account for this variability of interpretation by readers, which both Halliday and Hasan’s and De Beaugrande’s definitions of text fail to grasp is to consider a text as a state of mind or mental model (1991: 10-11). Viewed from this perspective, the written text is conceived of as the result of a writer’s thinking or intentions which are in turn interpreted by readers who create their own mental model of the text. Therefore, the author concludes that a synergetic definition of text must be reader-based. This definition sees texts as a reader’s mental reconstruction of a writer’s

20

text (1991:11). As far as cohesive patterns are concerned, Stoddarb identifies six writer- based factors or properties affecting their interpretation (1991: 20-21):

1) Number of cohesive ties: which refers to the number of potential ties per node

(the greater the number of these ties, the more unified the text is perceived);

2) Distance: which refers to the number of intervening words between the ends

of ties (between a node and each of its ties);

3) Directionality: which refers to word order (i.e: cataphoric vs. anaphoric

directions)

4) Reentry: which refers to the repetition of cohesion networks by the writer (all

the more so if identical nodes are used);

5) Intersection: which refers to cases in which networks overlap;

6) Type: which refers to types of cohesion (like the typology of cohesive ties set

out in Halliday and Hasan [1976]).

In addition to these six writer-determined properties of cohesion, Stoddard also identifies the following reader-dependent factors affecting the interpretation of texts:

1) Readers’ stored knowledge: when people are engaged in a reading act,

they select previously-stored information from their long-term memory to

fit the new reality or situation (1991: 24);

2) Specificity or definiteness of anaphoric expressions: in other words, there

exist among anaphoric expressions varying degrees of specificity which

21

affect the ease with which these expressions are identified and understood

(1991: 25-26);

3) Connectivity between concepts: the greater the connectivity between two

concepts, the likelier it is that a reader will retrieve one concept given the

other (1991: 27).

This reader-based conception of a text is supported by transactional theory which views the meaning-building process as a constant interplay between the reader and the text

(Rosenblatt 1989). This theory draws on Pierce’s triadic model in which the linguistic sign is related to its through the interpretant’s mental association (Pierce 1933: para. 347). It follows that, for transactional theorists, the meaning of a text, or rather words and marks on every single page resides in the interaction between reader, who during the reading process is conditioned by the social, cultural, educational circumstances in which s/he finds himself/herself, and the text itself (Rosenblatt 1989).

For the purpose of this study, a linguistics-based definition of text will be adopted; in other words, cohesion is assumed to be a property of texture, and coherence is considered to be a facet of a reader’s evaluation of a text (Hoey 1991:12). Thus, lexical cohesion is considered to play an important supporting role for coherence which, although cognitive in nature, is still constructed based on explicit linguistic signals.

1.2 Lexical Cohesion

22

The term lexical cohesion was first introduced by Halliday and Hasan (1976) in Cohesion in English, in which the authors define lexical cohesion as “the cohesive effect achieved by the selection of vocabulary” (1976: 274). In other words, it is through the choice of words that lexical cohesion is realized. In particular, they identify five main types of lexical cohesion:

1) Same Item, also known as repetition, which occurs when the same word, be it a

noun, verb, adjective or adverb, is reiterated throughout a text;

2) Synonym or near synonym: words sharing similar senses. In this subcategory,

Halliday and Hasan also include hyponyms which are words having narrowed-

down meanings;

3) Superordinates, also known as hypernyms, which could be defined as general-

meaning words or words with a broad meaning;

4) General nouns: words that are very general in meaning and whose interpretation

requires the reader to go back to previously-mentioned items. This subcategory

might sound very similar to superordinates, but it differs from the latter in that in

order for general nouns to be cohesive they need to be preceded by the reference

item the or a demonstrative (1976: 275);

5) Collocation which is defined as the “tendency of any two items to occur in similar

contexts” (1976: 286).

These five lexical cohesion types are grouped into two major categories: Reiteration (n° 1 to 4) and Collocation ( n° 5).

23

Based on Halliday and Hasan’s cohesion categories, Morris and Hirst (1991) propose their own classification of cohesive devices, which resembles Halliday and

Hasan’s but with a different nomenclature (1991: 21-22):

1) Reiteration with identity of reference (Halliday and Hasan’s category of same

items);

2) Reiteration without identity of reference (Halliday and Hasan’s category of

synonyms or near synonyms);

3) Reiteration by means of superordinates (Halliday’s category of hypernyms and

general nouns); and

4) Collocation, which is defined as a semantic relationship between words that co-

occur and which includes two types of relationships: 1a) systematic semantic

relations as in the case of pairs of words drawn from the same ordered series

(dollar-cent; north-south); and 2a) nonsystematic relations (when words are

related in terms of such relations as synechdoches, members of the same class,

hyponyms of the same superordinate etc.).

As far as collocation is concerned, there is a major difference in meaning between the way Halliday and Hasan and other authors working with lexical cohesion conceive it and the way corpus linguists define it. In corpus linguistics, collocation is basically described as the actual occurrences of words in texts. Sinclair originally defined it as “the co- occurrence of words with no more than four intervening words (2004: 141)” Generally speaking, in corpus linguistics, in order for a set of lexical items to be identified as

24

collocates, they need to appear in the text with a certain frequency and the number of intervening words between them should not exceed four. For Halliday and Hasan, by contrast, textual evidence is not fundamental; what matters is the meaning associations between words (Flowerdew & Mahlberg 2009: 112). This means that the very fact that two items imply one another makes it possible to consider them as collocates even if their frequency of co-occurrence is not high or the distance between them is very large.

In this study, Morris and Hirst’s classification of cohesive devices will be adopted in the corpus-based investigation of lexical chains but the categories will be renamed as follows:

1) Simple and modified repetition (which for the purpose of this study will be

considered as just one category)

2) Synonyms

3) Antonyms

4) Holonyms

5) Meronyms

6) Hypernyms

7) Hyponyms

The choice to use a different nomenclature has been taken to avoid any terminology- related confusion when the two web-based lexical databases, Wordnet and

MultiWordNet, which classify semantic relations using the above-mentioned denominations, will be employed to identify lexical chains.

25

As far as semantic relations are concerned, another important notion which needs to be briefly discussed herein is that of lexical chain, which was first introduced by

Halliday and Hasan (1976). They laid the foundation for this concept, but it was further developed by Hasan (1984) who defines it as a relationship formed between two cohesive elements when one refers back to the other. Hasan also classifies chains into identity and similarity groups. Identity chains consist of ties that all share the same referent (1984:

15), whereas similarity chains are made up of ties for which issues of identity cannot arise. The author claims that, although these chains by themselves contribute to making a text coherent, they are not sufficient for the creation of coherence. In his own words, coherence is mainly due to cohesive harmony whereby Hasan means the interaction of connections among cohesive chains present in a text. In this respect, a chain interaction is a relation associating elements from one chain with those of another chain (1984: 212).

This view is also held by Hoey (1991) who in Patterns of Lexis in Text, points out that the presence of lexical chains in a text does not necessarily guarantee coherence. Rather, it is the interaction among such lexical chains that appears to be the crucial factor (1991:

15). Two word groups or chains interact when at least two lexical items in one chain are, grammatically speaking, related to two other lexical items in the other chain through such relations as Actor – Material Process – Goal or Senser – Mental Process – Phenomenon

(Butler 2003: 341) The concept of lexical chains was further elaborated by Morris and

Hirst (1991), who refer to them as sequences of “nearby related words spanning a topical unit of the text (1991: 22).” Lexical chains are important because they help solve ambiguity issues and contribute to the determination of coherence and discourse structure

26

(1991: 23). In this study, lexical chains and their attendant inner semantic relations will be the focus of my analysis.

1.3 Lexical Cohesion Studies in Discourse Analysis and Linguistics

Over the past twenty years or so, much research has focused on the analysis of lexical cohesion. The attention that this cohesion category has been given is due to the fact that it is in Hoey’s words “the dominant mode of creating texture” in that it can form multiple relationships (1991: 10). Indeed, it has been proven that approximately fifty percent of a text’s cohesion ties are lexical (Teich & Fankhauser 2004: 327). Scholars in the field of discourse analysis and linguistics have used lexical chains for the following research purposes:

1) Text Summarization (Barzilay & Elhdad [1997], Maheedhar [2002], and Silber &

McCoy [2003])

2) Malapropism Detection (Morris & St-Onge [1998])

3) Information Retrieval (Stairmand [1996] and Al-Halimi & Kazman [1998])

4) Topic Segmentation (Morris & Hirst [1991], Hearst [1997])

5) Word Sense Disambiguation (Okumura & Takeo [1994], Galley & Mckeown

[2003])

6) Hypertext Construction (Green [1998])

The recourse to lexical cohesion, in general, and lexical chains in particular, in the study and analysis of discourse can be attributed to the fact that, as Anderson points out, this

27

cohesive subcategory combines the study of meaning (also called ) and the study of intersentential relations (also known as Discourse Analysis) (1977: 1). Through the analysis of lexical cohesion, one is concerned not only with how discourse is structured but also with the meaning thereof, thus dealing with both structural and semantic relations. In this respect, Hoey himself states that unlike reference, conjunction, ellipsis and substitution, which are markers of textual relations (their interpretation requires textual analysis), reiteration and collocation are primarily markers of lexical relation and only secondarily markers of textual relation (1991: 7).

Morris and Hirst (1991) were the first to use lexical chains to identify change of topics in discourse. They went so far as proposing an algorithm for the computation of lexical chains. Nevertheless, due to the lack of a machine-readable copy of Roget’s

Thesaurus, which was used as a lexical database for the extraction and detection of semantic relations among words, the algorithm had to be worked out by hand. Their algorithm was later taken up and improved by other discourse analysts to adjust it to their own research purposes.

Regina Barzilay and Michael Elhadad, in 1997, present a new algorithm for computing lexical chains, with a view to producing indicative summaries of texts without having to semantically interpret them in their entirety. Indicative summaries are summaries that are used by readers to decide whether or not a text is worth reading. They use lexical chains because the latter are indicators of discourse structure (1997: 11). In their article, they present a four-step summarization process which involves:

28

1) Segmenting a text through the use of a parser

2) Computing lexical chains by means of the WordNet Thesaurus

3) Identifying strong lexical chains

4) Extracting significant sentences which will make up the summary of texts

Unlike Morris and Hirst, Barzilay and Elhadad assessed the validity and quality of their process and lexical chaining algorithms by running statistical tests, thus providing empirical results (1997: 10). Furthermore, the authors argue that computing lexical chains improves the text summarization process because the focus is on the concept not on the linguistic representation. This argument is made against the use of word frequency in early summarization systems (1997: 10). Their rationale is that some concepts may be reiterated throughout a text by means of several words, which, as a consequence, may have a low frequency of occurrence (1997: 14). Conversely, the lexical chain approach disregards word frequency and manages to capture the conceptual spheres the discourse of a text revolves around.

This idea of using lexical chains as an intermediate step for text summarization was later taken up by Silber and McCoy (2003). They specifically focus on the concept extraction phase. To this end, they propose a linear time algorithm for lexical chain computation which, unlike previously proposed algorithms (i.e., Barzilay and Elhadad’s), is capable of analyzing large documents. For each word candidate, their algorithm extracts all word senses from WordNet and assigns them to that word, identifying all possible chains to which each word sense may belong. The second step involves finding

29

the best interpretation for each word candidate and the chain to which it belongs (2003:

3). In this respect, the algorithm makes it possible to analyze each chain to which a word belongs and, based on distance factors and the type of relation, it chooses the one having the strongest semantic relation (2003: 3).

In carrying out their experiment, Silber and McCoy set out to determine whether the concepts represented by strong lexical chains are the main concepts in texts. To do so, they used textbook chapters and their attendant chapter summaries (2003: 8). They computed lexical chains in both the original documents and their summaries and then compared the concepts represented by the lexical chains in both text types and found that the concepts represented by the strong lexical chains in the original documents were the same as the ones appearing in the lexical chain analysis of the summaries (2003: 7).

A similar lexical chaining process was put forward by Michael Galley and

Kathleen McKeown (2003), who suggest a new linear algorithm for computing lexical chains with a view to disambiguating word senses in discourse. In their study, they actually compared their algorithm to Silber and McCoy’s and Barziley and Elhadad’s and found that theirs had an accuracy of 62.09 % compared to 56.56 % for Barzilay and

Elhadad’s and 54.48% for Silber and McCoy’s. Similar to Silber and McCoy’s study, they separated the word sense disambiguation process from the actual lexical chaining of words, with the main difference being that Silber and McCoy’s algorithm creates all possible meta-chains from stage one of the disambiguation process (when all the possible interpretations for each word candidate are identified) whereas Gally and Kathleen’s

30

builds lexical chains only after disambiguating all words. In particular, their algorithm first analyzes each noun instance individually rather than processing the whole text. This means that each noun may be assigned more than one word sense in the same discourse.

Then each noun instance is checked to see whether the latter is semantically related to other nouns by looking at such semantic relations as hyponyms, synonyms, etc. The next step is to build a disambiguation graph which is a representation of word-sense combinations. A representation graph is a representation of all of the interpretations that each word in a discourse sample might have.

Another major application of lexical chains is in detecting and correcting malapropisms, which Hirst and St-Onge (1998) define as words that are either misspelt or mistyped due to their similarity in sound to other words or due to ignorance on the part of the person who typed them. To this end, they proposed an algorithm which sets out to detect and correct malapropisms based on a lexical chain construction. The rationale behind this study is that if a coherent and cohesive text is made up of intertwined concepts and sentences, the semantic relations of which contribute to building cohesive chains, it follows that malapropisms can be detected by computing lexical chains insofar as they provide sufficient context for lexical ambiguities (1998: 307). Like other studies on lexical chains, Hirst and St-Onge’s too makes use of WordNet as a lexical database from which word senses and semantic relations are extracted. However, Hirst and St-

Onge notice some flaws in their algorithm, which are attributable to some of the limitations of WordNet. For example, some words were not included in chains where they clearly belonged and other words were instead included in chains where they did not

31

belong. According to the two authors, these problems result from limitations in the set of relations contained in WordNet, inconsistencies in the semantic proximity implicit in

WordNet links, as well as incorrect or incomplete disambiguation (1998: 318-319).

Most of the above mentioned studies limit the of research to just one lexical item category, namely nouns because they are considered to be the main contributors to the “aboutness” or main topic(s) of a text; but they are also used because noun synsets dominate in WordNet (Barzilay and Elhadad, 1997: 13). By contrast, the present study sets out to analyze the semantic relations of nouns, verbs, adjectives and adverbs.

Initially, LexChainer, which is a lexical chaining tool created by Galley at Columbia

University, was supposed to be used because of the more accurate algorithm used in this software. However, several factors, among which the fact that LexChainer only runs on a

Linux platform and makes use of WordNet which classifies semantic relations according to word class and as previously-mentioned is not representative of the whole vocabulary of a language, as well as the impossibility for this tool to disambiguate polysemic words, made an automated analysis of semantic relations materially impossible. In its place, a manual analysis of them with the auxiliary of WordNet will be conducted so as to make sure no overriding chain or network is overlooked. In this respect, in analyzing such semantic networks, emphasis will be put on those built through lexical items which contribute the most to the aboutness of a text. In other words, only lexical items which are related to the major theme (s) of the documents being analyzed will be taken into consideration.

32

1.4 Lexical Chaining Sources

When it comes to computing lexical chains, two main options are available: 1) manual; and 2) automatic. For a manual computation of lexical chains, one can resort to thesauri in which words are grouped by meaning and semantic distance. An example is Roget’s

Thesaurus which classifies words into one thousand categories, and each of these categories is divided into smaller groups containing closely related words. However, thesauri only group related words but do not specify the kind of relationship they have in common. Morris and Hirst, for instance, used Roget’s Thesaurus in their computation of lexical chains. Since at the time there was no machine-readable copy of the thesaurus, they had to compute the semantic relations among words manually (29). Conversely, most of the studies that tried to automate the computation of semantic relations make use of WordNet.

1.4.1 The WordNet Project

Word meaning started to be formulated and represented in terms of networks or diagrams with nodes in 1985. This new way of looking at meanings was labeled Relational Lexical

Semantics. It came to be an alternative to Componential , which approached the meaning of a word in the same way as the meaning of a sentence. In other words, just as the meaning of a sentence needs to be broken down into the meanings of its constituents, the meaning of a word, too, needs to be decomposed into its “conceptual atoms” (Fellbaum 1998: XVI).

33

This new way of conceiving word meanings underlies the WordNet project. In particular, at the core of this project was the idea of using synonym sets to represent lexical concepts. Originally, WordNet started off as a browser and later became a lexical database after the WordNet team was asked to develop a tool which would read and process texts and report information about the lexical items used in them

(Fellbaum 1998: XIX-XX).

WordNet does not contain any information about the syntagmatic properties of words, which is why it is divided into four semantic nets, each concerning a specific word class, namely noun, verb, adjective and adverb (Fellbaum 1998: 4-5). WordNet can be thought of as a semantic dictionary wherein words and concepts are represented as an interrelated system or network which more closely resembles the way speakers organize the lexis in their mind (1998: 7). WordNet is halfway between a traditional dictionary and a thesaurus in that it combines features of both. However, as Fellbaum points out, unlike in a thesaurus, “the relations between concepts and words in WordNet are made explicit and labeled (1998: 8).” As in a dictionary, Wordnet provides definitions and sample sentences for most of its synsets, which are sets of synonyms belonging to the same word class and sharing the same concept. In WordNet, words and concepts are linked through a variety of semantic relations which are based on similarity and contrast (1998: 10).

WordNet by itself, however, is not able to process a text and extract semantic relations from it. In computing lexical chains, it is best used in combination with a lexical chainer which draws on WordNet as a lexical database and extracts semantic relations

34

from it. Generally speaking, when computing lexical chains, the following steps should be taken:

1) Candidate words are selected

2) For each candidate word sense, a suitable chain is to be found

3) If such a chain is found, the word is to be inserted into that chain; otherwise a new

one is created

Aligned with the English version of WordNet is MultiWordNet, in which the Italian synsets are created in correspondence with the English synsets (MultiWordNet will be used for the analysis of the lexical chains in the Italian corpus).

However, as was pointed out earlier, WordNet is divided into four semantic nets according to word class and these nets are not connected with one another. It follows that in the case of synonyms with word class change1, that is to say, words that are similar in meanings but belong to two different part-of- categories, the tool is not capable of detecting their semantic connection. This is why human intervention is still needed when analyzing WordNet-computed lexical chains.

1.5 Cohesion and Lexical Cohesion in Translation Studies

When it comes to translation, many studies focus on cohesion but few deal with lexical cohesion, lexical chain computation, and the analysis of semantic relations; and even fewer adopt empirical methods to investigate this phenomenon.

1 See Salkie, R. Text and Discourse Analysis. London: Routledge,1995.

35

The importance of cohesion in translation is partly related to the considerable impact that it can have on quality. In this respect, Balkrishan Kachroo in “Textual

Cohesion and Translation” states that an “authentic,” or rather good quality translation, needs to consider factors that go beyond the sentence level (128: 1984). One of these factors is textual cohesion. The author’s hypothesis is that the use of cohesive devices plays a major role in determining the quality and accuracy of translations. Stated in more technical terms, the hypothesis claims that an “authentic translation” always strives to match the distribution of cohesive devices in the target language text not only to those in the source language text but also, and more importantly, to those in the target sub- language (or text-type, which in this experiment is children’s literature) (1984: 131). The methodology involves the use of one Hindi original text and one English original text containing the same number of paired sentences (fifty in total) and sharing the same sublanguage (children’s literature). These texts were analyzed in terms of their distribution of cohesive devices. Then five native speakers of Hindi were asked to translate the Hindi text into English. These texts were later analyzed for the distribution of cohesive devices, which as in the previous case were counted (1984: 131). In the analysis of the results, it was found that the best or most authentic translation resembled the English original text in terms of the distribution of cohesive devices (1984: 132). In other words, both the distribution of cohesive devices and the cohesive patterns used conformed to the TL norms and above all to the norms governing the sublanguage or text-type.

36

This idea of text-type as an important element affecting our linguistic choices is also discussed by Hatim and Mason (1990) and Mona Baker (1992). Hatim and Mason, in particular, maintain that text-type, discourse and genre are motivating factors affecting our lexical choices. The importance of cohesion in translation is also discussed by Berber

Sardinha (1997), who states that changes in cohesive devices affect the way readers interpret the text. The author is clearly making reference to the theory of Rober-Alain de

Beaugrande (1980), according to whom cohesion is bound up with coherence. De

Beaugrande states that coherence results from the interaction of text-presented knowledge with the reader’s prior knowledge of the world (1980: 19). It follows that coherence is enabled by the reader’s inferencing, which is affected by the reader’s prior knowledge. It follows that if translators do not respect the norms of the target language in conveying the cohesive devices of the source text “properly” into the target text, more specifically, if they do not respect those of the TL text-type, the text’s readability for the target audience may decrease. When such misunderstanding occurs in legal texts or, generally speaking, in texts whose interpretation may lead to serious consequences, it becomes especially evident that cohesion plays an important role in translation, a role that has generally been neglected in translator training since more often than not the focus is on translation strategies and discussion of translation theories as applied to translation examples rather than on the analysis of the source text in terms of its global cohesive features, and therefore as a translation unit in itself.

Another study focusing on the quality of translation but in terms of translation equivalence is by Lotfipour-Saedi (1997), who defines translation equivalence in terms of

37

lexical cohesion and texture. The author’s hypothesis is that lexical cohesion affects the essence or texture of a text and therefore needs to be preserved when translating in order to achieve TL “equivalence.” In examining lexical cohesion, the author suggests reading the text to isolate lexical chains central to the theme of the text. The approach taken by the author also emphasizes, in my opinion, the importance lexical cohesion has in the structuring of information in the text and how it can affect the theme-rheme organization when cohesive markers are moved. This is all the more true of conjunctive devices, such as however, instead, and nevertheless, which writers may purposefully place in the rheme or theme position. When translating such devices, we need to be aware of their function; in other words, we need to be aware of the reason why the original writer used that particular cohesive device and the position thereof in the text. As a matter of fact, rhetorical styles play a major role in making a particular text-type acceptable to the members of the discourse community for which the text is meant. In this respect, Reza

Khany and Khalil Tazik statistically proved that the publication and acceptance in international journals of research articles written by local authors depends on the authors’ compliance not only with text-type-related rhetorical moves but also with move-related lexical cohesive patterns (LCPs) through which the connectedness within and among and across rhetorical moves is realized (2011: 83). What they found out in their study was that failure to comply with lexical cohesive patterns resulted in articles being rejected

(2011: 91). So in addition to the cohesive device itself, we need to consider the role that it plays in the whole text. As Barbara Folkart puts it, translation students and in general translators need to learn that the translation unit is not a word, sentence or paragraph but

38

the text (1988: 153). By so doing, the translation student will learn to focus his/her attention beyond the word or intrasentential level which, very often, results in banal translation errors.

It has already been pointed out that lexical cohesion is of paramount importance because it helps the reader solve ambiguity issues by narrowing down the potential meaning of words and providing clues for establishing the coherence of a text (Morris &

Hirst 1991: 23). Applied to translation, this statement implies that cohesion can be used to help students disambiguate the content of a text and narrow down the choice of translation solutions to a few candidates. However, it is sometimes difficult to draw a line between what constitutes cohesion and what does not. For example, Hatim and Mason

(1990) expand on Halliday and Hasan’s categorization of cohesive devices and add other variables of texture that contribute to making a text coherent and cohesive. Among them are theme-rheme or new-given information. For them, it is also possible to add punctuation. It follows that when studying cohesive devices we cannot focus on all these elements; rather, we need to narrow down the choice and focus on one particular type of cohesive device if we want our study to be thorough and generalizable. In this respect,

Campbell is of the opinion that it is not necessary to perform an exhaustive analysis of all the cohesive devices used or present in a text to carry out quality research. Quality does not equal quantity and it is the research goals that dictate the scope of cohesive elements that need analysis (1995: 84-85). In order to limit the scope of research, Stoddard suggests narrowing down not only the number and types of texts to be included in the research analysis but also the number and types of cohesive markers (1991: 32).

39

Following this rationale, the present study will focus on a limited number of texts, namely fifteen, taken from Scientific American and its Italian version Le Scienze, as the basis of a corpus-based study of lexical cohesive devices. For the purpose of my research, analyzing a large number of texts is not essential because the documents will be randomly selected. If the same results apply to all the randomly-selected documents and they turn out to be statistically significant, then it will be possible to generalize them to all of the documents printed in the above-mentioned magazines.

It is worth pointing out that most research studies on cohesion are either product- oriented or process-oriented, only a few combine the two approaches. Sivia Hansen-

Schirra, Stella Neumann and Erich Steiner investigate explicitness and implicitness in translation at the level of cohesion by adopting an empirical, corpus-based approach. The cohesive devices that were analyzed in the corpus are the ones listed by Halliday and

Hasan in Cohesion in English. Their corpus comprised multilingual comparable texts

(English originals and German originals), monolingually comparable texts (English originals and German originals), monolingually comparable texts (English originals and

English translations/German originals and German translations) and parallel texts

(English originals and German translations/German originals and English translations)

(2000: 247). In analyzing the corpora, it was noticed that shifts in cohesive devices were due to a normalization process whereby source text cohesive devices adjusted to TL preferences. For example, as far as the use of pronominal referents is concerned, it was discovered that in the translation corpora, the use of relative pronouns in the translation conformed to the norms of the target language. In particular, there was an increase in the

40

number of pronouns when translating into German, whereas there was an increase in the number of nouns and a decrease in the number of pronouns when translating into English

(2000: 256). Indeed, through an analysis of the monolingual or reference corpora in both

English and German, it was found that relative pronouns were more characteristic of

German than English. This product-oriented investigation of cohesion confirms Klum-

Kulka’s (1986), Newman (1988), Hatim and Mason’s (1990) and Mona Baker’s (1992) postulate that cohesive patterns vary both within and across languages. Within language, they vary according to text-types, whereas across languages, each language has its own stylistic preferences for certain patterns. It follows, Baker (1992) states, that when translating, transferring all the ST cohesive devices into the TT will not guarantee the creation of texture in the TT. The choice of which cohesive device to use must be dictated by TL norms as well as the textual norms of each text-type. Another study that focuses on one of Halliday and Hasan’s cohesive devices is Monika Krein-Kuhle’s

(2002). In her paper, she also emphasizes the need, when translating, to convey the function and semantic meaning of the cohesive element through the use of devices that are common in the target language. In particular, she focuses on the translation of the demonstrative this from German into English because in English the demonstrative is semantically strong, and so must be conveyed through other linguistic means in German, such as pronouns, adjective, adverbs, etc. Through her product-based analysis of this, she manages to demonstrate that shifts that occur in translation may be due to semantic and pragmatic aspects such as domain and register considerations that call for greater referential clarity (2002: 50).

41

Most of the above-mentioned studies deal with cohesive devices other than lexical cohesion; and the very few that do deal with it do not have any empirical foundations.

One of the few empirical studies that exist in the literature is Marta Karina’s Master’s thesis, Equivalence and Cohesion in Translation: A Study of a Polish Text and its English

Translation. The author tests the validity of two main hypotheses: 1) Hypothesis one claims that the translation of a word or rather a key term by a number of words into the target language affects the lexical cohesion of the target text negatively and results in a less cohesive text with a less clearly articulated theme; 2) Hypothesis two claims that the target text being analyzed, which is the English translation of a forty-page Polish travel brochure, uses less lexical variety of words to express the same theme as the source text

(2000: 19).

To test the validity of hypothesis one, the author focused on twelve key terms and found that the translation of the key words by a variety of words actually has the opposite effect to what she had predicted; in other words, this translation strategy actually increases the textual cohesiveness of the target text (2000: III). The twelve words chosen for the analysis are central to the theme and lexical coherence of the text (2000: 12). All

English translations of the terms fell within two main lexical categories: synonyms and hypernyms. It was also found that the same key term was translated differently throughout the text depending on the context (2000: 19). The author hypothesizes that the reason behind this might be stylistic (2000: 19).

42

The strength of this study is its empirical grounds. Statistical tests were run to check the validity of both hypotheses. These tests rejected her first hypothesis, indicating that both English and Polish texts use more or less the same number and variety of lexical equivalents (2000: 20). Likewise, statistical analysis rejected her second hypothesis, suggesting that a text is not made less cohesive when ST words are translated by a variety of TL equivalents (2000: 22). Somehow, as the author herself points out, this finding is in keeping with those of Halliday and Hasan (1976: 278-9) and Hoey (1991: 6), which state that synonymity actually contributes to text cohesiveness. The author argues that the use of synonyms in a translation adds to the cohesiveness of the target text. When translating, the choice of one term over the other should be guided by the target readers’ word preferences of which the translator can only be aware if he or she is familiar with text- type conventions and the topic lexis as well as the cultural differences existing in this respect between source and target languages (2000: 32).

During my search for previous studies and experiments on lexical cohesion, it has been found not only that there are few statistically grounded studies dealing with this issue but also that none focuses on the difference between novices and experts. Studying lexical cohesion in terms of novice vs. expert differences may help make the case for teaching lexical cohesion in translation classes in that, as previously mentioned, more often than not, this topic is usually disregarded or neglected by the translation trainee, who generally approaches the translation as a set of segments or paragraphs for which individual translation strategies and errors are discussed. This approach does not help the student see the text as a unit. What needs to be emphasized is a global, or macro-textual

43

approach, and the analysis and discussion of lexical chains is one way to do it. However, investigating lexical cohesion in terms of novice vs. expert differences is beyond the scope of the present study.

The primary goal of this study is to show that cohesion, and in particular, lexical cohesion does indeed affect the quality of translation. To this end, a product-based approach to the study of lexical cohesion will be undertaken, and the results will be statistically tested in order to provide the necessary empirical findings and data to support or disprove my hypotheses.

CHAPTER II

METHODOLOGY

The purpose of this study is to investigate the differences between English and Italian in terms of lexical cohesive markers and semantic relationships; in particular, my hypothesis is twofold and claims that 1) Italian translated texts tend to reproduce the lexical semantic relationships of the source texts and this in turn affects their readability; they will be perceived as less coherent by the target readership (who has different expectations when reading articles as a result of different stylistic, language and text-type conventions in the target system); and 2) Italian originals tend to differ in the use of lexical semantic relationships from Italian translated texts published in the same journal and belonging to the same text-type, which points to the influence of the ST lexical devices/markers during the translating process. The first hypothesis will be tested on a bilingual parallel corpus of

English and Italian semi-technical texts taken from Scientific American and Le Scienze, respectively; the validity of the second hypothesis will be tested on a comparable corpus of Italian originals and Italian translated texts both taken from Le Scienze. The decision to use the same magazine for both corpora makes it possible to assume that the target readership and the target readers’ expectations are the same, which in turn makes the findings comparable.

44

45

2.1 Methodological Approaches: Text Analysis and Corpus Linguistics

This study combines two different approaches to the analysis and linguistic investigation of written language, namely text analysis and corpus linguistics. Below is an overview of what these two methodologies are mainly concerned with and how they found applicability to the present study.

2.1.1 Text Analysis

Text analysis as applied to translation studies was first theorized by Christiane Nord in the early 1990s. According to Nord, text analysis in translation needs to explain the linguistic and textual structures of texts as well as the relationship the latter have with the system and conventions of the source texts and of the source language in general (2005:

1). In this respect, she states that the semantic and stylistic features of lexical choices may yield information about extratextual factors (the situation in which a text is produced) and intratextual factors (such as subject matter, content and presuppositions) (2005: 122). In the present study, the only extratextual factor that was foregrounded in analyzing the documents and in drawing conclusions was the target readership’s expectations. The latter, together with language and text-type conventions, are in turn deemed to play a major role in the choice and use of lexis and sentence structure (namely lexical cohesive devices and the distribution and length of sentences), which are the intratextual factors that were investigated herein.

46

This type of analysis of written language involves “the deconstruction2 of information within a text” (Tsai 2010: 61). Deconstructing the information contained in a text makes it possible to focus on lexical features such as content words and their senses, word frequencies (tokens and types), type-token ratio, and lexical density, as well as syntactic features such as number of sentences, average sentence length and readability index. A brief discussion of the above-mentioned lexical and syntactic features will be provided below followed by an overview of the research methodology and tools used in this study.

2.1.1.1

Lexical analysis is of great importance to this study because of its focus on lexical cohesive markers, which unlike other cohesive devices are actual content words, each with a specific, subject-field-bound sense or meaning. Lexical analysis allows researchers to identify the number and types of tokens occurring in any sample of spoken or written language. The term “token” refers to any set of characters delimited or separated by a whereas the term “type” refers to the number of different tokens present in a text. For example, in the following sentence “The book is on the table,” there are six tokens and five types in that the word “the” occurs twice and is counted only once when computing types. The ratio of types to tokens tells us about the lexical variations of a text (Laviosa 1988, 2002; Olohan 2004: 80-81). The higher the type/token ratio, the more varied the vocabulary of a text; conversely, the lower the type/token ratio, the lower

2 Deconstructing a text means breaking down the information contained therein into its textual, syntactic, and linguistic features.

47

the vocabulary variation in a text. However, it is worth pointing out here that type/token ratio is affected by text length (Tsai 2010: 74), which means that researchers must either compare texts of about the same length or compute the standardized type/token ratio to get valid results. Bowker and Pearson argue that “the standardized type/token ratio is obtained by calculating the type/token ratio for the first 1000 words in a text, and then for the next 1000 words and so on. Then a running average is calculated, which provides a standardized type/token ratio based on consecutive 1000-word chunks of text (2002:

110).” The standardized type/token ratio is therefore obtained by calculating type-token ratio every one thousand words and then by averaging the results. This way, data from different texts of different length can be compared without compromising the validity of the study.

Another important lexical factor is lexical density, which according to Mona

Baker is “the percentage of lexical as opposed to grammatical items in a given text or corpus (1995: 237).” It may be computed by dividing the number of content words by the total number of tokens in a text and multiplying the result by 100 to get the percentage

(Baker 1995: 237 & Stubbs 1996: 71-3). However, there are three other techniques or formulae that are usually used to calculate lexical density (Baker, Hardie & McEnery

2006: 106). Technique number one involves dividing the number of unique lexical words by the total number of words; technique number two involves dividing the number of unique words by the number of clauses; technique number 3 involves dividing the number of unique words (both lexical and grammatical) by the total number of words. In

48

the third case, there is no difference between type/token ratio and lexical density. For the purposes of this study, technique number two was employed.

Unlike type/token ratio, lexical density is an indicator of information load in a text. A text with a high information load is a text difficult to understand as a result of the amount of details and technical vocabulary. In my search for a lexical density analyzer for Italian and English, several web-based lexical density analyzers (Textalyser or Text

Content Analysis Tool to mention just a few) were found. However, with these analyzers, lexical density is often mistaken for type/token ratio; therefore they were not considered during the data collection procedure. WordSmith Tools was instead used to compute this lexical feature by adopting a technique tailored to the purpose of this study, given that the above-mentioned tool does not automatically calculate lexical density. More about this topic will be discussed in the section dealing with tools.

As for syntactic features, the definition of sentence adopted in this study is any set of tokens delimited by either a capital letter, number or currency on the left and either a full stop, exclamation or question mark on the right. This definition is specific to the two languages under investigation in this study and, therefore, does not take into consideration directionality issues which can be found when dealing with non-Western languages or, in the case of Western languages themselves, punctuation issues, as in

Spanish where exclamation and question marks are found both at the beginning and the end of a sentence. The above-mentioned definition was also tailored to the definition provided by the online guide to WordSmith Tools, which is available at the following

49

address: http://www.lexically.net/downloads/version5/ WordSmith.pdf. The decision to take this guide into account in providing a definition of the syntactic feature “sentence” is due to the fact that this suite of tools has been used in calculating the number of sentences. It has also been used to compute average sentence length, which is obtained by averaging the length of all the sentences present in a given text. Lastly, readability index, which is an indicator of the level of difficulty in reading a text, has been calculated by resorting to an online tool which works for both Italian and English. Other readability index calculators were not considered because of their language-related constraints.

Indeed, most of them did not support Italian. The readability analyzer used in the present study is available at: http://labs.translated.net/text-readability/. This analyzer makes use of the so-called Gulpease Index for calculating the readability level of texts. This index was originally developed for the Italian language but the website implemented it for

English and French as well. The Gulpease index is computed using the following formula:

Gulpease Index = 89 – (Lp/10) + (3 * Fr)

where:

Lp = 100 * number of letters/number of words; and

Fr = 100 * number of sentences/number of words.

The scores of this index range from 0 to 100, with 0 indicating low readability

(harder to read) and 100 indicating high readability (easier to read). As can be seen from

50

the values in the formula above, this index takes into consideration word and sentence length in computing the readability index as supported by previous studies (Flesch 1951,

Gunning 1973, Lusiano & Piemontese, 1988). The online readability index analyzer does not provide numbers but classifies readability as easy, average or hard based on the results obtained through the Gulpease formula. For each of the three levels of readability, an example is provided by the website:

Easy: This phrase is easy. It contains common words, and simple concepts.

Average: Although this phrase is slightly harder on average, and despite its

complexity, the reader will have no problem understanding it.

Hard: The very phrase contained herein, having rare complexity, may

potentially be, without prior preparation, implausibly difficult to parse in as

much as it carries, at the level of the text itself, an unnecessarily, albeit

notably low readability level.

Based on the examples mentioned above, an easy-to-read text refers to a set of short, independent sentences containing common, simple words; an average-to-read text refers to a set of independent sentences with very few dependent clauses; finally, a hard-to-read text refers to a set of long complex sentences consisting of many dependent clauses.

Focusing on data gleaned from the above-mentioned textual and statistical features would make it possible to assess the quality of the translations and their coherence in the target system. It has been shown that in translations of technical texts

51

from English to Italian, quality is associated with an overall increase in the number of tokens (text length), fewer and longer sentences than the source text, higher type-token ratio (lexical variety) but a lower lexical density (lexical/grammatical word ratio) (Scarpa

2006: 165-166). In other words, in order for a translation to be of optimal quality, it needs to comply with:

1) the target language syntactic conventions (hence, the use of fewer,

longer sentences in Italian as a result of its preference for hypotaxis

which is supposed to be achieved through the use of such cohesive

devices as conjunctions or proforms); and

2) with stylistic conventions through the avoidance of simple

repetitions, which are more acceptable in English than Italian, by

means of synonyms (hence, the higher type-token ratio) (Scarpa

2006: 166 ).

This compliance with TL norms was demonstrated by comparing the grades that translation trainees received on their translations of English domain specific source texts in a corpus-based study conducted by Federica Scarpa. The main hypothesis behind this study was that these grades reflect stylistic issues dealing with the use of lexicogrammatical cohesive devices such as conjunctions and proforms (focus on demonstratives) (Scarpa 2006: 157). This study was conducted on a bilingual parallel corpus of English source texts belonging to several text-types and their attendant translations carried out by translator trainees completing their four-year program at the

52

Advanced School of Modern Languages for Interpreters and Translators (SSLMIT)

(University of Trieste) (2006: 156). In it, higher grades were associated with the findings mentioned above. However, there seems to be one finding which runs counter to the one concerning the higher type-token ratio. Scarpa’s argument is that a higher type-token ratio, which indicates that better quality translations have more vocabulary variation than their originals, does not necessarily mean that target texts are more lexically dense than source texts because of structural differences between the two languages (Scarpa 2006:

164-165). What this implies is that when type/token ratio is computed, there are more types (distinct words) in Italian because of all the inflection in number and gender of grammatical words. The example she provides is that of the English definite article the which in Italian can take several forms such as il, lo, la, le, gli, l’, i. By contrast, when lexical density is computed, Italian texts have a lower ratio because they have more grammatical words. So her hypothesis is that a lower number of lexical or content words in Italian compared to English is due not only to structural differences but also to the transformation of simple repetitions into proforms when joining sentences through subordination. However, she points out that this lower lexical density index was found in

36 out of 39 texts analyzed, which means that three of the texts that were analyzed did not show this pattern; in this respect, Scarpa does not mention the grade associated with these three translations, in other words, we do not know whether these three translations stand on the quality scale. Another important observation to be made here is that lexical density was calculated manually in this study, but only the first 100 most frequently occurring words in the text were considered, so this sample cannot be taken as

53

representative of the whole text. Last but not least, Scarpa seems not to take into account that text-type also plays a role in the use of stylistic preferences of a language. The findings of my study take into consideration all these factors, confirm some of the findings put forward by Scarpa, and disprove others.

2.1.2 Corpus linguistics

The definition of corpus linguistics is vague and not well-defined, as Charlorre Taylor

(2008) points out in her article What is corpus linguistics? What the data says. Over the past twenty years, several conceptualizations of the expression corpus linguistics have been put forward by a number of leading scholars in the field, such as Sinclair (1991),

Stubbs (1993), Leech (1992) and Tognini-Bonelli (2006). The crux of the matter is that it is still not clear whether corpus linguistics is a discipline, a methodology, a theory, a tool, a methodological approach, a theoretical approach or a combination of the above (Taylor

2008: 180). In the present study, I will adopt Tognini-Bonelli’s definition of corpus linguistics as a “pre-application methodology” which has “theoretical status” (2001: 1).

Indeed, as Thompson and Hunston (20006) point out, corpus linguistics helped generate two theories, one concerning meaning and the other communicative discourse. In other words, thanks to corpus linguistics studies, meaning is no longer located in single words but in sets of words that tend to co-occur () and communicative discourse is conceived as a series of pre-fixed expressions (2006: 11-12).

Corpus linguistics facilitates the description and analysis of language through corpora. A corpus is nowadays considered to be mainly a collection of texts (written

54

discourse like novels or articles) or transcripts (spoken or written-to-be-spoken discourse like talks or ) held in electronic form. Mona Baker defines a corpus as “any collection of running texts (as opposed to examples/sentences), held in electronic form and analyzable automatically or semi-automatically (rather than manually) (1995: 225).”

By running texts, she means that a corpus may consist not only of whole texts but also fragments of texts, the length of which should be approximately 2000 words (225). These fragments are taken from the initial, middle, or final parts of longer texts on a random basis (225). However, not all collections of texts constitute a corpus. In order for a set of text samples or whole texts to be referred to as such, the texts making up the corpus must be chosen for a particular purpose and according to specific and well-defined selection criteria. This ensures that the chosen texts are representative of the language variety that is under investigation (Baker 1995: 225). Some of the most important criteria to bear in mind when choosing texts concern language variety ( vs. American

English), language domain (general vs. technical), genres (novels vs. journal articles), language synchronicity, and diachronicity (Baker 229). In the present study, the language variety under analysis is , the language domain can be referred to as technical in that the texts were taken from a scientific journal, the genre may be identified as the magazine article, and the language is investigated diachronically over a span of ten

(10) years, from 1999 to 2009.

Depending on the purpose(s) of one’s study, corpora may also be monolingual, comparable, multilingual and parallel. Monolingual corpora may be used to investigate the lexical, syntactic, textual patterns of a specific language variety or text-type. They are

55

called monolingual because they include texts written in the same language. Comparable corpora consist of two sets of texts written in different languages but which are comparable in terms of subject-matters or text-types. Multilingual corpora are similar to comparable corpora in terms of text selection criteria but include more than two sets of comparable texts written in different languages. Last but not least, there exist parallel corpora which consist of two sets of texts in which one set is the translation of the other.

In the present study, two different types of corpora were chosen, namely parallel and comparable corpora. Indeed, as Baker points out, parallel corpora can tell us a lot about translation strategies whereas comparable corpora can help find out the natural patterns of a language (1995:232). Comparing texts written under normal conditions (in a non- translation situation) with texts produced under translation constraints allows investigate language pair patterns to isolate which ones are characteristic of translationese and then use the findings to improve the training of translators.

My study was carried out over two sets of corpora: 1) one parallel corpus consisting of fifteen (15) English source texts taken from the American magazine

Scientific American and fifteen (15) Italian target texts taken from its Italian edition Le

Scienze; and 2) one comparable corpus consisting of the fifteen Italian translated texts included in the parallel corpus and fifteen Italian texts written by Italian scientists for Le

Scienze.

The parallel corpus is used to investigate the differences, if any, in the use of lexical cohesive devices such as repetitions, synonyms, antonyms, hypernyms,

56

hyponyms, meronyms and holonyms between English source text and their attendant

Italian translations. It has been hypothesized that the difference between source and target texts is minimal. Likewise, the comparable corpus is used to identify the most common lexical devices used to make a text coherent and cohesive and to see whether there are any differences in terms of frequency of occurrence in the translated texts. In other words, through the comparable corpus, this study sets out to investigate the lexical cohesive patterns of the translated and non-translated language to see whether there are any differences between the two. In this second case, it has been hypothesized that major differences are found between translated and non-translated texts, thus helping make the case for a pedagogical approach to the teaching of lexical cohesion in translator-training programs.

The choice to carry out the analysis of lexical chains manually was made for several reasons. The main one was that the lexical chaining tool that was originally going to be used turned out to support only the Linux platform, which was not available to me at the time of the analysis. Another main reason which discouraged me from using a machine to analyze the texts was knowing that the findings would not be one hundred percent correct and that some of the lexical chains would be overlooked by the tool.

Because of the polysemic nature of content words, it would be impossible to carry out an automated analysis of lexical chains. In order to better understand my argument, consider the following example from the Italian article Roma e la storia delle glaciazioni which is part of the collection of texts originally written in Italian. In this article, the proper noun

Roma is semantically related to the word area which occurs several times throughout the

57

text but which has different referents. In this respect, the computer would never be able to differentiate between area del Tevere (area of the river) and area dove sorge Roma (the area where Rome stands). Only the latter is semantically related to the main content word and only an intensive and thorough reading by a human being would be able to discern these two senses and identify the one which bears a semantic relationship with the word

Rome. Last but not least, a lexical analyzer is not able to identify collocates which constitute semantic items per se. For example, the term search engines which occurs a large number of times in the English source text Seeking Better Web Search English would be treated as two different words, thus ignoring the corpus-based principle that meaning can be found beyond single words. For all of the above-mentioned reasons, a manual analysis of the texts was carried out.

2.2 Tools

2.2.1 WordSmith Tools

WordSmith Tools is a software suite created by Mike Scott at the University of Liverpool and consisting of three tools or applications:

1) WordList

2) KeyWords

3) Concord

58

Of the three, the first one will be used for the purposes of this study. WordList allows users to view the list of all the words, or technically speaking, all the tokens occurring in a text. This list of words can be sorted in alphabetical or frequency order as in the figure below:

Figure 1 – WordList Frequency List

There are two tabs at the bottom of the WordList window which say “Frequency” and “Alphabetical,” respectively. By clicking on either of these two tabs, the list is sorted either by frequency or by alphabetical order. The number on the bottom left hand corner of the WordList window indicates the types or rather distinct words that occur in the text being analyzed. The tool also shows statistical data including number of types and tokens, type/token ratio, standardized type-token ratio, average sentence length and total number of sentences, to mention just a few in the figure below:

59

Figure 2 – Wordlist Statistics

However, as previously mentioned, this tool does not automatically compute lexical density, which is one of the lexical factors or features this study focuses on. To compute lexical density through WordSmith Tools, a list of grammatical words representative of a particular language is needed. In this study, the stoplist for English, which was originally built for the Smart Information Retrieval System experiment at

Cornell University, and the Italian stoplist were both taken from the official website of the Department of the University of Neuchatel in Switzerland (these stoplists are available at the following address: www.unine.ch/info/clef ). This website provides several stoplists for different languages, such as French, German, Russian,

Spanish and many more. These stoplists were obtained by creating monolingual corpora in several languages and then extracting the 200 most frequently recurring words. After this word extraction, the nouns and adjectives related to the corpus subject field

60

appearing within the first 200 words were removed from the list and several personal or possessive pronouns, prepositions and conjunctions were added even though they did not appear in the first 200 words. The stop list for English is 571 words whereas the Italian one consists of 399. However, these two lists do not include all function words but just the ones that have a high frequency of occurrence in a particular language, which is why they both had to be customized by adding more prepositions, conjunctions, adverbs and auxiliary and modal verb conjugations. After these additions, the total word count for the

English stoplist was 596 whereas the one for the Italian stoplist was 506. After the creation of these two stoplists, they were uploaded into WordSmith Tools in order to produce an approximate calculation of the total number of content words present in each text. WordList allows the user to stop unwanted words from appearing in the frequency list of the types contained in a text. For each text, the number of content words was calculated and then the ratio of content words to the total number of words was calculated and the result was multiplied by 100 to arrive at a percentage (Bosseaux, 2007 & Roos,

2009).

2.2.2 WordNet

In analyzing the semantic relationships of English texts, the Princeton-developed lexical database WordNet was used. In this study, its use was not combined with a lexical chainer because of the lack of the latter. The online version of WordNet was accessed through The Princeton University website at http://WordNetweb.princeton.edu. All the lexical items which were previously manually identified in each text were then processed

61

through WordNot to identify their semantic relationships. However, given the highly technical register of the texts under investigation , some of these terms were at times not available or present in WordNet. In these cases, online encyclopedias and thesauri were consulted to disambiguate the meaning of these terms and then potential semantically- related words were found. The same process was followed for the Italian texts, but, this time, MultiWordNet was accessed online at http://multiWordNet.fbk.eu. Below is a screenshot of the online interface of MultiWordNet:

Figure 3 – MultiWordNet Interface

Like WordNet, MultiWordNet consists of an online interface which makes it possible to view all the semantically-related words for each term that is typed in. It has a drop-down list located right below the search tab in the blue bar, from where it is possible

62

to select the different semantic relationships a particular word might have, such as synonyms, antonyms or hypernyms, as in the figure above. Once the type of semantic relationship one wants to analyze is selected, all the words falling within that category are displayed on screen, and the subject field appears in brackets in blue next to each one.

This is important because sometimes a word might have several senses, each belonging to a different subject field. If one knows exactly which area of study the term under analysis belongs to, it is possible to easily identify the most suitable semantically related word.

2.3 Preliminary Analysis

Before the word sense disambiguation process, intensive reading and manual analysis of the texts were carried out. The first step in the semantic analysis of each text was the creation of a word frequency list through the Wordlist tool, which is part of the

WordSmith Tools suite. Out of each frequency list, only content words that were directly related to the main topic of the article and with a frequency of occurrence above the cut- off point of 10 (ten) were chosen as relevant lexical items or key terms for the semantic analysis.

After the creation of these lists of most frequently occurring content words including nouns, adjectives, verbs and topic-related adverbs (with the exception of adverbs of space, time and manner), I started reading each text sentence by sentence, comparing both source and target segments when dealing with translations. For each word taken from the frequency list, the sentence number in which it occurred, its meaning

63

and translation(s) were recorded in a separate word doc table (see table 1). Besides the different translations a source word could have in the target text, any additions and omissions that occurred in the translations were also recorded.

Table 1 – Preliminary Textual Analysis Screenshot

In the figure above, which is a screenshot of the preliminary text analysis of one of the English texts entitled “What Birds See” and its attendant translation, the four vertical columns from left to right represent lexical chains, source text terms, target text terms and notes, respectively. Under the Lexical Chain column were included all the key terms extracted from the WordSmith Tools frequency word list, as well as all the potentially semantically related terms, be they nouns, adjectives, verbs or adverbs.

64

Indeed, the words taken from the high frequency word list were just a starting point in the analysis of the semantic relationships within each text. Under the Source Text Term

Column were listed all the sentence numbers in which each key term occurred whereas under the Target Text Term Column were listed all the available translations of ST terms.

For example, the verb see from the table above is rendered as vedere most of the time except for four cases in which it is translated differently (visione [change of word class from verb to noun], distinguire [distinguish], percepire [perceive] and cogliere [grasp]).

Any omissions were signaled by means of a sequence of dashes (------). The last column was used to include any comments about translation strategies or incongruences in translating the same expression. In this respect, one example may be taken from the

English text “Pandora’s baby” in which the terms cloning technology and technology of cloning are both translated as either tecniche or tecnologia. The incongruence here is due to the fact that tecniche and tecnologia refer to two different concepts, one indicating methods and the other indicating the practical application of scientific knowledge to problem resolutions.

Both simple and modified repetitions were included in the analysis. This means that if a verb occurred later as a noun, that noun was considered semantically connected to the verb that occurred earlier, and so classified as modified repetition. This is one of the reasons why no lexical chainer was used in the analysis of the lexical cohesive markers in the text. Indeed, WordNet, which is usually used to identify semantically related words in a text, identifies semantic relationships only within each word class; which means that only semantic relationships existing among either nouns, adjectives,

65

verbs or adverbs can be recognized. But as happens in any live language, adverbs are formed out of adjectives and verbs are coined after nouns so it would be erroneous to ignore these inter-word-class semantic relationships.

After listing and grouping all the semantically related words with information about the sentence numbers where they occurred and the different translations or omissions and additions in the attendant target texts, a percentage of all the types of lexical cohesion for every single text was calculated. This part of my methodology is modeled on a study by Abdel-Hafiz (2003) who compared lexical cohesive devices in the

English Translation of Naguib Mahfouz’s novel The Thief and The Dogs. In particular, the author sets out to verify statistically Aziz (1998)’s claim that explicitness and implicitness are associated with stylistic preferences. In particular, Aziz argues that

Arabic prefers explicit reference achieved through repetition of common and proper nouns whereas English prefers implicit reference achieved through pronominalization. To verify the above-mentioned claim, Abdel-Hafiz first identifies all the types of lexical cohesion used in the original novel and then analyzes the translated text to see how the translator handled the interlingual transcodification of these lexical cohesive devices.

Then, he calculated the frequency of occurrence of the types of lexical cohesion in both the source and target texts by focusing on instances of recurrence, partial recurrence and hyponymy.3 His findings disprove Aziz’s claim in that both source and target texts are characterized by a high frequency of recurrence, which turns out to be the most common type of lexical cohesive device; this shows that the translator did not perform the shift

3 In Abdel-Hazif’s study, recurrence and partial recurrence correspond to my two categories of repetition and partial repetition.

66

from common nouns to pronouns as was expected and hypothesized by Aziz (Abdel-

Hafiz 2004). Similar findings are expected to be found in my English/Italian parallel corpus.

2.4 Semantic Relation Analysis

After collecting information about key terms, sentence numbers in which they occurred, and in the case of translations, their attendant target texts, omissions or occasional additions, an actual analysis of their semantic relation was carried out. Each key term

(meaning the words which had a high frequency of occurrence above 10 times), was entered in either WordNet or MultiWordNet depending on whether the analysis was concerned with English or Italian texts, and all the semantic relationships were inspected and compared with the ones that were identified in a particular text. If any match was found, then the word semantically related to the key term was listed in the term table, as in the figure below:

Table 2 – Semantic Relation Analysis

KEY TERM: Proteins

Semantic ENGLISH ITALIAN ENGLISH ITALIAN TERM Category TEXT TEXT TERM FREQUENCY FREQUENCY Repetition 76 70 Protein 1) Proteina(70– 1 addition)

2) 1 synonym

3) 2 meronyms

67

4) 2 omissions

5) 4 pronouns

Synonym 0 1 Repetition Componente proteica

Antonym Meronym 0 2 Repetition Molecola 4 4 Amino aminoacidi acids Holonym 15 15 Proteome Proteoma

Hypernym Hyponym

In this table, the first vertical column on the left features the possible semantic relationships that the key term might have with other words in the text, namely:

1) repetitions and modified repetitions, that is to say, words

that have the same but belong to different word classes (nouns,

verbs, adjectives, adverbs: i.e. influence (n), influence (v), influenced,

influencing, etc.);

2) Synonyms: words that are similar in meaning to the key

term;

3) Antonyms: words that have an opposite meaning to the key

term;

4) Meronyms: words that might have a part-whole

relationship with the key term. For example, in the table above, the word

68

amino acids is listed as a meronym of the key term proteins because they

are actually part of proteins;

5) Holonyms: words that identify a whole-part relationship

with the key term. Unlike with meronyms, a holonym indicates the whole

and in this case the key term is, as a result, the meronym; In the example

above, the word proteome is listed as a holonym of proteins because the

latter includes them;

6) Hypernyms or superordinates: words which have a more

general meaning than the key term. In the example above, there are no

hypernyms but if WordNet is consulted, it says that a supermolecule can

be a hypernym of proteins;

7) Hyponyms are words which have a more specific meaning

than the key word. The table above does not contain any of them, but

WordNet lists several of them under proteins, such as gluten, opsin,

enzyme, etc.

This table was created for each key term and the same procedure was followed to identify the semantic relationships existing between a key term and its semantically related words in the text. The table above refers to the bilingual, parallel corpus given that it contains both source and target text cohesive markers. In this respect, in very few cases, it was not possible to classify some of the lexical choices that the translator made in rendering some of the source text words. In other words, no semantic links could be

69

found in either Wordnet or MultiwordNet. Take, for example, the key term mixture from the text “What birds see”. In it, this term is translated in Italian as combinazione and in just two cases as proporzioni, which has a totally different meaning from mixture. One refers to dimensions, the other to combinations of substances. In a few other cases, the term used in the translations did not fall within any of the semantic relationships under investigation in this study, such as coordinates (words sharing the same hypernym) or attributes (properties). So in this case, the translation was recorded but no semantic relationship was specified for those terms.

After this step, these qualitative data needed to be transformed into quantitative data in order to document any statistical significance in terms of differences in the use of semantic relationships between English source texts and Italian translations on the one hand, and between Italian translations and Italian originals on the other. To do so, for each document, the percentage of repetitions, synonyms, antonyms, meronyms, holonyms, hypernyms and hyponyms was calculated as follows:

Total number of repetitions per text * 100 Total number of semantic relations per text

This way, all the numbers are expressed in percentages and are comparable. The formula above refers to repetitions but it applies to the other semantic relations. After the percentage of usage of semantic relations in all the texts was calculated, the quantitative data was compared to test the validity of the two main hypotheses of this study. In the results section, the quantative data will be combined with findings taken from the

70

qualitative analysis in order to support my argument and hypotheses. In the following chapters, the results of the present study will be presented and then discussed.

2.5 Statistical Analysis

The results of the textual and semantic analyses were run in SPSS (Statistical Package for the Social Sciences). This is a software suite which was used to conduct data analysis and test the validity of the hypotheses put forward in this study. In this respect, a One-Way

Between Subjects ANOVA was conducted to compare the effect of language on the amount of use of semantic categories such as repetitions, synonyms, antonyms, etc. in three different conditions: English Originals, Italian Translations, and Italian Originals.

The independent variable of this design was the effect of language whereas the independent variable was each semantic category which was investigated. The decision to use a One-Way Between Subjects ANOVA is due to the fact that the Independent variable of this study, namely, the effect of language, contains three conditions. The experiment is called between subjects because the independent variable was tested using independent samples (that is to say three different text corpora) (Heiman 2001: 457). The one-way Between Subjects ANOVA aims at determining whether or not there is a statistically significant difference between any two of the three means (English Originals,

Italian Translations and Italian Originals) for each semantic category. This test for significance is called F value in ANOVA. If the F value turns out to be significant, this does not complete the analysis because this points to the existence of a statistically significant difference among the conditions but it does not indicate which ones. To find

71

out which conditions or groups differ significantly, a post-hoc Tukey HSD test needs to be conducted which compares each condition with all the others and determines among which conditions or groups the significance lies (Kirkpatrick & Feeney 2009:39). Post hoc comparisons were conducted for each semantic category whenever the F value was significant.

CHAPTER III

RESULTS

The present chapter has been organized into two major parts which mirror the methodological approach adopted in carrying out the analysis of the texts making up the parallel and comparable corpora. The first part is concerned with presenting the findings gleaned from the textual analysis of the documents, whereas the second part presents the findings relating to the semantic analysis of the documents. Quantitative data will be accompanied by qualitative information whenever the latter is deemed relevant to understanding the translator’s choices at the lexical and syntactic levels. Statistical evidence will also be provided. In this respect, several one-way between- subjects

ANOVA tests were run in SPSS to find out whether or not the quantitative data had statistical validity. Before presenting the findings, I would like to provide some information about the size of the two corpora analyzed.

3.1 Parallel and Comparable Corpora

The text length of the English source texts ranges from 2,714 to 4,167 tokens (or running words) with an average text length of 3,396 running words and a total corpus size of

50,944 tokens. The text length of the Italian translations ranges from 2,886 to 4,640 tokens with an average text length of 3,552 running words and a total TT corpus size of

72

73

53,291 tokens. The text length of the Italian originals ranges from 2,014 to 5,275 tokens with an average text length of 3,259 running words and a total corpus size of 48,892 tokens.

3.2 Textual Analysis

3.2.1 Standardized Type-Token Ratio

As mentioned in the previous chapter, type-token ratio is affected by text length – being the result of the ratio between number of types, that is to say the number of distinct words in a text, and total number of running words (also known as tokens); which is why it was not used in the present analysis. The value that actually makes the texts comparable regardless of their actual word length is instead the standardized type-token ratio, which was used herein. The latter results from averaging the type-token ratios of a text calculated at intervals of every one thousand words. Below is a table containing the data relating to the type-token ratios for each text in the parallel corpus:

Table 3.1 – Parallel Corpus STTRs

Corpus English Italian Doc. STs TTs Text 1 STTR STTR Text 2 43.03 48.17 Text 3 46.47 51.80 Text 4 49.87 52.15 Text 5 45.50 49.07 Text 6 47.00 54.90 Text 7 40.25 44.83 Text 8 46.10 48.20

74

Text 9 46.27 47.93 Text 10 48.70 53.50 Text 11 43.70 45.93 Text 12 42.80 44.77 Text 13 44.20 48.77 Text 14 45.90 48.53 Text 15 47.53 49.13 Text 16 41.17 43.70

As a general rule, it is possible to say that the type-token ratio in the translations is higher, which in turn points to greater vocabulary variation. This is also confirmed by a higher number of types in the translations. The higher the number of types, the higher the vocabulary variation. Below is a table comparing the source and target text types for each of the fifteen documents under analysis:

Table 3.2 – Parallel Corpus Types

Corpus English Italian Doc. STs TTs Types Types Text 1 1,163 1,453 Text 2 1,083 1,306 Text 3 1,134 1,226 Text 4 946 1,102 Text 5 1,116 1,397 Text 6 1,075 1,232 Text 7 1,160 1,217 Text 8 1,148 1,282 Text 9 1,119 1,280 Text 10 984 1,211 Text 11 940 1,006 Text 12 1,058 1,223 Text 13 1,069 1,237 Text 14 1,105 1,247 Text 15 989 1,107

75

The higher type-token ratios and number of types point to greater variation overall in the use of vocabulary in translations. However, as pointed out in the methodology chapter, this statistic might also be due to grammatical and syntactic differences in the two languages. Indeed, for WordSmith Tools a type is any distinct word occurring in a text, which means every declension of verbs, adjectives, nouns and grammatical particles is counted as a distinct type. It follows that Italian inevitably has more types than English by virtue of its linguistic features. One solution to this problem might be to lemmatize those words which belong to the same word class and differ only in gender, number or tense in the case of verbs, by reducing their inflectional forms to a common base form, but this practice is not widespread and it was not carried out in this study.

In six (6) out of fifteen (15) translated texts, the number of types is higher despite the fact that a few source text sentences were omitted and not translated. The texts in question are:

 Text 2: Darwin’s Influence on Modern Thought, in which four sentences, namely S 156, S 167, S 168 and S 169, were omitted in the Italian version.

 Text 7: Next Stretch for Plastic Electronics, in which four sentences were omitted. These are S 67, S 153, S 154, and S155.

 Text 8: Seeking Better Web Searches, in which two sentences were omitted. These are S 127 and S 164.

 Text 9: Shaping the Future, in which as in Text 8 two sentences were omitted. These are S 163 and S 164.

76

 Text 14: White Matter Matters, in which one sentence, namely S 145, was omitted.

 Text 15: The Colors of Plants on other Worlds, in which two sentences were omitted, namely S 113 and S142.

As far as the comparable corpus is concerned, a comparison of the texts of the translated and non-translated Italian sub-corpora is possible in that the standardized type-token ratio of each text is computed by averaging the type-token ratios of 1,000-word text stretches; therefore, the results are not affected by text length or, rather, they are not directly affected by it. Below are two tables, one featuring the STTR of each text and the one featuring the average STTR of each sub-corpus:

Table 3.3 Comparable Corpus STTRs

TT IO Sub- Sub- Corpus Corpus STTR STTR 48.17 50.20 51.80 51.65 52.15 46.12 49.07 44.40 54.90 51.05 44.83 47.85 48.20 48.10 47.93 48.38 53.50 49.40 45.93 46.80 44.77 48.23 48.77 55.00 48.53 42.50 43.70 47.70 49.13 46.45

77

Table 3.4 – Comparable Corpus STTR Means

TT IO Sub- Sub- Corpus Corpus STTR STTR 48.76 48.26

Generally speaking, the STTR in the translated Italian sub-corpus ranges from 43.70 to

54.90 whereas the one in the non-translated Italian sub-corpus ranges from 42.50 to 55.

These intervals are almost identical as are the average STTRs of the two sub-corpora taken as a whole.

3.2.2 Sentence number

Another important textual feature which needs to be discussed herein relates to the number of sentences present in the source and target texts. This textual feature documents whether or not there are any changes as to syntactic structure when the source text content is conveyed into the target text through the target language.

Table 3.5 – Parallel Corpus Sentence Numbers

Corpus English Italian Doc. STs TTs Sentence Sentence Number Number Text 1 166 160 Text 2 172 164 Text 3 152 143 Text 4 114 107 Text 5 154 137 Text 6 188 177

78

Text 7 162 128 Text 8 156 138 Text 9 172 142 Text 10 119 111 Text 11 116 110 Text 12 123 118 Text 13 137 123 Text 14 156 133 Text 15 184 177

If we compute the difference in number of sentences for each pair of texts contained in the parallel corpus and subtract, only from those texts in which omissions occurred, the number of sentences which were not translated from the source text, the data that comes out shows that only in six out of 15 pairs of texts the difference in sentence number is above 10. This points to the use in most TTs of a syntactic structure similar to that in STs.

Since the texts contained in the Italian comparable corpus cannot be compared when considered individually, an average sentence number was computed for both the parallel and comparable corpus:

Table 3.6 – Average Sentence Numbers

English Italian Italian STs TTs Originals Average Sentence 151.4 137 111.73 Number

It is worth reminding the reader that the average text length between the English source texts and the Italian originals is almost the same, namely 3,396 and 3,259, respectively.

The difference in number of tokens between the two is 137. In order to make the average

79

sentence number between these two corpora comparable, it is necessary to look at the average sentence length for each corpus – which indicates how many tokens are in an average sentence – and then divide 137 by the average sentence length:

Table 3.7 – Parallel and Comparable Corpus Average Sentence Length

English Italian Italian STs TTs Originals Average Sentence 22.53 26 30.12 Length

Two operations are possible: either 137 is divided by the average sentence length of the

English Source Text sub-corpus and then the result is subtracted from the average sentence length of the English Originals sub-corpus or 137 is divided by the average sentence length of the Italian Originals sub-corpus and the result is then added to the average sentence number of the Italian Originals sub-corpus. By so doing, the two sub- corpora are made comparable in terms of sentence numbers.

EO Corpus Sentences = 137 : 22.53 = 6.1

IO Corpus Sentence = 137 : 30.12 = 4.55

The result of the first operation needs to be subtracted from the average number of corpus sentences for the English Source Text sub-corpus, whereas the result of the second operation is to be added to the average number of corpus sentences for the Italian sub- corpus consisting of original texts.

EO Corpus Comparable Sentence Number = 151.4 – 6.1 = 145.3

80

IO Corpus Comparable Sentence Number = 111.73 + 4.55 = 116.28

Only one of the above-mentioned results is to be taken into account. I have decided to make the Italian Originals sub-corpus sentence number comparable to the English

Originals sub-corpus sentence number. Hence, the average sentence numbers of the two sub-corpora approximately reflect the average sentence number that an average text of 3,

339 words in English and Italian might have.

Table 3.8 – Average Sentence Number in an average text of 3,339 tokens

English Italian ST Originals Average 151.4 116.28 SL

As is evident from the table above, the average sentence number in texts originally written in Italian for the same readership is lower than the one in the corpus consisting of

Italian original texts.

The same procedure was followed to document whether there was any significant difference between translated and non-translated Italian texts. Since the two corpora in question are of different size, a few operations had to be performed in order for the results relating to the average sentence length of the two corpora to be comparable. First, the difference in tokens between the two corpora was computed:

Token Difference = 3,552 – 3, 259 = 293

81

Second, the approximate number of sentences that a chunk of text, consisting of 293 tokens originally written in Italian, might contain was calculated by dividing the Token

Difference by the average sentence length of the Italian Originals sub-corpus:

Approximate Sentence Number = 293/30.12 = 9.73

Third, the approximate number of sentences was added to the average sentence number of the Italian Originals sub-corpus in order to obtain the approximate average number of sentences that an average text of 3,552 words might contain:

Approximate Average Sentence Number = 111.73 + 9.73 = 121.46.

Finally, the average sentence numbers of the translated and non-translated Italian texts were compared:

Table 3.9 – Comparable Corpus Average Sentence Number

Italian Italian Translations Originals Average Sentence 137 121.46 Number

A comparison of the two results shows that the difference in the average number of sentences between the two sub-corpora is lower than the difference in the average number of sentences between English and Italian originals.

82

3.2.3 Lexical Density

As pointed out in the methodology chapter, lexical density represents the information load of a document, and in this study it was computed by taking into account all the content words occurring in each document analyzed instead of just focusing on the first one hundred most frequently-used words. Below is a table featuring the lexical density values for the parallel corpus:

Table 3.10 – Parallel Corpus Lexical Density

Corpus English Italian Doc. Originals Translations LD LD Text 1 23.12 26.57 Text 2 26.08 30.59 Text 3 29.46 35.96 Text 4 27.08 30.26 Text 5 26.19 33.77 Text 6 20.49 25.47 Text 7 27.69 30.14 Text 8 26.31 30.42 Text 9 28.15 32.63 Text 10 24.93 26.79 Text 11 24.94 26.53 Text 12 24.43 28.98 Text 13 26.13 28.91 Text 14 27.21 30.73 Text 15 21.62 24.56

As a general rule, the pattern that stands out from the table above is that in the Italian translations, the value for lexical density is higher than the value in the source texts. This

83

finding contradicts what Scarpa states in her article “Corpus-based Quality-Assessment of Specialist Translation: A Study Using Parallel and Comparable Corpora in English and

Italian,” in which she uses lexical density as one of the benchmarks for assessing the quality of student translations and finds that this feature is generally lower in translations.

However, in computing lexical density, Scarpa only focuses on the first one hundred high frequency words.

As far as average lexical density is concerned, its values for the entire corpus confirm an overall higher lexical density in the translated Italian sub-corpus as shown in the table below:

Table 3.11 – Lexical Density in English and Italian Originals

English Italian Originals Originals LD 25.60 29.49

As far as the comparable corpus is concerned, since a one-by-one comparison of lexical density values between English Originals, Italian Translations and Italian Originals is not possible in that the values for the Italian Originals would not be comparable because of their different text length which inevitably affects lexical density, it is nonetheless possible to compare the average lexical density for each sub-corpus as in the following table:

Table 3.12 – Parallel and Comparable Corpus Average Lexical Density

EO IT IO LD 25.60 29.49 29.76

84

The rationale behind comparing these results is that the overall corpus size for the Italian originals is lower than the corpus size of the English source and Italian target texts, respectively. Hence, the higher average lexical density in the Italian Originals corpus cannot be due to a higher number of tokens or running words in the latter. Indeed, the overall size of the Italian Originals sub-corpus is 48,892 tokens as opposed to 50,944 and

53,291 tokens for the English source text and Italian target text sub-corpora, respectively.

The pattern that emerges from the analysis of these values is that the Italian target texts and Italian originals resemble each others in terms of information load, which is on the whole higher than in English source texts.

3.2.4 Readability

The data on readability is qualitative rather than quantitative which means that statistical evidence will not be provided for this textual feature at the end of the chapter. Readability is classified as hard, average or easy. Below is a table featuring the readability indices relating to the parallel corpus texts:

Table 3.13 – Parallel Corpus Readability Indices

Corpus ST TT Doc. READABILITY READABILITY Text 1 Average Hard Text 2 Hard Hard Text 3 Hard Hard Text 4 Average Hard

85

Text 5 Average Hard Text 6 Average Hard Text 7 Hard Hard Text 8 Hard Hard Text 9 hard Hard Text 10 Hard Hard Text 11 Hard Hard Text 12 hard Hard Text 13 Hard Hard Text 14 Hard Hard Text 15 Average Hard

Since these texts are highly technical, their readability index is primarily hard, with the exception of five cases in the source text sub-corpus in which the readability is average.

On the whole, the pattern that emerges from the comparison of the results is in alignment with the patterns that have been identified so far in relation to other textual features. In other words, an increase in the readability index is evident in the Italian translations. As expected, the Italian translations are all hard to read because the texts, besides being highly technical, also have a higher word and sentence length which are the two factors used in calculating the Gulpease index which is the formula for readability.

A contrastive analysis between the above-mentioned values and the ones from the

Italian originals sub-corpus reveals that three out of fifteen texts have an average readability index as shown in the table below:

Table 3.14 – Parallel and Comparable Corpus Readability Indices

Corpus EO IT IO Doc. READABILITY READABILITY READABILITY Text 1 Average Hard Hard

86

Text 2 Hard Hard Hard Text 3 Hard Hard Hard Text 4 Average Hard Hard Text 5 Average Hard Average Text 6 Average Hard Hard Text 7 Hard Hard Hard Text 8 Hard Hard Average Text 9 hard Hard Hard Text 10 Hard Hard Hard Text 11 Hard Hard Hard Text 12 hard Hard Hard Text 13 Hard Hard Average Text 14 Hard Hard Hard Text 15 Average Hard Hard

Generally speaking, the readability of these texts is expected to be hard because of a greater use of technical vocabulary. It is not possible to make a one-by-one comparison between the texts from the parallel and comparable corpora because they have different text length and are about different topics. However, considering that the target readership is the same for both Italian Translations and Italian Originals (given that these texts were taken from the same magazine) it is possible to state that as a general rule the overall readability index in both corpora is hard.

3.2.5 Average Sentence Length

The average sentence length in the Source Text sub-corpus is generally lower than the one found in translated texts, as shown in the table below:

Table 3.15 – Parallel Corpus Average Sentence Length

87

Corpus STs TTs Doc. ASL ASL Text 1 24.56 28.75 Text 2 19.13 21.72 Text 3 19.89 19.99 Text 4 23.50 27.97 Text 5 22.14 25.64 Text 6 21.86 22.51 Text 7 21.11 26.38 Text 8 22.68 26.14 Text 9 18.60 22.92 Text 10 26.20 34.16 Text 11 25.45 27.83 Text 12 28.06 29.75 Text 13 24.19 29.50 Text 14 21.06 25.55 Text 15 19.59 20.99

Overall, it is possible to argue that the Italian translated sentences tend to be longer than their English counterparts. Only in four cases (Texts 3, 6, 12, and 15), the difference between the source and target texts in terms of sentence length is less than two (2) tokens.

The average sentence length for each sub-corpus is as follows:

Table 3.16 Parallel Corpus mean ASL

ST Sub- TT Sub- Corpus corpus ASL 22.53 26

A comparison of the average sentence length of each translation with the average sentence length of the Italian Originals sub-corpus shows that eight out of fifteen texts,

88

namely texts 1, 4, 7, 8, 10, 11, 12, 13, have an average sentence length higher than the sub-corpus mean average.

As far as the comparable corpus is concerned, the average sentence length of each text being part thereof is as follows:

Table 3.17 – Comparable Corpus ASL

IT IO Sub- Sub- Corpus Corpus ASL ASL 28.75 37.26 21.72 32.57 19.99 27.28 27.97 35.09

25.64 28.24 22.51 31.05 26.38 28.20 26.14 20.71 22.92 25.87 34.16 31.03 27.83 28.66 29.75 24.13 29.50 27.85 25.55 37.97 20.99 35.84

Since a one-by-one comparison of each text between the translated and non-translated

Italian sub-corpus is not possible, it is necessary to compute the average sentence length for each sub-corpus and then compare the results, which are the following:

89

Table 3.18 – Comparable Corpus Mean ASL

IT IO Sub- Sub- Corpus Corpus ASL 26 30.12

The data contained in the table above show that there is a major difference in terms of average sentence length between translated texts and texts that are originally written in

Italian by Italian scientists or, more generally speaking, authors. This difference is even more evident when comparing the average sentence length of English and Italian originals as shown below:

Table 3.19 – Mean ASL in English and Italian Originals

EO IO Sub- Sub- Corpus Corpus ASL 22.53 30.12

It is evident that English sentences are generally shorter than Italian sentences, which is partly due to the fact that English prefers coordination whereas Italian prefers hypotaxis.

3.3 Semantic Analysis

Semantic analysis herein refers to the investigation of such semantic relationships as repetitions, modified repetitions, synonyms, antonyms, meronyms, holonyms, hypernyms and hyponyms. The following is a list of the findings for each above-mentioned semantic category in the parallel and comparable corpora. The findings will be presented in the

90

form of percentages rather than raw numbers because in this way it will be easier to document the difference in the frequency and use of these semantic relationships in each and every text as well as the whole corpus.

3.3.1 Repetition and Modified Repetition

As mentioned in the methodology chapter, repetition and modified repetition, that is to say the use of derived forms, were considered as a unique semantic category while computing their percentage of occurrence. Therefore, the results presented below include both types of relationships:

Table 3.20 – Parallel Corpus Repetitions

Corpus ST TT Doc. Repetition Repetition Text 1 52.13 % 65.56 % Text 2 53.90 % 55.68 % Text 3 69.60 % 70.13 % Text 4 70.83 % 73.85 % Text 5 33.85 % 16.57 % Text 6 54.67 % 54.13 % Text 7 48.55 % 49.38 % Text 8 49.86 % 54.62 % Text 9 88.48 % 82.76 % Text 10 77.86 % 77.32 % Text 11 61.69 % 59.26 % Text 12 62.35 % 58.71 % Text 13 59.59 % 50.86 % Text 14 60.68 % 62.74 % Text 15 48.78 % 47.56 %

91

From the table above, it is evident that the repetition patterns in the target texts for the most part resemble those of the source texts. However, in five out of fifteen texts, there is a major difference between source and target texts. Of these five texts, two have a greater use of repetitions in the target text whereas three have a lower use of them. If we look at each text individually, it is not possible to see the general picture, which is why an average of the use of repetition in both sub-corpora is provided below:

Table 3.21 – Parallel Corpus Average Repetition

EO IT Average Average Repetition Repetition 59.52 58.61

By comparing the average use of repetitions in the English source text and the Italian target text sub-corpora, it is possible to argue that there is no major difference in the use of this semantic category between English Originals and Italian Translations.

A different trend is evident when non-translated and translated Italian texts are compared. Since a one-by-one comparison between texts in the two sub-corpora is not possible, an average of the percentage of use of this semantic category in the two sub- corpora taken as a whole was computed and the results are the following:

Table 3.22 – Comparable Corpus Average Repetition

IT IO Average Average Repetition Repetition 58.61 % 45.58 %

92

The pattern that emerges concerning the use of repetition in Italian translations and

Italian originals is that there is a far lower use of repetition devices in Italian originals.

The repetition average relating to the Italian Target Text sub-corpus resembles that of the

English Source Text sub-corpus, which is only slightly higher (59.52 %). However, one might argue that these two figures are not totally comparable because of the different sizes of the two sub-corpora: one (the Italian Target Text Sub-corpus) being 53,291 tokens long and the other (the Italian Originals Sub-corpus) being 48,892 tokens long.

This is not entirely true given that the percentage is not computed based on the number of tokens of a text but rather based on the amount of times the same term was repeated throughout a text. It is true that the longer a text, the higher the chances a term can be repeated but the percentage reflects the frequency of use of a semantic category compared to the other semantic relationships. Therefore, regardless of the length of a text, the percentage can be either higher or lower depending on whether the author resorted to other semantic categories such as synonyms, antonyms, meronyms, etc. In support of this argument is the finding that the percentage of use of any semantic category is not proportional to the size of the texts in which they occur:

Table 3.23 – Percentage of Use of Repetition Compared to Text Size

Corpus English English Italian Italian Italian Italian Doc. Originals Originals Translations Translations Originals Originals Tokens Repetition Tokens Repetition Tokens Repetition Text 1 4,121 52.13 % 4,640 65.56 % 2,878 43.85 % Text 2 3,308 53.90 % 3,576 55.68 % 2,952 47.41 % Text 3 3,058 69.60 % 2,886 70.13 % 5,232 50.23 % Text 4 2,714 70.83 % 3,033 73.85 % 3,034 58.58 % Text 5 3,447 33.85 % 3,541 16.57 % 2,611 38.37 % Text 6 4,167 54.67 % 4,032 54.13 % 2,402 42.67 %

93

Text 7 3,473 48.55 % 3,430 49.38 % 2,134 61.54 % Text 8 3,557 49.86 % 3,632 54.62 % 4,625 51.66 % Text 9 3,221 88.48 % 3,273 82.76 % 2,014 43.33 % Text 10 3,132 77.86 % 3,806 77.32 % 5,275 36.69 % Text 11 2,967 61.69 % 3,075 59.26 % 3,782 47.66 % Text 12 3,478 62.35 % 3,533 58.71 % 3,268 42.79 % Text 13 3,356 59.59 % 3,669 50.86 3,706 42.95 % Text 14 3,311 60.68 % 3,420 47.56 % 2,877 29.38 % Text 15 3,634 48.78 % 3,745 62.74 % 2,102 46.55 %

As is evident from the table above, within the same sub-corpus, longer texts may have a lower repetition frequency as in text 6, which consists of 4,167 running words but has a repetition frequency of 54.67 %, which is lower than that of Text 4, which consists of only 2,714 running words but has a repetition frequency of 70.83 %. This is due to the fact that, as previously mentioned, the percentage was calculated out of the total number of semantic categories used in each text.

3.3.2 Synonyms

As far as synonyms are concerned, a contrastive analysis between each source and target text points to an overall higher use of repetitions in the translated texts, as shown in the table below:

Table 3.24 – Parallel Corpus Synonyms

Corpus ST TT Doc Synonyms Synonyms Text 1 4.25 % 1.11 % Text 2 8.81 % 4.76 % Text 3 18.71 % 16.88 %

94

Text 4 0 0.46 % Text 5 40.10 % 48% Text 6 16.80 % 18.80 % Text 7 3.26 % 0.41 % Text 8 11.44 % 18.54 % Text 9 7.37 % 11.33 % Text 10 0.24 % 3.90 % Text 11 6.47 % 7.12 % Text 12 2.71 % 4.84 % Text 13 16.33 % 22.84 % Text 14 16.1 % 13.67 % Text 15 17.48 % 17.70 %

In ten out of fifteen target texts, the frequency of use of synonyms is higher than the source texts. The most evident differences can be found between source and target texts

2, 5, 8, 9, 12, and 13. If the means of the two sub-corpora are compared, it becomes evident that, overall, the Target Text sub-corpus has a slightly higher use of synonyms, as featured in the table below:

Table 3.25 – Parallel Corpus Synonym Means

EO IT Synonym Synonym Mean Mean 11.34 % 12.69 %

The difference between the two means is 1.35 %. The mean values are not so different from the mean value of the Italian Originals sub-corpus which is 10.83 %, which in turn implies a lower use of synonyms than in both English originals and Italian translations.

95

3.3.3 Antonyms, Meronyms and Holonyms

These three semantic categories are presented under the same heading because their frequency of use was very low compared to repetitions, synonyms, hypernyms and hyponyms, which can be regarded as the most common lexical cohesive devices used to create coherence and cohesion in a text.

As far as antonyms are concerned, their use in the texts under analysis was sporadic. They were identified only in a few texts with a very low frequency of occurrence, as shown in the table below:

Table 3.26 – Parallel Corpus Antonyms

Corpus EO IT Doc. ANTONYM ANTONYM Text 1 0 0 Text 2 0.68 % 0.73 % Text 3 0 0 Text 4 0 0 Text 5 0 0 Text 6 0 0 Text 7 1.81 % 2.06 % Text 8 0 0 Text 9 3.23 % 3.45 % Text 10 0 0 Text 11 0 0 Text 12 0 0 Text 13 7.35 % 7.33 % Text 14 2.17 % 2.17 % Text 15 0.2 % 0.22 %

96

By looking at the data above, one cannot draw any conclusions as to the use of antonyms in the source and target texts because in some the target texts have a higher use of antonyms and in others the target texts have the same frequency of use as the source texts. Moreover, in nine out of fifteen texts, the frequency of use of this semantic category equals zero. If the mean of these values for each sub-corpus is taken into consideration, then it is possible to see a general trend in its use:

Table 3.27 – Parallel Corpus Antonym Means

EO IT Antonym Antonym Mean Mean 1.03 % 1.06 %

The difference between the two mean values is almost equal to zero. These findings are not so different from the ones found in the Italian Originals sub-corpus as shown in the table below:

Table 3.28 – Italian Originals Antonyms

Corpus IO Doc. Antonym Text 1 0.41 % Text 2 0 Text 3 0 Text 4 0 Text 5 0 Text 6 4.19 % Text 7 0 Text 8 2.32 % Text 9 0 Text 10 0 Text 11 0

97

Text 12 0 Text 13 0 Text 14 0.57 % Text 15 0

Only in three out of fifteen texts were antonyms present; however, since a comparison between source and target texts and Italian Originals is not possible, the sub-corpus mean was computed. Its value is 0.50 %, which is twice as low as the mean of both the source and target text sub-corpora.

As regards meronyms, there is, overall, a higher use of the latter compared to antonyms, but their frequency of occurrence is still low when compared to repetitions, synonyms, hypernyms or hyponyms. Below is a table featuring the frequency of occurrence of this semantic category in the parallel corpus:

Table 3.29 – Parallel Corpus Meronyms

Corpus EO IT Doc. Meronym Meronyms Text 1 10.64 % 10% Text 2 0 0 Text 3 0 0 Text 4 8.33 % 9.63 % Text 5 0 0 Text 6 0 0 Text 7 1.81 % 1.23 % Text 8 7.62 % 7.28 % Text 9 0 0 Text 10 4.05 4.39 % Text 11 0.99 % 1.14 % Text 12 9.64 % 10.32 % Text 13 0 0 Text 14 5.26 % 4.97 %

98

Text 15 3.86 % 4.2 %

The values in the columns above are generally very close to each other. Though there is a slightly higher use of meronyms in the translated texts (in five out of nine texts where meronyms appear), it is not possible to argue that there is a major difference between the two sub-corpora, and this is proved by the mean values:

Table 3.30 – Parallel Corpus Meronym Means

EO IT Meronym Meronym Mean Mean 3.48 % 3.54 %

The means of the two sub-corpora are almost the same, which implies that both source and target texts adopt the same number of meronyms. Different findings are evident when these mean values are compared to the mean value of the Italian Originals sub-corpus, in which the meronym mean amounts to 8.89 %. This higher mean is due to the fact that a greater number of texts in the Italian Originals sub-corpus make use of this semantic category as in the table below:

Table 3.31 – Italian Originals Meronyms

Corpus IO Doc. Meronym Text 1 0 Text 2 6.67 % Text 3 5.94 % Text 4 0 Text 5 5.03 % Text 6 17.99 %

99

Text 7 5.33 % Text 8 18.87 % Text 9 10.84 % Text 10 1.62 % Text 11 0 Text 12 7.96 % Text 13 34.83 % Text 14 6.22 % Text 15 12.07 %

As is evident from the table above, not only do most of the texts make use of meronyms but the frequency of occurrence of the latter is also higher than that of the source and target texts.

Last but not least, there is the semantic category of holonyms. Like antonyms, this semantic category does not occur in almost half of the texts that are part of the parallel corpus, as shown in the table below:

Table 3.32 – Parallel Corpus Holonyms

Corpus EO IT Doc. Holonyms Holonyms Text 1 3.19 % 3.33 % Text 2 0 0 Text 3 0 0 Text 4 10.42 % 11.47 % Text 5 1.04 % 1.14 % Text 6 0 0 Text 7 7.25 % 7.42 % Text 8 0 0 Text 9 0 0 Text 10 4.52 % 4.15 % Text 11 0 0 Text 12 0 0

100

Text 13 0 0 Text 14 8.05 % 9.31 % Text 15 0.2 % 0.22 %

The general trend that is evident in the table above is that the target texts have a slightly higher but overall similar frequency of occurrence of holonyms. This is confirmed by the mean values of the two sub-corpora, which are as follows:

Table 3.33 – Parallel Corpus Holonym Means

EO IT Holonym Holonym Mean Mean 2.30 % 2.47 %

Though the TT holonym mean is slightly higher, it is not possible to state that there is a difference in the use of this semantic category in the two sub-corpora, which instead can be argued in the case of the Italian Originals sub-corpus, in which the presence of this semantic category is very limited, as demonstrated in the table below:

Table 3.34 – Italian Originals Holonyms

Corpus IO Doc. Holonym Text 1 0 Text 2 0 Text 3 5.94 % Text 4 0.32 % Text 5 0 Text 6 0.84 % Text 7 0.59 % Text 8 0 Text 9 0

101

Text 10 0 Text 11 0 Text 12 0 Text 13 0 Text 14 0 Text 15 0.58 %

Not only is the frequency of occurrence in most of the texts equal to zero, but it is also very low in the few texts in which this semantic category occurs. To see if there are any differences in the frequency of use between the source and target texts and the Italian original texts, it is necessary to compare the mean values of the three sub-corpora. The mean of the Italian Originals sub-corpus amounts to 0.55 %, which is lower than that of the Source and Target Text sub-corpora, which is 2.30 % and 2.47 %, respectively.

3.3.4 Hypernyms

As far as the parallel corpus is concerned, a comparison of the frequencies of use of hypernyms in the source and target texts reveals some interesting details. Below is a table featuring the frequency values in the parallel texts:

Table 3.35 – Parallel Corpus Hypernyms

Corpus EO IT Doc. Hypernym Hypernym Text 1 11.70 % 11.11 % Text 2 20.68 % 21.61 % Text 3 8.19 % 11.69 % Text 4 5% 2.75 % Text 5 14.60 % 23.43 % Text 6 26.40 % 24.79 %

102

Text 7 5.07 % 6.17 % Text 8 20.82 % 10.93 % Text 9 0 0.49 % Text 10 3.81 2.92 % Text 11 3.98 % 6.55 % Text 12 7.53 % 6.45 % Text 13 13.47 % 16.38 % Text 14 0.93 % 1.24 % Text 15 4.67 % 5.10 %

The table above shows that in ten out of fifteen texts, there is an increase in the use of hypernyms in the Italian translations (see Texts 1, 2, 3, 5, 7, 9, 11, 13, 14, 15). One text, namely, text 9, in which the frequency of use of this semantic category is equal to zero in the source text, makes use of hypernyms in the Italian version. In the remaining five cases, the Italian translation makes lesser use of hypernyms. In their place, the Italian texts resort to other semantic categories such as synonyms, hyponyms, meronyms but mostly repetitions and omissions. Overall, however, the hypernym frequency mean values of the two sub-corpora are not very different from each other as shown in the table below:

Table 3.36 – Parallel Corpus Hypernym Means

EO IT Hypernym Hypernym Mean Mean 9.79 % 10.11 %

The data in the table above show that the average use of hypernyms in the source text sub-corpus is almost identical to the one in the Target Text sub-corpus. An interesting

103

finding can be found when these means are compared to the hypernym frequency mean value of the Italian Originals sub-corpus. The individual values in each text are overall higher than the ones in the parallel corpus as shown in the table below:

Table 3.37 – Italian Originals Hypernyms

Corpus IO Doc. Hypernym Text 1 33.61 % Text 2 14.82 % Text 3 9.36 % Text 4 15.53 % Text 5 17.61 % Text 6 15.90 % Text 7 4.14 % Text 8 12.91 % Text 9 1.66 % Text 10 15.26 % Text 11 15.58 % Text 12 25.37 % Text 13 7.51 % Text 14 12.99 % Text 15 23.56 %

This higher use of hypernyms in each of the Italian Original texts that were analyzed is reflected in the overall higher mean value of the sub-corpus which is equal to 15.05 %.

This implies that the Italian Originals sub-corpus makes a greater use of hypernyms compared to English texts and their Italian translations.

3.3.5 Hyponyms

104

Besides repetitions, synonyms and hypernyms, the fourth most widely used semantic category in the corpus of texts selected for this study was hyponyms, namely the use of words with a more specific meaning. Comparing the frequency of occurrence of hyponyms in each pair of texts making up the parallel corpus yields the following results:

Table 3.38 – Parallel Corpus Hyponyms

Corpus EO IT Doc. Hyponym Hyponym Text 1 18.70 % 8.89 % Text 2 15.93 % 17.22 % Text 3 3.50 % 1.3 % Text 4 5.42 % 1.84 % Text 5 10.41 % 10.86 % Text 6 2.13 % 2.28 % Text 7 32.25 % 33.33 % Text 8 10.26 % 8.61 % Text 9 0.92 % 1.97 % Text 10 9.52 % 7.32 % Text 11 26.87 % 25.93 % Text 12 17.77 % 19.68 % Text 13 3.26 % 2.59 % Text 14 6.81 5.90 % Text 15 24.81 % 25%

None of the texts has the same exact percentage of use of this semantic category in both the source and target language. More than half of the target texts (eight out of fifteen) make a greater use of hyponyms whereas the remaining seven make a lower use of them.

No pattern emerges from this comparison. This slightly higher use of hyponyms which is present in the target texts is not reflected in the mean value of the latter, as shown in the table below:

105

Table 3.39 – Parallel Corpus Hyponym Means

EO IT Hponym Hyponym Mean Mean 12.57 % 11.51 %

The data in the table show that the source text sub-corpus overall makes a greater use of hyponyms compared to the target text sub-corpus or translations. This data is in contrast to the findings of the Italian Originals sub-corpus, in which most of the texts have a frequency of use of hyponyms higher than ten percent (13 texts) as opposed to the Italian

Originals sub-corpus, in which only six out of fifteen have a frequency of use higher than ten percent as shown in the table below:

Table 3.40 – Italian Originals Hyponyms

Corpus IO Doc. Hyponym Text 1 17.21 % Text 2 15.55 % Text 3 26.25 % Text 4 13.92 % Text 5 33.33 % Text 6 10.88 % Text 7 16.57 % Text 8 3.31 % Text 9 20.84 % Text 10 33.12 % Text 11 16.51 % Text 12 11.94 % Text 13 12.01 % Text 14 41.24 % Text 15 6.32 %

106

The texts with a frequency of use higher than ten percent are texts 1, 2, 3, 4, 5, 6, 7, 9, 10,

11, 12, 13, and 14. This finding is also supported by the mean value of the whole sub- corpus which amounts to 18.6 % as opposed to just 12.57 and 11.51 % of the Source and

Target text sub-corpora, respectively.

3.4 SPSS Statistical Analysis

3.4.1 Textual Features

3.4.1.1 STTRs

A one-way between subjects ANOVA was conducted to compare the effect of language

(IV) on standardized type token ratios (DV) under three conditions: English Originals,

Italian Translations, and Italian Originals. There was not a significant effect of language on SSTR at the p < 0.05 level for the three conditions [F (2, 42) = 6.075; p = 0.005]. Post hoc comparisons using the Tukey HSD test indicated that the mean score for the English

Originals condition (M = 45.23; SD = 2.69) was significantly different from the mean score for the Italian Translations condition (M = 48.76; SD = 3.25) and from the mean score for the Italian Originals condition (M = 48.26; SD = 3.03). However, there was no significant difference between the mean score for the Italian Translations condition (M =

48.76; SD = 3.25) and the mean score for the Italian Originals condition (M = 48.26; SD

= 3.03).

107

3.4.1.2 Sentence Number

A one-way between subjects ANOVA was conducted to compare the effect of language

(IV) on sentence number (DV) under three conditions: English Originals, Italian

Translations, and Italian Originals. There was a significance effect of language on sentence number at the p < 0.05 level for the three conditions [F (2, 42) = 5.1666; p =

0.010]. Post hoc comparisons using the Tukey HSD test indicated that the mean score for the English Originals condition (M = 151.40; SD = 24.41) was significantly different from the mean score for the Italian Originals Condition (M = 111.73; SD = 49.12).

However, there was no significant difference either between the mean score for the

English Originals condition (M = 151.40; SD = 24.41) and the mean score for the Italian

Translations condition (M = 137.87; SD = 23.10) or between the mean score for the

Italian Translations condition (M = 137.87; SD = 23.10) and the mean score for the

Italian Originals condition (M = 111.73; SD = 49.12).

3.4.1.3 Lexical Density

A one-way between subjects ANOVA was conducted to compare the effect of language

(IV) on lexical density (DV) under three conditions: English Originals, Italian

Translations, and Italian Originals. There was a significant effect of language on lexical density at the p < 0.05 level for the three conditions [F (2, 42) = 7.696; p = 0.001]. Post hoc comparisons using the Tukey HSD test indicated that the mean score for the English

Originals condition (M = 25.59; SD = 2.42) was significantly different from the mean score for the Italian Translations condition (M = 29.49; SD = 3.17) and from the mean score for the Italian Originals condition (M = 29.76; SD = 3.99). However, there was no

108

significant difference between the mean score for the Italian Translations condition (M =

29.49; SD = 3.17) and the mean score for the Italian Originals condition (M = 29.76; SD

= 3.99).

3.4.1.4 Average Sentence Length

A one-way between subjects ANOVA was conducted to compare the effect of language

(IV) on average sentence length (DV) in three conditions: English Originals, Italian

Translations, and Italian Originals. There was a significant effect of language on ASL at the p < 0.05 level for the three conditions [F (2, 42) = 13.755; p = 0.000]. Post hoc comparisons using the Tukey HSD test indicated that the mean score for the English

Originals condition (M = 22.53; SD = 2.78) was significantly different from the mean score for the Italian Originals condition (M = 30.12; SD = 4.95). However, there was no significant difference between the mean score for the English Originals condition (M =

22.53; SD = 2.78) and the mean score for the Italian Translations condition (M = 25.99;

SD = 3.87). The Tukey HSD test also indicated that the mean score for the Italian

Translations condition (M = 25.99; SD = 3.87) was significantly different from the mean score for the Italian Originals condition (M = 30.12; SD = 4.95).

3.4.2 SPSS Statistical Analysis: Semantic Features

A one-way between subjects ANOVA was conducted to compare the effect of language

(independent variable) on the amount of use of semantic categories (dependent variables),

109

also known as lexical cohesive devices, in three (3) conditions: English Originals, Italian

Translations, and Italian Originals.

Table 3.41 – Parallel & Comparable Corpus Statistical Data

Semantic English Source Text Italian target Text Italian Originals Sub- Category Sub-Corpus Sub-corpus corpus Repetition M = 59.52 ; SD = 13.41 M = 58.61; SD = 15.68 M = 45.58; SD = 8.09 Synonym M = 11.34 ; SD = 10.30 M = 12.69; SD = 12.35 M = 10.83; SD = 5.91 Antonym M = 1.03; SD = 2.02 M = 1.06; SD = 2.04 M = 0.50; SD = 1.19 Meronym M = 3.48; SD = 3.92 M = 3.54; SD = 4.04 M = 8.89; SD = 9.29 Holonym M = 2.31; SD = 3.56 M = 2.47; SD = 3.89 M = 0.55; SD = 1.52 Hypernym M = 9.79; SD = 7.93 M = 10.11; SD = 8.07 M = 15.06; SD = 8.14 Hyponym M = 12.57; SD = 9.76 M = 11.51; SD = 10.29 M = 18.60; SD = 10.62

3.4.2.1 Repetition

There was a significant effect of language (IV) on repetition (DV) at the p < 0.05 level for the three conditions [ F (2, 42) = 5.576; p = 0.007].

Post hoc comparisons using the Tukey UDS test indicated that the mean score for the English Originals condition (M = 59.52, SD = 13.41) was significantly different from the Italian Originals Condition (M = 45.58, SD = 8.09). The Tukey HDS test also revealed that the mean score for the Italian Translations condition (M = 58.61, SD =

15.68) was significantly different from the Italian Originals Condition (M = 45.58, SD =

8.09). There was no significant difference between the mean score for the English

Originals condition and the mean score for the Italian Translations condition (p > 0.05).

3.4.2.2 Synonym

110

There was not a significant effect of language (IV) on synonym (DV) for the three conditions [F (2, 42) = 0.142; p = 0.87].

3.4.2.3 Antonym

There was not a significant effect of language (IV) on antonym (DV) for the three conditions [F (2, 42) = 0.47; p = 0.630].

3.4.2.4 Meronym

There was not a significant effect of language (IV) on meronym (DV) for the three conditions [F (2, 42) = 3.679; p = 0.034]. Post hoc comparisons using the Tukey HSD test revealed that the mean score for the English Originals condition (M = 3.48; SD =

3.92) was significantly different from the Italian Originals Condition (M = 8.89; SD =

9.29). However, there was no significant difference between the mean score of the Italian

Translations condition and the mean score of the Italian Originals condition nor between the mean score of the English Originals condition and the mean score of the Italian

Translations condition (p > 0.05).

3.4.2.5 Holonym

There was not a significant effect of language (IV) on holonym (DV) for the three conditions [F (2, 42) = 1.7; p = 0.196].

3.4.2.6 Hypernym

There was not a significant effect of language (IV) on hypernym (DV) for the three conditions [F (2, 42) = 2.02; p = 0.149].

111

3.4.2.7 Hyponym

There was not a significant effect of language (IV) on hyponym (DV) for the three conditions [F (2, 42) = 2.09; p = 0.139].

3.4.2.8 Repetition vs. Rest of Semantic Categories

Since the mean scores for the rest of the semantic categories did not, statistically speaking, turn out to be significant, with the exception of meronym, for which the p value was greater than 0.05 but still less than 0.06, it was decided to group the other semantic categories together and document whether there is a statistically significant difference in mean values among the three conditions when considering antonyms, meronyms, holonyms, hypernyms and hyponyms as one semantic category instead of separate ones.

A one-way between subjects ANOVA was conducted to compare the effect of language

(IV) on the use of the other semantic categories (DV) as a whole in three conditions:

English Originals, Italian Translations and Italian Originals. The table below shows the mean values and standard deviations for each condition:

Table 3.42 – Parallel & Comparable Corpus Statistical Means

Semantic English Source Text Italian target Text Italian Originals Sub- Category Sub-Corpus Sub-corpus corpus Repetition M = 59.52 ; SD = 13.41 M = 58.61; SD = 15.68 M = 45.58; SD = 8.09 Other Semantic M = 40.52 ; M = 41.39; M = 54.42; Categories SD = 13.43 SD = 15.68 SD = 8.09

3.4.2.9 Other semantic categories as a whole

112

There was a significant effect of language (IV) on the amount of use of the other semantic categories as a whole for the three conditions [F (2, 42) = 5.55; p = 0.007]. Post hoc comparisons using the Tukey HSD test indicated that the mean score for the English

Originals condition (M = 40.52; SD = 15.68) was significantly different from the mean score for the Italian Originals condition (M = 54.42; SD = 8.09). It also revealed that there was a statistically significant difference between the mean score of Italian

Translations (M= 41.39; SD = 15.68) and the mean score of the Italian Originals (M =

54.42; SD = 8.09). However, there was no statistically significant difference between the mean score for the English Originals condition and the mean score for the Italian

Translations condition.

CHAPTER IV

DISCUSSIONS

4.1 Introduction

The findings from both the textual and semantic analysis tend to confirm the hypotheses stated in the introduction and in chapter one of this work. The findings presented in the results chapter suggest that overall there is a significant difference in the use of lexical cohesive devices in English and Italian. In particular, the findings confirm hypothesis 1, which claims that Italian translations tend to adopt the lexical cohesive devices of their attendant English source texts, and hypothesis 2, which claims that articles originally written in Italian and published in Le Scienze differ in the use of lexical cohesive devices from Italian translations published in the same magazine. Both the textual and semantic features under analysis point to an overall stylistic, syntactic and lexical difference between the two languages in question. The fact that, at the sentential level, there is a statistical difference in average sentence length between Italian Translations and Italian originals supports the semantic analysis findings in that longer sentences in the translations would have an impact on the use of lexical devices. Indeed, by merging two or more sentences into a bigger, more syntactically complex one, the use of repetition, which is not employed as often in Italian as it is in English, could be reduced by resorting

113

114

to other lexical or non-lexical cohesive devices such as references, substitutions, ellipsis, etc.

The following results will be discussed in the light of the findings by Scarpa and of the universals of translation by Mona Baker. In the latter respect, Baker defines universals of translation as linguistic features that occur in translated texts rather than original texts regardless of the source or target language involved in the translation process (1993: 243). She identifies four major universals in translation which are said to be valid across languages; they are explicitation, simplification, normalization and leveling out.

As far as simplification is concerned, three types exist: lexical, syntactic and stylistic. Blum-Kulka and Leverston (1983) define lexical simplification as “the process and/or result of making do with less words (1983: 119).” Lexical simplification is achieved by means of:

1) Use of superordinates, when no equivalent hyponyms are available in the target

language (this translation strategy was investigated by Baker [1992]);

2) Concept approximation, which is the case with culture-bound items;

3) Use of circumlocutions instead of equivalent high-level words (this translation

strategy was investigated by Vanderauwera [1985: 102-3] who notices a use of

colloquial/modern synonyms when translating old, formal and high-level source

language words);

115

4) Paraphrasing, to make up for cultural gaps existing between any two cultures.

In terms of simplification features, Laviosa (1998) identifies four core patterns of lexical use in the English Comparable Corpus (1998: 565):

a) In translated texts, lexical density is generally lower because the percentage of

content words is relatively lower than that of grammatical words;

b) The ratio of high frequency words versus lower frequency words is relatively

higher in translated texts;

) The most frequent words are repeated more often;

d) Translations contain fewer lemmas.

At the syntactic level, Vanderauwera (1985) finds several cases in which complex syntactic structures are simplified by changing non-finite clauses into finite ones. She also provides evidence for stylistic simplification which is achieved by breaking down long sentences, reducing or omitting repetitions or redundant information.

As for explicitation, Vinay (1958) carried out a comparative study between

French and English and defined explicitation as “the process of introducing information into the target language which is present only implicitly in the source language, but which can be derived from the context or the situation (23).” Likewise, Baker (1996) describes explicitation as the tendency “to spell things out rather than leave them implicit

(180).” An example of explicitation is provided by Blum-Kulka (1986) who speaks of cohesive explicitness whereby he refers to shifts in the type of cohesive markers that are

116

used in a text. These shifts are achieved through the replacement of substitution or ellipsis with repetitions or use of synonyms, which increases the level of cohesion in the target text. Factors that might explain this phenomenon are stylistic preferences, systematic differences, or culture-bound translation norms.

Normalization involves the unconscious or conscious use of textual features that make a target text comply with the typical textual characteristics of the target language/culture. Baker defines normalization as the “tendency to exaggerate features of the target language to conform to its typical patterns (1996: 183).” An example of normalization is when creative lexis is normalized in translations or when typical collocations are preferred over unusual ones.

Last but not least, leveling out, in Baker’s words, or convergence as Laviosa

(2002) calls it, refers to “the tendency of translated text to gravitate towards the centre of a continuum (1996: 184).” This tendency is also known as convergence because it reflects “the relatively higher level of homogeneity of translated texts with regard to their own scores on given measures of universal features (Laviosa 2002: 72).”

These four universals of translation were empirically studied by Laviosa-

Braithwaite (1996) using corpus linguistics tools. In this respect, Laviosa argues that corpus-based techniques have great potential for meeting the need for a rigorous descriptive methodology (156). However, the empirical study of translation phenomena is only carried out by means of corpus linguistic tools, which, though providing some statistical data, does not tell the translation scholar whether or not there is statistical

117

significance in his or her findings. The present study, by contrast, sets out to put forward a new methodology which combines the statistical and quantitative data provided by such corpus tools with statistical analysis tools such as SPSS which can objectively tell us whether or not the hypotheses put forward at an early stage of a study and the actual findings identified in the analysis are statistically significant.

Lastly, the choice of using parallel and comparable corpora is, like this whole study, grounded in theory. In this respect, it is Baker who states that by shifting the focus of translation studies research from comparing source and target texts or languages to comparing text production per se with translation, translation scholars are able to

“explore how text produced in relative freedom from an individual script in another language differs from text produced under the normal conditions that pertain in translation (1995).” This is why in the present study the results gleaned from the textual and lexical analysis of the English into Italian parallel corpus were compared with those from the textual and lexical analysis of the Italian comparable corpus. By so doing, it was possible to identify and compare text production patterns pertaining to Italian translationese and text production patterns pertaining to Italian in a non-translation context.

4.2 Textual Features

Average Sentence length

118

The SPSS analysis of the textual features shows that Italian translations reproduce the syntactic features of the English source texts as far as sentence length is concerned.

Indeed, post hoc comparisons using the Tukey test indicate that the p value between the mean score for the English source text sub-corpus and the Italian target text sub-corpus is greater than 0.05. It follows that there is no statistically significant difference in terms of average sentence length between the two sub-corpora. This finding does not support the syntactic Simplification hypothesis which is one of the universals of translation identified by Mona Baker, according to which translations may have shorter sentences; this finding also contradicts the Normalization hypothesis according to which translations should comply with the textual characteristics of the target language. However, this finding confirms what Scarpa argues in her article “Corpus-based Quality-Assessment of

Specialist Translation: A Study Using Parallel and Comparable Corpora in English and

Italian” in which she finds that sentences were actually longer in translations (2006: 169).

However, she does not provide any statistical evidence for that finding. As demonstrated herein, though the sentences tend to be longer in the translated texts, statistically speaking, the difference in mean score between the English source text sub-corpus and the Italian target text sub-corpus is not significant. On the other hand, the SPSS analysis shows that there is a statistically significant difference in average sentence length between the Italian Translations sub-corpus and the Italian Originals sub-corpus. The p value between the two means is 0.018 which points to a great statistical significance. The mean value of the translations is closer to the mean value of the English source texts than it is to the mean value of the Italian originals. This implies that, syntactically speaking,

119

translations mirror the source language since English prefers coordination unlike

Italian which prefers hypotaxis.

Sentence number

The SPSS analysis of this textual feature shows that Italian translations as a whole have an amount of sentences similar to that of their attendant English source texts. Indeed, the p value between the mean score for the English Originals sub-corpus and the Italian translations sub-corpus is greater than 0.05, which points to an insignificant difference in mean score. This finding contradicts what Scarpa’s translation quality-assessment study reports. Indeed, she states that the average number of sentences in the Italian translated texts was found to have reduced “somewhat more drastically” (2006: 170) but no statistical evidence was provided to prove her statement. This finding also, statistically speaking, rejects the Simplification hypothesis mentioned above, in that the latter predicts a lower number of sentences in translation. Though this is somewhat true in that most of the target texts generally have a lower number of sentences than their attendant source texts, this difference is not such as to make it statistically significant.

Another interesting finding relates to the difference in sentence number between the Italian Translations and the Italian Originals. My second hypothesis claims that there should be a difference between the latter two sub-corpora. Though the number of sentences is overall lower in the Italian Originals sub-corpus than in the English Originals and Italian translations sub-corpora, statistically speaking, the SPSS analysis indicates that there is no significant difference between Italian translated texts and texts originally

120

written in Italian. By contrast, there is a statistically significant difference between

English and Italian originals (p = 0.008). This result would somehow be in contrast with the finding about average sentence length. Indeed, if Italian Originals have a significantly longer average sentence length than the Italian Translations, this would imply a higher sentence number for the Translations. This would be true, if twenty-three sentences had not been omitted in some of the translations. Indeed, were the twenty-three sentences to be added to the Italian Translations sub-corpus then there would be a statistically significant difference in sentence numbers between Translations and Italian Originals, in that the sentence number would be almost identical to that of the English Originals sub- corpus, and this would hence support my second hypothesis. The omissions of these sentences in the translation process can be classified as examples of syntactic and stylistic simplification, as put forward by Vanderauwera (1985), who argues that translations are usually syntactically simplified through the omission of long circumlocutions or irrelevant details and redundant information. Indeed, the sentences which were omitted in the translations most of the time provided further information either between brackets or as part of the main text. The following are some examples:

Sentence 67 from “Next Stretch for Plastic Electronics” reads as follows: “(the channel is where an electric current flows through a transistor – or not – and where the switching action takes place);”

121

Sentence 142 from “The Color of Plants on Other Worlds” is in brackets and reads as follows “(this is one of the models enlisted to calculate how much light reaches the solar panels of the Mars rovers);” or

Sentence 113 from the same text which reads “The photons work together like stages of a rocket to provide the necessary energy to an electron as it performs the chemical reactions.”

In all the above-mentioned examples, the information omitted in the translation process is additional and gives details which can be left out without compromising the overall understanding and/or coherence of the text.

STTR

The SPSS analysis of this textual feature shows that Italian translations differ from their

English source texts in terms of vocabulary variation. This means that in translations there is a greater variation in the use of vocabulary because Italian prefers lexical variety.

This finding supports the Normalization hypothesis, according to which target texts tend to adapt the style and sentence structure of the source texts to the stylistic and syntactic conventions of the target language. This finding also supports Scarpa’s study. She notices that higher quality translations are associated with a higher type-token ratio (170).

However, she does not provide any statistical evidence in this respect. The finding of my study statistically proves that there is a significant difference (p value < 0.05) between the mean score for the English Originals and the mean score for the Italian translations. The

122

SPSS analysis also shows that there is no statistically significant difference between

Italian Translations and Italian Originals (p > 0.05). Indeed the mean values for the comparable corpus is almost identical, which means that Italian Translations are closer, stylistically speaking, to the target language, namely Italian.

Lexical Density

The SPSS analysis shows that there is a statistically significant difference in terms of lexical density, which, as a reminder, is the ratio between content words and the total number of words in a text, between the English originals and their Italian translations.

Indeed, the p value between the mean score for the English Originals sub-corpus and the

Italian Translations sub-corpus was less than 0.05 (p = 0.006). It follows that translations were found to be more lexically densed (in terms of content words) than their source texts. This finding contradicts Scarpa’s study in which translations were found to have a lower lexical density (2006: 170). However, as noted above, in calculating lexical density, she only focused on the first one-hundred high frequency words, whereas in the present study lexical density was calculated out of the total number of content words after taking out grammar words through a stop list.

As far as universals of translation are concerned, this finding contradicts the

Simplification hypothesis according to which translations should have a lower information load as a result of a higher use of lexico-grammatical relations and repetition.

The SPSS analysis also shows that there is no statistically significant difference in the

123

amount of information load between Italian Translations and Italian Originals. Indeed, the mean values for both sub-corpora are almost identical: 29.49 for the Italian translations sub-corpus and 29.76 for the Italian Originals sub-corpus which significantly differ from the 25.59 mean score of the English Originals sub-corpus. This implies that, generally speaking, the Italian language has a higher lexical density, which means that there is a greater use of different content words.

4.3 Semantic Features

4.3.1 Repetition

The SPSS analysis shows that there is no statistically significant difference in mean score between the English Originals and the Italian Translations. In other words, translators tend to reproduce this lexical cohesive device in the target text instead of replacing it with other lexical or non-lexical cohesive devices. Indeed, stylistic choices in Italian recommend the avoidance of repetition in favor of a greater use of lexical variety

(Musacchio, 2007: 179). This stylistic feature of the Italian language is statistically proved by the comparable corpus consisting of Italian Translations and Italian Originals, in which the difference in mean scores between these two sub-corpora was statistically significant (p = 0.021). On the whole, the Italian Originals corpus had a lower use of repetitions than Italian translations and this finding is similar to Scarpa’s (171). In particular, this finding contradicts the Explicitation hypothesis, according to which translations tend to have a higher number of repetitions (155), and the Normalization

124

hypothesis, according to which, in the transfer from the ST to the TT, the ST style and sentence structure are adapted to the textual features of the target language (Scarpa, 2006:

156). Though the repetition mean score for the translations is slightly lower than the mean score for English Originals, the difference is not statistically significant. A comparison of each single pair of source and target texts shows that there is a major difference in the use of repetition only in three cases:

1) “The Iceman Reconsidered;”

2) “Shaping the Future” and;

3) “Sowing a Gene Revolution.”

The rest of the target texts tend to overall reproduce this lexical cohesive device. In the first pair of source and target texts, namely “The Iceman Reconsidered,” the percentage of use of repetition in the source text is 33.85 % as opposed to just 16.57 % of the target text. Here it is possible to state that normalization was performed at the stylistic level by replacing repetition with other lexical cohesive devices. The following are some examples:

The term evidence is only once translated as reperti but in the other sentences it is then translated by means of several hypernyms such as elementi, dati, testimonianze, segni. The verb find is translated as rinvenire three times and then in the rest of the text the same verb is translated by means of 7 synonyms (ritrovare, trovare, scoprire, scoperta), 2 hypernyms (individuare, ricavare) and in two cases it was omitted.

125

In the second pair of source and target texts, namely “Shaping the Future”, the percentage of use of repetition devices in the English source text is 88.48 % as opposed to 82.76 % of the Italian translation. Lexically speaking, repetition was replaced with synonyms or hypernyms. In a few cases, there were omissions or use of pronouns. For example, the English term cost, which occurs fourteen times in the source text, was translated ten times as costi, three times as spese (synonym), and in one case it was omitted.

Lastly, in “Sowing a Gene Revolution”, the percentage of use of repetitions in the source text amounts to 59.59 % compared to 50.86 % of its Italian translation. Some of the lexical cohesive devices which were employed to limit redundancy in the repetition of the same term were hypernyms and synonyms. In other cases, repetition was avoided through omissions or use of demonstratives or pronouns. For example, the English term farmers, which occurs thirty-three times in the source text, is translated fourteen times as contadino, eighteen times as agricoltore or coltivatore (synonym), and in just one case it is omitted.

As far as the comparable corpus is concerned, the SPSS analysis indicates that there is a statistically significant difference between the mean score for the Italian

Translations sub-corpus and the Italian Originals sub-corpus. Indeed, the p value between these two groups is less than 0.05, more precisely 0.021. By looking at the mean values for both sub-corpora which are 58.61 % for the Italian translations and 45.58 % for the

Italian Originals, it is possible to state that texts which are originally written in Italian

126

tend to use a lower amount of repetitions as opposed to Italian translationese. Thus the presence of lexical redundancy in Italian translations published in the same magazine as the Italian originals fails to meet the target readership’s expectations.

4.3.2 Synonyms

The SPSS analysis indicates that statistically speaking there is no difference in mean score either between the texts making up the parallel corpus or between the texts making up the comparable corpus. Indeed, the p value between the respective groups is greater than 0.05 which points to a lack of statistical significance. Indeed, the mean values for the three groups of texts are very close to each other. The English originals sub-corpus has a mean value of 11.34 %, the Italian translations sub-coprus has a mean value of 12.69 %, whereas the Italian originals sub-corpus has a mean value of 10.83 %.

Statistically speaking, the general trend found in the whole corpus does not support the lexical simplification hypothesis put forward by Blum-Kulka and Leverson

(1983), according to which translations make a greater use of superordinates and synonyms. However, if each pair of texts from the parallel corpus is taken into consideration, one can notice that in five out of fifteen cases, there is a greater use of synonyms in the translated texts. The texts in question are:

1) “The Iceman reconsidered;”

2) “Sowing a Gene Revolution;”

127

3) “Shaping the Future” and;

4) “Intrigue at the Immune System.”

In the above-mentioned texts, repetition is replaced by synonyms in fifteen, twenty-one, nine and twenty one cases respectively. For example, in “Sowing a Gene Revolution”, the

English term farmers is first translated as contadini (for fourteen times) but in the rest of the text the same term is translated by means of a synonym, namely agricoltori or coltivatori. In these texts, there was an attempt to limit redundancy and adapt the style of the source language to that of the target language by resorting to synonyms. However, as can be deduced from the analysis of the Italian Originals documents, synonyms are not the main lexical cohesive device that can be used to achieve that goal.

4.3.3 Meronyms, Holonyms, Hypernyms, Hyponyms, Antonyms

The SPSS analysis shows that there is no statistically difference in terms of use of meronyms either between the source and target texts or between the Italian Translations and Italian Originals as a whole. By contrast, there was a slightly significant difference between English Originals and Italian Originals. However, since the percentage of use of this semantic category is very low compared to the use of synonyms or repetitions in both the parallel and comparable corpora due to the tendency to resort to other cohesive devices, it is not possible to state that the use of meronyms in translations is different from the use of meronyms in texts that are originally written in Italian because this would mean overgeneralizing the results, which, considering the size of the corpus and its being

128

restricted to a specific text-type, would not be feasible. The same comment applies to the other less commonly used semantic categories which were analyzed in this study such as holonyms, antonyms, hypernyms and hyponyms. For all of these semantic categories, when each is considered individually, the SPSS analysis showed that there was no statistically significant difference in their use in either corpus. However, if all of them are grouped together, including synonyms, and are considered as a whole in opposition to repetitions, the findings, as shown in the section below, turn out to be significant.

4.3.4 Semantic categories other than repetitions as a whole

Interestingly enough, when the percentage of use of synonyms, antonyms, meronyms, holonyms, hypernyms, and hyponyms is summed up and considered as one semantic category apart from repetitions, important conclusions can be drawn. In this respect, the

SPSS analysis shows that there is no statistically significant difference (p = 0.981) between the source and target text sub-corpora. This means that overall target texts resort to semantic categories other than repetition in the same way and in the same amount as their source texts. If this finding is considered together with the one concerning repetition, then it is evident that translations tend to resort to the same semantic categories in so far as there is no statistically significant difference in the use of repetitions or the remainder of the semantic categories.

What needs to be pointed out here is that, statistically speaking, both English

Originals and Italian Translations do not differ either in the amount of use of repetitions

129

or in the use of semantic categories other than repetitions when the latter are considered as a whole. This implies that translations tend to reproduce the semantic categories of their source texts disregarding the stylistic preferences of the Italian language which makes a lower use of repetitions.

Another important finding concerns the difference in the use of repetitions and the rest of the semantic categories between Italian Translations and Italian Originals as well as between English Originals and Italian Originals. In this respect, the SPSS analysis shows that there is a statistically significant difference in mean score between texts originally written in English and the ones originally written in Italian. The statistical significance of this finding points to different stylistic preferences in the two languages under analysis when it comes to using repetitions. Generally speaking, English Originals have a far higher percentage of use of repetitions compared to Italian Originals. The p value between the two mean scores is 0.013 which points to a significant difference in resorting to repetitions. The same is true of the p value between the mean scores for the other semantic categories when considered all together. The p value is 0.013 which implies that unlike English, Italian prefers to avoid an overuse of repetition by resorting to other semantic categories thus making texts more lexically varied. This finding is confirmed by a significant difference in lexical density and standardized type-token ratio between the mean score for the English Originals sub-corpus and the Italian Originals sub-corpus with a p value of 0.003 and 0.023 respectively.

130

Though the SPSS analysis did not show any statistically significant difference in terms of mean scores for either the parallel or comparable corpus with the only exception of meronyms, overall Italian originals make a larger use of meronyms (which was statistically proven) but above all of hyponyms and hypernyms. The same is not true for the Italian Translations where the mean scores for hyponyms and hypernyms are almost identical to those of their translations. As noted above, this finding contradicts the lexical simplification hypothesis whereby translations make a larger use of superordinates. This result is closely connected with the repetition mean score of the Italian translations sub- corpus which is very close to that of the English Originals sub-corpus. Out of a very large number of repetition devices that were computed and analyzed in the English Originals sub-corpus, only a very small number of them were replaced by hyponyms or hypernyms.

To be more precise, the total number of repetition devices that were computed in the

English Originals sub-corpus amounts to 2,627, twenty-eight of which were rendered as hyponyms and fifty-eight as hypernyms. The following are some examples:

Cases in which repetition was replaced by a hyponym:

Ex. 1. Gene (ST) > Allele (gene sequence variation) (TT)

Ex. 2. Company (ST) > Celera ( of the company) (TT)

Ex. 3. Response (ST) > Reaction (TT)

Cases in which repetition was replaced by a hypernym

Ex. 1. Customers (ST) > Consumatori (TT)

131

Ex.2 Microsatellites (ST) > Strutture satellitari (TT)

Ex. 3 Corpse (ST) > Corpo (TT) meaning body instead of using cadavere.

Interestingly enough, there were also cases (21 in total) in which hyponyms were translated by means of hypernyms:

Ex. 1. Blue (ST) > Banda

Ex. 2. Human cloning (ST) > Tecnica (TT)

Ex. Query (ST) > Ricerca

Apart from these very few instances in which repetition was replaced by hyponyms and hypernyms, other translation techniques for avoiding repetition which were identified during the contrastive textual/lexical analysis of the English source texts and their Italian translations were omissions (258 instances), synonyms (104 instances), substitution (7 instances), reference – mainly pronouns - (38 instances). Overall, the general translation approach to repetition which was found in the translated texts was that of maintaing the same cohesive device unchanged. Investigating the reasons behind these choices is not within the scope of this study; what needs to be pointed out is that because of these choices, the translations tend to reflect the stylistic, lexical and syntactic preferences of the source language. By so doing, the target readership’s expectations are not met and this ultimately compromises the readability of these texts. At the initial stage of this dissertation, one of the objectives was to carry out an experiment whereby readers’ expectations had to be tested. However, because of several factors (mainly time

132

constraints and availability of subjects), this second part of the analysis was not carried out. The subjects needed to be Italian native speakers living in Italy. Their task was to assess the fluency of texts (half Italian translations and half Italian originals) in terms of style, lexis and syntax and then classify them either as translations or as non-translated texts. Distance from the subject recruitment location, along with the travel- and subject- related expenses that such experiment would entail made me abandon the project, which I hope could still be carried out as a follow-up study.

CHAPTER V

CONCLUSIONS

5.1 Introduction

The analysis and discussion of the results presented in chapters three and four support the two hypotheses put forward in this study. Hypothesis 1 claims that English source texts and their attendant Italian translations are similar in terms of use and amount of lexical cohesive devices, whereas hypothesis 2 claims that Italian translations and Italian originals differ in the use and amount of lexical cohesive devices.

As for hypothesis 1, the statistical analysis of the lexical cohesive relations in the parallel corpus (consisting of English Originals and Italian Translations), which was carried out by means of SPSS, showed that there was no significant difference in mean scores between the two sub-corpora for each of the following semantic categories when taken individually:

1) Repetition

2) Synonymy

3) Meronymy

4) Antonymy

133

134

5) Holonymy

6) Hypernymy and

7) Hyponymy.

The SPSS analysis also showed that there was no significant difference in mean scores between the two sub-corpora when all of the above-mentioned semantic categories, with the only exception of repetition, were put together and considered as one single semantic category. Further evidence in support of this hypothesis was the lack of statistically significant difference in mean scores for both number of sentences and average sentence length between the English Originals and the Italian Translations.

By contrast, the statistical analysis of the semantic categories in the comparable corpus, consisting of English Originals and Italian Originals, showed that there was a significant difference only for repetition when each semantic category was taken individually. However, when all of the remaining semantic categories, namely synonyms, meronyms, holonyms, hypernyms and hyponyms were put together and considered as one single semantic category, the SPSS analysis showed that there was a significant difference in mean scores between English Originals and Italian Translations. Further evidence in support of hypothesis 2 was obtained from the statistical analysis of the average sentence length which turned out to be significantly different between Italian

Translations and Italian Originals.

135

These findings statistically prove what other discourse analysis and translation scholars have, over the past thirty years, theorized in their studies. In this respect, James argues that “while every language has at its disposal a set of devices for maintaining textual cohesion, different languages have preferences for certain of these devices and neglect certain others (1980: 109).” Likewise, Hatim and Mason (1990) state that when one translates from a source language 1 into a target language 2 , the underlying coherence (that is to say the semantic relations set up by the cohesive devices) should be kept invariant in the translated texts. What might need to change are the surface linguistic elements or devices used to reproduce the source language semantic relations because these surface elements might be language- or text-type specific.

My analysis focused on scientific texts taken from an American-English magazine, namely Scientific American, and demonstrated that the articles originally written in Italian make a lower use of lexical repetition and a greater use of other lexical cohesive devices, especially hyponyms and hypernyms.

As mentioned above, during the translating process only the underlying semantic relations should be kept invariant in the translation; what needs to change is instead the surface structure which is used to establish those relations. Transferring the same surface linguistic elements into the target text might cause the target readership to find the latter less coherent. It follows that cohesion, and more specifically, lexical cohesion, is, out of the seven standards of textuality identified by De Beaugrande and Dressler (1981), very important to text comprehension. Indeed, research has shown that lexis is one of the main

136

reasons for comprehension problems (Alderson, 1984; Cassell, 1982). Given that translating or the process of translation requires successful comprehension of a text, it follows that being able to identify and transfer lexical cohesive devices is a necessary translator’s skill if textual equivalence is to be achieved in the target text. In this respect,

Newman (1988) states that the cohesive level is “a regulator, it secures coherence, it adjusts emphasis” (1988: 24). At this level, the translator is forced to deal with the values that are intrinsic to lexis. S/he needs to identify the differences between positive and negative words, positive and neutral words, and negative and neutral words and then transfer the same value of those words in the target text (1988: 24) Cohesion, which is one of the four levels of translating in Newman’s approach (the other three being the textual, referential and naturalness levels), is mainly concerned with the structure of texts and the moods of texts.

By structure, Newman means the links among sentences or information items, whereas by mood he means a dialectical factor that helps determine the negative, positive or neutral meanings of words which need to be kept invariant in the target language

(1988: 23-24). It follows that when choosing a synonym in a target language, translators have to take into account shades of meaning or what Inkpen and Hirst (2006) call types of differences in synonyms (224-225). They identify three of them:

1) Denotational differences (synonyms differing in meaning);

2) Attitudinal differences (synonyms differing in connotation); and

3) Stylistic differences (differing in their level of formality or register).

137

Making students aware of the role that cohesion in general, and lexical cohesion in particular, play in the translation problem can make the translation process smoother. In other words, lexical cohesion plays a major role in reading comprehension in that it is realized through lexis. But reading comprehension is only one of the activated cognitive processes, which a translator goes through while translating, that involve lexis. Indeed, the translator from reader then turns himself/herself into a writer and attempts to convey or rather evoke an experience (which is his/her response/interpretation of the text) through the meticulous choice of lexical items in another language (Rosenblatt 1989).

Remaining on the subject of comprehension, it is worth pointing out that cohesion is not the only source of texture or coherence in a text as claimed by Halliday and Hasan

(1976). In this respect, other studies carried out by scholars such as Widdowson (1978),

Carrell (1982), Brown & Yule (1983) show that cohesive devices need not be present in a text in order for the latter to be coherent. In the latter case, the source of cohesion is to be found outside the text, more specifically, in the reader’s prior or background knowledge and schemata. Whenever cohesive devices are missing between sentences, readers tend to infer the links based on their interpretation of the illocutionary acts accompanying the propositions being uttered (Widdowson 1978: 28-30).

Though it does not fall within the scope of this study to report on the cognitive processes activated by translators when faced with lack of explicit cohesive links, this phenomenon in translation bears further investigation in order to see to what extent readers’ schemata can affect their understanding of the semantic relations existing among

138

sentences in a text, how these mental models can help readers explicitate textual conceptual gaps and how this whole cognitive process affects the readers’ understanding of the global meaning of the text to be translated.

As Kostopoulou points out, translation, being an act of speech and communication, is performed at the level of text and discourse by the translator’s resorting not only to linguistic but also extra-linguistic devices (2007: 146), therefore both linguistic and extra-linguistic knowledge needs to be enhanced in translation trainees or future translators. However, this study focused only on the analysis of explicit lexical cohesive devices. Therefore, the pedagogical suggestions offered in the section below will deal primarily with how to recognize explicit lexical cohesive devices and their semantic relations, and with how to transfer them into the target text thus making sure to achieve textual equivalence.

5.2 Pedagogical Implications

Several authors have published a number of works on how to teach cohesion. Most of these studies belong to second language teaching, particularly English as a second language. In this respect, Lubelska, in her article “An Approach to Teaching Cohesion to

Improve Reading,” deals mainly with how to develop students’ ability to interpret cohesive devices more effectively. Indeed, as hinted in the previous section, recent studies on reading comprehension have shown that some learners find it difficult to make sense of a text because of their failure to interpret the writer’s cohesive signals as

139

intended (1991: 569). The author focuses only on inter-sentential cohesion, though the latter can also occur at the intra-sentential level. The pedagogical activities suggested by the author are concerned with the teaching of reference and lexical cohesion. As far as lexical cohesion is concerned, some of the activities for developing an understanding of what this cohesive device is and how it works may involve giving students short paragraphs in which semantically-related words were previously underlined and then ask them a number of questions such as (1991: 592-594):

1) Do the underlined words mean the same thing?

2) What’s the name of words that have similar meanings?

3) Can you identify as many synonyms of word A as possible?

4) Write down the first ten words you think of when reading word A.

These questions are aimed at developing a student’s cognitive thinking through discovery procedures. Since synonymy is a slippery concept, not all of the words that students will circle or identify will most likely belong to this semantic category. Some of the words they might think are synonyms may very well belong to other semantic categories such as hyponymy or hypernymy. Therefore, this activity gives the opportunity to the foreign language instructor to have students reflect upon the differences existing between the above-mentioned categories from a semantic point of view.

Likewise, Ian McGee argues in his article “Traversing the Lexical Cohesion

Minefield” that repetition is a common way of achieving lexical cohesion in scientific

140

texts. However, students sometimes tend to overuse this cohesive device, thus compromising the reader’s understanding of the text itself (2009: 213). He offers a few suggestions on how to teach students to avoid overusing this cohesive category.

However, his approach to teaching cohesion falls within the framework of second language acquisition. Indeed, one of the reasons for this overuse or abuse that he mentions is L1 interference. In other words, foreign language learners may transfer the text structure patterns and style preferences which are typical in their L1 into their L2 writing. Though this comment does not necessarily apply to all translators, in that some of them may only translate into their L1, the article still offers pedagogical suggestions to language teachers who would like to improve their students’ writing skills in a foreign language. In this respect, the author argues that an excessive use of lexical repetition can be avoided by resorting to synonyms. However, choosing and using synonyms successfully is not an easy task because the semantic, attitudinal and connotative nuances put forward by Inkpen and Hirst (2006) must be taken into consideration. In this respect, the author suggests using WordNet which, as described in chapter 2, is a lexical database wherein nouns, verbs, adjectives and adverbs are grouped into synsets, that is to say, sets of synonyms. Students may be given altered texts asking them to identify inappropriate uses of synonyms (McGee 2009: 219). An altered text could be a text in which some of the synonyms related to key terms have been replaced with other synonyms having, say, different connotations. For example, there are adjectives that though semantically related have positive, neutral or negative values or “attitudes.” Having students reflect on these differences helps them become aware of the fact that a wrong lexical choice may

141

compromise the cohesiveness of the text by changing the writer’s/reader’s stance on the actor, object or action being described. For instance, a person can be referred to as astute or sagacious. Though these two adjectives are synonyms, they have different connotations. Astute,has a negative connotation in that the focus is on the person’s use of his/her cleverness to gain some advantage; sagacious, on the other hand, has a positive connotation in that it places the focus on the person’s wisdom without implying any hidden purpose. The above-mentioned activity can help them reflect on the slippery nature of synonymy.

Applied to the translation field, similar pedagogical activities can be carried out in a translation class. The first objective that needs attaining in such a class is to make sure students understand the difference between cohesion and coherence since there is a little confusion about these two concepts.

Some scholars (Halliday & Hasan [1976]; Schiffrin [1987]) think of cohesion as a semantic concept. In their view, cohesion refers to the meaning relations to be found among the surface linguistic elements present in a text. Others (Baker [1992]; Thompson

[1996]) refer to it as surface relations, that is to say lexical and grammatical dependencies linking words and sentences together (Baker 1992: 218). Halliday and Hasan’s notion of cohesion is text-bound in that as mentioned above they do not take into consideration the role played by the reader’s schemata and background knowledge in making sense of the semantic relations among the sentences of a text.

142

Baker’s definition, on the other hand, takes into consideration both textual and extra-textual factors. In her view, which is similar to that held by other scholars such as

De Beaugrande & Dressler (1981), Brown & Yule (1983), and Hatim & Mason (1990) cohesion relates to the surface relations within a text whereas coherence relates to the semantic relations, as perceived by the reader, underlying the very same surface linguistic elements (or cohesive devices).

One way to show the difference between cohesion and coherence is to have students read two short paragraphs one in which the interpretation of the propositions, and hence the illocutionary acts expressed by the latter, requires activating one’s background knowledge or schemata, and another one in which the relations between the sentences are made explicit through cohesive devices. This way the translation teacher can also have students reflect on the necessary but not absolute role of cohesive devices in making sense of a text or sets of semantically-related sentences. To exemplify this concept, Widdowson (1978: 29) gives the following example:

A: That’s the telephone.

B: I’m in the bath.

A: O.K.

Though the above-mentioned three utterances do not have any cohesive devices which may textually establish a semantic relationship among them, any person who is familiar with a phone call scenario, can make sense of it by inferring the communicative value behind these three sentences. Taken together, these utterances are part of a

143

communicative exchange in which A’s utterance is interpreted as a request, B’s utterance as a negative response and finally A’s reply as an acceptance of what B says. This stretch of text is interpreted as coherent because the person who reads it recognizes the illocutionary act performed by each sentence and can therefore fill in the propositional gaps which help produce a cohesive conversational exchange which reads as follows

(1978: 29):

A: That’s the telephone. (Can you answer it, please?)

B: (No, I can’t answer it because) I’m in the bath.

A: O.K. (I’ll answer it).

It is because of the illocutionary value we give to the propositions we hear or read that we are able to recover the propositional link(s) that are missing and that enable us to make sense of the written or spoken discourse (1978: 31).

By having students work on short extracts taken from texts or dialogues, the teacher helps them recognize and understand not only the difference between cohesion and coherence but also the role that a reader’s schemata or background knowledge play in making sense of texts. At this level, students are encouraged to recall what the difference between cohesion and coherence is, recognize it when presented with hands-on activities or tasks, and explain where the difference lies. These activities will help students start developing two of the six intellectual skills or behaviors that the cognitive domain involves and that are classified in Bloom’s revised taxonomy as follows (Krathwohl

2002: 215):

144

1) Remembering (being able to recall data/information);

2) Understanding (being able to determine the meaning of instructional messages);

3) Applying (being able to apply what one learns in similar but new situations);

4) Analyzing (being able to compare and contrast);

5) Evaluating (being able to assess decisions or a course of action);

6) Creating (being able to draw on what one has learned and generate new ideas or

products).

Once this first pedagogical milestone is attained, the next step is to make them aware of the fact that, as Hatim and Mason (1990) claim, when translating from a Language A into a Language B, what needs to be kept invariant is the underlying coherence (semantic relations) which is textually conveyed by surface linguistic elements, namely cohesive devices, which may, on the other hand, need to shift insofar as they are language- and text-type specific. In this respect, pertinent insights into the teaching of cohesion are given in the book Teaching Translation from Spanish to English: Worlds beyond Words by Allison Beeby Lonsdale who discusses cohesion differences between English and

Spanish. In particular, the author states that English prefers lexical repetition and pronominalization because, unlike Romance languages, it makes very few distinctions in terms of gender, verb agreement and number (1996: 219). Therefore, it is more difficult for the reader to keep track of the right reference. Lexical repetition and pronominalization make it easier for the English reader to establish reference and

145

cohesion in a text (1996: 225). As far as lexical cohesion is concerned, one of the activities that the author suggests is to have students read short texts and then have them make a list of the referential networks that can be identified in a text. In other words, a text is made up of several paragraphs, each dealing with the same or a different concept.

The main referential network is the one established by the title but within the text other semantic fields can be established and identified. Once these referential networks are identified, it is then possible to make a list of all the content words that belong to each referential network. This task can be sped up and made easier through the use of

WordList which is one of the three applications of WordSmith tools which can make a word list for us.

In a translation class, the focus should be on the different ways these referential networks are represented in the text. In this respect, Allison Beeby Lonsdale suggests providing translation trainees with parallel texts. The term parallel employed by the author has a different definition from the one adopted in the present study. By parallel texts, she means texts which are not translations of each other but which are comparable in terms of topic, genre, register, language sub-field etc. The texts chosen by the author of the book deal with Einstein’s theory of relativity. Three main topics were singled out, namely (225):

1) physics before Einstein;

2) the theory of relativity; and

3) reality and time/space

146

Students were asked to list all the referential networks or references related to these three topics. Once they were done with this activity, students noticed the absence of repetition and a variety of references in the Spanish text (225). This pedagogical activity is an effective one because it helps translation trainees reflect on the difference in the use of lexical cohesive devices in any two languages. The author does not suggest the use of any corpus tools because the length of the texts provided to the students in their study was short. However, using corpora can help students easily identify referential networks because programs such as WordSmith Tool create word lists indicating the frequency of use of words. Since, when working with referential networks, the focus is on content words, WordSmith Tool allows the user to leave out of consideration all grammatical words by loading a word stop list into the application. This way, only content words will be included in the word list. Word lists can help students identify all types of repetitions, including the use of derivational forms. In this respect, to identify derivational forms of certain content words, the user can just arrange his or her word list in alphabetical order and then lemmatize the words having the same stem. Using WordSmith Tool to identify referential networks can be a follow-up activity to their manual identification. This way, students are prompted to apply what they already learned in a novel situation thus fostering the development of the third level of Bloom’s intellectual behaviours, namely

Applying.

Another activity that could be done in class could be having students identify the referential networks of both parallel (my definition) and comparable texts dealing with the same topic(s) and compare them to see how the references to the latter change

147

depending on whether the text being analyzed is a translation or an original text. All these activities are aimed at making students aware of stylistic, syntactic, cultural or semantic differences when it comes to using lexical cohesive devices in another language. In terms of learning objectives, students are encouraged to develop their analytical skills through compare-and-contrast activities (Bloom’s fourth level).

Let us consider lexical differences and the use of these databases in task design.

As mentioned above, choosing synonyms when translating is not always an easy task because words have semantic, attitudinal and connotative differences. Students need to be aware of these nuances if textual equivalence is to be attained in the target text. In other words, students need to be able to choose TL lexical items that do not betray the semantic value conveyed by their ST counterparts. This is all the more true if they have to pick a synonym to avoid repetition. In this respect, a very useful tool is WordNet for the and MultiWordNet for other languages such as Italian, Spanish, Portuguese,

Romanian, Hebrew and Latin. This multilingual lexical database allows the user to type in a term and see its synsets, or sets of synonyms not only in the language of the search but also in the other languages which are supported by the database. Like WordNet,

MultiWordNet also allows you through a drop-down menu, to check the other semantic relationships that the word being looked up has to other words such as hypernymy, hyponymy, meronymy or antonymy. In this respect, an interesting pedagogical task aimed at having students become familiar with the above-mentioned semantic relations, could be to provide them with a short text, in which a limited number of key words are underlined, and then ask students to identify all the other lexical words which are

148

semantically related to each word and establish their type of relationship by using

MultiWordNet. By so doing, students will understand not only the importance that words have in a text but also the network(s) of semantic relationships they create. This part of the task will also help students pay attention to inter-sentential bonds and links and how sentences are semantically related to one another thus developing their awareness of a text as a translation unit in itself. Then students could be grouped into pairs of two/three, as the case may be, and asked to compare their referential networks and assess one another’s choices thus promoting their moving up Bloom’s revised taxonomy (fifth level:

Evaluating). Lastly, students could be asked to translate the text in question and the referential networks thereof, by making all the necessary shifts in terms of lexical cohesive devices, and then justify their choices. This last part of the task aims at promoting the development of their creativity (Bloom’s sixth level) by having them suggest an actual solution to the translation of such referential networks. However, when using WordNet or MultiWordNet, students need to be warned of one of the big shortcomings of these lexical databases, which is their classification of semantic relationships based on the word class (nouns, verbs, adverbs or adjectives) lexical items belong to. This means that the verb open and the adjective open are not considered semantically related in these lexical databases; it follows that students need to use their best judgement when analyzing a text and identifying semantic relationships because machines cannot do all the work for us.

As previously mentioned, Italian does not make a great use of repetition because it makes many distinctions in terms of gender, verb agreement and number. When it comes

149

to scientific texts, as the ones analyzed in this study, though, the use of repetition is higher because the aim is that of avoiding ambiguity especially in the case of co- reference. However, as the analysis of the data showed, the use of the repetition device though higher compared to the other lexical cohesive devices, was yet lower compared to

English Originals. What students need to understand is not that they have to avoid repetition all the time but rather that they can make use of synonyms or superordinates or other semantic categories whenever repetition can be avoided without making the referent obscure or ambiguous.

All the above-mentioned activities or tasks are aimed at developing not only the students’ micro-analysis skills, by having them focus on intra- and inter-sentential semantic relations, but also their instrumental, interpersonal and attitudinal competences

(Kelly 2005). However, translation trainees also need to learn to look beyond the textual boundaries; they need to look at the bigger picture which allows them to grasp the global meaning of the text they need to understand and translate. They need to be taught how to identify textual characteristics and conventions associated with TL genres, register and speech acts. Lexical cohesion is strictly related to textual features and conventions because, as mentioned above, depending on the level of formality of a text or the type of text, the use of content words may differ. Cultures differ in levels of formality and conventions depending on the context(s).

Students need to understand that words that are equivalent in a given target language at the micro-textual level may not be so at the macro-textual level, that is to say, in terms

150

of text-type or register. The choice of lexis and the level of formality of the latter change from culture to culture, and consequently from language to language. To make students aware of these differences that exist at the macro-textual levels, an activity that could be done in class is to have students collect a comparable corpus of texts belonging to the same text-type and language sub-field in the language pair the students are working with and then analyze the text-type features and use of lexis through corpus tools such as

WordSmith Tool.

By means of this tool, students can be made aware of textual features which are often neglected in a translation class. By looking at the statistical data that this corpus tool provides, students can become familiar with differences concerning syntax (hypotaxis vs. parataxis) and paragraph arrangement (rhetorical purposes). To focus on sintax, students can just look at the statistical data concerning number of sentences and average sentence length to reflect upon differences in how messages are conveyed in writing in a particular text-type and language sub-field. The analysis of paragraphs in terms of rhetorical purposes (Bhatia [1993]; Swales [1990]) of a particular text-type can also help students become aware of differences as to its “cognitive structure”, which refers to the conventionalized and standardized arrangement of rhetorical moves or functions used by a particular professional discourse community (Bhatia 1993: 32). In other words, any given text-type has a particular organization of rhetorical functions which is realized through a series of paragraphs, each fulfilling a specific function. These functions or moves are usually associated with communicative intentions and realized by specific lexical-grammatical choices, including lexical cohesive patterns themselves (see chapter

151

1), which ensure the connectedness of rhetorical moves and are accepted and recognized by the member of a particular discourse community (Swales 1990: 58). Discourse communities have common communicative goals, a high level of expertise, use a highly- specialized terminology and possess specific text-types through which their members further their aims (Swales 1990: 24-27). In this study, the discourse community in question is a highly educated one. However, despite the magazines belonging to the same text-type and language sub-field, the discourse communities involved belong to two different cultures, namely American-English and Italian. Since each culture has its own set of lexical-grammatical features to fulfill specific rhetorical purposes, through the comparison of comparable corpora, translation students can become familiar with text- type differences and stylistic preferences in any two languages and learn how to effectively convey the source text rhetorical purposes into the target text through the appropriate TL lexica-grammatical and stylistic choices. In this respect, in order to test the students’ textual competence, borrowing Neubert and Shreve’s terminology, different projects, with different learning objectives, could be devised depending on the linguistic background of the class. In a language-specific class, in which students work with the same language pair, the focus of the project could be on the differences in rhetorical moves and move-related lexical cohesive patterns (LCPs) across text-types. Students could be grouped in pairs and each group could be asked to collect a comparable corpus of texts of a to-be-agreed-upon length and belonging to a specific text-type (assigned by the instructor to make sure each group has a different one).Their task would be to identify the text-type-specific rhetorical moves and the move-related LCPs and then present their

152

findings in class at the end of a module, course, or semester. Having each group present their projects allows to have students reflect on text-type differences, since each group focuses on a different text type. By contrast, in a non-language-specific class, in which students come from or have different cultural/linguistic background, the focus should be on language-specific differences in rhetorical moves and LCPs. In this case, students could be grouped according to their language pair, if possible, and asked to collect a comparable corpus of texts, the text-type of which should be the same for each group. As in the previous project, their task would be to identify the rhetorical moves and move- related LCPs. At the end of the module, course or semester, they would be asked to present their findings so students can compare them and see whether or not the rhetorical moves and move-related LCPs for the same text-type change across languages.

5.3 Limitations and Future Directions

Some of the limitations of this study concern text-type, language pair, text-bound cohesion, and subjectivity in establishing semantic relations.

As for text-type, this study focused only on articles taken from one scientific magazine, namely Scientific American and its Italian version Le Scienze. So the results are representative of the specialized language used by the discourse community working for and reading this magazine. It would be an overgeneralization to say that the findings for the Italian Originals apply to the Italian science language as a whole. Therefore, further investigation into the differences or rather stylistic preferences as to the use of

153

lexical cohesive devices needs conducting in several specialized sectors of the Italian as well as English language. Needless to say, in order for the findings to be generalizable, texts should belong to different text-types and be taken from a number of magazines or sources.

Another limitation of this study is the language pair. The findings apply to stylistic and text-type-related differences between English and Italian. However, it is not possible to claim that the same differences or findings apply for example to other language pairs such as German and Italian or Spanish and Italian. There is still a lot of research to be done as to the stylistic preferences and differences between languages.

The analysis of the lexical cohesive devices carried out on the selected texts is product-based and is mainly concerned with lexical cohesion. This cohesive category has been studied because of its major impact (as Hoey [1991] asserts) on the coherence of a text. However, since coherence is subjective in that it depends on the reader’s ability to interpret the semantic relationships within a text, this study fails to assess to what extent the reproduction of the lexical cohesive patterns of the source text into the Italian target text might affect the reader’s understanding of the latter as a coherent whole. Indeed, in

Sanchez Escobar’s view, the coherence of a text is created by the writer through his or her lexical, semantic and syntactic choices and by the reader through his or her interpretation of the text (1999: 558). Therefore, a direction for future research would be to study the target readership’s response to Italian translated and original texts to see if translated texts, wherein the surface structure of the underlying source text’s semantic

154

relations has been kept invariant are overall rated as less coherent than texts which are originally written in Italian. This experiment would also help provide evidence for the importance that lexical cohesion has in translation quality in that in the end it is the target audience who decides on the success or failure of the end product, that is to say, the translated text.

Another important limitation is the subjectivity in establishing semantic relations.

Since the analysis conducted in this study was manual, it was the author himself who read through the source and target texts and identified the semantic relationships existing among the different key words which were extracted from WordSmith Tool. Though the semantic categories to which each lexical item being investigated belonged were identified by means of WordNet for the English texts and MultiWordNet for the Italian translations and originals, in a few cases the lexical database only provided a set of synsets but no hyponyms, hypernyms, meronyms and so on. In such cases, other sources such as Italian thesauri or dictionary were consulted to identify the semantic relationship(s) between any two words, and in establishing these semantic relations the author’s subjectivity might have played a small part. However, this only happened for a very small number of terms; the bulk of the semantic relations were established through the above-mentioned lexical databases.

Lastly, the biggest limitation of this study is its being product-based. At its initial stage, both a product- and process-based investigation of lexical cohesion was supposed to take place. However, for a series of reasons the process-based project could not be

155

carried out, mainly because of time constraints and shortage of subject availability in the desired target language. Therefore, a future direction that this study should take is mainly empirical, namely an investigation into cohesion and coherence from an expertise point of view. My hypothesis rested on Folkart’s claim that the quality of a translation is affected by the students’ focus on lower sententional ranks which prevents them from seeing the text as a whole. Indeed, cohesive markers come into play only at the supra- sententional rank (1988: 151).

Based on these findings, the hypothesis which was formulated at the very beginning and which needs to be tested claims that translations carried out by novice translators are less cohesive and coherent than the ones done by expert translators in view of the micro-textual approach to text analysis adopted by novices, which leads them to disregard or pay less attention to the network of lexical chains created by lexical cohesion. By contrast, translations done by expert translators are expected to be more cohesive and coherent given their awareness of text-level forms but nevertheless are not completely compliant with TL norms and TL text-type conventions. The latter assumption was partly proven by the product-based investigation of lexical cohesion carried out in this study in that the translated texts tend overall to comply with the lexical cohesion preferences of their source text.

The present study set out to shed more light on the stylistic differences or preferences when it comes to lexical cohesion between English and Italian in scientific texts. However, much is yet to be done on this topic across languages and text-types. An

156

important innovation of this study was its adoption of both corpus and statistical tools for analyzing lexical cohesion. Indeed, so far most of the corpus-based studies which relate to cohesion in general or lexical cohesion in particular mostly discuss the statistical data computed by such corpus tools as WordSmith Tool in general terms without demonstrating whether or not there is statistical significance in the differences which are discovered. It is hoped that this study will help future translation scholars adopt this combined methodology in investigating lexical cohesion thus providing more valuable data in support of the teaching of this standard of textuality in translator training programs.

GLOSSARY OF ACRONYMS

ANOVA = Analysis of Variance ASL = Average Sentence Length DV = Dependent Variable EN = English EO = English Original (Text) EFL = English as a Foreign Language F = F value IO = Italian Original (Text) IT = Italian Translation IV = Independent Variable L1 = First Language (native) L2 = Second Language (non-native) LCP = Lexical Cohesive Pattern LD = Lexical Density LSP = Language for Special Purposes M = Mean S = Sentence SD = Standard Devision SL = Source Language ST = Source Text SPSS = Statistical Package for the Social Sciences SSLMIT = Scuola Superiore di Lingue Moderne per Interpreti e Traduttori (Advanced School of Modern Languages for Interpreters and Translators)

157

158

STTR = Standardized Type-Token Ratio TL = Target Language TT = Target Text TTR = Type-Token Ratio

References

Abdel-Hafiz, A.S. “Lexical Cohesion in the Translated Novels of Naguib Mahfouz: the Evidence from the Thief and the Dogs.” Occasional Papers in the Development of English , Vol. 37, Oct. 2003 – Mar. 2004, pp. 63 – 88.

Aiwei, S. “The Importance of Teaching Cohesion in Translation on a Textual Level: A Comparison of Test Scores before and after Teaching.” In Translation Journal, 2004, Vol. 8, n° 2.

Alderson, J.C. “Reading in a Foreign Language: A Reading Problem or a Language Problem?.” In Alderson, J.C., & Urquhart, A.H. (eds.), Reading in a Foreign Language. London: Longman, 1984. Anderson, M. L. Lexical Cohesion in Emma: An Approach to the Semantic Analysis of Style. Ann Arbor, Michigan: Bell & Howell Company, 1997.

Aziz. “Translation and Pragmatic Meaning.” In Shunnaq et. Al (eds.), Issues in Translation. Irbid: Irbid Nation University & Jordanian Translators’ Association, 1998, pp. 119 – 141.

Baker, M. “Textual Equivalence: Cohesion.” In Baker, M. In Other Words, London: Routledge, 1992, pp. 180-215.

Baker, M. In Other Words: A Coursebook on Translation. London: Routledge, 1992.

Baker, M. “Corpus Linguistics and Translation Studies: Implications and Applications.” In Baker, M., Gill, F., & Tognini-Bonelli, E. (eds.), Text and Technology: In Honour of John Sinclair. Amsterdam: John Benjamins, pp. 233-250, 1993. Baker, M. “Corpora in Translation Studies: An Overview and some Suggestions for Future Research.” In Target, Vol. 7, n. 2, 1995, pp. 223 – 243.

Baker, M. “Corpus-based Translation Studies: The Challenges That Lie Ahead.” In Somers, H. (ed.), Terminology, LSP and Translation. Studies in Language Engineering in Honour of John C. Sager. Amsterdam: John Benjamins, pp. 175- 186, 1996. Baker, M. “A Corpus-based View of Similarity and Difference in Translation.” In International Journal of Corpus Linguistics, Vol. 9, n. 2, 2004, pp. 167 – 193.

159

160

Baker, P., Hardie, B., McEnery, T. A Glossary of Linguistics. Edinburgh: Edinburgh University Press, 2006. Bhatia, V.K. (Analysing Genre: Language Use in Professional Settings. London: Longman, 1993.

Bosseaux, C. How Does it Feel? Point of View in Translation: The Case of Virginia Woolf into French. Amsterdam/New York: Rodopi, 2007.

Bowker, L., & Pearson, J. Working with Specialized Language: A Practical Gude to Using Corpora. London: Routledge, 2002.

Beaugrande, De, R. Text, Discourse, and Process: Toward a Multidisciplinary Science of Texts. Norwood, NJ: ABLEX Publishing Corporation, 1980.

Beaugrande, De, R., & Dressler, W. Introduction to Text Linguistics. London & New York: Longman, 1981.

Beigman, B., & Shamir, E. “Lexical Cohesion: Some Implications of an Empirical Study.” In Bernadette Sharp (Ed.), Natural Language Understanding and Cognitive Science. Miami, FL: INSTICC Press, 2005, pp. 13-21.

Beigman, K.B., & Shamir, E. “Reader-based Exploration of Lexical Cohesion”. In Language Resources and Evaluation, 2006, Vol. 40, n° 2, pp. 109-126.

Beigman, B., Diermeier, D, & Beigman, E. “Lexical Cohesion Analysis of Political Speech: Web Appendix.” In Political Analysis, 2008, Vol. 16, pp. 447-463.

Berber Sardinha, Antonio Paulo. “Patterns of Lexis in Original and Translated Business Reports: Textual Differences and Similarities.” In Karl Simms, Translating Sensitive Texts: Linguistic Aspects, 1997, pp. 147-154.

Bernardini, S. “Exploring New Directions for Discovery Learning.” In Kettemann, B, & Marko, G. (Eds), Language and Computers, Teaching and Learning by Doing Corpus Analysis. Proceedings of the Fourth International Conference on Teaching and Language Corpora, Graz 19-24 July, 2000, pp. 165-182.

Bloom, B. Developing Talent in Young People. New York: Ballantine, 1985.

Blum-Kulka, S., & Levenston, E.A. “Universals of Lexical Simplification.” In Faerch, C., & Kasper, G. (eds.)., Strategies in Interlanguage Communication. London/New York: Longman, pp. 119-139, 1983. Blum-Kulka, S. “Shifts of Cohesion and Coherence in Translation.” In House J., Blum-

161

Kulka S. Interlingual and Intercultural Communication: Discourse and Cognition in Translation and Second Language Acquisition. Tübingen: Narr, 1986, pp. 17- 35 – reprinted in Lawrence Venuti. The Translation Studies Reader. London/ New York: Routledge, 2004, 298-313.

Brown, G., & Yule, G. “The Nature of Reference in Text and in Discourse.” In Brown, G., & Yule, G., Discourse Analysis. Cambridge, UK: Cambridge University Press, 1983, pp. 190-222.

Butler, C. Structure and Function: From Clause to Discourse and Beyond. Amsterdam: John Benjamins, 2003.

Campbell, K. S. Coherence, Continuity, and Cohesion: Theoretical Foundations for Document Design. Hillsdale, NJ: Lawrence Erlbaum Associates, 1995.

Carrell, P.L. “Cohesion Is Not Coherence.” In TESOL Quarterly, 1982, Vol. 16, n° 4, pp. 479-488.

Carrell, P.L. “Some Issues in Studying the Role of Schemata, or Background Knowledge, in Second Language Comprehension.” In Reading in a Foreign Language, 1983, Vol. 1, pp. 81-92.

Chueca Moncayo, F. J.” The Textual Function of Terminology in Business and Finance Discourse.” In JoSTrans, 3/2005, S. 40 – 63.

Cradler, James F. & Michael K. Launer. “Problems of Referential Cohesion in Russian- to-English Translation.” In Karl Kummer, Building Bridges, 1986, pp. 293-300.

Ericsson, K. A., Krampe, R. T., & Tesch-Roemer, C. “The Role of Deliberate Practice in the Acquisition of Expert Performance.” In Psychological Review, 1993, Vol. 100, pp. 363 – 406.

Eriksson, A. “Tense, Cohesion and Coherence” In Karin Aijmer, Hilde Hasselgård, Translation and Corpora, 2004, pp. 19-31.

Fellbaum, C. WordNet: An Electronic Lexical Database. Cambridge/London: The MIT Press, 1998.

Flesch, R. How to Test Readability. New York: Harper & Brothers, 1951.

Flowerdew, J., & Mahlberg, M. Lexical Cohesion and Corpus Linguistics. Amsterdam/Philadelphia: John Benjamins Publishing Company, 2009.

Folkart, B. “Cohesion and the Teaching of Translation.” In Meta, 1988, Vol. 33, n° 2, pp.

162

142-155.

Fulcher, G. “Cohesion and Coherence in Theory and Reading Research.” In Journal of Research in Reading, 1989, Vol. 12, n° 2, pp. 146-163.

Gunning, R. The Technique of Clear Writing. New York: McGraw-Hill, 1973.

Halliday, M. A. K. & Hasan, R. Cohesion in English. London: Longman Group Limited, 1976.

Halliday, M. A. K. An Introduction to Functional Grammar. London: Edward Arnold Limited, 1985.

Hansen-Schirra, S., Neumann, S. & Steiner, E. “Cohesion and Explicitness and Explicitation in an English-German Translation Corpus.” In Languages in Contrast, 2007, Vol. 7, No. 2, pp. 241-265.

Hasan, R. “Coherence and Cohesive Harmony.” In J. Flood (Ed.), Understanding Reading Comprehension. Newark, Del.: ITA, 1984, pp. 181-219.

Hatim, B., & Mason, I. “Discourse Texture.” In Hatim, B., & Mason, I., Discourse and the Translator. London: Longman, 1990, pp. 192-222.

Heiman,G.W. Understanding Research Methods and Statistics: An Integrated Introduction for Psychology. Boston/New York: Houghton Mifflin Company, 2001.

Hirst, G. & St-Onge, D. “Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms.” In Fellbaum, C. (Ed), WordNet: An Electronic Lexical Database. Cambridge, Massachusetts/London, England: The MIT Press, 1998, pp.305-332.

Hoey, M. Patterns of Lexis in Text. New York/Oxford/Toronto: Oxford University Press, 1991.

Inkpen, D., & Hirst, G. “Building and Using a Lexical Knowledge Base of Near- Synonym Differences.” In Computational Linguistics, 2006, 32 (2), pp. 223-262.

James, C. Contrastive Analysis. London: Longman, 1980.

Jobbins, A.C., & Evett, L.J. “Automatic Identification of Cohesion in Texts: Exploting the Lexical Organization of Roget‘s Thesaurus.” In Proceedings of ROCLING VIII, Taipei, Taiwan, 1995.

163

Kachroo, B. “Textual Cohesion and Translation.” In Meta, 1984, Vol. 29, n° 2, pp. 128- 134.

Kalina, M. Equivalence and Cohesion in Translation: A Study of a Polish Text and its English Translation. Master Thesis published at the University of Toledo, December 2000.

Kelly, D. A Handbook for Translator Trainers. Manchester: St. Jerome, 2005.

Khany, R., & Tazik, K. “The Relationship between Rhetorical Moves and Lexical Cohesion Patterns; the case of Introduction and Discussion sections of Local and International Research Articles.” In Journal of English Language Teaching and Learning, 2011, pp. 71-95.

Kirkpatrick, L.A., & Feeney, B.C. A Simple Guide to SPSS for Version 16.0. Wadsworth: Cengage Learning, 2009.

Klaudy, Kinga & Kristina Károly. “The Text-Organizing Function of Lexical Repetition in Translation.” In Maeve Olahan, Intercultural Faultlines: Research Models in Translation Studies 1. Textual and Cognitive Aspects, 2000, pp. 143-160.

Klebanov, B.B., Diermeier, D, & Beigman, E. “Lexical Cohesion Analysis of Political Speech: Web Appendix.” In Political Analysis, 2008, Vol. 16, pp. 447-463.

Kostopoulou, G. “The Role of Coherence in Text Approaching and Comprehension: Applications in Translation Didactics.” In Meta, 2007, LII (1), pp. 146-155. Krathwohl, D.R. “A Revision of Bloom’s Taxonomy. An Overview.” In Thoery into Practice, Vol. 41 (4), pp. 212-218. Krein-Kühle, Monika. “Cohesion and Coherence in Technical Translation: The Case of Demonstrative Reference.” In Leona Van Vaerenbergh, Linguistics and Translation Studies, Translation Studies and Linguistics, 2002, pp. 41-53.

Laviosa-Braithwaite, S. “Comparable Corpora: Towards a Corpus Linguistic Methodology for the Empirical Study of Translation.” In Thelen, M., & Lewandoska-Tomaszczyk, B. (eds.), Translation and Meaning Part 3. Maastricht: UPM, pp. 153-163, 1996. Laviosa, S. “The English Comparable Corpus: A Resource and a Methodogy.” In Bowker, L., Cronin, M., Kenny, D., Pearson, J. (eds.), Unity in Diversity: Current Trends in Translation Studies. Manchester: St. Jerome, 1998, pp. 101 – 112.

164

Laviosa, S. Corpus-based Translation Studies. Theory, Findings, Applications. Amsterdam: Rodopi, 2002.

Leech, G. “Corpora and Theories of .” In Svartvik, J. (ed.), Directions in Corpus Linguistics. Berlin: Mouton Gruyter, 1992, pp. 105 – 122.

Lonsdale, A.B. Teaching Translation from Spanish to English: Worlds beyond Words. Ottawa: University of Ottawa Press, 1996. Lotfipour-Saedi, Kazem. “Lexical Cohesion and Translation Equivalence.” In Meta, 1997, 42:1. pp. 185-192.

Lubelska, D. “An Approach to Teaching Cohesion to Improve Reading.” In Reading in a Foreign Language, 1991, 7 (2), pp. 569-596. Lucisano, P. Piemontese, M.E. “GULPEASE: Una Formula per la Predizione della Difficoltà dei Testi in Lingua Italiana.” In Scuola e Città, 3, 1988, pp. 110 – 124.

Mason, I. “Discourse, Ideology and Translation.” In Robert de Beaugrande, Abdulla Shunnaq, and Mohamed H. Heliel (eds.), Language, Discourse and Translation in the West and Middle East (John Benjamins, 1994), pp. 23–34.

McGee, I. “Traversing the Lexical Cohesion Minefield.” In ELT Journal Volume,2009, 63 (3), July, pp. 212-220. Morris, J., & Hirst, G. “Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text.” In Computational Linguistics, 1991, Vol. 17, n° 1, pp. 21-48.

Musacchio, M. T. "The distribution of Information in LSP Translation: A Corpus Study of Italian". In Ahmad K. & Rogers M. (Eds.). Evidence-based LSP: Translation, Text and Terminology. Frankfurt am Main: Peter Lang, 2007, pp. 97-117.

Muto, K. “The Use of Lexical Cohesion in Reading and Writing.” In Journal of School of Foreign Languages, 2006, Vol. 30, pp: 107-129.

Neubert, A., & Shreve, G.M. Translation as Text. Kent, OH: Kent State University Press, 1992.

Newman, P. A Textbook of Translation. Hemel Hempstead: Prentice-Hall, 1988.

Nord, C. Text Analysis in Translation: Theory, Methodology, Didactic Application of a Model for a Translation-oriented Text Analysis. Amsterdam & New York: Rodopi, 2005.

165

Okumura, M., & Honda, T. “Word Sense Disambiguation and Based on Lexical Cohesion.” In Proceedings of COLING-94, 1994, pp. 755-761.

Olohan, M. Introducing Corpora in Translation Studies. London: Routledge, 2004.

Pierce, C.S. Collected Papers. Cambridge, MA: Harvard University Press, 1933.

Roos, D. “Translation Features in a Comparable Corpus of Afrikaans Newspaper Articles.” In Stellenbosch Papers in Linguistics PLUS, Vol. 38, 2009, pp. 73 – 83.

Rosenblatt, L.M. “Writing and Reading: The Transactional Theory.” In Mason, J. Reading and Writing Connections. Newton, MA: Allyn & Bacon, 1989.

Salkie, R. Text and Discourse Analysis. London/New York: Routledge, 1995.

Sanchez, Escobar, A.F. “Teaching Textual Cohesion Through Analyses of Defoe’s Moll Flanders and Swift’s Gulliver’s Travels.” In Cauce, Revista de Filologia y su Didactica, 1999, 22-23, pp. 557-570. Scarpa, F. “Corpus-based Quality-Assessment of Specialist Translation: A Study Using Parallel and Comparable Corpora in English and Italian.” In Gotti, M., & Šarčevič, S. Insights into Specilized Translation. New York/Oxford: PETER LANG, 2006, pp. 155 – 172.

Schiffrin, D. Approaches to Discourse. Oxford: Blackwell, 1994.

Séguinot, C. “ and the Explicitation Hypothesis”. In TTR, 1988, Vol. 1, Number 2, pp. 106-113.

Shreve, G. “Knowing Translation: Cognitive and Experiential Aspects of Translation Expertise from the Perspective of Expertise Studies.” In Alessandra R. (Ed.), Translation Studies: Perspectives on an Emerging Discipline. Cambridge: Cambridge University Press, 2002, pp. 150 – 171.

Sinclair, J. Corpus, , Collocation. Oxford: Oxford University Press, 1991.

Sinclair, J.McH. Trust the Text. Language, Corpus and Discourse. London: Routledge, 2004.

Singh, R. “Contrastive Textual Cohesion.” Montreal, Université de Montréal. Unpublished.

Steiner, E. “Explicitation, its Lexicogrammatical Realization, and its Determining Variables - Towards an Empirical and Corpus-based Methodology.” In

166

SPRIKreport, 2005, no. 36, Dec. 2005.

Stubbs, M. “British Traditions in Text Analysis: From Firth to Sinclair.” In Baker, M., Francis, F. & Tognini-Bonelli, E. (eds.). Text and Technology: In Honour of John Sinclair. Amsterdam: John Benjamins, 1993, pp. 1 – 36.

Stubbs, M. Text and Corpus Analysis. Computer-assisted Studies of Language and Culture. Oxford and Cambridge, Mass.: Blackwell, 1996.

Stubbs, M. “Computer-assisted Text and Corpus Analysis: Lexical Cohesion and Communicative Competence.” In Schiffrin, D., Tannen, D. & Hamilton, E. (Eds), The Handbook of Discourse Analysis. Malden Massachusetts, Oxford: Blackwell, 2001, pp. 304-320.

Stoddard, S. Text and Texture: Patterns of Cohesion. Norwood, NJ: Ablex Publishing Corporation, 1991.

Swales, John M. Genre Analysis: English in Academic and Research Settings. Cambridge: Cambridge University Press, 1990.

Taylor, C. “What is Corpus Linguistics: What the Data Says.” In ICAME Journal, 32/2008, pp. 179 – 200.

Teich, E., & Fankhauser, P. “WordNet for Lexical Cohesion Analysis.” In Soijika, P., Pala, K., Smrz, P., Fellbaum, C., & Vossen, P. (Eds.), Proceedings of the 2nd Global WordNet Conference, Masaryk University, Brno, Czech Republic, Jan, 2004, pp. 326-331.

Thompson, G. Introducing Functional Grammar. London: Edward Arnold, 1996.

Thompson, G., & Hunston, S. System and Corpus: Exploring Connections. London: Equinox, 2006.

Tirkkonen-Condit, Sonja & Jukka Mäkisalo. “Cohesion in Subtitles: A Corpus-based Study.” In Across Languages and Cultures, 2007, 8:2. pp. 221-230.

Tognini-Bonelli, E. Corpus Linguistics at Work. Amsterdam: John Benjamins, 2001.

Tsai, Y. “Text Analysis of Patent Abstracts.” In The Journal of Specialized Translation, 13/2010, pp. 61 – 80.

Valdes, C. & Fuentes, L.,A. “Coherence in Translated Television Commercials.” In European Journal of English Studies, 2008, Vol. 12, N. 2, pp. 133-148.

167

Vanderauwera, R. Dutch Novels Translated into English: The Transformation of a “Military” Literature. Amsterdam: Rodopi, 1985. Vinay, J.P., & Darbelnet, J. Stylistique Comparée du Français et l’Anglais. Méthode de Traduction. Paris: Didier, 1958. Widdowson, H. G. Teaching Language as Communication. Oxford: Oxford University Press, 1978.

Xuanmin, L. “A Textual-Cognitive Model for Translation” In Perspectives, 2003, 11:1. pp. 73-79.

Webography:

Mike, S. “WordSmith Tools.” Lexically.Net. 2010, Version 5.0. July 15th 2011 . Translatednet Labs. September 5th 2011

Université de Neuchâtel. September 20th, 2011

WordNet: A Lexical Database for English. Princeton University. September 20th.

WordNetweb.princeton.edu>

MultiWordNet. Fondazione Bruno Kessler. September 20th, 2011