7/7/2015

Translators use language that is not their own

Advantages and Challenges of Using Corpora in Practice

Ana Frankenberg-Garcia

Tuesday, 07 July 2015 1 Tuesday, 07 July 2015 3

Translators use language that is not their own Translators use language that is not their own

Query a corpus!

Not enough?

Tuesday, 07 July 2015 4 Tuesday, 07 July 2015 5

Why should translators use corpora? Why should translators use corpora?

• Instant access to combined intuitions of dozens (or I am looking hundreds) of language users forward to hearing • Without informants having to think from you… • Without translators having to bother informants • Answers to questions that have not been documented in dictionaries, termbases and other edited resources • Equivalence • Terminology I look forward to • Phraseology hearing from you… • Translation decisions

e.g. Aston 1999, Bowker & Pearson 2002, Zanettin et al. 2003, Beeby et al. 2009, Rodríguez-Inés 2010, Kubler 2011, Zanettin 2012, Gallego- Hernández 2015, etc.

Tuesday, 07 July 2015 6 Tuesday, 07 July 2015 7

1 7/7/2015

Why should translators use corpora?

Business Letter Corpus A collection of around 1 million words of U.S. and U.K. business letters compiled into a corpus by Professor Yasumasa Someya for his MA at the University of Tokyo looking forward to hearing Free online access http://www.someya-net.com/concordancer

Tuesday, 07 July 2015 8 7/7/2015 9

look forward to hearing

7/7/2015 11

Interpreting corpus data

• Empirical evidence • But no clear explanations and rules • It’s up to users to interpret the data • And results are only as good as the corpus

looking forward Business Letter Corpus to…

look forward to…

BNC spoken looking look forward forward to… to…

Tuesday, 07 July 2015 13

2 7/7/2015

Interpreting corpus data What queries can be useful to translators?

• Interpretation is a matter of common sense • Corpora are made of natural language • They contain • Mistakes Concordances? • Frequencies? Non-standard usage Collocations? • What is interesting is what is conventional Word lists?

Tuesday, 07 July 2015 14 Tuesday, 07 July 2015 15

Concordances Concordances

BNC concordance query: married sort right

Portuguese > Casada com um francês Literal English translation > *Married with a Frenchman

Is this right?

Tuesday, 07 July 2015 16 Tuesday, 07 July 2015 17

Concordances Concordances

BNC concordance query: married sort right

English > They are very flexible nowadays. Portuguese literal translation > Eles são muito flexíveis hoje em dia

Does this sound good?

Tuesday, 07 July 2015 18 Tuesday, 07 July 2015 19

3 7/7/2015

Concordances Concordances

BNC concordance query (sentence view): nowadays ptTenTen concordance query (sentence view): hoje em dia

Tuesday, 07 July 2015 20 Tuesday, 07 July 2015 21

Concordances Parallel concordances

EuroParl corpus concordance query: in my opinion, align with French

English > In my opinion Literal French translation > *Dans mon opinion

Is this right?

Tuesday, 07 July 2015 22 Tuesday, 07 July 2015 23

Frequencies Frequencies

enTenTen frequency query: baked|roast|roasted fish BNC frequency query, distribution per text type: preoccupied, worried

Portuguese> preocupado Portuguese> peixe assado English translation > preoccupied English translation > baked fish? worried roast fish? roasted fish?

Which is better? worried

preoccupied

Tuesday, 07 July 2015 24 Tuesday, 07 July 2015 25

4 7/7/2015

Collocations Collocations

enTenTen collocation query: opinion ptTenTen collocation query: opinião

What verbs can I How about opinião use before in Portuguese? opinion? What adjectives can I use before opinion?

Tuesday, 07 July 2015 26 Tuesday, 07 July 2015 27

Collocations: subtle differences Collocation: subtle differences

Electric or electrical?

Portuguese > elétrico English translation > electric or electrical

Which one?

Tuesday, 07 July 2015 28 Tuesday, 07 July 2015 29

Collocation: subtle differences Collocation: looking for equivalence I Safety or security? PT - dente EN- tooth

Tuesday, 07 July 2015 30 Tuesday, 07 July 2015 31

5 7/7/2015

Collocation: looking for equivalence II Collocation: looking for equivalence III

Kilgarriff et al. (2013) Navigating through bilingual word sketches (Kilgarriff et al. 2013) Character Personagem [character]

ficticious? principal?

feminine?

Tuesday, 07 July 2015 32 Tuesday, 07 July 2015 33

32

Collocation: specialized vs non-specialized language Word lists

Goal mg autorisation vial affections Verb collocates of goal Verb collocates of goal ribavirin medicament Becoming in general English corpus in DIY football corpus olanzapine comprimé (enTenTen) acquainted with • Score new terminology excipients insuline • Accomplish • Disallow ritonavir hypoglycémie • Attain • Award expiry olanzapine • Reach pharmacist ribavirine • Concede hémoglobine • Set administration ml excipients • Pursue insulin ritonavir • Achieve dose administration • Score authorisation authorisation • Fulfill syringe posologie warning precautions dosage mg EMEA English corpus EMEA French corpus compared with BNC compared with frTenTen via via Sketch Engine Tuesday, 07 July 2015 34 Tuesday, 07 July 2015 35

Word lists From terminology to phraseology

oral use mg Becoming marketing authorisation vial acquainted with for injection ribavirin new terminology medicinal products Phraseology? olanzapine authorisation holder excipients film-coated tablets ritonavir special warning expiry batch number pharmacist expiry date administration injection site ml active substance insulin package leaflet dose pre-filled syringe authorisation special precautions syringe adverse reactions warning hepatic impairment dosage EMEA English corpus EMEA English corpus compared with BNC compared with BNC via Sketch Engine via Sketch Engine Tuesday, 07 July 2015 36 Tuesday, 07 July 2015 37

6 7/7/2015

From terminology to phraseology Consistency

Dose?

• 375 MB, 1552 pages

EMEA English corpus • Two-week deadline

Tuesday, 07 July 2015 38 Tuesday, 07 July 2015 39

Consistency Consistency

IPCC report 2013 – volume 1 – The Physical Science Basis

• Extract ST terminology Huge source text, • Aerosol impossible deadline Split translation • Tropospheric • Stratospheric • Cryosphere • Paleoclimate • Interannual variability • North Atlantic Oscillation (NAO) • Global mean surface temperature • Coupled Model Intercomparison Project Phase 5 (CMIP 5) • Agree on TT terminology beforehand to ensure consistency among different translators • Feed it to a terminology management system like SDL MultiTerm

Tuesday, 07 July 2015 40 Tuesday, 07 July 2015 41

Interpreting More

Populating translation memories with existing parallel corpora

• Extract ST terminology Preparing for IPCC • Aerosol interpreting job • Tropospheric • Stratospheric • Cryosphere • Paleoclimate • Interannual variability • North Atlantic Oscillation (NAO) • Global mean surface temperature • Coupled Model Intercomparison Project Phase 5 (CMIP 5) • Research TT equivalent terms • Sight translate concordances with terms to prepare and practice (Xu 2015)

Tuesday, 07 July 2015 42 Tuesday, 07 July 2015 43

7 7/7/2015

Do translators use corpora? Access to corpora today

Tuesday, 07 July 2015 44 45

Corpus compilation today Removing barriers is not enough

Thousands of digital texts available

46 Tuesday, 07 July 2015 47

Awareness of corpora Do translators want training?

• No pressure from the industry • Many translation jobs demand use of TM systems • Survey by Bernardini (2006) – 623 responses, mostly UK and Europe • 42% never heard of corpora • Software developers/agencies/corporate clients profit • Corpora not mentioned • Survey by Gough (2013) – 540 responses, mostly EU • No immediate gains for big stakeholders • 62% Least used technology (compared with term bases, translation memories, glossaries, web searches) • Why bother? • don’t get cheaper  • Survey by Gallego-Hernández (2015) – 526 responses, Spain • No obvious productivity gains  • 50% never or almost never used corpora • Quality  • Reassurance  • Flexibility and autonomy 

Tuesday, 07 July 2015 48 Tuesday, 07 July 2015 49

8 7/7/2015

Implementing corpora in translator education Corpora in translator education

modules often available, but… • Is it worth the trouble? • Focus on linguistics (not specific to translation) • Corpora & translation mostly translation studies research • Optional corpus module for translation practice at Surrey (especially after Baker 1993) • Focus not on research, but on everyday translation • Not simple to teach about corpora for translation practice • 11 weeks, 22 hours • Especially in UK • 13 students • Students translate into and out of many languages • Complicated multilingual setting • Corpus instructors vs translation instructors • General imbalance regarding corpora of different French > English English > German Spanish > English languages Russian > English Portuguese > English English > Chinese • Even when available (e.g. BNC, CREA, DeReKo), interfaces German > English English > Greek English > Italian and query languages differ • Task-based, consciousness-raising activities (Frankenberg-Garcia 2012)

Tuesday, 07 July 2015 50 Tuesday, 07 July 2015 51

Corpora in translator education Reactions

1. Guided tasks with English corpora 2. Students try out similar queries using corpora of other languages • Main corpora used Two types of data • BYU: COCA & BNC (Davies 2004, 2008) • End-of-semester anonymous questionnaires • OPUS collection (Tiedemman 2012) • End-of semester written assignments • EMEA, ECB, EuroParl, OpenSubtitles • Sketch Engine (Kilgarriff et al. 2004, 2014) • Via the same interface and using same query language , access to • BNC, OPUS collection • Huge webcrawled corpora in 60 languages (Jakubíček et al. 2013) • DIY corpora • Pre-selected texts • Crawl the web via WebBootCaT, (Baroni et al. 2006) • TMX (parallel)

Tuesday, 07 July 2015 52 Tuesday, 07 July 2015 53

Questionnaire Questionnaire

Before the MA Translation Self-assessment after completion of the module

I had never heard of corpora before my 12 true I understand the strengths 3 strongly agree 6 agree MA. 1 false and limitations of different 4 neither agree nor disagree I had already used a corpus hands-on 1 true types of corpora. 0 disagree before I started my MA. 12 false 0 strongly disagree

Tuesday, 07 July 2015 54 Tuesday, 07 July 2015 55

9 7/7/2015

Questionnaire Questionnaire

Self-assessment after completion of the module Self-assessment after completion of the module

I can carry out simple word 4 strongly agree I can carry out queries 2 strongly agree 9 agree 11 agree queries to retrieve KWIC 0 neither agree nor disagree involving more than one 0 neither agree nor disagree concordances. 0 disagree word. 0 disagree 0 strongly disagree 0 strongly disagree

Tuesday, 07 July 2015 56 Tuesday, 07 July 2015 57

Questionnaire Questionnaire

Self-assessment after completion of the module Self-assessment after completion of the module

I understand the difference 4 strongly agree I can use part-of-speech tags 2 strongly agree 8 agree 8 agree between looking up lemmas 1 neither agree nor disagree in my queries. 1 neither agree nor disagree and looking up plain words. 0 disagree 2 disagree 0 strongly disagree 0 strongly disagree

Tuesday, 07 July 2015 58 Tuesday, 07 July 2015 59

Questionnaire Questionnaire

Self-assessment after completion of the module Self-assessment after completion of the module

I can use corpora to retrieve 5 strongly agree I am able to compare the 4 strongly agree 8 agree 8 agree information about 0 neither agree nor disagree frequencies of different 1 neither agree nor disagree collocation. 0 disagree words or combinations of 0 disagree 0 strongly disagree words within a corpus. 0 strongly disagree

Tuesday, 07 July 2015 60 Tuesday, 07 July 2015 61

10 7/7/2015

Questionnaire Questionnaire

Self-assessment after completion of the module Self-assessment after completion of the module

I am able to use normalized 1 strongly agree I am able to build a simple 5 strongly agree 6 agree 7 agree frequencies to compare 0 neither agree nor disagree corpus on my own. 0 neither agree nor disagree words across different 3 disagree 0 disagree corpora or sub-corpora. 1 strongly disagree 1 strongly disagree

Tuesday, 07 July 2015 62 Tuesday, 07 July 2015 63

Questionnaire Questionnaire

Opinion about corpus output Present uses of corpora outside module I find concordances 8 strongly agree helpful. 5 agree 0 neither agree nor disagree I use corpora to help me when I am writing 0 very often 0 disagree in my native language. 2 often 0 strongly disagree 6 sometimes I find word lists & 7 strongly agree 3 rarely frequencies helpful. 4 agree 2 never 2 neither agree nor disagree I use corpora to help me when I am writing 3 very often 0 disagree in a language that is not my native language. 5 often 0 strongly disagree 3 sometimes I find collocation 10 strongly agree 2 rarely queries helpful. 3 agree 0 never 0 neither agree nor disagree 0 disagree 0 strongly disagree

Tuesday, 07 July 2015 64 Tuesday, 07 July 2015 65

Questionnaire Questionnaire

Present uses of corpora outside module Future uses of corpora

I use corpora to help me with my translation 1 very often I am likely to look things up in 3 strongly agree assignments. 5 often corpora during my translation 7 agree 6 sometimes 2 neither agree nor disagree exams or for writing my MA 0 rarely 1 disagree 0 never dissertation. 0 strongly disagree I use corpora for other purposes. 0 very often I am likely to carry on using corpora 6 strongly agree 2 often in the future in my work as a 6 agree 3 sometimes 1 neither agree nor disagree translator. 4 rarely 0 disagree 4 never 0 strongly disagree

Tuesday, 07 July 2015 66 Tuesday, 07 July 2015 67

11 7/7/2015

Student assignments Student assignments: corpus analysis

• Graded piece of assessment about student uses of Search terms (lemmas) Concept Concept corpora in translation with examples mentioned used • 3000 word limit (excluding references) • Corpus of 47,123 running words frequency, hit, occurrence, token 13 13 • Overall picture concordance 12 13 lemma 5 13 • Essays read from beginning to end collocation, collocate, word sketch 12 12 • Detailed analysis word/frequency list 7 7 keyword list, keyness 6 5 part-of-speech, part of speech, pos, tag 8 3 relative/normali(zs)ed frequency 2 2

Tuesday, 07 July 2015 68 Tuesday, 07 July 2015 69

Student assignments: more details Student assignments: more details

Translation decisions Collocation • French > English translation of victoire total [total victory] • English > Chinese technical translation • Looked up collocates of victory in the BNC • Student not sure whether translation of mRNA should keep the English form mRNA or use the Chinese form • Unhelpful results 信使RNA • Labour/Conservative/great victory • Looked up frequencies zhTenTen corpus • Reformulated query so as to– look up synonyms of total in • mRNA - 1674 the context of victory • 信使RNA – 106 • final/outright/complete/conclusive victory • Corpus helped her decide to use English form • Immediately spotted what she considered to be best option in context of translation: outright victory

Tuesday, 07 July 2015 70 Tuesday, 07 July 2015 72

Student assignments: more details Student assignments: more details

Collocation Collocation • Spanish > English business translation • Student not sure how to translate cuadro Had student tried a collocation query to macroeconómico into English identify which nouns follow • Tried out several concordance queries in enTenTen macroeconomic… • Macroeconomic picture – 67 Source: enTenTen corpus • Macroeconomic projection – 35 • Macroeconomic prediction – 9 • Chose to use literal translation but not too happy

Tuesday, 07 July 2015 73 Tuesday, 07 July 2015 74

12 7/7/2015

Student assignments: more details Student assignments: more details

Terminology Translation decisions • Russian > English translation of short story • German > English business translation • Iра - это не девушка –a мальчик • Compiled a small corpus of different types of companies to become more familiar with terminology in this area • [Ira – not a girl - but a boy] • Used her DIY corpus to look up terms with liability and • ruTenTen: Ira very common and always a female came up with • BNC: Ira used mostly for Irish Republican Army, • joint liability occasionally for male name • non-current liabilities • COCA: Ira usually a man’s name • interest-bearing liabilities, etc. • Decided to make the translation more explicit for English readers • Added terms to her glossary of business terminology • Ira was not, as the name seemed to imply, a girl, but a boy.

Tuesday, 07 July 2015 75 Tuesday, 07 July 2015 76

Student assignments: more details Student assignments: more details

Opinions about parallel corpora Opinions about DIY corpora “Of all the types of corpora available, parallel are undoubtedly the easiest for translators to draw conclusions “Although my corpus was put together in only a matter of from because the necessary information can be accessed minutes, it still allowed me to study terminology and immediately and terms can be directly compared to their phraseology related to astronomy in a reasonable amount of equivalents in another language” depth”

“The parallel corpus often produced few results.” “I find that compiling corpora is more suitable for researchers, linguists and teachers, rather than translators Only a very small part of what people in general say or write ever gets to be and interpreters.” translated, which seriously limits the number and types of texts available for the compilation of parallel corpora. Indeed, this is one of the main reasons why parallel corpora are usually much smaller in scale than monolingual corpora. Frankenberg-Garcia (2009: 60)

Tuesday, 07 July 2015 77 Tuesday, 07 July 2015 78

Student assignments: more details Student assignments: more details

Opinions about learning to use corpora Opinions about using corpora “One thing that can make using corpora time-consuming is that once concordances are begun, in my experience, I can find myself looking “The translator spend a huge amount of time familiarise him further and often find interesting things out that I wasn’t looking for in or her with the tool and then spend extra effort on mastering the first place, which isn’t necessarily a negative observation.” the code and tag language these things, but he or she may never use some of the functionalities in a corpus” “Compared to dictionaries, they [corpora] offer translators with extensive genuine examples in various contexts, thus can be a powerful complementary tool for understanding the usage of language. However, “Overall, it has been a useful resource but has been limited it is also noted that translators should be careful with their own by my relative inexperience of applying the available interpretations for data presented by corpora and examine the reliability functions and occasional searches taking too long” of some examples in corpora before making further analysis.”

“Although corpus is highly informative, it is no substitute for other “the use of corpora […] takes some time to get used to but authoritative resources like dictionaries. A better solution would be to has proved to be a good resource for translation practice”. combine them both and utilise the advantages of both.”

Tuesday, 07 July 2015 79 Tuesday, 07 July 2015 80

13 7/7/2015

Student assignments: more details Conclusion

Opinions about the usefulness of corpora • Students not power users of corpora • Some much better than others “my comparable [DIY] corpora saved me time and effort.” • Could do with a lot more training “unexpected insights on the native language […] a precious • POS queries resource especially in regards with working into a non-native • Normalized frequencies language, in this case English, during the writing of essays.” • Extracting terminology “Producing an authentic-sounding TT is, however, especially • Researching phraseology difficult when you are working out of your native language • But were able to successfully carry out many corpus and I therefore found corpora to be especially useful when queries for which online dictionaries, glossaries and web translating a text about an Aztec artefact from English into my searches and other more conventional resources wouldn’t non-native language, German” have provided satisfactory answers • As with any new technology, it is likely that the more they use corpora the better they will be able to use them

Tuesday, 07 July 2015 81 Tuesday, 07 July 2015 82

More details References

Aston, G. (1999) ‘Corpus use and learning to translate’. Textus, 12, pp. 289-314. Aston, G. (2009) Foreword. In Beeby, A., Rodríguez, P. & Sánchez-Gijón, P. Frankenberg-Garcia, A. (forthcoming, 2015) Training (eds.), ix-x. translators to use corpora hands-on: challenges and Baker, M. (1993), ‘Corpus linguistics and translation studies. Implications and applications’, in M. Baker, G. Francis and E. Tognini-Bonelli (eds) Text and reactions by a group of 13 students at a UK university. Technology: In Honour of John Sinclair. Amsterdam and Philadelphia: John Corpora, 10/2. Benjamins, 233-250. Baroni, M., Kilgarriff, A., Pomikalek, J. & Rychly, P. (2006) ‘WebBootCaT: Instant domain-specific corpora to support human translators’. Proceedings of EAMT-2006, pp. 247-252. Beeby, A., Rodríguez, P. & Sánchez-Gijón, P. (2009) (eds.) Corpus use and translating. Amsterdam and Philadelphia: John Benjamins. Bernardini, S. (2006) Corpora for Translation Education and Translation Practice: Achievements and Challenges. Proceedings of the Third International Workshop on Language Resources for Translation Work, Reseach & Training (LR4Trans-III) Available online at: http://www.ifi.unizh.ch/cl/yuste/LR4Trans-III/materials/silvia.pdf Bowker, L. and Pearson, J. (2002) Working with Specialized Language: a practical guide to using corpora. London: Routledge. Tuesday, 07 July 2015 83 84

References References

Davies, M. 2004. BYU-BNC. (Based on the from Oxford Kilgarriff, A., P. Rychlý, P. Smrz and D. Tugwell (2004) The Sketch Engine. University Press). Available online at http://corpus.byu.edu/bnc/. Proceedings of Euralex. Lorient, France.Kubler, N. (2011) Working with Davies, M. 2008. The Corpus of Contemporary American English: 450 million words, Corpora for Translation Teaching in a French-speaking setting. In 1990-present. Available online at http://corpus.byu.edu/coca/. Frankenberg-Garcia, A., Flwerdew, L. and Aston G. (eds) New Trends in Frankenberg-Garcia, A. 2009. ‘Compiling and Using a Parallel Corpus for Research in Corpora and Language Learning. London: Continuum, 62-80. Translation’. International Journal of Translation, XXI, 1, pp 57-71. Rodríguez-Inés, P. (2010) ‘Electronic Corpora and Other ICT (Information and Frankenberg-Garcia, A. (2012) Raising Teacher’s awareness of corpora. Language communication technologies) tools: an integrated approach to translation Teaching, vol 45, 4, pp 475-489. teaching’. The Interpreter and Translator Trainer, 4, 2, pp. 251-282. Gallego-Hernández, D. (2015) ‘The use of Corpora as translation resources: a study Tiedemann, J. (2012) Parallel Data, Tools and Interfaces in OPUS. [pdf] based on a survey of Spanish professional translators’. Perspectives: Studies in In Proceedings of the 8th International Conference on Language Resources Translatology, DOI 10.1080/0907676X.2014.964269. and Evaluation (LREC'2012), 2214-2218. Gough, J. (2013) Survey of professional translators’ use of on-line resources for Xu, Ran (2015) Using comparable corpora for simultaneous interpreting terminology research. Unpublished interim PhD report, September 2013, preparation. Paper presented at the IV International Conference on Corpus University of Surrey. Use and Learning to Translate (CULT), Alicante, 27-29 May 2015. Jakubíček, M. Kilgarriff, A., Kovář, V., Rychlý, P. & Suchomel, V. (2013) The TenTen Zanettin, F., Bernardini, S. & Stewart, D. (2003) (eds.) Corpora in Translator corpus family. Paper presented at ​7th International Corpus Linguistics Education. Manchester: St. Jerome. Conference, Lancaster, July 2013. Zanettin, F. (2012) Translation-Driven Corpora. Corpus Resources for Descriptive Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J, Rychlý, P. & and Applied Translation Studies. Manchester: St. Jerome. Suchomel, V. (2014) The Sketch Engine: ten years on. Lexicography, 1/1, 7-36.

85 86

14 7/7/2015

Advantages and Challenges of Using Corpora in Translation Practice To Adam Kilgarriff

12 Feb 1960—16 May 2015

Tuesday, 07 July 2015 87 Tuesday, 07 July 2015 88

15