The MT Developer/Provider and the Global Corporation
Total Page:16
File Type:pdf, Size:1020Kb
The MT developer/provider and the global corporation Terence Lewis1 & Rudolf M.Meier2 1 Hook & Hatton Ltd, 2 Siemens Nederland N.V. [email protected], [email protected] Abstract. This paper describes the collaboration between MT developer/service provider, Terence Lewis (Hook & Hatton Ltd) and the Translation Office of Siemens Nederland N.V. (Siemens Netherlands). It involves the use of the developer’s Dutch-English MT software to translate technical documentation for divisions of the Siemens Group and for external customers. The use of this language technology has resulted in significant cost savings for Siemens Netherlands. The authors note the evolution of an ‘MT culture’ among their regu- lar users. 1. Introduction 2. The Developer’s perspective In 2002 Siemens Netherlands and Hook & Hat- Under the current arrangement, the Developer ton Ltd signed an exclusive agreement on the (working from home in the United Kingdom) marketing of Dutch-English MT services in the currently acts as the sole operator of an Auto- Netherlands. This agreement set the seal on an matic Translation Environment (Trasy), process- informal partnership that had evolved over the ing documents forwarded by e-mail by the Trans- preceding six-seven years between Siemens and lation Office of Siemens Netherlands in The the Developer (Terence Lewis, trading through Hague. Ideally, these documents are in MS Word the private company Hook & Hatton Ltd). Un- or rtf format, but the Translation Office also re- der the contract, Siemens Netherlands under- ceives pdf files or OCR output files for automatic takes to market the MT Services provided by translation. After conversion to a machine-pro- the Developer to divisions within the Siemens cessable format, these files need to be cleaned Group and to prospective external customers. up, otherwise the MT output will be markedly This arrangement has proved to be a win- inferior to the agreed baseline. This clean-up is win situation for both Siemens Netherlands and done by either party by arrangement. the Developer. It has given Siemens Nether- The documents are processed and returned lands access to translation technology at a time on the basis of an agreed delivery date. Short when it has been involved in major cross-border documents are often returned within the hour. infrastructure projects with consortium partners They are not post-edited or revised line by line, that need rapid access to usable English transla- but the Developer does quickly scan each page tions of Dutch technical and legal documenta- to spot any gross errors or omissions. Sentences tion (technical requirements, relevant legisla- containing gross errors are usually sent back to tion and regulations etc.). Siemens is responsi- the MT system after further preparation work. ble for the complete administration of the MT Broadly speaking, users receive automatically customers, and this set-up has provided the De- generated documents. These are either edited by veloper with a stable stream of revenue ena- the end users themselves or used “as is” by en- bling him to devote more time to further devel- gineers familiar with the technical background opment of the software. This paper provides a to the documents. mostly practical account of this relationship, In addition to the run-of-the-mill jobs that first through the eyes of the Developer, and pass through a corporate translation office, the then from the perspective of the global corpora- automatic translation environment has been tion. used to handle huge volumes of documentation EAMT 2005 Conference Proceedings 176 The MT developer/provider and the global corporation from such major projects as the HSL-Zuid (the nal delivery tool and only sending to the MT high-speed rail link between Amsterdam and engine sentences that are not already present in Paris), the RandstadRail light-rail project in the the translation memory database. The MT soft- Hague Region, and a huge construction project ware produces a file that can imported directly for an external customer, Shell International. into the translation memory. This selection, ex- Trasy is built around a small LAN compris- traction and integration is performed semi- ing 4 PCs: automatically. Every job the Developer does is ultimately stored in the translation memory da- • a Linux gateway to the Internet tabase. Figure 1 shows an example of transla- • two Linux translation servers tion unit in an aligned file produced by the MT • a translator workstation/client running un- software. It should be noted that the MT soft- der Windows 98 ware can also be used on a stand-alone basis, This configuration has been set up to facilitate producing an html output file. the handling of a flow of jobs in an office envi- ronment. Of course, the number of translation <TrU> servers and workstation/clients can be increased <Quality>98 to meet any required number of translator/ope- <CrU>LEWIS! rators. All the dictionary building work is done <CrD>15042004,08:38:04 on the translator workstation and the up-to-date <Seg L=NL_NL>De hoofddraagconstructie is van lexical resources are then uploaded together with staal. the files to be translated to the Linux servers. <Seg L=EN-GB>The main supporting structure is of Typically, the Developer/Provider will start the steel. terminology work for the next job while the cur- </TrU> rent job is being processed. This client/server Figure 1: Translation unit produced by MT software arrangement has been found to the most practi- cal one, but it is possible (although somewhat The following software tools are used in this inconvenient) to put the whole set-up on a pow- translation environment: erful notebook running either Linux (or Solaris) or Windows XP (or both) that functions as a • Resource Building Tools mobile translation workstation. • Dutch-English machine translation software The use of two translation servers by a sin- • Trados Translator’s Workbench gle operator makes it easy to jump the queue to The Resource Building Tools, written by the handle shorter jobs when a long job is already Developer, include conventional single-word dic- running. It also enables the Developer to run the tionary-building tools plus tools for building re- same job simultaneously on two successive builds sources consisting of multiword expressions up of the software, thus providing a simple mecha- to idiomatic translations of complete sentences. nism for observing the success or failure of in- The utility called “ResourceBuilder” can pro- tended software improvements. This is vital since duce useful sub-sentence segments from trans- the Dutch-English MT application, written en- lation memory databases, aligned source/target tirely in Java, is seen as a dynamic process – files and even different language versions of most large jobs will generate at least one im- web pages. It is the key factor in securing user provement in the software. And the process of acceptance of the automatically generated improvement has been customer-driven: mis- translations since many of the constituent parts takes that were tolerated six years ago are now of sentences are taken from “human transla- pointed out with alarm. tions”. The single-word dictionary and the Re- sources Repository are basically simple re- 3. Software tools source files (enhanced text files) which are only All jobs are processed using the already de- compiled at run-time. The Resources Repository scribed approach combining Machine Transla- contains all lexical items that are not single- tion and Translation Memory. This involves us- word entries. ing a translation memory application as the fi- Examples of entries in this repository are: EAMT 2005 Conference Proceedings 177 Lewis and Meier “Tot het werk behoort”, “<MF>The work and, with sufficient data, can produce complete includes</MF>„ translations independently within limited do- (literal translation =„to the work belongs”) mains. Figure 3 shows the translation of a com- “van den Berg”, “<N1 semType= plete sentence produced by mining the Re- „proper_noun”>van den Berg</N1>„ sources Repository with one word being looked (Dutch surname, not “of the Mountain”!) up in the single-word dictionary. After this module has run, the sentence is “in dit document wordt beschreven”, “<MF>this then passed to a Rule-Based Machine Transla- document describes</MF>„ tion module which applies various known MT (literal translation: “in this document is described”) techniques to the sentence. These include the “Centrale Velsen”, “<N1>Velsen Power parsing of “clusters” or chunks of text, recogni- Plant</N1>„ tion of known patterns and techniques for “certificerende instantie van de opdrachtgever, achieving semantic disambiguation based on “<N1>client’s certification agency</N1>„ the identification of the subject-matter of the (literal translation: “certifying instance of the document. At this stage, for instance, the string client”) “niet zal kunnen worden gekozen” would al- “Dit systeem is gelaagd opgebouwd”, “<MF>this ready be identified and tagged as a verbal phrase system has been built up in layers</MF>„ and the provisional translation “will not be able (literal translation: “This system is layered built to be chosen” would be allocated for it (the up”) word-for-word translation being “not will can be chosen”). The term “provisional” is used Figure 2: Examples of repository entries since other modules might “decide” to change The Dutch-English translation software is a mo- this translation at a later stage. These various dular application incorporating a hybrid approach techniques are applied “sentence down”, i.e. the to machine translation. Since the Developer be- last series of rules (contained in an Issu- lieves that this approach is critical in achieving eResolver module) deals with single words. In an acceptable quality level for Siemens Nether- the final part of the ‘run-time sequence’, the lands users, a few paragraphs will be devoted to sentence is passed to the EnglishStyleEditor the internal flow of the application.