Term Extraction Process

www.sdl.com SDL MultiTerm Extract 2017 Tools Guide MultiTerm Extract Tools User Guide SDL MultiTerm 2017 November 2016 ii MultiTerm Extract Tools User Guide A Legal notice 0 Legal notice Copyright and trademark information relating to this product release. Copyright © 2000–2016 SDL Group. SDL Group means SDL PLC. and its subsidiaries and affiliates. All intellectual property rights contained herein are the sole and exclusive rights of SDL Group. All references to SDL or SDL Group shall mean SDL PLC. and its subsidiaries and affiliates details of which can be obtained upon written request. All rights reserved. Unless explicitly stated otherwise, all intellectual property rights including those in copyright in the content of this website and documentation are owned by or controlled for these purposes by SDL Group. Except as otherwise expressly permitted hereunder or in accordance with copyright legislation, the content of this site, and/or the documentation may not be copied, reproduced, republished, downloaded, posted, broadcast or transmitted in any way without the express written permission of SDL. SDL MultiTerm is a registered trademark of SDL Group. All other trademarks are the property of their respective owners. The names of other companies and products mentioned herein may be the trademarks of their respective owners. Unless stated to the contrary, no association with any other company or product is intended or should be inferred. This product may include open source or similar third-party software, details of which can be found by clicking the following link: Acknowledgments on page 0 . Although SDL Group takes all reasonable measures to provide accurate and comprehensive information about the product, this information is provided as-is and all warranties, conditions or other terms concerning the documentation whether express or implied by statute, common law or otherwise (including those relating to satisfactory quality and fitness for purposes) are excluded to the extent permitted by law. To the maximum extent permitted by law, SDL Group shall not be liable in contract, tort (including negligence or breach of statutory duty) or otherwise for any loss, injury, claim liability or damage of any kind or arising out of, or in connection with, the use or performance of the Software Documentation even if such losses and/or damages were foreseen, foreseeable or known, for: (a) loss of, damage to or corruption of data, (b) economic loss, (c) loss of actual or anticipated profits, (d) loss of business revenue, (e) loss of anticipated savings, (f) loss of business, (g) loss of opportunity, (h) loss of goodwill, or (i) any indirect, special, incidental or consequential loss or damage howsoever caused. All Third Party Software is licensed "as is." Licensor makes no warranties, express, implied, statutory or otherwise with respect to the Third Party Software, and expressly disclaims all implied warranties of non-infringement, merchantability and fitness for a particular purpose. In no event will Licensor be liable for any damages, including loss of data, lost profits, cost of cover or other special, incidental, consequential, direct, actual, general or indirect damages arising from the use of the Third Party Software or accompanying materials, however caused and on any theory of liability.This limitation will apply even if Licensor has been advised of the possibility of such damage. The parties acknowledge that this is a reasonable allocation of risk. Information in this documentation, including any URL and other Internet Web site references, is subject to change without notice. Without limiting the rights under copyright, no part of this may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of SDL Group. 4 MultiTerm Extract Tools User Guide Contents 1 Legal notice ............................................. 3 2 Using statical terminology extraction ............................. 1 About statistical terminology extraction . 2 Statistical extraction using SDL MultiTerm Extract . 2 3 About SDL MultiTerm Extract ................................... 3 Statistical extraction features in SDL MultiTerm Extract . 4 Supported languages . 4 Supported file formats . 4 Export formats . 6 Related documentation . 6 Accessing documentation . 7 4 Working with SDL MultiTerm Extract .............................. 9 SDL MultiTerm Extract projects . 10 Project types . 10 About Translation projects . 10 About QA projects . 11 Overview of the Term Extraction process . 11 Overview of the Term Extraction Process for QA Projects . 12 Creating a project . 13 Before you begin . 13 Saving a project . 28 Importing terms into translation projects . 29 Extracting terms . 29 Automatically extracting terms . 29 Manually extracting terms . 30 Text window . 31 SDL MultiTerm Extract Interface . 32 Term window . 33 Sorting Terms . 34 Deleting Terms . 34 Term Properties window . 34 Global operation buttons . 35 MultiTerm Extract Tools User Guide v Term and translation information . 35 Definition box . 36 Notebox.......................................... 36 Context information . 36 Concordance window . 36 Adding or deleting terms in the Concordance window . 38 Formatting terms in the Concordance window . 38 Validating extracted terms . 40 5 Exporting terms in SDL MultiTerm Extract .......................... 43 About exporting extracted terms . 44 Export definitions . 44 Before you begin . 44 Exporting terms . 45 Exporting using export definitions . 54 Exporting using an existing export definition . 55 Exporting using a new export definition . 57 Exporting suspect sentences in a QA project . 60 Using filters when exporting . 61 Creating a filter . 62 Editing a filter . 64 Removing a filter . 65 6 SDL MultiTerm Extract reference ................................ 67 Settings . 68 Preferences . 68 Project settings . 69 Add/Remove files tab . 69 Exclude tab . 70 To set exclude lists: . 71 Defining the learning settings . 71 Term Extraction tab . 72 Dictionary Compilation tab . 73 Translation tab . 74 File format settings . 74 Selecting a settings file . 75 QA Statistics . 76 Document quality . 77 vi MultiTerm Extract Tools User Guide Termbase quality . 78 Keyboard shortcuts . 78 7 Acknowledgments ........................................ 81 MultiTerm Extract Tools User Guide vii viii MultiTerm Extract Tools User Guide 1 Using statical terminology extraction 1 Using statical terminology extraction About statistical terminology extraction Welcome to the SDL MultiTerm Extract Tools Guide. This guide explains the method of statistical terminology extraction using MultiTerm Extract. This allows you to extract candidate terms to add to a new or existing termbase. It reduces the time and effort involved in generating a termbase by automating the process where terms are identified and entered into a central repository for review and approval. This is especially useful when creating a termbase from large volumes of documents or from new information that is constantly being updated. Statistical extraction using SDL MultiTerm Extract SDL MultiTerm Extract uses a statistical extraction method to determine the frequency of the appearance of candidate terms. It extracts term candidates and, for multilingual termbases, their probable translations found in sentences, incomplete sentence fragments and strings of code. The extracted terms are presented as a term candidate word or phrase. After running the extraction process, MultiTerm Extract displays a term candidate list. Following review and validation, term candidates become term lists. In the MultiTerm Extract interface, you can quickly and easily validate terms. You can then use the multiple export options to incorporate extracted terms into your existing MultiTerm terminology management system. Terms can be exported to a MultiTerm termbase, MultiTerm XML format or a tab-delimited format. This allows, for example, translators to access those terms when using the Translator's Workbench termbases. MultiTerm Extract supports all languages, including those of the Far East. If you are using a non-Western languages, you must have support for the relevant language installed on your computer. 2 MultiTerm Extract Tools User Guide 2 About SDL MultiTerm Extract 2 About SDL MultiTerm Extract Statistical extraction features in SDL MultiTerm Extract SDL MultiTerm Extract uses statistical extraction to extract term candidates and, for multilingual termbases, their probable translations and present them in a term candidate list. You can extract terms from monolingual or bilingual documents and from translation memories and then quickly and easily validate the terms. The various export options enable you to incorporate extracted terms into existing MultiTerm Extract termbases or to export terms to MultiTerm XML format or to tab-delimited format. You can also check for terminology consistency between the termbase and a translated file. MultiTerm Extract allows you to do the following: • Create term candidate lists from monolingual documents. • Create bilingual term candidate lists from bilingual file formats. • Update existing multilingual termbases by creating new translations for terms already stored in MultiTerm. • Analyze the quality of termbases and documents by comparing the terms found in documents with those stored in the termbase. • Compile bilingual dictionaries from bilingual file formats. • Manually extract terms from the source document(s) to complement and enhance the automatic term extraction process. • Export extracted terms to MultiTerm, MultiTerm XML or a tab-delimited format. Supported languages SDL MultiTerm.

Term Extraction Process

A Comparison of Knowledge Extraction Tools for the Semantic Web

A Combined Method for E-Learning Ontology Population Based on NLP and User Activity Analysis

Information Extraction Using Natural Language Processing

A Terminology Extraction System for Technology Related Terms

Ontology Learning and Its Application to Automated Terminology Translation

Translate's Localization Guide

Generating Domain Terminologies Using Root- and Rule-Based Terms1

Terminology Extraction, Translation Tools and Comparable Corpora

Tbxtools: a Free, Fast and Flexible Tool for Automatic Terminology Extraction

The Interplay Between Lexical Resources and Natural Language Processing

Building Ontologies from Folksonomies and Linked Data: Data Structures and Algorithms Technical Report - 22 May 2012

Bilingual Terminology Extraction in Sketch Engine