Methods and Tools for Weak Problems of Translation Muhammad Ghulam Abbas Malik

Methods and Tools for Weak Problems of Translation Muhammad Ghulam Abbas Malik

Methods and Tools for Weak Problems of Translation Muhammad Ghulam Abbas Malik To cite this version: Muhammad Ghulam Abbas Malik. Methods and Tools for Weak Problems of Translation. Computer Science [cs]. Université Joseph-Fourier - Grenoble I, 2010. English. tel-00502192 HAL Id: tel-00502192 https://tel.archives-ouvertes.fr/tel-00502192 Submitted on 13 Jul 2010 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Université de Grenoble N° attribué par la bibliothèque /_ /_ /_ /_ /_ /_ /_ /_ /_ / THÈSE pour obtenir le grade de DOCTEUR DE L’UNIVERSITÉ DE GRENOBLE Spécialité: “INFORMATIQUE” (Computer Science) dans le cadre de l’École Doctorale “Mathématiques, Sciences et Technologie de l’Information, Informatique” (Doctoral School “Mathematics, Information Science and Technology, Computer Science”) présentée et soutenue publiquement par Muhammad Ghulam Abbas MALIK le 9 juillet 2010 Méthodes et outils pour les problèmes faibles de traduction (Methods and Tools for Weak Problems of Translation) Thèse dirigée par M. Christian BOITET Directeur de thèse M. Pushpak BHATTACHARYYA Codirecteur de thèse JURY M. Arthur SOUCEMARIANADIN Président Mme Violaine PRINCE Rapporteur M. Patrice POGNAN Rapporteur M. Eric WEHRLI Rapporteur M. Vincent BERMENT Examinateur M. Alain DÉSOULIÈRES Examinateur M. Christian BOITET Directeur M. Pushpak BHATTACHARYYA Codirecteur Préparée au laboratoire GETALP–LIG (CNRS–INPG–UJF–UPMF) Table of Contents Introduction ..................................................................................................................... 1 Part I ...................................................................................................................... 7 Scriptural Translation and other Weak Translation Problems ...................... 7 Chapter 1. Scriptural Translation ................................................................................. 9 1.1. Scriptural Translation, Transliteration and Transcription ................................................... 9 1.2. Machine Transliteration in Natural Language Processing (NLP) ....................................... 9 1.2.1. Grapheme-based models ................................................................................................ 10 1.2.2. Phoneme-based models .................................................................................................. 11 1.2.3. Hybrid and correspondence-based models ..................................................................... 11 1.3. Scriptural Translation ........................................................................................................ 12 1.3.1. Challenges and barriers within the same language ........................................................ 14 1.3.1.1. Scriptural divide .......................................................................................................... 14 1.3.1.2. Under-resourced and under-written languages ........................................................... 15 1.3.1.3. Absence of necessary information .............................................................................. 15 1.3.1.4. Different spelling conventions .................................................................................... 16 1.3.1.5. Transliterational or transcriptional ambiguities .......................................................... 17 1.3.2. Challenges and barriers within dialects .......................................................................... 17 1.3.2.1. Distinctive sound inventories ...................................................................................... 17 1.3.2.2. Lexical divergence and translational ambiguities ....................................................... 18 1.3.3. Challenges and barriers between closely related languages ........................................... 19 1.3.3.1. Characteristics ............................................................................................................. 19 1.3.3.2. Under-resourced dialect or language pairs .................................................................. 19 1.4. Approaches for Scriptural Translation .............................................................................. 20 1.4.1. Direct programming approaches .................................................................................... 20 1.4.2. Finite-state approaches ................................................................................................... 21 1.4.3. Empirical, machine learning and SMT approaches ....................................................... 21 1.4.4. Hybrid approaches ......................................................................................................... 21 1.4.5. Interactive approaches .................................................................................................... 22 Chapter 2. Scriptural Translation Using FSTs and a Pivot UIT .......................... 23 2.1. Scriptural Translation for Indo-Pak Languages ................................................................ 24 2.1.1. Scripts of Indo-Pak languages ........................................................................................ 24 2.1.1.1. Scripts based on the Persio-Arabic script .................................................................... 24 2.1.1.2. Indic scripts ................................................................................................................. 25 2.1.2. Analysis of Indo-Pak scripts for scriptural translation ................................................... 26 2.1.2.1. Vowel analysis ............................................................................................................ 26 2.1.2.2. Consonant analysis ...................................................................................................... 27 2.1.2.2.1. Aspirated consonants ............................................................................................... 27 2.1.2.2.2. Non-aspirated consonants ........................................................................................ 28 2.1.2.3. Diacritics analysis ....................................................................................................... 28 2.1.2.4. Punctuations, digits and special characters analysis ................................................... 29 2.1.3. Indo-Pak scriptural translation problems ....................................................................... 29 2.2. Universal Intermediate Transcription (UIT) ..................................................................... 29 2.2.1. What is UIT? .................................................................................................................. 30 2.2.2. Principles of UIT ............................................................................................................ 31 2.2.3. UIT mappings for Indo-Pak languages .......................................................................... 31 2.3. Finite-state Scriptural Translation model .......................................................................... 32 i 2.3.1. Finite-state transducers ................................................................................................... 32 2.3.1.1. Non-deterministic transducers ..................................................................................... 33 2.3.1.1.1. Character mappings .................................................................................................. 33 2.3.1.1.2. Contextual mappings ................................................................................................ 37 2.3.1.2. Deterministic transducers ............................................................................................ 37 2.3.2. Finite-state model architecture ....................................................................................... 38 2.4. Experiments and results .................................................................................................... 39 2.4.1. Test data ......................................................................................................................... 39 2.4.2. Results and discussion .................................................................................................... 41 2.5. Conclusion ......................................................................................................................... 47 Part II .................................................................................................................. 49 Statistical, Hybrid and Interactive Models for Scriptural Translation ........ 49 Chapter 3. Empirical and Hybrid Methods for Scriptural Translation ............... 51 3.1. SMT Model for Scriptural Translation .............................................................................. 51 3.1.1. Training data .................................................................................................................. 52 3.1.2. Data alignments .............................................................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    263 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us