Recovery, Convergence and Documentation of Languages Zaytsev, V.V

Recovery, Convergence and Documentation of Languages Zaytsev, V.V

VU Research Portal Recovery, Convergence and Documentation of Languages Zaytsev, V.V. 2010 document version Publisher's PDF, also known as Version of record Link to publication in VU Research Portal citation for published version (APA) Zaytsev, V. V. (2010). Recovery, Convergence and Documentation of Languages. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal ? Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. E-mail address: [email protected] Download date: 25. Sep. 2021 VRIJE UNIVERSITEIT Recovery, Convergence and Documentation of Languages ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad Doctor aan de Vrije Universiteit Amsterdam, op gezag van de rector magnificus prof.dr. L.M. Bouter, in het openbaar te verdedigen ten overstaan van de promotiecommissie van de faculteit der Exacte Wetenschappen op woensdag 27 oktober 2010 om 15.45 uur in de aula van de universiteit, De Boelelaan 1105 door Vadim Valerievich Zaytsev geboren te Rostov aan de Don, Rusland promotoren: prof.dr. R. Lammel¨ prof.dr. C. Verhoef Dit onderzoek werd ondersteund door de Nederlandse Organisatie voor Wetenschappelijk Onderzoek via: This research has been sponsored by the Dutch Organisation of Scientific Research via: NWO 612.063.304 LPPR: Language-Parametric Program Restructuring Acknowledgements Working on a PhD is supposed to be an endeavour completed in seclusion, but in practice one cannot survive without the help and support from others, fruitful scientific discus- sions, collaborative development of tools and papers and valuable pieces of advice. My work was supervised by Prof. Dr. Ralf Lammel¨ and Prof. Dr. Chris Verhoef, who often believed in me more than I did and were always open to questions and ready to give expert advice. They have made my development possible. LPPR colleagues — Jan Heering, Prof. Dr. Paul Klint, Prof. Dr. Mark van den Brand — have been a rare yet useful source of new research ideas. All thesis reading committee members have dedicated a lot of attention to my work and delivered exceptionally useful feedback on the late stage of the research: Prof. Dr. Jean Bezivin,´ Dr. Jean-Marie Favre, Prof. Dr. Willem Jan Fokkink, Prof. Dr. Paul Klint, Dr. Steven Klusener. I am also grateful for Cor-Paul Bezemer and Toon Verwaest who provided proofreading and correcting services for the Dutch part of this thesis. There have been a lot of insightful discussions in the rooms and hallways of the Vrije Universiteit with Dr. Niels Veerman, Ernst-Jan Verhoeven, Łukasz Kwiatkowski and Johan Vincent de Vries. I would like to thank my family that backed me up with complete support and encour- agement through the years of research, especially my mother, Dr.ir. Liudmila Zaytseva; my grandmother, Dr. Svetlana Bocheva; my grandfather, Prof. Dr.ir. Alexander Bochev ; my uncle, Dr. Michael Bochev and my godfather, Prof. Dr. Yuri Bashmakov, MD. My close friends’ understanding, respect and interest in my work was also among the most important things that kept me going: Dr. Alexander Gufan, Dr. Stanislav Tsykavy and Stanislav Rezhabek. I have also been saved many times from depression and writer’s blocks by good mu- sic. I cannot name all the artists responsible for that, but the most credit goes to Huddie Ledbetter, William Broonzy, Fulton Allen, Thomas McClennan and Bruce Springsteen. i Contents Acknowledgements i Contents ii List of Tables ................................... ix List of Figures ................................... x List of Listings .................................. xi 1 Introduction 1 1.1 Research context .............................. 1 1.2 Motivation and objectives .......................... 2 1.3 Example scenario .............................. 3 1.4 Thesis outline and contributions ...................... 10 1.4.1 Chapter 2 overview: additional background . 10 1.4.2 Chapter 3 overview: case study on language recovery . 11 1.4.3 Chapter 4 overview: language convergence . 12 1.4.4 Chapter 5 overview: case study on recovery and convergence . 13 1.4.5 Chapter 6 overview: language documentation . 14 1.4.6 Chapter 7 overview: XBGF language manual . 15 2 Additional background 17 2.1 Terminology ................................. 17 2.2 Grammarware ................................ 18 2.3 Techniques for grammars .......................... 19 2.4 Language evolution: versions and dialects . 20 2.5 Grammar levels ............................... 22 2.6 Grammar recovery methodology ...................... 22 2.7 Grammar definition formalism ....................... 23 2.8 Grammar idiosyncrasies and parsing technology . 24 2.9 Grammarware and tool generation ..................... 25 2.10 Language documentation qualities ..................... 27 2.11 Standardisation bodies ........................... 28 2.12 Languages used in the thesis ........................ 31 2.13 Transformations used in the thesis ..................... 34 3 Case study on recovery 37 ii Contents iii 3.1 Contributions ................................ 37 3.2 Semi-automated recovery of C# grammar . 38 3.2.1 Step 1: Obtaining the standard ................... 38 3.2.2 Step 2: Extracting the grammar . 39 3.2.3 Step 3: Fixing misprints ...................... 40 3.2.4 Step 4: Completing a formal part . 40 3.2.5 Step 5: Relaxation ......................... 40 3.2.6 Step 6: Removing idiosyncrasies from the grammar . 44 3.2.7 Step 7: Resolving conflicts ..................... 48 3.2.8 Step 8: Improving the grammar . 49 3.2.9 Step 9: Generating the parser .................... 49 3.3 Proposed solution generalisation and evaluation . 50 3.4 Conclusion ................................. 54 3.4.1 Discussion on the method automation . 54 3.4.2 Research objectives revisited .................... 55 4 Language convergence 57 4.1 Motivation .................................. 58 4.2 Contributions ................................ 58 4.3 The domain ................................. 59 4.3.1 Sources of convergence ....................... 61 4.3.2 Targets of convergence ....................... 61 4.3.3 BGF — BNF-like Grammar Format . 63 4.4 Grammar extraction ............................. 65 4.4.1 Abstraction by extraction ...................... 65 4.4.2 Grammar extractors ........................ 66 4.5 Grammar comparison ............................ 68 4.6 Grammar transformation .......................... 73 4.7 Convergence process ............................ 76 4.8 Programmable grammar transformations . 79 4.8.1 Transformation properties ..................... 79 4.8.2 Grammar refactoring ........................ 79 4.8.3 Grammar editing .......................... 81 4.9 Transformation generators ......................... 83 4.10 Language Convergence Infrastructure ................... 86 4.10.1 Main configuration elements .................... 86 4.10.2 Shortcuts .............................. 87 4.10.3 Generators ............................. 87 4.10.4 Sources ............................... 87 4.10.5 Targets ............................... 88 4.10.6 Phases ................................ 89 4.10.7 Test sets ............................... 89 4.10.8 Tools ................................ 90 4.10.9 xstring ............................... 90 4.11 Related work ................................ 91 iv Contents 4.11.1 Interoperability ........................... 91 4.11.2 Testing grammarware ........................ 91 4.11.3 Generators and synchronisers ................... 91 4.11.4 Grammar recovery ......................... 92 4.11.5 Grammar transformation ...................... 92 4.11.6 Grammar convergence ....................... 92 4.12 Concluding remarks ............................. 93 5 Case study on recovery and convergence 95 5.1 Java is not syntax-safe—apparently .................... 95 5.2 Contributions ................................ 97 5.3 The JLS corpus ............................... 98 5.3.1 JLS1 ................................ 98 5.3.2 JLS2 ................................ 99 5.3.3 JLS3 ................................ 99 5.3.4 Grammar data ............................ 99 5.4 Automated grammar extraction . 100 5.4.1 Assumed grammar format . 101 5.4.2 Phase 1 — Preprocessing . 102 5.4.3 Phase 2 — Error recovery . 104 5.4.4 Phase 3 — Removal of doubles . 107 5.4.5 Phase 4 — Precise parsing . 107 5.4.6 Extraction data . 108 5.5 The convergence graph . 108 5.6 Grammar transformation . 109 5.6.1 Semantics-preserving operators . 109 5.6.2 Semantics-in/decreasing operators . 110 5.6.3 Semantics-revising operators . 112 5.6.4 Grammar refactoring . 114 5.7 Grammar convergence phases . 117 5.7.1 Preparation phase: semantic error recovery . 117 5.7.2 Preparation phase: fixing known bugs . 117 5.7.3 Preparation phase: initial correction . 118 5.7.4 Nominal matching phase . 119 5.7.5 Structural matching phase . 119 5.7.6 Resolution phase: extension . 119 5.7.7 Resolution phase: relaxation . 120 5.7.8 Resolution

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    272 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us