Recovery, Convergence and Documentation of Languages Zaytsev, V.V

VU Research Portal Recovery, Convergence and Documentation of Languages Zaytsev, V.V. 2010 document version Publisher's PDF, also known as Version of record Link to publication in VU Research Portal citation for published version (APA) Zaytsev, V. V. (2010). Recovery, Convergence and Documentation of Languages. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal ? Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. E-mail address: [email protected] Download date: 25. Sep. 2021 VRIJE UNIVERSITEIT Recovery, Convergence and Documentation of Languages ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad Doctor aan de Vrije Universiteit Amsterdam, op gezag van de rector magnificus prof.dr. L.M. Bouter, in het openbaar te verdedigen ten overstaan van de promotiecommissie van de faculteit der Exacte Wetenschappen op woensdag 27 oktober 2010 om 15.45 uur in de aula van de universiteit, De Boelelaan 1105 door Vadim Valerievich Zaytsev geboren te Rostov aan de Don, Rusland promotoren: prof.dr. R. Lammel¨ prof.dr. C. Verhoef Dit onderzoek werd ondersteund door de Nederlandse Organisatie voor Wetenschappelijk Onderzoek via: This research has been sponsored by the Dutch Organisation of Scientific Research via: NWO 612.063.304 LPPR: Language-Parametric Program Restructuring Acknowledgements Working on a PhD is supposed to be an endeavour completed in seclusion, but in practice one cannot survive without the help and support from others, fruitful scientific discussions, collaborative development of tools and papers and valuable pieces of advice. My work was supervised by Prof. Dr. Ralf Lammel¨ and Prof. Dr. Chris Verhoef, who often believed in me more than I did and were always open to questions and ready to give expert advice. They have made my development possible. LPPR colleagues — Jan Heering, Prof. Dr. Paul Klint, Prof. Dr. Mark van den Brand — have been a rare yet useful source of new research ideas. All thesis reading committee members have dedicated a lot of attention to my work and delivered exceptionally useful feedback on the late stage of the research: Prof. Dr. Jean Bezivin,´ Dr. Jean-Marie Favre, Prof. Dr. Willem Jan Fokkink, Prof. Dr. Paul Klint, Dr. Steven Klusener. I am also grateful for Cor-Paul Bezemer and Toon Verwaest who provided proofreading and correcting services for the Dutch part of this thesis. There have been a lot of insightful discussions in the rooms and hallways of the Vrije Universiteit with Dr. Niels Veerman, Ernst-Jan Verhoeven, Łukasz Kwiatkowski and Johan Vincent de Vries. I would like to thank my family that backed me up with complete support and encour- agement through the years of research, especially my mother, Dr.ir. Liudmila Zaytseva; my grandmother, Dr. Svetlana Bocheva; my grandfather, Prof. Dr.ir. Alexander Bochev ; my uncle, Dr. Michael Bochev and my godfather, Prof. Dr. Yuri Bashmakov, MD. My close friends’ understanding, respect and interest in my work was also among the most important things that kept me going: Dr. Alexander Gufan, Dr. Stanislav Tsykavy and Stanislav Rezhabek. I have also been saved many times from depression and writer’s blocks by good mu- sic. I cannot name all the artists responsible for that, but the most credit goes to Huddie Ledbetter, William Broonzy, Fulton Allen, Thomas McClennan and Bruce Springsteen. i Contents Acknowledgements i Contents ii List of Tables ................................... ix List of Figures ................................... x List of Listings .................................. xi 1 Introduction 1 1.1 Research context .............................. 1 1.2 Motivation and objectives .......................... 2 1.3 Example scenario .............................. 3 1.4 Thesis outline and contributions ...................... 10 1.4.1 Chapter 2 overview: additional background . 10 1.4.2 Chapter 3 overview: case study on language recovery . 11 1.4.3 Chapter 4 overview: language convergence . 12 1.4.4 Chapter 5 overview: case study on recovery and convergence . 13 1.4.5 Chapter 6 overview: language documentation . 14 1.4.6 Chapter 7 overview: XBGF language manual . 15 2 Additional background 17 2.1 Terminology ................................. 17 2.2 Grammarware ................................ 18 2.3 Techniques for grammars .......................... 19 2.4 Language evolution: versions and dialects . 20 2.5 Grammar levels ............................... 22 2.6 Grammar recovery methodology ...................... 22 2.7 Grammar definition formalism ....................... 23 2.8 Grammar idiosyncrasies and parsing technology . 24 2.9 Grammarware and tool generation ..................... 25 2.10 Language documentation qualities ..................... 27 2.11 Standardisation bodies ........................... 28 2.12 Languages used in the thesis ........................ 31 2.13 Transformations used in the thesis ..................... 34 3 Case study on recovery 37 ii Contents iii 3.1 Contributions ................................ 37 3.2 Semi-automated recovery of C# grammar . 38 3.2.1 Step 1: Obtaining the standard ................... 38 3.2.2 Step 2: Extracting the grammar . 39 3.2.3 Step 3: Fixing misprints ...................... 40 3.2.4 Step 4: Completing a formal part . 40 3.2.5 Step 5: Relaxation ......................... 40 3.2.6 Step 6: Removing idiosyncrasies from the grammar . 44 3.2.7 Step 7: Resolving conflicts ..................... 48 3.2.8 Step 8: Improving the grammar . 49 3.2.9 Step 9: Generating the parser .................... 49 3.3 Proposed solution generalisation and evaluation . 50 3.4 Conclusion ................................. 54 3.4.1 Discussion on the method automation . 54 3.4.2 Research objectives revisited .................... 55 4 Language convergence 57 4.1 Motivation .................................. 58 4.2 Contributions ................................ 58 4.3 The domain ................................. 59 4.3.1 Sources of convergence ....................... 61 4.3.2 Targets of convergence ....................... 61 4.3.3 BGF — BNF-like Grammar Format . 63 4.4 Grammar extraction ............................. 65 4.4.1 Abstraction by extraction ...................... 65 4.4.2 Grammar extractors ........................ 66 4.5 Grammar comparison ............................ 68 4.6 Grammar transformation .......................... 73 4.7 Convergence process ............................ 76 4.8 Programmable grammar transformations . 79 4.8.1 Transformation properties ..................... 79 4.8.2 Grammar refactoring ........................ 79 4.8.3 Grammar editing .......................... 81 4.9 Transformation generators ......................... 83 4.10 Language Convergence Infrastructure ................... 86 4.10.1 Main configuration elements .................... 86 4.10.2 Shortcuts .............................. 87 4.10.3 Generators ............................. 87 4.10.4 Sources ............................... 87 4.10.5 Targets ............................... 88 4.10.6 Phases ................................ 89 4.10.7 Test sets ............................... 89 4.10.8 Tools ................................ 90 4.10.9 xstring ............................... 90 4.11 Related work ................................ 91 iv Contents 4.11.1 Interoperability ........................... 91 4.11.2 Testing grammarware ........................ 91 4.11.3 Generators and synchronisers ................... 91 4.11.4 Grammar recovery ......................... 92 4.11.5 Grammar transformation ...................... 92 4.11.6 Grammar convergence ....................... 92 4.12 Concluding remarks ............................. 93 5 Case study on recovery and convergence 95 5.1 Java is not syntax-safe—apparently .................... 95 5.2 Contributions ................................ 97 5.3 The JLS corpus ............................... 98 5.3.1 JLS1 ................................ 98 5.3.2 JLS2 ................................ 99 5.3.3 JLS3 ................................ 99 5.3.4 Grammar data ............................ 99 5.4 Automated grammar extraction . 100 5.4.1 Assumed grammar format . 101 5.4.2 Phase 1 — Preprocessing . 102 5.4.3 Phase 2 — Error recovery . 104 5.4.4 Phase 3 — Removal of doubles . 107 5.4.5 Phase 4 — Precise parsing . 107 5.4.6 Extraction data . 108 5.5 The convergence graph . 108 5.6 Grammar transformation . 109 5.6.1 Semantics-preserving operators . 109 5.6.2 Semantics-in/decreasing operators . 110 5.6.3 Semantics-revising operators . 112 5.6.4 Grammar refactoring . 114 5.7 Grammar convergence phases . 117 5.7.1 Preparation phase: semantic error recovery . 117 5.7.2 Preparation phase: fixing known bugs . 117 5.7.3 Preparation phase: initial correction . 118 5.7.4 Nominal matching phase . 119 5.7.5 Structural matching phase . 119 5.7.6 Resolution phase: extension . 119 5.7.7 Resolution phase: relaxation . 120 5.7.8 Resolution

Recovery, Convergence and Documentation of Languages Zaytsev, V.V

An Efficient Implementation of the Head-Corner Parser

The Monad.Reader Issue 6 by Bernie Pope [email protected] and Dan Piponi [email protected] and Russell O’Connor [email protected]

Communications/Information

Instructions to the Ada Rapporteur Group from SC22/WG9 For

Modular Logic Grammars

The Evolution of Lisp

NINETEENTH PLENARY MEETING of ISO/IEC JTC 1/SC 22 London, United Kingdom September 19-22, 2006 [20060918/22] Version 1, April 17, 2006 1

A Relatively Small Turing Machine Whose Behavior Is Independent of Set Theory

Programming in XML”

Gábor Melis to Keynote European Lisp Symposium

JTC1 and SC22 - Terminology

A Definite Clause Version of Categorial Grammar