On the Transformational Derivation of an Efficient Recognizer for Algol 68

On the Transformational Derivation of an Efficient Recognizer for Algol 68 Caspar Derksen [email protected] August 1995 Masters Thesis no. 357 Catholic University of Nijmegen Preface This work is the result of my graduation project at the Catholic University of Nijmegen. It has been supervised by Prof. C.H.A. Koster. I should like to thank my parents, Donald, Cindy, Walter, Joyce, Berry, and others for their friendship and support while I was working on this thesis. Thanks go to Paula, Marco, Erik and others for enlightening life at the office during work. Special thanks go to Prof. C.H.A. Koster for awakening my interest in two level grammars and for his support and invaluable advice and patience. Caspar Derksen Nijmegen, August 1995 i ii Abstract During the sixties two level van Wijngaarden grammars (2vwg) were introduced for the formal definition of Algol 68. Two level van Wijngaarden grammars provide a formalism suited for writing rigorous formal definitions which read like human prose. Both the context free and context sensitive syntax of Algol 68 are defined in 2vwg. However, the specification is not executable. The Extended Affix Grammar (eag) formalism is closely related to 2vwg. Any 2vwg can be simulated in eag. From a given eag an exact recognizer for the language defined by the eag can be derived automatically. Termination of such a recognizer is guaranteed if certain liberal well-formedness conditions are satisfied. However, in general some effort is necessary to enforce well-formedness. cdl3 is a programming language which rides the borderline between affix grammars and programs. It can be seen as a deterministic version of eag. From a grammar in cdl3 efficient parsers can be generated. Specifications in 2vwg and eag often are of a declarative nature. Our purpose is to derive efficient parsers from such a language specification in a transformational way. We have gen- eralized well known transformations techniques for making context free grammars top down parsable to two level grammars. Furthermore, we have defined transformations for applying the well-known unfold/rearrange/fold-technique on specifications which are formulated as a two level grammar. These transformations can be used for imposing stronger well-formedness conditions on a two level grammar and for operationalizing declarative constructs in the grammar. As a case study, we have undertaken an attempt to derive a new specification for Algol 68 in eag by applying language preserving transformations to the original grammar. Secondly, we have tried to make the resulting eag deterministic in order to obtain an efficient recognizer in cdl3. iii iv Contents Preface i Abstract iii 1 Introduction 1 1.1 The Problem . 1 1.2 Aims of the Present Work . 2 1.3 Outline of the Thesis . 2 2 Two Level Grammars 3 2.1 Two Level van Wijngaarden Grammars . 3 2.1.1 Formal Definition . 3 2.1.2 Properties of 2VWG . 8 2.1.3 Hyper-derivations and Conjugations . 9 2.1.4 Conjugation Grammars . 10 2.1.5 Transparent Two Level Grammars . 11 2.2 Extended Affix Grammars . 12 2.2.1 Formal Definition of EAG . 12 2.2.2 Properties of EAG . 15 2.2.3 Implementation . 16 2.3 CDL3 . 18 2.3.1 Formal Definition . 18 2.3.2 Properties of CDL3 . 25 2.4 Conclusions . 26 v vi CONTENTS 3 Transformations on Two Level Grammars 27 3.1 Semantical Foundations . 27 3.1.1 Equivalence of Grammars . 27 3.1.2 Equivalence of Rules . 28 3.1.3 Equivalence of Hypernotions . 28 3.2 Notational Conventions . 28 3.2.1 Notation of Transformation rules . 29 3.2.2 Notation of Derivations . 29 3.3 Basic Transformation Rules . 29 3.3.1 General Transformations . 29 3.3.2 Folding and Unfolding . 30 3.3.3 Rule Introduction and Pruning . 35 3.3.4 Predicate Introduction and Removal . 37 3.3.5 Left-factorization . 38 3.3.6 Left-recursion Removal . 40 3.3.7 Restriction . 42 3.3.8 Invariant Introduction and Removal . 43 3.3.9 Lifting and Sinking . 43 3.3.10 Embedding . 43 3.3.11 Change Order . 44 3.3.12 Symbolic Evaluation . 44 3.3.13 Renaming . 44 3.3.14 Transformation of the Metagrammar . 45 3.4 Further Transformation Rules . 46 3.4.1 Transformations using Equality Tests . 46 3.4.2 Transformations on Lists . 46 3.5 Sample Developments . 47 3.5.1 Concatenation of Lists . 47 3.5.2 Splitting of Lists . 49 3.5.3 Ravelling of Moods . 50 3.6 Conclusions . 51 CONTENTS vii 4 An Extended Affix Grammar for Algol 68 53 4.1 Imaging 2VWG into EAG . 53 4.1.1 Strategy . 55 4.2 Deriving an EAG for Algol68 . 55 4.2.1 Overview of the process . 55 4.2.2 Separation of the First and Second Level . 56 4.2.3 From Relations to Functions . 59 4.2.4 From Flat Domains to Tree Structures . 59 4.2.5 Left-recursion Removal . 62 4.2.6 Optimizations . 62 4.2.7 Sample Derivations . 64 4.2.8 On the Terminal Objects of Algol 68 . 72 4.2.9 Implementation Status . 74 4.3 Practical Usability of eag-compile . 75 4.3.1 Compiling and Debugging . 75 4.3.2 The Typing System . 75 4.3.3 An Experiment . 76 4.3.4 Reliability of eag-compile . 79 4.4 Conclusions . 79 4.4.1 Conclusions about Transformation of Two Level Grammars . 79 4.4.2 Conclusions about the EAG Formalism . 80 4.4.3 Conclusions about eag-compile . 80 5 Towards a Parser for Algol 68 in CDL3 81 5.1 Imaging EAG into CDL3 . 81 5.1.1 Strategy . 82 5.1.2 Typing the Hyper-rules . 82 5.1.3 Rule Ordering and Algorithm Classification . 82 5.1.4 Example . 83 5.2 Additional Transformations . 84 5.2.1 Decomposition of Grammars . 84 5.2.2 Place-Holders . 84 viii CONTENTS 5.3 Deriving a Parser for Algol 68 in CDL3 . 86 5.3.1 Lexical Analysis . 86 5.3.2 The Pre-Scan . 88 5.3.3 Context Free Analysis . 89 5.3.4 Imposing Two Pass Affix Evaluation . 92 5.3.5 Context Sensitive Analysis . 93 5.4 Overview of the Parser . 97 5.4.1 Module Structure . 97 5.4.2 Implementation Status . 97 5.4.3 Size of the Parser . 97 5.5 Practical Usability of CDL3 . 98 5.6 Conclusions . 98 5.6.1 Future Work . 99 6 Conclusions 101 6.1 Two Level Grammars . 101 6.2 Transformation of Two level Grammars . 101 6.2.1 Extended Affix Grammars . 102 6.2.2 CDL3 . 102 6.2.3 Future Work . 102 Bibliography 103 Index 106 Chapter 1 Introduction In this chapter, the goal of this thesis and the questions addressed in it are explained. Fur- thermore, an outline of the thesis is given. The reader of this thesis is advised to have read the Revised Report on the Algorithmic Language Algol 68 [Wij76] and to be familiar with context free grammars and their properties. 1.1 The Problem During the sixties and early seventies various forms of two level grammars were introduced. In a two level grammar, both context free and context sensitive syntax of a language, as well as its semantics, can be described. One notable example is the grammar of Algol 68..

On the Transformational Derivation of an Efficient Recognizer for Algol 68

Evidence and Counter-Evidence : Essays in Honour of Frederik

Fundamental Methodological Issues of Syntactic Pattern Recognition

Lecture 5 Mildly Context-Sensitive Languages

International Standard ISO/IEC 14977

The Linguistic Relevance of Tree Adjoining Grammar

Algorithm for Analysis and Translation of Sentence Phrases

Surface Without Structure Word Order and Tractability Issues in Natural Language Analysis

Parsing Discontinuous Structures

17191931.Pdf

Fundamental Study

The History of the ALGOL Effort

Van Wijngaarden Grammars, Metamorphism and K-Ary Malwares