The Complete Literature References
Total Page:16
File Type:pdf, Size:1020Kb
18 Annotated Bibliography The purpose of this annotated bibliography is to supply the reader with more material and with more detail than was possible in the preceding chapters, rather than to just list the works referenced in the text. The annotations cover a considerable number of subjects that have not been treated in the rest of the book. The printed version of this book includes only those literature references and their summaries that are actually referred to in it. The full literature list with summaries as far as available can be found on the web site of this book; it includes its own authors index and subject index. This annotated bibliography differs in several respects from the habitual literature list. • The annotated bibliography consists of five sections: – Main parsing material — papers about the main parsing techniques. – Further parsing material — papers about extensions of and refinements to the main parsing techniques, non-Chomsky systems, error recovery, etc. – Parser writing and application — both in computer science and in natural languages. – Support material — books and papers useful to the study of parsers. – Secondary Material — unannotated papers of peripheral interest (Internet version only). • The entries in each section have been grouped into more detailed categories; for example, the main section contains categories for general CF parsing, LR parsing, precedence parsing, etc. For details see the Table of Contents at the beginning of this book. Most publications in parsing can easily be assigned a single category. Some that span two categories have been placed in one, with a reference in the other. • The majority of the entries are annotated. This annotation is not a copy of the ab- stract provided with the paper (which generally says something about the results obtained) but is rather the result of an attempt to summarize the technical content in terms of what has been explained elsewhere in this book. 576 18 Annotated Bibliography • The entries are ordered chronologically rather than alphabetically. This arrange- ment has the advantage that it is much more meaningful than a single alphabetic list, ordered on author names. Each section can be read as the history of research on that particular aspect of parsing, related material is found closely together and recent material is easily separated from older publications. A disadvantage is that it is now difficult to locate entries by author; to remedy this, an author index (starting on page 779) has been supplied. 18.1 Major Parsing Subjects 18.1.1 Unrestricted PS and CS Grammars 1. Loeckx, Jacques J. C. Mechanical Construction of Bounded-Context Parsers for Chomsky 0-Type Languages. PhD thesis, University of Leuven, Eindhoven, 1969. Also Philips research reports. Supplements; 1969, no. 3., See Loeckx [2]. 2. Loeckx, Jacques. The parsing for general phrase-structure grammars. Inform. Control, 16:443–464, 1970. The paper sketches two non-deterministic parsers for PS grammars, one top- down, which tries to mimic the production process and one bottom-up, which tries to find a handle. The instructions of the parsers are derived by listing all possible transitions in the grammar and weeding out by hand those that cannot occur. Trial-and-error methods are discussed to resolve the non-determinism, but no instruction selection mechanisms are given. Very readable, nice pictures. 3. Walters, Daniel A. Deterministic context-sensitive languages, Parts I & II. Inform. Control, 17(1):14–61, 1970. The definition of LR(k) grammars is extended to context-sensitive grammars. Emphasis is more on theoretical properties than on obtaining a parser. 4. Woods, William A. Context-sensitive parsing. Commun. ACM, 13(7):437–445, July 1970. The paper presents a canonical form for context-sensitive (ε-free) derivation trees. Parsing is then performed by an exhaustive guided search over these canonical forms exclusively. This guarantees that each possible parsing will be found exactly once. 5. Révész, György. Unilateral context sensitive grammar and left to right parsing. J. Comput. Syst. Sci., 5:337–352, 1971. Right-sensitive grammars (RSGs) are context-sensitive grammars in which the left context X in all rules XAY XαY is empty. The rules of such a grammar can all be reduced to the rule forms X Y, X! YZ, and XT YT, in Chomsky Normal Form fashion. A simple non-deterministic!parser is!proposed for RSGs,! which basically always does the leftmost possible reduction using one of the grammar rules. The bulk of the paper is concerned with finding properties of grammars for which this parsing method works, the so- called “progressive grammars”. 6. Book, R. V. Terminal context in context-sensitive grammars. SIAM J. Computing, 1:20–30, 1972. Some conditions under which CS grammars produce CF languages. 7. Ghandour, Z. J. Formal systems and analysis of context-sensitive languages. Computer J., 15(3):229–237, 1972. Ghandour describes a formal production system that is in some ways similar to but far from identical to a two-level grammar. A hierarchy of 4 classes is defined on these systems, with Class 1 Class 2 Class 3 Class 4, Class 1 CS and Class 4 = CF. A parsing algorithm for Class 1 systems¶ is given in fairly great detail. ¶ 8. Savitch, Walter J. How to make arbitrary grammars look like context-free grammars. SIAM J. Computing, 2:174–182, 1973. Proves that any Type(0) grammar is equivalent to a Type(0) grammar with only three kinds of rules: BA αA (right context only), AB Aα (left context only), and B α (context-free). ! ! ! 18.1 Major Parsing Subjects 577 9. Khabbaz, N. A. Multipass precedence analysis. Acta Inform., 4:77–85, 1974. A hierarchy of CS grammars is given that can be parsed using multipass precedence parsing. The parser and the table construction algorithm are given explicitly. 10. Hart, Johnson M. Right and left parses in phrase-structure grammars. Inform. Control, 32(3):242–262, Nov. 1976. 11. Tanaka, Eiichi and Fu, King-Sun. Error-correcting parsers for formal languages. IEEE Trans. Comput., C-27(7):605–616, July 1978. In addition to the error correction algorithms referred to in the title (for which see [940]) a version of the CYK algorithm for context-sensitive grammars is described. It requires the grammar to be in 2-form: no rule has a right-hand size longer than 2, and no rule has a left-hand size longer than its right-hand size. This limits the number of possible rule forms to 4: A a, A BC, AB CB (right context), and BA BC (left context). The algorithm is largely straightforw! !ard; for e!xample, for rule AB CB, if C!and B have been recognized adjacently, an A is recognized in the position of the C. Care!has to be taken, however, to avoid recognizing a context for the application of a production rule when the context is not there at the right moment; a non-trivial condition is given for this, without explanation or proof. 12. Tomita, E., Sakuramoto, K., and Kasai, T. A top-down parsing algorithm for a context- sensitive language. Trans. Inst. Electron. & Commun. Eng. Jpn., E62(2):127–128, Feb. 1979. Abstract only. 13. Turnbull, C. J. M. and Lee, E. S. Generalized deterministic left to right parsing. Acta Inform., 12:187–207, 1979. The LR(k) parsing machine is modified so that the reduce in- struction removes the reduced right-hand side from the stack and pushes an arbitrary number of symbols back into the input stream. (The traditional LR(k) reduce is a special case of this: it pushes the recognized non-terminal back into the input and immediately shifts it. The technique is similar to that put forward by Dömölki [40].) The new machine is capable of parsing efficiently a subset of the Type 0 grammars, DRP grammars (for Deterministic Regular Parsable). Member- ship of this subset is undecidable, but an approximate algorithm can be constructed by extending the LR(k) parse table algorithm. Details are not given, but can be found in a technical report by the first author. 14. Matsuoka, K. Context-sensitive LR(k) parsers. Trans. Inst. Electron. & Commun. Eng. Jpn., E64:691, 1981. Abstract only. 15. Vol’dman, G. Sh. A parsing algorithm for context-sensitive grammars. Program. Comput. Softw., 7:302–307, 1981. Extends Earley’s algorithm first to length-increasing phrase structure grammars and then to non-decreasing PS grammars (= CS grammars). The resulting algorithm has exponential time requirements but is often much better. 16. Harris, Lawrence A. SLR(1) and LALR(1) parsing for unrestricted grammars. Acta Inform., 24:191–209, 1987. The notion of an LR(0) item can easily be defined for unrestricted grammars: “For each item λ µ1 Xµ2 there is a transition on X to the item λ µ1X µ2 and an ε-transition to any item X!δ ²η”, for any symbol X. These items are grouped! by² subset construction into the usual states,!called² here preliminary states, since some of them may actually be ineffective. A GOTO function is also defined. If we can, for a given grammar, compute the FOLLOW sets of all left-hand sides (undecidable in the general case), we can apply a variant of the usual SLR(1) test and construct a parsing table for a parser as described by Turnbull and Lee [13]. To obtain the LALR(1) definition, a look-ahead grammar system is defined that will, for each item in each state, generate the (unrestricted) language of all continuations after that item. If we can, for a given grammar, compute the FIRST sets of all these languages (undecidable in the general case), we can apply a variant of the usual LALR(1) test and construct a parsing table for a similar parser.