Parallel Parsing of Context-Free Grammars

Total Page:16

File Type:pdf, Size:1020Kb

Parallel Parsing of Context-Free Grammars Master Thesis Computer science Thesis no: MCS-2011-28 December 2011 Parallel parsing of context-free grammars Piotr Skrzypczak School of Computing Blekinge Institute of Technology SE-371 79 Karlskrona Sweden This thesis is submitted to the School of Computing at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Computer Science. The thesis is equivalent to 20 weeks of full time studies. Contact Information: Author(s): Piotr Skrzypczak 871216-P317 E-mail: [email protected] University advisor(s): Olgierd Unold, PhD, DSc Faculty of Electronics, Wroclaw University of Technology, Poland Bengt Aspvall, PhD School of Computing School of Computing Blekinge Institute of Technology Internet : www.bth.se/com SE-371 79 Karlskrona Phone : +46 455 38 50 00 Sweden Fax : +46 455 38 50 57 ABSTRACT During the last decade increasing interest in parallel programming can be observed. It is caused by a tendency of developing microprocessors as a multicore units, that can perform instructions simultaneously. Pop- ular and widely used example of such platform is a graphic processing unit (GPU). Its ability to perform calculations simpultaneously is being investigated as a way for improving performance of the complex algorithms. Therefore, GPU’s are now having the architectures that allows to use its computional power by programmers and software de- velopers in the same way as CPU. One of these architectures is CUDA platform, developed by nVidia. Aim of this thesis is to implement the parallel CYK algorithm, which is one of the most popular parsing algorithms, for CUDA platform, that will gain a significant speed-up in comparison with the sequential CYK algorithm. The thesis presents review of existing parallelisations of CYK algorithm, descriptions of implemented algorithms (basic version and few modifications), and experimental stage, that includes testing these versions for various inputs in order to justify which version of algorithm is giving the best performance. There are three versions of algorithm presented, from which one was selected as the best (giving about 10 times better performance for the longest instances of inputs). Also, a limited version of algorithm, that gives best performance (even 100 times better in comparison with non-limited sequential version), but requires some conditions to be fulfilled by grammar, is presented. The motivation for the thesis is to use the developed algorithm in GCS. 1 Contents 1 Introduction 7 1.1 Problem definition..........................................7 1.2 Background..............................................8 1.2.1 GCS..............................................8 1.2.2 CUDA............................................8 1.3 Aims and Objectives.........................................9 1.4 Research questions..........................................9 1.5 Expected outcomes..........................................9 1.6 Research methodology........................................ 10 1.7 Thesis outline............................................. 11 2 Context-free grammars 12 2.1 Grammar definition.......................................... 12 2.2 Chomsky chierarchy......................................... 12 2.3 Chomsky normal form........................................ 13 3 CYK algorithm 14 3.1 Parsing process............................................ 14 3.2 Sequential CYK algorithm...................................... 14 3.3 Existing parallelisations of CYK................................... 16 4 CUDA 19 4.1 Overview............................................... 19 4.2 CUDA Enabled GPU architecture.................................. 20 4.3 Programming model......................................... 20 4.4 CUDA compute capability versions................................. 23 4.5 Memory Levels............................................ 23 4.5.1 Global memory........................................ 23 4.5.2 Constant memory....................................... 24 4.5.3 Shared memory........................................ 24 4.5.4 Registers........................................... 24 4.5.5 Local memory........................................ 25 4.5.6 Texture/Surface memory................................... 25 4.6 Driver API vs Runtime API...................................... 25 4.7 CUDA C............................................... 25 4.8 Atomic operations........................................... 26 2 4.9 CUDA Occupancy Calculator..................................... 26 5 Application 28 5.1 Environment.............................................. 28 5.2 Data representation.......................................... 29 5.3 Parallel CYK............................................. 30 5.3.1 Basic version......................................... 30 5.3.2 Modified version....................................... 34 5.3.3 Version with shared memory................................. 35 6 Experiments 37 6.1 First round of experiments...................................... 38 6.1.1 Tests............................................. 38 6.1.2 Results............................................ 40 6.2 Second round of experiments..................................... 45 6.2.1 Tests............................................. 46 6.2.2 Results............................................ 46 6.3 Third round of experiments...................................... 47 6.3.1 Limited version........................................ 47 6.3.2 Tests............................................. 52 6.3.3 Results............................................ 52 7 Discussion 53 7.1 Research questions.......................................... 53 7.2 Discussion of time complexity.................................... 54 7.3 Discussion of validity......................................... 55 7.4 Conclusion.............................................. 55 References 55 A CUDA 2.x compute capability 58 B GeForce 440 GT specification 59 C Descriptions of the grammars used in tests 60 C.1 First and second round........................................ 60 C.1.1 Grammar 1.......................................... 60 C.1.2 Grammar 2.......................................... 60 C.1.3 Grammar 3.......................................... 60 3 C.2 Third round.............................................. 61 C.2.1 Grammar 1.......................................... 61 C.2.2 Grammar 2.......................................... 61 C.2.3 Grammar 3.......................................... 61 C.2.4 Grammar 4.......................................... 62 C.2.5 Grammar 5.......................................... 62 4 List of Tables 1 Times of algorithms’ executions, first round, grammar 1....................... 38 2 Times of algorithms’ executions, first round, grammar 2....................... 39 3 Times of algorithms’ executions, first round, grammar 3....................... 39 4 Times of algorithms’ executions, second round, grammar 1..................... 42 5 Times of algorithms’ executions, second round, grammar 2..................... 43 6 Times of algorithms’ executions, second round, grammar 3..................... 44 7 Times of algorithms’ executions, third round, grammar 1...................... 49 8 Times of algorithms’ executions, third round, grammar 2...................... 49 9 Times of algorithms’ executions, third round, grammar 3...................... 50 10 Times of algorithms’ executions, third round, grammar 4...................... 50 11 Times of algorithms’ executions, third round, grammar 5...................... 51 12 CUDA 2.x technical specifications.................................. 58 5 List of Figures 1 CYK Algorithm run.......................................... 16 2 Cells important for one thread.................................... 17 3 Architecture of CUDA enabled GPU................................. 21 4 Threads configuration within CUDA application........................... 22 5 one row of CYK table......................................... 30 6 Threads within basic version..................................... 32 7 Threads within modified version................................... 34 8 CUDA Occupancy Calculator output for 17 and 18 registers per thread.............. 35 9 Relationship between length of input and algorithms’ performance - Grammar 1, 1st round.... 40 10 Relationship between length of input and algorithms’ performance - Grammar 2, 1st round.... 40 11 Relationship between length of input and algorithms’ performance - Grammar 3, 1st round.... 41 12 Relationship between length of input and algorithms’ performance - Grammar 1, 2nd round... 41 13 Relationship between length of input and algorithms’ performance - Grammar 2, 2nd round... 45 14 Relationship between length of input and algorithms’ performance - Grammar 3, 2nd round... 45 15 CUDA Occupancy Calculator output for 11 registers per thread.................. 46 16 Relationship between length of input and algorithms’ performance - Grammar 4, 3rd round... 51 17 Relationship between length of input and algorithms’ performance - Grammar 5, 3rd round... 52 18 Relationship between length of input, size of grammar and modified algorithm’s performance... 53 19 Relationship between length of input, size of grammar and limited algorithm’s performance... 53 20 GeForce 440 GT specification.................................... 59 6 1 Introduction 1.1 Problem definition Parsing is a process of syntactic analysis of input data, given as a text, in order to determine its grammar structure, according to the given grammar. Usually output
Recommended publications
  • The Earley Algorithm Is to Avoid This, by Only Building Constituents That Are Compatible with the Input Read So Far
    Earley Parsing Informatics 2A: Lecture 21 Shay Cohen 3 November 2017 1 / 31 1 The CYK chart as a graph What's wrong with CYK Adding Prediction to the Chart 2 The Earley Parsing Algorithm The Predictor Operator The Scanner Operator The Completer Operator Earley parsing: example Comparing Earley and CYK 2 / 31 We would have to split a given span into all possible subspans according to the length of the RHS. What is the complexity of such algorithm? Still O(n2) charts, but now it takes O(nk−1) time to process each cell, where k is the maximal length of an RHS. Therefore: O(nk+1). For CYK, k = 2. Can we do better than that? Note about CYK The CYK algorithm parses input strings in Chomsky normal form. Can you see how to change it to an algorithm with an arbitrary RHS length (of only nonterminals)? 3 / 31 Still O(n2) charts, but now it takes O(nk−1) time to process each cell, where k is the maximal length of an RHS. Therefore: O(nk+1). For CYK, k = 2. Can we do better than that? Note about CYK The CYK algorithm parses input strings in Chomsky normal form. Can you see how to change it to an algorithm with an arbitrary RHS length (of only nonterminals)? We would have to split a given span into all possible subspans according to the length of the RHS. What is the complexity of such algorithm? 3 / 31 Note about CYK The CYK algorithm parses input strings in Chomsky normal form.
    [Show full text]
  • LATE Ain't Earley: a Faster Parallel Earley Parser
    LATE Ain’T Earley: A Faster Parallel Earley Parser Peter Ahrens John Feser Joseph Hui [email protected] [email protected] [email protected] July 18, 2018 Abstract We present the LATE algorithm, an asynchronous variant of the Earley algorithm for pars- ing context-free grammars. The Earley algorithm is naturally task-based, but is difficult to parallelize because of dependencies between the tasks. We present the LATE algorithm, which uses additional data structures to maintain information about the state of the parse so that work items may be processed in any order. This property allows the LATE algorithm to be sped up using task parallelism. We show that the LATE algorithm can achieve a 120x speedup over the Earley algorithm on a natural language task. 1 Introduction Improvements in the efficiency of parsers for context-free grammars (CFGs) have the potential to speed up applications in software development, computational linguistics, and human-computer interaction. The Earley parser has an asymptotic complexity that scales with the complexity of the CFG, a unique, desirable trait among parsers for arbitrary CFGs. However, while the more commonly used Cocke-Younger-Kasami (CYK) [2, 5, 12] parser has been successfully parallelized [1, 7], the Earley algorithm has seen relatively few attempts at parallelization. Our research objectives were to understand when there exists parallelism in the Earley algorithm, and to explore methods for exploiting this parallelism. We first tried to naively parallelize the Earley algorithm by processing the Earley items in each Earley set in parallel. We found that this approach does not produce any speedup, because the dependencies between Earley items force much of the work to be performed sequentially.
    [Show full text]
  • Backtrack Parsing Context-Free Grammar Context-Free Grammar
    Context-free Grammar Problems with Regular Context-free Grammar Language and Is English a regular language? Bad question! We do not even know what English is! Two eggs and bacon make(s) a big breakfast Backtrack Parsing Can you slide me the salt? He didn't ought to do that But—No! Martin Kay I put the wine you brought in the fridge I put the wine you brought for Sandy in the fridge Should we bring the wine you put in the fridge out Stanford University now? and University of the Saarland You said you thought nobody had the right to claim that they were above the law Martin Kay Context-free Grammar 1 Martin Kay Context-free Grammar 2 Problems with Regular Problems with Regular Language Language You said you thought nobody had the right to claim [You said you thought [nobody had the right [to claim that they were above the law that [they were above the law]]]] Martin Kay Context-free Grammar 3 Martin Kay Context-free Grammar 4 Problems with Regular Context-free Grammar Language Nonterminal symbols ~ grammatical categories Is English mophology a regular language? Bad question! We do not even know what English Terminal Symbols ~ words morphology is! They sell collectables of all sorts Productions ~ (unordered) (rewriting) rules This concerns unredecontaminatability Distinguished Symbol This really is an untiable knot. But—Probably! (Not sure about Swahili, though) Not all that important • Terminals and nonterminals are disjoint • Distinguished symbol Martin Kay Context-free Grammar 5 Martin Kay Context-free Grammar 6 Context-free Grammar Context-free
    [Show full text]
  • Adaptive LL(*) Parsing: the Power of Dynamic Analysis
    Adaptive LL(*) Parsing: The Power of Dynamic Analysis Terence Parr Sam Harwell Kathleen Fisher University of San Francisco University of Texas at Austin Tufts University [email protected] [email protected] kfi[email protected] Abstract PEGs are unambiguous by definition but have a quirk where Despite the advances made by modern parsing strategies such rule A ! a j ab (meaning “A matches either a or ab”) can never as PEG, LL(*), GLR, and GLL, parsing is not a solved prob- match ab since PEGs choose the first alternative that matches lem. Existing approaches suffer from a number of weaknesses, a prefix of the remaining input. Nested backtracking makes de- including difficulties supporting side-effecting embedded ac- bugging PEGs difficult. tions, slow and/or unpredictable performance, and counter- Second, side-effecting programmer-supplied actions (muta- intuitive matching strategies. This paper introduces the ALL(*) tors) like print statements should be avoided in any strategy that parsing strategy that combines the simplicity, efficiency, and continuously speculates (PEG) or supports multiple interpreta- predictability of conventional top-down LL(k) parsers with the tions of the input (GLL and GLR) because such actions may power of a GLR-like mechanism to make parsing decisions. never really take place [17]. (Though DParser [24] supports The critical innovation is to move grammar analysis to parse- “final” actions when the programmer is certain a reduction is time, which lets ALL(*) handle any non-left-recursive context- part of an unambiguous final parse.) Without side effects, ac- free grammar. ALL(*) is O(n4) in theory but consistently per- tions must buffer data for all interpretations in immutable data forms linearly on grammars used in practice, outperforming structures or provide undo actions.
    [Show full text]
  • Lecture 10: CYK and Earley Parsers Alvin Cheung Building Parse Trees Maaz Ahmad CYK and Earley Algorithms Talia Ringer More Disambiguation
    Hack Your Language! CSE401 Winter 2016 Introduction to Compiler Construction Ras Bodik Lecture 10: CYK and Earley Parsers Alvin Cheung Building Parse Trees Maaz Ahmad CYK and Earley algorithms Talia Ringer More Disambiguation Ben Tebbs 1 Announcements • HW3 due Sunday • Project proposals due tonight – No late days • Review session this Sunday 6-7pm EEB 115 2 Outline • Last time we saw how to construct AST from parse tree • We will now discuss algorithms for generating parse trees from input strings 3 Today CYK parser builds the parse tree bottom up More Disambiguation Forcing the parser to select the desired parse tree Earley parser solves CYK’s inefficiency 4 CYK parser Parser Motivation • Given a grammar G and an input string s, we need an algorithm to: – Decide whether s is in L(G) – If so, generate a parse tree for s • We will see two algorithms for doing this today – Many others are available – Each with different tradeoffs in time and space 6 CYK Algorithm • Parsing algorithm for context-free grammars • Invented by John Cocke, Daniel Younger, and Tadao Kasami • Basic idea given string s with n tokens: 1. Find production rules that cover 1 token in s 2. Use 1. to find rules that cover 2 tokens in s 3. Use 2. to find rules that cover 3 tokens in s 4. … N. Use N-1. to find rules that cover n tokens in s. If succeeds then s is in L(G), else it is not 7 A graphical way to visualize CYK Initial graph: the input (terminals) Repeat: add non-terminal edges until no more can be added.
    [Show full text]
  • Sequence Alignment/Map Format Specification
    Sequence Alignment/Map Format Specification The SAM/BAM Format Specification Working Group 3 Jun 2021 The master version of this document can be found at https://github.com/samtools/hts-specs. This printing is version 53752fa from that repository, last modified on the date shown above. 1 The SAM Format Specification SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section. If present, the header must be prior to the alignments. Header lines start with `@', while alignment lines do not. Each alignment line has 11 mandatory fields for essential alignment information such as mapping position, and variable number of optional fields for flexible or aligner specific information. This specification is for version 1.6 of the SAM and BAM formats. Each SAM and BAMfilemay optionally specify the version being used via the @HD VN tag. For full version history see Appendix B. Unless explicitly specified elsewhere, all fields are encoded using 7-bit US-ASCII 1 in using the POSIX / C locale. Regular expressions listed use the POSIX / IEEE Std 1003.1 extended syntax. 1.1 An example Suppose we have the following alignment with bases in lowercase clipped from the alignment. Read r001/1 and r001/2 constitute a read pair; r003 is a chimeric read; r004 represents a split alignment. Coor 12345678901234 5678901234567890123456789012345 ref AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCAT +r001/1 TTAGATAAAGGATA*CTG +r002 aaaAGATAA*GGATA +r003 gcctaAGCTAA +r004 ATAGCT..............TCAGC -r003 ttagctTAGGC -r001/2 CAGCGGCAT The corresponding SAM format is:2 1Charset ANSI X3.4-1968 as defined in RFC1345.
    [Show full text]
  • Lexing and Parsing with ANTLR4
    Lab 2 Lexing and Parsing with ANTLR4 Objective • Understand the software architecture of ANTLR4. • Be able to write simple grammars and correct grammar issues in ANTLR4. EXERCISE #1 Lab preparation Ï In the cap-labs directory: git pull will provide you all the necessary files for this lab in TP02. You also have to install ANTLR4. 2.1 User install for ANTLR4 and ANTLR4 Python runtime User installation steps: mkdir ~/lib cd ~/lib wget http://www.antlr.org/download/antlr-4.7-complete.jar pip3 install antlr4-python3-runtime --user Then in your .bashrc: export CLASSPATH=".:$HOME/lib/antlr-4.7-complete.jar:$CLASSPATH" export ANTLR4="java -jar $HOME/lib/antlr-4.7-complete.jar" alias antlr4="java -jar $HOME/lib/antlr-4.7-complete.jar" alias grun='java org.antlr.v4.gui.TestRig' Then source your .bashrc: source ~/.bashrc 2.2 Structure of a .g4 file and compilation Links to a bit of ANTLR4 syntax : • Lexical rules (extended regular expressions): https://github.com/antlr/antlr4/blob/4.7/doc/ lexer-rules.md • Parser rules (grammars) https://github.com/antlr/antlr4/blob/4.7/doc/parser-rules.md The compilation of a given .g4 (for the PYTHON back-end) is done by the following command line: java -jar ~/lib/antlr-4.7-complete.jar -Dlanguage=Python3 filename.g4 or if you modified your .bashrc properly: antlr4 -Dlanguage=Python3 filename.g4 2.3 Simple examples with ANTLR4 EXERCISE #2 Demo files Ï Work your way through the five examples in the directory demo_files: Aurore Alcolei, Laure Gonnord, Valentin Lorentz. 1/4 ENS de Lyon, Département Informatique, M1 CAP Lab #2 – Automne 2017 ex1 with ANTLR4 + Java : A very simple lexical analysis1 for simple arithmetic expressions of the form x+3.
    [Show full text]
  • A Mobile App for Teaching Formal Languages and Automata
    Received: 21 December 2017 | Accepted: 19 March 2018 DOI: 10.1002/cae.21944 SPECIAL ISSUE ARTICLE A mobile app for teaching formal languages and automata Carlos H. Pereira | Ricardo Terra Department of Computer Science, Federal University of Lavras, Lavras, Brazil Abstract Formal Languages and Automata (FLA) address mathematical models able to Correspondence Ricardo Terra, Department of Computer specify and recognize languages, their properties and characteristics. Although Science, Federal University of Lavras, solid knowledge of FLA is extremely important for a B.Sc. degree in Computer Postal Code 3037, Lavras, Brazil. Science and similar fields, the algorithms and techniques covered in the course Email: [email protected] are complex and difficult to assimilate. Therefore, this article presents FLApp, Funding information a mobile application—which we consider the new way to reach students—for FAPEMIG (Fundação de Amparo à teaching FLA. The application—developed for mobile phones and tablets Pesquisa do Estado de Minas Gerais) running Android—provides students not only with answers to problems involving Regular, Context-free, Context-Sensitive, and Recursively Enumer- able Languages, but also an Educational environment that describes and illustrates each step of the algorithms to support students in the learning process. KEYWORDS automata, education, formal languages, mobile application 1 | INTRODUCTION In this article, we present FLApp (Formal Languages and Automata Application), a mobile application for teaching FLA Formal Languages and Automata (FLA) is an important area that helps students by solving problems involving Regular, of Computer Science that approaches mathematical models Context-free, Context-Sensitive, and Recursively Enumerable able to specify and recognize languages, their properties and Languages (levels 3 to 0, respectively), in addition to create an characteristics [14].
    [Show full text]
  • Compiler Design
    CCOOMMPPIILLEERR DDEESSIIGGNN -- PPAARRSSEERR http://www.tutorialspoint.com/compiler_design/compiler_design_parser.htm Copyright © tutorialspoint.com In the previous chapter, we understood the basic concepts involved in parsing. In this chapter, we will learn the various types of parser construction methods available. Parsing can be defined as top-down or bottom-up based on how the parse-tree is constructed. Top-Down Parsing We have learnt in the last chapter that the top-down parsing technique parses the input, and starts constructing a parse tree from the root node gradually moving down to the leaf nodes. The types of top-down parsing are depicted below: Recursive Descent Parsing Recursive descent is a top-down parsing technique that constructs the parse tree from the top and the input is read from left to right. It uses procedures for every terminal and non-terminal entity. This parsing technique recursively parses the input to make a parse tree, which may or may not require back-tracking. But the grammar associated with it ifnotleftfactored cannot avoid back- tracking. A form of recursive-descent parsing that does not require any back-tracking is known as predictive parsing. This parsing technique is regarded recursive as it uses context-free grammar which is recursive in nature. Back-tracking Top- down parsers start from the root node startsymbol and match the input string against the production rules to replace them ifmatched. To understand this, take the following example of CFG: S → rXd | rZd X → oa | ea Z → ai For an input string: read, a top-down parser, will behave like this: It will start with S from the production rules and will match its yield to the left-most letter of the input, i.e.
    [Show full text]
  • An Earley Parsing Algorithm for Range Concatenation Grammars
    An Earley Parsing Algorithm for Range Concatenation Grammars Laura Kallmeyer Wolfgang Maier Yannick Parmentier SFB 441 SFB 441 CNRS - LORIA Universitat¨ Tubingen¨ Universitat¨ Tubingen¨ Nancy Universite´ 72074 Tubingen,¨ Germany 72074 Tubingen,¨ Germany 54506 Vandœuvre, France [email protected] [email protected] [email protected] Abstract class of LCFRS has received more attention con- cerning parsing (Villemonte de la Clergerie, 2002; We present a CYK and an Earley-style Burden and Ljunglof,¨ 2005). This article proposes algorithm for parsing Range Concatena- new CYK and Earley parsers for RCG, formulat- tion Grammar (RCG), using the deduc- ing them in the framework of parsing as deduction tive parsing framework. The characteris- (Shieber et al., 1995). The second section intro- tic property of the Earley parser is that we duces necessary definitions. Section 3 presents a use a technique of range boundary con- CYK-style algorithm and Section 4 extends this straint propagation to compute the yields with an Earley-style prediction. of non-terminals as late as possible. Ex- periments show that, compared to previ- 2 Preliminaries ous approaches, the constraint propagation The rules (clauses) of RCGs1 rewrite predicates helps to considerably decrease the number ranging over parts of the input by other predicates. of items in the chart. E.g., a clause S(aXb) S(X) signifies that S is → 1 Introduction true for a part of the input if this part starts with an a, ends with a b, and if, furthermore, S is also true RCGs (Boullier, 2000) have recently received a for the part between a and b.
    [Show full text]
  • 7.2 Finite State Machines Finite State Machines Are Another Way to Specify the Syntax of a Sentence in a Lan- Guage
    32397_CH07_331_388.qxd 1/14/09 5:01 PM Page 346 346 Chapter 7 Language Translation Principles Context Sensitivity of C++ It appears from Figure 7.8 that the C++ language is context-free. Every production rule has only a single nonterminal on the left side. This is in contrast to a context- sensitive grammar, which can have more than a single nonterminal on the left, as in Figure 7.3. Appearances are deceiving. Even though the grammar for this subset of C++, as well as the full standard C++ language, is context-free, the language itself has some context-sensitive aspects. Consider the grammar in Figure 7.3. How do its rules of production guarantee that the number of c’s at the end of a string must equal the number of a’s at the beginning of the string? Rules 1 and 2 guarantee that for each a generated, exactly one C will be generated. Rule 3 lets the C commute to the right of B. Finally, Rule 5 lets you substitute c for C in the context of having a b to the left of C. The lan- guage could not be specified by a context-free grammar because it needs Rules 3 and 5 to get the C’s to the end of the string. There are context-sensitive aspects of the C++ language that Figure 7.8 does not specify. For example, the definition of <parameter-list> allows any number of formal parameters, and the definition of <argument-expression-list> allows any number of actual parameters.
    [Show full text]
  • Principles and Implementation of Deductive Parsing
    Principles and implementation of deductive parsing The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Stuart M. Shieber, Yves Schabes, and Fernando C. N. Pereira. Principles and implementation of deductive parsing. Journal of Logic Programming, 24(1-2):3-36, July-August 1995. Also available as cmp-lg/9404008. Published Version http://dx.doi.org/10.1016/0743-1066(95)00035-I Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:2031716 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA J. LOGIC PROGRAMMING 1995:24(1{2):3{36 3 PRINCIPLES AND IMPLEMENTATION OF DEDUCTIVE PARSING STUART M. SHIEBER, YVES SCHABES, ∗ AND FERNANDO C. N. PEREIRAy £ We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such de- duction systems so as to implement the corresponding parser. The method generalizes easily to parsers for augmented phrase structure formalisms, such as definite-clause grammars and other logic grammar formalisms, and has been used for rapid prototyping of parsing algorithms for a variety of formalisms including variants of tree-adjoining grammars, categorial gram- mars, and lexicalized context-free grammars. ¡ 1. INTRODUCTION Parsing can be viewed as a deductive process that seeks to prove claims about the grammatical status of a string from assumptions describing the grammatical properties of the string's elements and the linear order between them.
    [Show full text]