Parsing: Process of Analyzing with the Rules of a Formal Grammar

Total Page:16

File Type:pdf, Size:1020Kb

Parsing: Process of Analyzing with the Rules of a Formal Grammar Chhillar N. et al., J. Harmoniz. Res. Eng., 2013, 1(2), 73-79 • - Journal Of Harmonized Research (JOHR) Journal Of Harmonized Research in Engineering 1(2), 2013, 73-79 ISSN 2347 – 7393 Original Research Article PARSING: PROCESS OF ANALYZING WITH THE RULES OF A FORMAL GRAMMAR Nikita Chhillar, Nisha Yadav, Neha Jaiswal Department of Computer Science and Engineering, Dronacharya College of Engineering, Khentawas, Farukhnagar, Gurgaon, India Abstract: Parsing is the process of structuring a linear representation in accordance with a given grammar. The “linear representation” may be a sentence, a computer program, knitting pattern, a sequence of geological strata, a piece of music, actions in ritual behavior, in short any linear sequence in which the preceding elements in some way restrict† the next element. For some of the examples the grammar is well-known, for some it is an object of research and for some our notion of a grammar is only just beginning to take shape. Keywords: Parsing, Grammar, Bottom -up parsing , Top -down parsing, Parser. Introduction: the aid of devices such as sentence diagrams. Parsing or syntactic analysis is the process of Within computational linguistics the term is analyzing a string of symbols, either in natural used to refer to the formal analysis by language or in computer languages, according computer of a sentence or other string of to the rules of a formal grammar. The term words into its constituents, resulting in a parse parsing comes from Latin pars meaning part tree showing their syntactic relation to each (of speech). other, which may also contain semantic and The term has slightly different meanings in other information. different branches of linguistics and computer The term is also used in psycholinguistics science. Traditional sentence parsing is often when describing language comprehension. In performed as a method of understanding the this context, parsing refers to the way that exact meaning of a sentence, sometimes with human beings analyze a sentence or phrase (in spoken language or text) "in terms of For Correspondence: grammatical constituents, identifying the parts nikitachhillarATyahoo.com of speech, syntactic relations, etc." This term Received on: October 2013 is especially common when discussing what Accepted after revision: December 2013 linguistic cues help speakers to interpret garden-path sentences. Downloaded from: www.johronline.com Within computer science, the term is used in the analysis of computer languages, referring www.johronline.com 73 | P a g e Chhillar N. et al., J. Harmoniz. Res. Eng., 2013, 1(2), 73-79 to the syntactic analysis of the input code into this type is identified as NP-complete. Head- its component parts in order to facilitate the driven phrase structure grammar is another writing of compilers and interpreters. linguistic formalism that has been accepted in 1) Traditional methods: the parsing community, but further research The traditional grammatical exercise of efforts have focused on simple formalisms parsing, are known as clause analysis , such as the one used in the Penn Treebank. involves splitting a text into its component Shallow parsing aims to locate only the parts of speech with an explanation of the boundaries of major constituents like noun form, purpose, and syntactic relationship of phrases. Another admired strategy for each part. This is determined in many part avoiding linguistic controversy is dependency from study of the language's conjugations and grammar parsing. declensions, which can be quite complex for Most modern parsers are at least partially heavily inflected languages. To parse a phrase statistical; that is, they rely on a body of such as 'Kittu saw monkey' involves noting training data which has already been that the singular noun 'Kittu' is the subject of interpreted (parsed by hand). This approach the sentence, the verb 'saw' is the third person permits the system to gather information about singular of the past tense of the verb 'to see', the frequency with which different and the singular noun 'monkey' is the object of constructions occur in specific contexts. the sentence. Techniques such as sentence Approaches which have been used consist of diagrams are used to indicate relation between straightforward PCFGs (probabilistic context- elements in the sentence. free grammars), maximum entropy, and neural 2) Computational methods: nets. Most of the successful systems use In some machine translation and natural lexical statistics (that is, they consider the language processing systems, written texts in identities of the words involved, as well as human languages are parsed by computer their part of speech). However such systems programs . Human sentences are not easily are vulnerable to over fitting and need some parsed via programs, as there is substantial kind of smoothing to be effective. ambiguity inside the structure of human Parsing algorithms used for natural language language, whose usage is to convey meaning cannot rely on the grammar having 'good' (or semantics) in a potentially unlimited range properties as with manually designed of possibilities however only some of which grammars for programming languages. As are germane to the particular case. So an mentioned before some grammar formalisms utterance "Kittu saw monkey" versus are very difficult to parse computationally; in "Monkey saw Kittu" is definite on one detail general, even if the desired structure is not but in another language might appear as "Kittu context-free, some type of context-free monkey saw" with a reliance on the larger approximation to the grammar is used to context to distinguish between those two perform a first pass. Algorithms which make possibilities, if indeed that difference was of use context-free grammars often rely on some concern. It is difficult to prepare formal rules alternative of the CKY algorithm, usually with to describe informal behavior even though it is some heuristic to prune away unlikely clear that some rules are being followed. analyses to keep time. However some systems To parse natural language data, researchers trade speed for accurateness using, example should first have same opinion on the linear-time versions of the shift-reduce grammar to be used. The selection of syntax is algorithm. A recent development has been affected by both linguistic and computational parse reranking that the parser proposes concerns; for example some parsing systems several large numbers of analyses, and a more make use of lexical functional grammar, complex system picks the best option. whereas in general, parsing for grammars of www.johronline.com 74 | P a g e Chhillar N. et al., J. Harmoniz. Res. Eng., 2013, 1(2), 73-79 3) Overview of process: of identifiers. These rules can be formally conveyed with attribute grammars. The last phase is semantic parsing or analysis that works out the implications of the expression just validated and taking the suitable action. Calculator or interpreter evaluates the expression or program, a compiler, would generate some kind of code. Attribute grammars can also be used to describe these actions. Types of parsing: The task of the parser is basically to determine how the input can be derived from the start symbol of the grammar. This can be done in basically two ways: • Top-down parsing- Top-down parsing can be viewed as an approach to find left-most origin of an input-stream by searching for parse trees using a top-down extension of the given formal grammar rules. Tokens are used from left to right. Complete choice is used to hold ambiguity by expanding every alternative right-hand- These examples demonstrate the general case side of grammar rules. of parsing a computer language by two levels • Bottom-up parsing - A parser can initiate of grammar: lexical and syntactic. with the input and approach to rewrite it to The first step is the token generation, or the start symbol. Intuitively, the parser lexical analysis, in which the input character attempts to place the most basic elements, stream is split into meaningful symbols then the elements include these, and so on. defined with a grammar of regular LR parsers are instances of bottom-up expressions. For example, a calculator parsers. Another term used for this sort of program would come across an input such as parser is Shift-Reduce parsing. "12*(3+4^2" and split this into the tokens 12, LL parsers and recursive-descent parser are *, (, 3, +, 4,), ^, 2, each of which is a examples of top-down parsers that cannot significant symbol in the context of an accommodate left recursive production rules. arithmetic expression. The lexer would Although it has been assumed that simple contain rules that tell it, the characters *, +, ^, implementations of top-down parsing cannot (and) mark the beginning of a new token, so hold direct and indirect left-recursion and may meaningless tokens such as "12*" or "(3" will need exponential time and space complexity not be generated). as parsing ambiguous context-free grammars, The next stage is parsing or syntactic analysis, extra sophisticated algorithms for top-down which examines that the tokens form an parsing have been produced by Frost, Hafiz, acceptable expression. This is generally done and Callaghan that accommodate ambiguity with reference to a context-free grammar and left recursion in polynomial time and that which recursively defines parts that can make generate polynomial-size depictions of the up an expression and the arrangement in potentially exponential number of parse trees. which they must appear. However, not all Their algorithm is capable to produce both rules defining programming languages can be left-most and right-most derivations of an conveyed by context-free grammars alone, for input with respect to a given CFG (context- example type validity and proper declaration free grammar). www.johronline.com 75 | P a g e Chhillar N. et al., J. Harmoniz. Res. Eng., 2013, 1(2), 73-79 An important dissimilarity with regard to parsing by applying each production rule to parsers is whether a parser produces a leftmost the received symbols, working from the left- derivation or a rightmost derivation.
Recommended publications
  • Regular Expressions with a Brief Intro to FSM
    Regular Expressions with a brief intro to FSM 15-123 Systems Skills in C and Unix Case for regular expressions • Many web applications require pattern matching – look for <a href> tag for links – Token search • A regular expression – A pattern that defines a class of strings – Special syntax used to represent the class • Eg; *.c - any pattern that ends with .c Formal Languages • Formal language consists of – An alphabet – Formal grammar • Formal grammar defines – Strings that belong to language • Formal languages with formal semantics generates rules for semantic specifications of programming languages Automaton • An automaton ( or automata in plural) is a machine that can recognize valid strings generated by a formal language . • A finite automata is a mathematical model of a finite state machine (FSM), an abstract model under which all modern computers are built. Automaton • A FSM is a machine that consists of a set of finite states and a transition table. • The FSM can be in any one of the states and can transit from one state to another based on a series of rules given by a transition function. Example What does this machine represents? Describe the kind of strings it will accept. Exercise • Draw a FSM that accepts any string with even number of A’s. Assume the alphabet is {A,B} Build a FSM • Stream: “I love cats and more cats and big cats ” • Pattern: “cat” Regular Expressions Regex versus FSM • A regular expressions and FSM’s are equivalent concepts. • Regular expression is a pattern that can be recognized by a FSM. • Regex is an example of how good theory leads to good programs Regular Expression • regex defines a class of patterns – Patterns that ends with a “*” • Regex utilities in unix – grep , awk , sed • Applications – Pattern matching (DNA) – Web searches Regex Engine • A software that can process a string to find regex matches.
    [Show full text]
  • LLLR Parsing: a Combination of LL and LR Parsing
    LLLR Parsing: a Combination of LL and LR Parsing Boštjan Slivnik University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia [email protected] Abstract A new parsing method called LLLR parsing is defined and a method for producing LLLR parsers is described. An LLLR parser uses an LL parser as its backbone and parses as much of its input string using LL parsing as possible. To resolve LL conflicts it triggers small embedded LR parsers. An embedded LR parser starts parsing the remaining input and once the LL conflict is resolved, the LR parser produces the left parse of the substring it has just parsed and passes the control back to the backbone LL parser. The LLLR(k) parser can be constructed for any LR(k) grammar. It produces the left parse of the input string without any backtracking and, if used for a syntax-directed translation, it evaluates semantic actions using the top-down strategy just like the canonical LL(k) parser. An LLLR(k) parser is appropriate for grammars where the LL(k) conflicting nonterminals either appear relatively close to the bottom of the derivation trees or produce short substrings. In such cases an LLLR parser can perform a significantly better error recovery than an LR parser since the most part of the input string is parsed with the backbone LL parser. LLLR parsing is similar to LL(∗) parsing except that it (a) uses LR(k) parsers instead of finite automata to resolve the LL(k) conflicts and (b) does not perform any backtracking.
    [Show full text]
  • Generating Context-Free Grammars Using Classical Planning
    Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Generating Context-Free Grammars using Classical Planning Javier Segovia-Aguas1, Sergio Jimenez´ 2, Anders Jonsson 1 1 Universitat Pompeu Fabra, Barcelona, Spain 2 University of Melbourne, Parkville, Australia [email protected], [email protected], [email protected] Abstract S ! aSa S This paper presents a novel approach for generating S ! bSb /|\ Context-Free Grammars (CFGs) from small sets of S ! a S a /|\ input strings (a single input string in some cases). a S a Our approach is to compile this task into a classical /|\ planning problem whose solutions are sequences b S b of actions that build and validate a CFG compli- | ant with the input strings. In addition, we show that our compilation is suitable for implementing the two canonical tasks for CFGs, string produc- (a) (b) tion and string recognition. Figure 1: (a) Example of a context-free grammar; (b) the corre- sponding parse tree for the string aabbaa. 1 Introduction A formal grammar is a set of symbols and rules that describe symbols in the grammar and (2), a bounded maximum size of how to form the strings of certain formal language. Usually the rules in the grammar (i.e. a maximum number of symbols two tasks are defined over formal grammars: in the right-hand side of the grammar rules). Our approach is compiling this inductive learning task into • Production : Given a formal grammar, generate strings a classical planning task whose solutions are sequences of ac- that belong to the language represented by the grammar.
    [Show full text]
  • Formal Grammar Specifications of User Interface Processes
    FORMAL GRAMMAR SPECIFICATIONS OF USER INTERFACE PROCESSES by MICHAEL WAYNE BATES ~ Bachelor of Science in Arts and Sciences Oklahoma State University Stillwater, Oklahoma 1982 Submitted to the Faculty of the Graduate College of the Oklahoma State University iri partial fulfillment of the requirements for the Degree of MASTER OF SCIENCE July, 1984 I TheSIS \<-)~~I R 32c-lf CO'f· FORMAL GRAMMAR SPECIFICATIONS USER INTER,FACE PROCESSES Thesis Approved: 'Dean of the Gra uate College ii tta9zJ1 1' PREFACE The benefits and drawbacks of using a formal grammar model to specify a user interface has been the primary focus of this study. In particular, the regular grammar and context-free grammar models have been examined for their relative strengths and weaknesses. The earliest motivation for this study was provided by Dr. James R. VanDoren at TMS Inc. This thesis grew out of a discussion about the difficulties of designing an interface that TMS was working on. I would like to express my gratitude to my major ad­ visor, Dr. Mike Folk for his guidance and invaluable help during this study. I would also like to thank Dr. G. E. Hedrick and Dr. J. P. Chandler for serving on my graduate committee. A special thanks goes to my wife, Susan, for her pa­ tience and understanding throughout my graduate studies. iii TABLE OF CONTENTS Chapter Page I. INTRODUCTION . II. AN OVERVIEW OF FORMAL LANGUAGE THEORY 6 Introduction 6 Grammars . • . • • r • • 7 Recognizers . 1 1 Summary . • • . 1 6 III. USING FOR~AL GRAMMARS TO SPECIFY USER INTER- FACES . • . • • . 18 Introduction . 18 Definition of a User Interface 1 9 Benefits of a Formal Model 21 Drawbacks of a Formal Model .
    [Show full text]
  • Compiler Design
    CCOOMMPPIILLEERR DDEESSIIGGNN -- PPAARRSSEERR http://www.tutorialspoint.com/compiler_design/compiler_design_parser.htm Copyright © tutorialspoint.com In the previous chapter, we understood the basic concepts involved in parsing. In this chapter, we will learn the various types of parser construction methods available. Parsing can be defined as top-down or bottom-up based on how the parse-tree is constructed. Top-Down Parsing We have learnt in the last chapter that the top-down parsing technique parses the input, and starts constructing a parse tree from the root node gradually moving down to the leaf nodes. The types of top-down parsing are depicted below: Recursive Descent Parsing Recursive descent is a top-down parsing technique that constructs the parse tree from the top and the input is read from left to right. It uses procedures for every terminal and non-terminal entity. This parsing technique recursively parses the input to make a parse tree, which may or may not require back-tracking. But the grammar associated with it ifnotleftfactored cannot avoid back- tracking. A form of recursive-descent parsing that does not require any back-tracking is known as predictive parsing. This parsing technique is regarded recursive as it uses context-free grammar which is recursive in nature. Back-tracking Top- down parsers start from the root node startsymbol and match the input string against the production rules to replace them ifmatched. To understand this, take the following example of CFG: S → rXd | rZd X → oa | ea Z → ai For an input string: read, a top-down parser, will behave like this: It will start with S from the production rules and will match its yield to the left-most letter of the input, i.e.
    [Show full text]
  • Algorithm for Analysis and Translation of Sentence Phrases
    Masaryk University Faculty}w¡¢£¤¥¦§¨ of Informatics!"#$%&'()+,-./012345<yA| Algorithm for Analysis and Translation of Sentence Phrases Bachelor’s thesis Roman Lacko Brno, 2014 Declaration Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Roman Lacko Advisor: RNDr. David Sehnal ii Acknowledgement I would like to thank my family and friends for their support. Special thanks go to my supervisor, RNDr. David Sehnal, for his attitude and advice, which was of invaluable help while writing this thesis; and my friend František Silváši for his help with the revision of this text. iii Abstract This thesis proposes a library with an algorithm capable of translating objects described by natural language phrases into their formal representations in an object model. The solution is not restricted by a specific language nor target model. It features a bottom-up chart parser capable of parsing any context-free grammar. Final translation of parse trees is carried out by the interpreter that uses rewrite rules provided by the target application. These rules can be extended by custom actions, which increases the usability of the library. This functionality is demonstrated by an additional application that translates description of motifs in English to objects of the MotiveQuery language. iv Keywords Natural language, syntax analysis, chart parsing,
    [Show full text]
  • Finite-State Automata and Algorithms
    Finite-State Automata and Algorithms Bernd Kiefer, [email protected] Many thanks to Anette Frank for the slides MSc. Computational Linguistics Course, SS 2009 Overview . Finite-state automata (FSA) – What for? – Recap: Chomsky hierarchy of grammars and languages – FSA, regular languages and regular expressions – Appropriate problem classes and applications . Finite-state automata and algorithms – Regular expressions and FSA – Deterministic (DFSA) vs. non-deterministic (NFSA) finite-state automata – Determinization: from NFSA to DFSA – Minimization of DFSA . Extensions: finite-state transducers and FST operations Finite-state automata: What for? Chomsky Hierarchy of Hierarchy of Grammars and Languages Automata . Regular languages . Regular PS grammar (Type-3) Finite-state automata . Context-free languages . Context-free PS grammar (Type-2) Push-down automata . Context-sensitive languages . Tree adjoining grammars (Type-1) Linear bounded automata . Type-0 languages . General PS grammars Turing machine computationally more complex less efficient Finite-state automata model regular languages Regular describe/specify expressions describe/specify Finite describe/specify Regular automata recognize languages executable! Finite-state MACHINE Finite-state automata model regular languages Regular describe/specify expressions describe/specify Regular Finite describe/specify Regular grammars automata recognize/generate languages executable! executable! • properties of regular languages • appropriate problem classes Finite-state • algorithms for FSA MACHINE Languages, formal languages and grammars . Alphabet Σ : finite set of symbols Σ . String : sequence x1 ... xn of symbols xi from the alphabet – Special case: empty string ε . Language over Σ : the set of strings that can be generated from Σ – Sigma star Σ* : set of all possible strings over the alphabet Σ Σ = {a, b} Σ* = {ε, a, b, aa, ab, ba, bb, aaa, aab, ...} – Sigma plus Σ+ : Σ+ = Σ* -{ε} Strings – Special languages: ∅ = {} (empty language) ≠ {ε} (language of empty string) .
    [Show full text]
  • The Use of Predicates in LL(K) and LR(K) Parser Generators
    Purdue University Purdue e-Pubs ECE Technical Reports Electrical and Computer Engineering 7-1-1993 The seU of Predicates In LL(k) And LR(k) Parser Generators (Technical Summary) T. J. Parr Purdue University School of Electrical Engineering R. W. Quong Purdue University School of Electrical Engineering H. G. Dietz Purdue University School of Electrical Engineering Follow this and additional works at: http://docs.lib.purdue.edu/ecetr Parr, T. J.; Quong, R. W.; and Dietz, H. G., "The sU e of Predicates In LL(k) And LR(k) Parser Generators (Technical Summary) " (1993). ECE Technical Reports. Paper 234. http://docs.lib.purdue.edu/ecetr/234 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. TR-EE 93-25 JULY 1993 The Use of Predicates In LL(k) And LR(k) Parser ene era tors? (Technical Summary) T. J. Purr, R. W. Quong, and H. G. Dietz School of Electrical Engineering Purdue University West Lafayette, IN 47907 (317) 494-1739 [email protected] Abstract Although existing LR(1) or U(1) parser generators suffice for many language recognition problems, writing a straightforward grammar to translate a complicated language, such as C++ or even C, remains a non-trivial task. We have often found that adding translation actions to the grammar is harder than writing the grammar itself. Part of the problem is that many languages are context-sensitive. Simple, natural descriptions of these languages escape current language tool technology because they were not designed to handle semantic information.
    [Show full text]
  • Using Contextual Representations to Efficiently Learn Context-Free
    JournalofMachineLearningResearch11(2010)2707-2744 Submitted 12/09; Revised 9/10; Published 10/10 Using Contextual Representations to Efficiently Learn Context-Free Languages Alexander Clark [email protected] Department of Computer Science, Royal Holloway, University of London Egham, Surrey, TW20 0EX United Kingdom Remi´ Eyraud [email protected] Amaury Habrard [email protected] Laboratoire d’Informatique Fondamentale de Marseille CNRS UMR 6166, Aix-Marseille Universite´ 39, rue Fred´ eric´ Joliot-Curie, 13453 Marseille cedex 13, France Editor: Fernando Pereira Abstract We present a polynomial update time algorithm for the inductive inference of a large class of context-free languages using the paradigm of positive data and a membership oracle. We achieve this result by moving to a novel representation, called Contextual Binary Feature Grammars (CBFGs), which are capable of representing richly structured context-free languages as well as some context sensitive languages. These representations explicitly model the lattice structure of the distribution of a set of substrings and can be inferred using a generalisation of distributional learning. This formalism is an attempt to bridge the gap between simple learnable classes and the sorts of highly expressive representations necessary for linguistic representation: it allows the learnability of a large class of context-free languages, that includes all regular languages and those context-free languages that satisfy two simple constraints. The formalism and the algorithm seem well suited to natural language and in particular to the modeling of first language acquisition. Pre- liminary experimental results confirm the effectiveness of this approach. Keywords: grammatical inference, context-free language, positive data only, membership queries 1.
    [Show full text]
  • QUESTION BANK SOLUTION Unit 1 Introduction to Finite Automata
    FLAT 10CS56 QUESTION BANK SOLUTION Unit 1 Introduction to Finite Automata 1. Obtain DFAs to accept strings of a’s and b’s having exactly one a.(5m )(Jun-Jul 10) 2. Obtain a DFA to accept strings of a’s and b’s having even number of a’s and b’s.( 5m )(Jun-Jul 10) L = {Œ,aabb,abab,baba,baab,bbaa,aabbaa,---------} 3. Give Applications of Finite Automata. (5m )(Jun-Jul 10) String Processing Consider finding all occurrences of a short string (pattern string) within a long string (text string). This can be done by processing the text through a DFA: the DFA for all strings that end with the pattern string. Each time the accept state is reached, the current position in the text is output. Finite-State Machines A finite-state machine is an FA together with actions on the arcs. Statecharts Statecharts model tasks as a set of states and actions. They extend FA diagrams. Lexical Analysis Dept of CSE, SJBIT 1 FLAT 10CS56 In compiling a program, the first step is lexical analysis. This isolates keywords, identifiers etc., while eliminating irrelevant symbols. A token is a category, for example “identifier”, “relation operator” or specific keyword. 4. Define DFA, NFA & Language? (5m)( Jun-Jul 10) Deterministic finite automaton (DFA)—also known as deterministic finite state machine—is a finite state machine that accepts/rejects finite strings of symbols and only produces a unique computation (or run) of the automaton for each input string. 'Deterministic' refers to the uniqueness of the computation. Nondeterministic finite automaton (NFA) or nondeterministic finite state machine is a finite state machine where from each state and a given input symbol the automaton may jump into several possible next states.
    [Show full text]
  • Part 1: Introduction
    Part 1: Introduction By: Morteza Zakeri PhD Student Iran University of Science and Technology Winter 2020 Agenda • What is ANTLR? • History • Motivation • What is New in ANTLR v4? • ANTLR Components: How it Works? • Getting Started with ANTLR v4 February 2020 Introduction to ANTLR – Morteza Zakeri 2 What is ANTLR? • ANTLR (pronounced Antler), or Another Tool For Language Recognition, is a parser generator that uses LL(*) for parsing. • ANTLR takes as input a grammar that specifies a language and generates as output source code for a recognizer for that language. • Supported generating code in Java, C#, JavaScript, Python2 and Python3. • ANTLR is recursive descent parser Generator! (See Appendix) February 2020 Introduction to ANTLR – Morteza Zakeri 3 Runtime Libraries and Code Generation Targets • There is no language specific code generators • There is only one tool, written in Java, which is able to generate Lexer and Parser code for all targets, through command line options. • The available targets are the following (2020): • Java, C#, C++, Swift, Python (2 and 3), Go, PHP, and JavaScript. • Read more: • https://github.com/antlr/antlr4/blob/master/doc/targets.md 11 February 2020 Introduction to ANTLR – Morteza Zakeri Runtime Libraries and Code Generation Targets • $ java -jar antlr4-4.8.jar -Dlanguage=CSharp MyGrammar.g4 • https://github.com/antlr/antlr4/tree/master/runtime/CSharp • https://github.com/tunnelvisionlabs/antlr4cs • $ java -jar antlr4-4.8.jar -Dlanguage=Cpp MyGrammar.g4 • https://github.com/antlr/antlr4/blob/master/doc/cpp-target.md • $ java -jar antlr4-4.8.jar -Dlanguage=Python3 MyGrammar.g4 • https://github.com/antlr/antlr4/blob/master/doc/python- target.md 11 February 2020 Introduction to ANTLR – Morteza Zakeri History • Initial release: • February 1992; 24 years ago.
    [Show full text]
  • Compiler Construction
    Compiler construction PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 10 Dec 2011 02:23:02 UTC Contents Articles Introduction 1 Compiler construction 1 Compiler 2 Interpreter 10 History of compiler writing 14 Lexical analysis 22 Lexical analysis 22 Regular expression 26 Regular expression examples 37 Finite-state machine 41 Preprocessor 51 Syntactic analysis 54 Parsing 54 Lookahead 58 Symbol table 61 Abstract syntax 63 Abstract syntax tree 64 Context-free grammar 65 Terminal and nonterminal symbols 77 Left recursion 79 Backus–Naur Form 83 Extended Backus–Naur Form 86 TBNF 91 Top-down parsing 91 Recursive descent parser 93 Tail recursive parser 98 Parsing expression grammar 100 LL parser 106 LR parser 114 Parsing table 123 Simple LR parser 125 Canonical LR parser 127 GLR parser 129 LALR parser 130 Recursive ascent parser 133 Parser combinator 140 Bottom-up parsing 143 Chomsky normal form 148 CYK algorithm 150 Simple precedence grammar 153 Simple precedence parser 154 Operator-precedence grammar 156 Operator-precedence parser 159 Shunting-yard algorithm 163 Chart parser 173 Earley parser 174 The lexer hack 178 Scannerless parsing 180 Semantic analysis 182 Attribute grammar 182 L-attributed grammar 184 LR-attributed grammar 185 S-attributed grammar 185 ECLR-attributed grammar 186 Intermediate language 186 Control flow graph 188 Basic block 190 Call graph 192 Data-flow analysis 195 Use-define chain 201 Live variable analysis 204 Reaching definition 206 Three address
    [Show full text]