Adaptive LL(*) Parsing: the Power of Dynamic Analysis

Total Page:16

File Type:pdf, Size:1020Kb

Adaptive LL(*) Parsing: the Power of Dynamic Analysis Adaptive LL(*) Parsing: The Power of Dynamic Analysis Terence Parr Sam Harwell Kathleen Fisher University of San Francisco University of Texas at Austin Tufts University [email protected] [email protected] kfi[email protected] Abstract PEGs are unambiguous by definition but have a quirk where Despite the advances made by modern parsing strategies such rule A ! a j ab (meaning “A matches either a or ab”) can never as PEG, LL(*), GLR, and GLL, parsing is not a solved prob- match ab since PEGs choose the first alternative that matches lem. Existing approaches suffer from a number of weaknesses, a prefix of the remaining input. Nested backtracking makes de- including difficulties supporting side-effecting embedded ac- bugging PEGs difficult. tions, slow and/or unpredictable performance, and counter- Second, side-effecting programmer-supplied actions (muta- intuitive matching strategies. This paper introduces the ALL(*) tors) like print statements should be avoided in any strategy that parsing strategy that combines the simplicity, efficiency, and continuously speculates (PEG) or supports multiple interpreta- predictability of conventional top-down LL(k) parsers with the tions of the input (GLL and GLR) because such actions may power of a GLR-like mechanism to make parsing decisions. never really take place [17]. (Though DParser [24] supports The critical innovation is to move grammar analysis to parse- “final” actions when the programmer is certain a reduction is time, which lets ALL(*) handle any non-left-recursive context- part of an unambiguous final parse.) Without side effects, ac- free grammar. ALL(*) is O(n4) in theory but consistently per- tions must buffer data for all interpretations in immutable data forms linearly on grammars used in practice, outperforming structures or provide undo actions. The former mechanism is general strategies such as GLL and GLR by orders of magni- limited by memory size and the latter is not always easy or pos- tude. ANTLR 4 generates ALL(*) parsers and supports direct sible. The typical approach to avoiding mutators is to construct left-recursion through grammar rewriting. Widespread ANTLR a parse tree for post-parse processing, but such artifacts funda- 4 use (5000 downloads/month in 2013) provides evidence that mentally limit parsing to input files whose trees fit in memory. ALL(*) is effective for a wide variety of applications. Parsers that build parse trees cannot analyze large data files or infinite streams, such as network traffic, unless they can be pro- 1. Introduction cessed in logical chunks. Computer language parsing is still not a solved problem in Third, our experiments (Section 7) show that GLL and GLR practice, despite the sophistication of modern parsing strategies can be slow and unpredictable in time and space. Their com- and long history of academic study. When machine resources plexities are, respectively, O(n3) and O(np+1) where p is the were scarce, it made sense to force programmers to contort length of the longest production in the grammar [14]. (GLR is their grammars to fit the constraints of deterministic LALR(k) typically quoted as O(n3) because Kipps [15] gave such an al- or LL(k) parser generators.1 As machine resources grew, re- gorithm, albeit with a constant so high as to be impractical.) In searchers developed more powerful, but more costly, nondeter- theory, general parsers should handle deterministic grammars ministic parsing strategies following both “bottom-up” (LR- in near-linear time. In practice, we found GLL and GLR to be style) and “top-down” (LL-style) approaches. Strategies in- ˜135x slower than ALL(*) on a corpus of 12,920 Java 6 library clude GLR [26], Parser Expression Grammar (PEG) [9], LL(*) source files (123M) and 6 orders of magnitude slower on a sin- [20] from ANTLR 3, and recently, GLL [25], a fully general gle 3.2M Java file, respectively. top-down strategy. LL(*) addresses these weaknesses by providing a mostly de- Although these newer strategies are much easier to use than terministic parsing strategy that uses regular expressions, rep- LALR(k) and LL(k) parser generators, they suffer from a resented as deterministic finite automata (DFA), to potentially variety of weaknesses. First, nondeterministic parsers some- examine the entire remaining input rather than the fixed k- times have unanticipated behavior. GLL and GLR return multi- sequences of LL(k). Using DFA for lookahead limits LL(*) ple parse trees (forests) for ambiguous grammars because they decisions to distinguishing alternatives with regular lookahead were designed to handle natural language grammars, which are languages, even though lookahead languages (set of all pos- often intentionally ambiguous. For computer languages, am- sible remaining input phrases) are often context-free. But the biguity is almost always an error. One can certainly walk the main problem is that the LL(*) grammar condition is statically constructed parse forest to disambiguate it, but that approach undecidable and grammar analysis sometimes fails to find reg- costs extra time, space, and machinery for the uncommon case. ular expressions that distinguish between alternative produc- tions. ANTLR 3’s static analysis detects and avoids potentially- 1 We use the term deterministic in the way that deterministic finite automata undecidable situations, failing over to backtracking parsing de- (DFA) differ from nondeterministic finite automata (NFA): The next symbol(s) uniquely determine action. cisions instead. This gives LL(*) the same a j ab quirk as PEGs 1 2014/3/24 for such decisions. Backtracking decisions that choose the first ALL(*) parsers handle the task of matching terminals and matching alternative also cannot detect obvious ambiguities expanding nonterminals with the simplicity of LL but have such A ! α j α where α is some sequence of grammar sym- O(n4) theoretical time complexity (Theorem 6.3) because in bols that makes α j α non-LL(*). the worst-case, the parser must make a prediction at each input symbol and each prediction must examine the entire remaining 1.1 Dynamic grammar analysis input; examining an input symbol can cost O(n2). O(n4) is in line with the complexity of GLR. In Section 7, we show In this paper, we introduce Adaptive LL(*) , or ALL(*) , parsers empirically that ALL(*) parsers for common languages are that combine the simplicity of deterministic top-down parsers efficient and exhibit linear behavior in practice. with the power of a GLR-like mechanism to make parsing de- The advantages of ALL(*) stem from moving grammar anal- cisions. Specifically, LL parsing suspends at each prediction ysis to parse time, but this choice places an additional bur- decision point (nonterminal) and then resumes once the pre- den on grammar functional testing. As with all dynamic ap- diction mechanism has chosen the appropriate production to proaches, programmers must cover as many grammar position expand. The critical innovation is to move grammar analysis to and input sequence combinations as possible to find grammar parse-time; no static grammar analysis is needed. This choice ambiguities. Standard source code coverage tools can help pro- lets us avoid the undecidability of static LL(*) grammar anal- grammers measure grammar coverage for ALL(*) parsers. High ysis and lets us generate correct parsers (Theorem 6.1) for any coverage in the generated code corresponds to high grammar non-left-recursive context-free grammar (CFG). While static coverage. analysis must consider all possible input sequences, dynamic The ALL(*) algorithm is the foundation of the ANTLR 4 analysis need only consider the finite collection of input se- parser generator (ANTLR 3 is based upon LL(*)). ANTLR 4 quences actually seen. was released in January 2013 and gets about 5000 download- The idea behind the ALL(*) prediction mechanism is to s/month (source, binary, or ANTLRworks2 development envi- launch subparsers at a decision point, one per alternative pro- ronment, counting non-robot entries in web logs with unique duction. The subparsers operate in pseudo-parallel to explore IP addresses to get a lower bound.) Such activity provides evi- all possible paths. Subparsers die off as their paths fail to match dence that ALL(*) is useful and usable. the remaining input. The subparsers advance through the input The remainder of this paper is organized as follows. We be- in lockstep so analysis can identify a sole survivor at the min- gin by introducing the ANTLR 4 parser generator (Section 2) imum lookahead depth that uniquely predicts a production. If and discussing the ALL(*) parsing strategy (Section 3). Next, multiple subparsers coalesce together or reach the end of file, we define predicated grammars, their augmented transition the predictor announces an ambiguity and resolves it in favor network representation, and lookahead DFA (Section 4). Then, of the lowest production number associated with a surviving we describe ALL(*) grammar analysis and present the pars- subparser. (Productions are numbered to express precedence as ing algorithm itself (Section 5). Finally, we support our claims an automatic means of resolving ambiguities like PEGs; Bi- regarding ALL(*) correctness (Section 6) and efficiency (Sec- son also resolves conflicts by choosing the production specified tion 7) and examine related work (Section 8). Appendix A has first.) Programmers can also embed semantic predicates [22] to proofs for key ALL(*) theorems, Appendix B discusses algo- choose between ambiguous interpretations. rithm pragmatics, Appendix C has left-recursion elimination ALL(*) parsers memoize analysis results, incrementally and details. dynamically building up a cache of DFA that map lookahead phrases to predicted productions. (We use the term analysis in the sense that ALL(*) analysis yields lookahead DFA like static 2. ANTLR 4 LL(*) analysis.) The parser can make future predictions at the ANTLR 4 accepts as input any context-free grammar that does same parser decision and lookahead phrase quickly by consult- not contain indirect or hidden left-recursion.2 From the gram- ing the cache.
Recommended publications
  • Derivatives of Parsing Expression Grammars
    Derivatives of Parsing Expression Grammars Aaron Moss Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada [email protected] This paper introduces a new derivative parsing algorithm for recognition of parsing expression gram- mars. Derivative parsing is shown to have a polynomial worst-case time bound, an improvement on the exponential bound of the recursive descent algorithm. This work also introduces asymptotic analysis based on inputs with a constant bound on both grammar nesting depth and number of back- tracking choices; derivative and recursive descent parsing are shown to run in linear time and constant space on this useful class of inputs, with both the theoretical bounds and the reasonability of the in- put class validated empirically. This common-case constant memory usage of derivative parsing is an improvement on the linear space required by the packrat algorithm. 1 Introduction Parsing expression grammars (PEGs) are a parsing formalism introduced by Ford [6]. Any LR(k) lan- guage can be represented as a PEG [7], but there are some non-context-free languages that may also be represented as PEGs (e.g. anbncn [7]). Unlike context-free grammars (CFGs), PEGs are unambiguous, admitting no more than one parse tree for any grammar and input. PEGs are a formalization of recursive descent parsers allowing limited backtracking and infinite lookahead; a string in the language of a PEG can be recognized in exponential time and linear space using a recursive descent algorithm, or linear time and space using the memoized packrat algorithm [6]. PEGs are formally defined and these algo- rithms outlined in Section 3.
    [Show full text]
  • Ece351 Lab Manual
    DEREK RAYSIDE & ECE351 STAFF ECE351 LAB MANUAL UNIVERSITYOFWATERLOO 2 derek rayside & ece351 staff Copyright © 2014 Derek Rayside & ECE351 Staff Compiled March 6, 2014 acknowledgements: • Prof Paul Ward suggested that we look into something with vhdl to have synergy with ece327. • Prof Mark Aagaard, as the ece327 instructor, consulted throughout the development of this material. • Prof Patrick Lam generously shared his material from the last offering of ece251. • Zhengfang (Alex) Duanmu & Lingyun (Luke) Li [1b Elec] wrote solutions to most labs in txl. • Jiantong (David) Gao & Rui (Ray) Kong [3b Comp] wrote solutions to the vhdl labs in antlr. • Aman Muthrej and Atulan Zaman [3a Comp] wrote solutions to the vhdl labs in Parboiled. • TA’s Jon Eyolfson, Vajih Montaghami, Alireza Mortezaei, Wenzhu Man, and Mohammed Hassan. • TA Wallace Wu developed the vhdl labs. • High school students Brian Engio and Tianyu Guo drew a number of diagrams for this manual, wrote Javadoc comments for the code, and provided helpful comments on the manual. Licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) version 2.5 or greater. http://creativecommons.org/licenses/by-sa/2.5/ca/ http://creativecommons.org/licenses/by-sa/3.0/ Contents 0 Overview 9 Compiler Concepts: call stack, heap 0.1 How the Labs Fit Together . 9 Programming Concepts: version control, push, pull, merge, SSH keys, IDE, 0.2 Learning Progressions . 11 debugger, objects, pointers 0.3 How this project compares to CS241, the text book, etc. 13 0.4 Student work load . 14 0.5 How this course compares to MIT 6.035 .......... 15 0.6 Where do I learn more? .
    [Show full text]
  • A Typed, Algebraic Approach to Parsing
    A Typed, Algebraic Approach to Parsing Neelakantan R. Krishnaswami Jeremy Yallop University of Cambridge University of Cambridge United Kingdom United Kingdom [email protected] [email protected] Abstract the subject have remained remarkably stable: context-free In this paper, we recall the definition of the context-free ex- languages are specified in BackusśNaur form, and parsers pressions (or µ-regular expressions), an algebraic presenta- for these specifications are implemented using algorithms tion of the context-free languages. Then, we define a core derived from automata theory. This integration of theory and type system for the context-free expressions which gives a practice has yielded many benefits: we have linear-time algo- compositional criterion for identifying those context-free ex- rithms for parsing unambiguous grammars efficiently [Knuth pressions which can be parsed unambiguously by predictive 1965; Lewis and Stearns 1968] and with excellent error re- algorithms in the style of recursive descent or LL(1). porting for bad input strings [Jeffery 2003]. Next, we show how these typed grammar expressions can However, in languages with good support for higher-order be used to derive a parser combinator library which both functions (such as ML, Haskell, and Scala) it is very popular guarantees linear-time parsing with no backtracking and to use parser combinators [Hutton 1992] instead. These li- single-token lookahead, and which respects the natural de- braries typically build backtracking recursive descent parsers notational semantics of context-free expressions. Finally, we by composing higher-order functions. This allows building show how to exploit the type information to write a staged up parsers with the ordinary abstraction features of the version of this library, which produces dramatic increases programming language (variables, data structures and func- in performance, even outperforming code generated by the tions), which is sufficiently beneficial that these libraries are standard parser generator tool ocamlyacc.
    [Show full text]
  • Implementation of Processing in Racket 1 Introduction
    P2R Implementation of Processing in Racket Hugo Correia [email protected] Instituto Superior T´ecnico University of Lisbon Abstract. Programming languages are being introduced in several ar- eas of expertise, including design and architecture. Processing is an ex- ample of one of these languages that was created to teach architects and designers how to program. In spite of offering a wide set of features, Pro- cessing does not support the use of traditional computer-aided design applications, which are heavily used in the architecture industry. Rosetta is a generative design tool based on the Racket language that attempts to solve this problem. Rosetta provides a novel approach to de- sign creation, offering a set of programming languages that generate de- signs in different computer-aided design applications. However, Rosetta does not support Processing. Therefore, the goal is to add Processing to Rosetta's language set, offering architects that know Processing, an alternative solution that supports computer-aided design applications. In order to achieve this goal, a source-to-source compiler that translates Processing into Racket will be developed. This will also give the Racket community the ability to use Processing in the Racket environment, and, at the same time, will allow the Processing community to take advantage of Racket's libraries and development environment. In this report, an analysis of different language implementation mecha- nisms will be presented, focusing on the different steps of the compilation phase, as well as higher-level solutions, including Language Workbenches. In order to gain insight of source-to-source compiler creation, relevant existing source-to-source compilers are presented.
    [Show full text]
  • Backtrack Parsing Context-Free Grammar Context-Free Grammar
    Context-free Grammar Problems with Regular Context-free Grammar Language and Is English a regular language? Bad question! We do not even know what English is! Two eggs and bacon make(s) a big breakfast Backtrack Parsing Can you slide me the salt? He didn't ought to do that But—No! Martin Kay I put the wine you brought in the fridge I put the wine you brought for Sandy in the fridge Should we bring the wine you put in the fridge out Stanford University now? and University of the Saarland You said you thought nobody had the right to claim that they were above the law Martin Kay Context-free Grammar 1 Martin Kay Context-free Grammar 2 Problems with Regular Problems with Regular Language Language You said you thought nobody had the right to claim [You said you thought [nobody had the right [to claim that they were above the law that [they were above the law]]]] Martin Kay Context-free Grammar 3 Martin Kay Context-free Grammar 4 Problems with Regular Context-free Grammar Language Nonterminal symbols ~ grammatical categories Is English mophology a regular language? Bad question! We do not even know what English Terminal Symbols ~ words morphology is! They sell collectables of all sorts Productions ~ (unordered) (rewriting) rules This concerns unredecontaminatability Distinguished Symbol This really is an untiable knot. But—Probably! (Not sure about Swahili, though) Not all that important • Terminals and nonterminals are disjoint • Distinguished symbol Martin Kay Context-free Grammar 5 Martin Kay Context-free Grammar 6 Context-free Grammar Context-free
    [Show full text]
  • Unravel Data Systems Version 4.5
    UNRAVEL DATA SYSTEMS VERSION 4.5 Component name Component version name License names jQuery 1.8.2 MIT License Apache Tomcat 5.5.23 Apache License 2.0 Tachyon Project POM 0.8.2 Apache License 2.0 Apache Directory LDAP API Model 1.0.0-M20 Apache License 2.0 apache/incubator-heron 0.16.5.1 Apache License 2.0 Maven Plugin API 3.0.4 Apache License 2.0 ApacheDS Authentication Interceptor 2.0.0-M15 Apache License 2.0 Apache Directory LDAP API Extras ACI 1.0.0-M20 Apache License 2.0 Apache HttpComponents Core 4.3.3 Apache License 2.0 Spark Project Tags 2.0.0-preview Apache License 2.0 Curator Testing 3.3.0 Apache License 2.0 Apache HttpComponents Core 4.4.5 Apache License 2.0 Apache Commons Daemon 1.0.15 Apache License 2.0 classworlds 2.4 Apache License 2.0 abego TreeLayout Core 1.0.1 BSD 3-clause "New" or "Revised" License jackson-core 2.8.6 Apache License 2.0 Lucene Join 6.6.1 Apache License 2.0 Apache Commons CLI 1.3-cloudera-pre-r1439998 Apache License 2.0 hive-apache 0.5 Apache License 2.0 scala-parser-combinators 1.0.4 BSD 3-clause "New" or "Revised" License com.springsource.javax.xml.bind 2.1.7 Common Development and Distribution License 1.0 SnakeYAML 1.15 Apache License 2.0 JUnit 4.12 Common Public License 1.0 ApacheDS Protocol Kerberos 2.0.0-M12 Apache License 2.0 Apache Groovy 2.4.6 Apache License 2.0 JGraphT - Core 1.2.0 (GNU Lesser General Public License v2.1 or later AND Eclipse Public License 1.0) chill-java 0.5.0 Apache License 2.0 Apache Commons Logging 1.2 Apache License 2.0 OpenCensus 0.12.3 Apache License 2.0 ApacheDS Protocol
    [Show full text]
  • Disambiguation Filters for Scannerless Generalized LR Parsers
    Disambiguation Filters for Scannerless Generalized LR Parsers Mark G. J. van den Brand1,4, Jeroen Scheerder2, Jurgen J. Vinju1, and Eelco Visser3 1 Centrum voor Wiskunde en Informatica (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands {Mark.van.den.Brand,Jurgen.Vinju}@cwi.nl 2 Department of Philosophy, Utrecht University Heidelberglaan 8, 3584 CS Utrecht, The Netherlands [email protected] 3 Institute of Information and Computing Sciences, Utrecht University P.O. Box 80089, 3508TB Utrecht, The Netherlands [email protected] 4 LORIA-INRIA 615 rue du Jardin Botanique, BP 101, F-54602 Villers-l`es-Nancy Cedex, France Abstract. In this paper we present the fusion of generalized LR pars- ing and scannerless parsing. This combination supports syntax defini- tions in which all aspects (lexical and context-free) of the syntax of a language are defined explicitly in one formalism. Furthermore, there are no restrictions on the class of grammars, thus allowing a natural syntax tree structure. Ambiguities that arise through the use of unrestricted grammars are handled by explicit disambiguation constructs, instead of implicit defaults that are taken by traditional scanner and parser gener- ators. Hence, a syntax definition becomes a full declarative description of a language. Scannerless generalized LR parsing is a viable technique that has been applied in various industrial and academic projects. 1 Introduction Since the introduction of efficient deterministic parsing techniques, parsing is considered a closed topic for research, both by computer scientists and by practi- cioners in compiler construction. Tools based on deterministic parsing algorithms such as LEX & YACC [15,11] (LALR) and JavaCC (recursive descent), are con- sidered adequate for dealing with almost all modern (programming) languages.
    [Show full text]
  • Sequence Alignment/Map Format Specification
    Sequence Alignment/Map Format Specification The SAM/BAM Format Specification Working Group 3 Jun 2021 The master version of this document can be found at https://github.com/samtools/hts-specs. This printing is version 53752fa from that repository, last modified on the date shown above. 1 The SAM Format Specification SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section. If present, the header must be prior to the alignments. Header lines start with `@', while alignment lines do not. Each alignment line has 11 mandatory fields for essential alignment information such as mapping position, and variable number of optional fields for flexible or aligner specific information. This specification is for version 1.6 of the SAM and BAM formats. Each SAM and BAMfilemay optionally specify the version being used via the @HD VN tag. For full version history see Appendix B. Unless explicitly specified elsewhere, all fields are encoded using 7-bit US-ASCII 1 in using the POSIX / C locale. Regular expressions listed use the POSIX / IEEE Std 1003.1 extended syntax. 1.1 An example Suppose we have the following alignment with bases in lowercase clipped from the alignment. Read r001/1 and r001/2 constitute a read pair; r003 is a chimeric read; r004 represents a split alignment. Coor 12345678901234 5678901234567890123456789012345 ref AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCAT +r001/1 TTAGATAAAGGATA*CTG +r002 aaaAGATAA*GGATA +r003 gcctaAGCTAA +r004 ATAGCT..............TCAGC -r003 ttagctTAGGC -r001/2 CAGCGGCAT The corresponding SAM format is:2 1Charset ANSI X3.4-1968 as defined in RFC1345.
    [Show full text]
  • Lexing and Parsing with ANTLR4
    Lab 2 Lexing and Parsing with ANTLR4 Objective • Understand the software architecture of ANTLR4. • Be able to write simple grammars and correct grammar issues in ANTLR4. EXERCISE #1 Lab preparation Ï In the cap-labs directory: git pull will provide you all the necessary files for this lab in TP02. You also have to install ANTLR4. 2.1 User install for ANTLR4 and ANTLR4 Python runtime User installation steps: mkdir ~/lib cd ~/lib wget http://www.antlr.org/download/antlr-4.7-complete.jar pip3 install antlr4-python3-runtime --user Then in your .bashrc: export CLASSPATH=".:$HOME/lib/antlr-4.7-complete.jar:$CLASSPATH" export ANTLR4="java -jar $HOME/lib/antlr-4.7-complete.jar" alias antlr4="java -jar $HOME/lib/antlr-4.7-complete.jar" alias grun='java org.antlr.v4.gui.TestRig' Then source your .bashrc: source ~/.bashrc 2.2 Structure of a .g4 file and compilation Links to a bit of ANTLR4 syntax : • Lexical rules (extended regular expressions): https://github.com/antlr/antlr4/blob/4.7/doc/ lexer-rules.md • Parser rules (grammars) https://github.com/antlr/antlr4/blob/4.7/doc/parser-rules.md The compilation of a given .g4 (for the PYTHON back-end) is done by the following command line: java -jar ~/lib/antlr-4.7-complete.jar -Dlanguage=Python3 filename.g4 or if you modified your .bashrc properly: antlr4 -Dlanguage=Python3 filename.g4 2.3 Simple examples with ANTLR4 EXERCISE #2 Demo files Ï Work your way through the five examples in the directory demo_files: Aurore Alcolei, Laure Gonnord, Valentin Lorentz. 1/4 ENS de Lyon, Département Informatique, M1 CAP Lab #2 – Automne 2017 ex1 with ANTLR4 + Java : A very simple lexical analysis1 for simple arithmetic expressions of the form x+3.
    [Show full text]
  • Efficient Recursive Parsing
    Purdue University Purdue e-Pubs Department of Computer Science Technical Reports Department of Computer Science 1976 Efficient Recursive Parsing Christoph M. Hoffmann Purdue University, [email protected] Report Number: 77-215 Hoffmann, Christoph M., "Efficient Recursive Parsing" (1976). Department of Computer Science Technical Reports. Paper 155. https://docs.lib.purdue.edu/cstech/155 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. EFFICIENT RECURSIVE PARSING Christoph M. Hoffmann Computer Science Department Purdue University West Lafayette, Indiana 47907 CSD-TR 215 December 1976 Efficient Recursive Parsing Christoph M. Hoffmann Computer Science Department Purdue University- Abstract Algorithms are developed which construct from a given LL(l) grammar a recursive descent parser with as much, recursion resolved by iteration as is possible without introducing auxiliary memory. Unlike other proposed methods in the literature designed to arrive at parsers of this kind, the algorithms do not require extensions of the notational formalism nor alter the grammar in any way. The algorithms constructing the parsers operate in 0(k«s) steps, where s is the size of the grammar, i.e. the sum of the lengths of all productions, and k is a grammar - dependent constant. A speedup of the algorithm is possible which improves the bound to 0(s) for all LL(l) grammars, and constructs smaller parsers with some auxiliary memory in form of parameters
    [Show full text]
  • Introduction to Programming Languages and Syntax
    Programming Languages Session 1 – Main Theme Programming Languages Overview & Syntax Dr. Jean-Claude Franchitti New York University Computer Science Department Courant Institute of Mathematical Sciences Adapted from course textbook resources Programming Language Pragmatics (3rd Edition) Michael L. Scott, Copyright © 2009 Elsevier 1 Agenda 11 InstructorInstructor andand CourseCourse IntroductionIntroduction 22 IntroductionIntroduction toto ProgrammingProgramming LanguagesLanguages 33 ProgrammingProgramming LanguageLanguage SyntaxSyntax 44 ConclusionConclusion 2 Who am I? - Profile - ¾ 27 years of experience in the Information Technology Industry, including thirteen years of experience working for leading IT consulting firms such as Computer Sciences Corporation ¾ PhD in Computer Science from University of Colorado at Boulder ¾ Past CEO and CTO ¾ Held senior management and technical leadership roles in many large IT Strategy and Modernization projects for fortune 500 corporations in the insurance, banking, investment banking, pharmaceutical, retail, and information management industries ¾ Contributed to several high-profile ARPA and NSF research projects ¾ Played an active role as a member of the OMG, ODMG, and X3H2 standards committees and as a Professor of Computer Science at Columbia initially and New York University since 1997 ¾ Proven record of delivering business solutions on time and on budget ¾ Original designer and developer of jcrew.com and the suite of products now known as IBM InfoSphere DataStage ¾ Creator of the Enterprise
    [Show full text]
  • Creating Custom DSL Editor for Eclipse Ian Graham Markit Risk Analytics
    Speaking Your Language Creating a custom DSL editor for Eclipse Ian Graham 1 Speaking Your Language | © 2012 Ian Graham Background highlights 2D graphics, Fortran, Pascal, C Unix device drivers/handlers – trackball, display devices Smalltalk Simulation Sensor “data fusion” electronic warfare test-bed for army CASE tools for telecommunications Java Interactive visual tools for QC of seismic processing Seismic processing DSL development tools Defining parameter constraints and doc of each seismic process in XML Automating help generation (plain text and HTML) Smart editor that leverages DSL definition Now at Markit using custom language for financial risk analysis Speaking Your Language | © 2012 Ian Graham Overview Introduction: Speaking Your Language What is a DSL? What is Eclipse? Demo: Java Editor Eclipse architecture Implementing an editor Implementing a simple editor Demo Adding language awareness to your editor Demo Learning from Eclipse examples Resources Speaking Your Language | © 2012 Ian Graham Introduction: Speaking Your Language Goal illustrate power of Eclipse platform Context widening use of Domain Specific Languages Focus editing support for custom languages in Eclipse Speaking Your Language | © 2012 Ian Graham Recipe Editor 5 Text Editor Recipes | © 2006 IBM | Made available under the EPL v1.0 What is a DSL? Domain Specific Language - a simplified language for specific problem domain Specialized programming language E.g. Excel formula, SQL, seismic processing script Specification language
    [Show full text]