Towards a Universal Code Formatter Through Machine Learning

Total Page:16

File Type:pdf, Size:1020Kb

Towards a Universal Code Formatter Through Machine Learning Towards a Universal Code Formatter through Machine Learning Terence Parr Jurgen Vinju University of San Francisco, USA Centrum Wiskunde & Informatica, Netherlands [email protected] [email protected] Abstract Because the value of a particular code formatting style is There are many declarative frameworks that allow us to im- a subjective notion, often leading to heated discussions, for- plement code formatters relatively easily for any specific lan- matters must be highly configurable. This allows, for exam- guage, but constructing them is cumbersome. The first prob- ple, current maintainers of existing code to improve their ef- lem is that “everybody” wants to format their code differ- fectiveness by reformatting the code per their preferred style. ently, leading to either many formatter variants or a ridicu- There are plenty of configurable formatters for existing lan- lous number of configuration options. Second, the size of guages, whether in IDEs like Eclipse or standalone tools like each implementation scales with a language’s grammar size, Gnu indent, but specifying style is not easy. The emergent leading to hundreds of rules. behavior is not always obvious, there exists interdependency In this paper, we solve the formatter construction problem between options, and the tools cannot take context informa- using a novel approach, one that automatically derives for- tion into account [13]. For example, here are the options matters for any given language without intervention from a needed to obtain K&R C style with indent: language expert. We introduce a code formatter called CODE- -nbad -bap -bbo -nbc -br -brs -c33 -cd33 -ncdb -ce BUFF that uses machine learning to abstract formatting rules -ci4 -cli0 -cp33 -cs -d0 -di1 -nfc1 -nfca -hnl -i4 from a representative corpus, using a carefully designed fea- -ip0 -l75 -lp -npcs -nprs -npsl -saf -sai -saw -nsc ture set. Our experiments on Java, SQL, and ANTLR gram- -nsob -nss mars show that CODEBUFF is efficient, has excellent accu- New languages pop into existence all the time and each racy, and is grammar invariant for a given language. It also one could use a formatter. Unfortunately, building a for- generalizes to a 4th language tested during manuscript prepa- matter is difficult and tedious. Most formatters used in ration. practice are ad hoc, language-specific programs but there Categories and Subject Descriptors D.2.3 [Software Engi- are formal approaches that yield good results with less ef- neering]: Coding - Pretty printers fort. Rule-based formatting systems let programmers specify phrase-formatting pairs, such as the following sample spec- Keywords Formatting algorithms, pretty-printer ification for formatting the COBOL MOVE statement using 1. Introduction ASF+SDF [3, 12, 13, 15]. The way source code is formatted has a significant impact MOVE IdOrLit TO Id-list = on its comprehensibility [9], and manually reformatting code from-box( H [ "MOVE" is just not an option [8, p.399]. Therefore, programmers need Hts=25[to-box(IdOrLit)] Hts=49["TO"] ready access to automatic code formatters or “pretty printers” Hts=53[to-box(Id-list)]]) in situations where formatting is messy or inconsistent. Many program generators also take advantage of code formatters to This rule maps a parse tree pattern to a box expression. A set improve the quality of their output. of such rules, complemented with default behavior for the unspecified parts, generates a single formatter with a specific style for the given language. Section 6 has other related work. PermissionPermission to to make make digital digital or or hard hard copies copies of of all all or part of this work for personal or There are a number of problems with rule-based format- classroomclassroom use use is is granted granted without without fee fee provided provided that that copies copies are not made or distributed forfor profit profit or or commercial commercial advantage advantage and and that that copies copies bear bear this this notice and the full citation ters. First, each specification yields a formatter for one spe- onon the the first first page. page. Copyrights Copyrights for components of this work owned by others than ACM cific style. Each new style requires a change to those rules mustmust be be honored. honored. Abstracting with credit is permitted. To copy otherwise, or republish, toto post post on on servers servers or or to to redistribute redistribute to to lists, lists, requires requires prior prior specific permission and/or a or the creation of a new set. Some systems allow the rules to fee. Request permissions from [email protected]. fee. Request permissions from [email protected]. be parametrized, and configured accordingly, but that leads SLE’16SLE ’16,,, October October 31 31-November – November 01, 1, 2016, 2016, Amsterdam, Amsterdam, Netherlands. Netherlands c 2016 ACM. 978-1-4503-4447-0/16/10...$15.00 to higher rule complexity. Second, minimal changes to the Copyright c 2016 ACM [to be supplied]. $15.00. http://dx.doi.org/10.1145/2997364.2997383http://dx.doi.org/10.1145/http://dx.doi.org/10.1145/2997364.2997383 associated grammar usually require changes to the format- 137 ting rules, even if the grammar changes do not affect the lan- MonVerificationMaster.sql (trained with sqlite grammar on guage recognized. Finally, formatter specifications are big. sqlclean corpus): Although most specification systems have builtin heuristics SELECT DISTINCT t.server_name for default behavior in the absence of a specification for a ,t.server_id ,’MessageQueuingService’ASmissingmonitors given language phrase, specification size tends to grow with FROM t_server t INNER JOIN t_server_type_assoc tsta ON t.server_id = tsta.server_id WHERE t.active = 1 AND tsta.type_id IN (’8’) the grammar size. A few hundred rules are no exception. AND t.environment_id = 0 AND t.server_name NOT IN Formatting is a problem solved in theory, but not yet in ( SELECT DISTINCT l.address practice. Building a good code formatter is still too difficult FROM ipmongroups g INNER JOIN ipmongroupmembers m ON g.groupid = m.groupid INNER JOIN ipmonmonitors l ON m.monitorid = l.monitorid and requires way too much work; we need a fresh approach. INNER JOIN t_server t ON l.address = t.server_name INNER JOIN t_server_type_assoc tsta ON t.server_id = tsta.server_id In this paper, we introduce a tool called CODEBUFF [11] that WHERE l.name LIKE ’%Message Queuing Service%’ AND t.environment_id = 0 uses machine learning to produce a formatter entirely from AND tsta.type_id IN (’8’) AND g.groupname IN (’Prod O/S Services’) a grammar for language L and a representative corpus writ- AND t.active = 1 ) ten in L. There is no specification work needed from the user UNION other than to ensure reasonable formatting consistency within ALL the corpus. The statistical model used by CODEBUFF first And here is a complicated query from dmart_bits_IAPPBO510.sql learns the formatting rules from the corpus, which are then with case statements: applied to format other documents in the same style. Differ- SELECT ent corpora effectively result in different formatters. From a CASE WHEN SSISInstanceID IS NULL THEN ’Total’ user perspective the formatter is “configured by example.” ELSE SSISInstanceID END SSISInstanceID ,SUM(OldStatus4)ASOldStatus4 ... Contributions and roadmap. We begin by showing sample ,SUM(OldStatus4+Status0+Status1+Status2+Status3+Status4)ASInstanceTotal FROM CODEBUFF output in Section 2 and then explain how and why ( SELECT CODEBUFF works in Section 3. Section 4 provides empirical CONVERT(VARCHAR, SSISInstanceID) AS SSISInstanceID ,COUNT(CASEWHENStatus=4AND evidence that CODEBUFF learns a formatting style quickly CONVERT(DATE, LoadReportDBEndDate) < CONVERT(DATE, GETDATE()) and using very few files. CODEBUFF approximates the cor- THEN Status ELSE NULL END) AS OldStatus4 pus style with high accuracy for the languages ANTLR, Java ... ,COUNT(CASEWHENStatus=4AND and SQL, and it is largely insensitive to language-preserving DATEPART(DAY, LoadReportDBEndDate) = DATEPART(DAY, GETDATE()) THEN Status grammar changes. To adjust for possible selection bias and ELSE NULL END) AS Status4 FROM dbo.ClientConnection model overfitting to these three well-known languages, we GROUP BY SSISInstanceID )ASStatusMatrix tested CODEBUFF on an unfamiliar language (Quorum) in GROUP BY SSISInstanceID Section 5, from which we learned that CODEBUFF works sim- Here is a snippet from Java, our 2nd test language, taken from ilarly well, yet improvements are still possible. We position STLexer.java (trained with java grammar on st corpus): CODEBUFF with respect to the literature in Section 6. switch ( c ) { ... default: 2. Sample Formatting if ( c==delimiterStopChar ) { consume(); scanningInsideExpr = false; This section contains sample SQL, Java, and ANTLR code return newToken(RDELIM); ODE UFF } formatted by C B , including some that are poorly for- if ( isIDStartLetter(c) ) { ... matted to give a balanced presentation. Only the formatting if ( name.equals("if") ) return newToken(IF); else if ( name.equals("endif") ) return newToken(ENDIF); style matters here so we use a small font for space reasons. ... return id; Github [11] has a snapshot of all input corpora and format- } RecognitionException re = new NoViableAltException("", 0, 0, input); ted versions (corpora, testing details in Section 4). To arrive ... errMgr.lexerError(input.getSourceName(), at the formatted output for document d in corpus D, our test "invalid character ’"+str(c)+"’", templateToken, rig removes all whitespace tokens from d and then applies re); an instance of CODEBUFF trained on the corpus without d, ... D d . \{ } Here is an example from STViz.java that indents a The examples are not meant to illustrate “good style.” method declaration relative to the start of an expression They are simply consistent with the style of a specific corpus. rather than the first token on the previous line: In Section 4 we define a metric to measure the success of the Thread t = new Thread() { automated formatter in an objective and reproducible man- @Override public void run() { ner. No quantitative research method can capture the qualita- synchronized ( lock ) { while ( viewFrame.isVisible() ) { tive notion of style, so we start with these examples.
Recommended publications
  • Unravel Data Systems Version 4.5
    UNRAVEL DATA SYSTEMS VERSION 4.5 Component name Component version name License names jQuery 1.8.2 MIT License Apache Tomcat 5.5.23 Apache License 2.0 Tachyon Project POM 0.8.2 Apache License 2.0 Apache Directory LDAP API Model 1.0.0-M20 Apache License 2.0 apache/incubator-heron 0.16.5.1 Apache License 2.0 Maven Plugin API 3.0.4 Apache License 2.0 ApacheDS Authentication Interceptor 2.0.0-M15 Apache License 2.0 Apache Directory LDAP API Extras ACI 1.0.0-M20 Apache License 2.0 Apache HttpComponents Core 4.3.3 Apache License 2.0 Spark Project Tags 2.0.0-preview Apache License 2.0 Curator Testing 3.3.0 Apache License 2.0 Apache HttpComponents Core 4.4.5 Apache License 2.0 Apache Commons Daemon 1.0.15 Apache License 2.0 classworlds 2.4 Apache License 2.0 abego TreeLayout Core 1.0.1 BSD 3-clause "New" or "Revised" License jackson-core 2.8.6 Apache License 2.0 Lucene Join 6.6.1 Apache License 2.0 Apache Commons CLI 1.3-cloudera-pre-r1439998 Apache License 2.0 hive-apache 0.5 Apache License 2.0 scala-parser-combinators 1.0.4 BSD 3-clause "New" or "Revised" License com.springsource.javax.xml.bind 2.1.7 Common Development and Distribution License 1.0 SnakeYAML 1.15 Apache License 2.0 JUnit 4.12 Common Public License 1.0 ApacheDS Protocol Kerberos 2.0.0-M12 Apache License 2.0 Apache Groovy 2.4.6 Apache License 2.0 JGraphT - Core 1.2.0 (GNU Lesser General Public License v2.1 or later AND Eclipse Public License 1.0) chill-java 0.5.0 Apache License 2.0 Apache Commons Logging 1.2 Apache License 2.0 OpenCensus 0.12.3 Apache License 2.0 ApacheDS Protocol
    [Show full text]
  • Adaptive LL(*) Parsing: the Power of Dynamic Analysis
    Adaptive LL(*) Parsing: The Power of Dynamic Analysis Terence Parr Sam Harwell Kathleen Fisher University of San Francisco University of Texas at Austin Tufts University [email protected] [email protected] kfi[email protected] Abstract PEGs are unambiguous by definition but have a quirk where Despite the advances made by modern parsing strategies such rule A ! a j ab (meaning “A matches either a or ab”) can never as PEG, LL(*), GLR, and GLL, parsing is not a solved prob- match ab since PEGs choose the first alternative that matches lem. Existing approaches suffer from a number of weaknesses, a prefix of the remaining input. Nested backtracking makes de- including difficulties supporting side-effecting embedded ac- bugging PEGs difficult. tions, slow and/or unpredictable performance, and counter- Second, side-effecting programmer-supplied actions (muta- intuitive matching strategies. This paper introduces the ALL(*) tors) like print statements should be avoided in any strategy that parsing strategy that combines the simplicity, efficiency, and continuously speculates (PEG) or supports multiple interpreta- predictability of conventional top-down LL(k) parsers with the tions of the input (GLL and GLR) because such actions may power of a GLR-like mechanism to make parsing decisions. never really take place [17]. (Though DParser [24] supports The critical innovation is to move grammar analysis to parse- “final” actions when the programmer is certain a reduction is time, which lets ALL(*) handle any non-left-recursive context- part of an unambiguous final parse.) Without side effects, ac- free grammar. ALL(*) is O(n4) in theory but consistently per- tions must buffer data for all interpretations in immutable data forms linearly on grammars used in practice, outperforming structures or provide undo actions.
    [Show full text]
  • Lexing and Parsing with ANTLR4
    Lab 2 Lexing and Parsing with ANTLR4 Objective • Understand the software architecture of ANTLR4. • Be able to write simple grammars and correct grammar issues in ANTLR4. EXERCISE #1 Lab preparation Ï In the cap-labs directory: git pull will provide you all the necessary files for this lab in TP02. You also have to install ANTLR4. 2.1 User install for ANTLR4 and ANTLR4 Python runtime User installation steps: mkdir ~/lib cd ~/lib wget http://www.antlr.org/download/antlr-4.7-complete.jar pip3 install antlr4-python3-runtime --user Then in your .bashrc: export CLASSPATH=".:$HOME/lib/antlr-4.7-complete.jar:$CLASSPATH" export ANTLR4="java -jar $HOME/lib/antlr-4.7-complete.jar" alias antlr4="java -jar $HOME/lib/antlr-4.7-complete.jar" alias grun='java org.antlr.v4.gui.TestRig' Then source your .bashrc: source ~/.bashrc 2.2 Structure of a .g4 file and compilation Links to a bit of ANTLR4 syntax : • Lexical rules (extended regular expressions): https://github.com/antlr/antlr4/blob/4.7/doc/ lexer-rules.md • Parser rules (grammars) https://github.com/antlr/antlr4/blob/4.7/doc/parser-rules.md The compilation of a given .g4 (for the PYTHON back-end) is done by the following command line: java -jar ~/lib/antlr-4.7-complete.jar -Dlanguage=Python3 filename.g4 or if you modified your .bashrc properly: antlr4 -Dlanguage=Python3 filename.g4 2.3 Simple examples with ANTLR4 EXERCISE #2 Demo files Ï Work your way through the five examples in the directory demo_files: Aurore Alcolei, Laure Gonnord, Valentin Lorentz. 1/4 ENS de Lyon, Département Informatique, M1 CAP Lab #2 – Automne 2017 ex1 with ANTLR4 + Java : A very simple lexical analysis1 for simple arithmetic expressions of the form x+3.
    [Show full text]
  • Creating Custom DSL Editor for Eclipse Ian Graham Markit Risk Analytics
    Speaking Your Language Creating a custom DSL editor for Eclipse Ian Graham 1 Speaking Your Language | © 2012 Ian Graham Background highlights 2D graphics, Fortran, Pascal, C Unix device drivers/handlers – trackball, display devices Smalltalk Simulation Sensor “data fusion” electronic warfare test-bed for army CASE tools for telecommunications Java Interactive visual tools for QC of seismic processing Seismic processing DSL development tools Defining parameter constraints and doc of each seismic process in XML Automating help generation (plain text and HTML) Smart editor that leverages DSL definition Now at Markit using custom language for financial risk analysis Speaking Your Language | © 2012 Ian Graham Overview Introduction: Speaking Your Language What is a DSL? What is Eclipse? Demo: Java Editor Eclipse architecture Implementing an editor Implementing a simple editor Demo Adding language awareness to your editor Demo Learning from Eclipse examples Resources Speaking Your Language | © 2012 Ian Graham Introduction: Speaking Your Language Goal illustrate power of Eclipse platform Context widening use of Domain Specific Languages Focus editing support for custom languages in Eclipse Speaking Your Language | © 2012 Ian Graham Recipe Editor 5 Text Editor Recipes | © 2006 IBM | Made available under the EPL v1.0 What is a DSL? Domain Specific Language - a simplified language for specific problem domain Specialized programming language E.g. Excel formula, SQL, seismic processing script Specification language
    [Show full text]
  • Conflict Resolution in a Recursive Descent Compiler Generator
    LL(1) Conflict Resolution in a Recursive Descent Compiler Generator Albrecht Wöß, Markus Löberbauer, Hanspeter Mössenböck Johannes Kepler University Linz, Institute of Practical Computer Science, Altenbergerstr. 69, 4040 Linz, Austria {woess,loeberbauer,moessenboeck}@ssw.uni-linz.ac.at Abstract. Recursive descent parsing is restricted to languages whose grammars are LL(1), i.e., which can be parsed top-down with a single lookahead symbol. Unfortunately, many languages such as Java, C++, or C# are not LL(1). There- fore recursive descent parsing cannot be used or the parser has to make its deci- sions based on semantic information or a multi-symbol lookahead. In this paper we suggest a systematic technique for resolving LL(1) conflicts in recursive descent parsing and show how to integrate it into a compiler gen- erator (Coco/R). The idea is to evaluate user-defined boolean expressions, in order to allow the parser to make its parsing decisions where a one symbol loo- kahead does not suffice. Using our extended compiler generator we implemented a compiler front end for C# that can be used as a framework for implementing a variety of tools. 1 Introduction Recursive descent parsing [16] is a popular top-down parsing technique that is sim- ple, efficient, and convenient for integrating semantic processing. However, it re- quires the grammar of the parsed language to be LL(1), which means that the parser must always be able to select between alternatives with a single symbol lookahead. Unfortunately, many languages such as Java, C++ or C# are not LL(1) so that one either has to resort to bottom-up LALR(1) parsing [5, 1], which is more powerful but less convenient for semantic processing, or the parser has to resolve the LL(1) con- flicts using semantic information or a multi-symbol lookahead.
    [Show full text]
  • A Simple, Possibly Correct LR Parser for C11 Jacques-Henri Jourdan, François Pottier
    A Simple, Possibly Correct LR Parser for C11 Jacques-Henri Jourdan, François Pottier To cite this version: Jacques-Henri Jourdan, François Pottier. A Simple, Possibly Correct LR Parser for C11. ACM Transactions on Programming Languages and Systems (TOPLAS), ACM, 2017, 39 (4), pp.1 - 36. 10.1145/3064848. hal-01633123 HAL Id: hal-01633123 https://hal.archives-ouvertes.fr/hal-01633123 Submitted on 11 Nov 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 14 A simple, possibly correct LR parser for C11 Jacques-Henri Jourdan, Inria Paris, MPI-SWS François Pottier, Inria Paris The syntax of the C programming language is described in the C11 standard by an ambiguous context-free grammar, accompanied with English prose that describes the concept of “scope” and indicates how certain ambiguous code fragments should be interpreted. Based on these elements, the problem of implementing a compliant C11 parser is not entirely trivial. We review the main sources of difficulty and describe a relatively simple solution to the problem. Our solution employs the well-known technique of combining an LALR(1) parser with a “lexical feedback” mechanism. It draws on folklore knowledge and adds several original as- pects, including: a twist on lexical feedback that allows a smooth interaction with lookahead; a simplified and powerful treatment of scopes; and a few amendments in the grammar.
    [Show full text]
  • An Insight Into ANTLR(Antlr)/Jedit/Netbeans/Eclipse/Java/JVM/Jikesrvm/Osgi As Bio- Informatics Platform in the Context of DNA/RN
    An Insight into ANTLR(antlr)/jEdit/NetBeans/Eclipse/Java/JVM/JikesRVM/OSGi as Bio- informatics Platform in the Context of DNA/RNA Sequencing & Gene Chip Design – A Simple & Interesting Suggestion towards Interfacing Nano-Bio Systems /IoT Hardware & Software/ HPC Environments. Nirmal/ [email protected] [I] Introduction & Inspiration : https://www.osapublishing.org/boe/viewmedia.cfm?uri=boe-1-5-1302&seq=0 http://www.scienpress.com/Upload/JAMB/Vol%202_2_6.pdf https://pubs.rsc.org/en/content/articlelanding/2015/cc/c4cc10047f#!divAbstract https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5522603/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6163985/ https://www.mdpi.com/2077-0383/8/1/6/htm https://www.sciencedirect.com/science/article/pii/S030100821530099X https://pubs.acs.org/doi/abs/10.1021/ac0487449 http://www.globalhealthprimer.emory.edu/targets-technologies/nucleic-acid-based-diagnostics.html https://en.wikipedia.org/wiki/DNA_nanotechnology https://explained.ai/ - Good information. https://arxiv.org/pdf/1802.01528.pdf - The Matrix Calculus You Need For Deep Learning . https://en.wikipedia.org/wiki/DNA_microarray https://learn.genetics.utah.edu/content/labs/microarray/ https://en.wikipedia.org/wiki/Patrick_O._Brown https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4479227/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4011503/ http://www.cs.mun.ca/~harold/Courses/Old/CS6754.W06/Diary/ng1030.pdf http://textofvideo.nptel.ac.in/102101054/lec33.pdf https://www.m2-automation.com/ https://genome.cshlp.org/content/13/3/467.full https://www.comsol.co.in/paper/download/364251/jindal_presentation.pdf http://www.ipcsit.com/vol2/38-A313.pdf [II] Informatics Framework & Implementation : Figure I – Our Total Idea as mentioned in our TITLE.
    [Show full text]
  • Third Party Licenses Visual Investigator
    Third Party Licenses and Information This page contains information regarding any third party code included with your SAS software, including applicable third party software notices and/or additional terms and conditions. SAS Visual Investigator 10.7 Component Applicable License(s) @uirouter/angularjs 1.0.20 MIT License @uirouter/core 5.0.21 MIT License @uirouter/dsr 1.0.3 MIT License @uirouter/sticky-states 1.5.0 MIT License ACE 1.2.3 BSD 3-clause "New" or "Revised" License akka-actor 2.4.17 Apache License 2.0 Angular 1.7.9 MIT License Animal Sniffer Annotations 1.18 MIT License ANTLR 2.7.7 ANTLR Software Rights Notice ANTLR 3.5.2 BSD 3-clause "New" or "Revised" License ANTLR 4.5.3 BSD 3-clause "New" or "Revised" License ANTLR 4 Tool 4.5 BSD 3-clause "New" or "Revised" License ANTLR 4 Tool 4.5.3 BSD 3-clause "New" or "Revised" License Apache Commons BeanUtils 1.9.4 Apache License 2.0 Apache Commons Codec 1.11 Apache License 2.0 Apache Commons Codec 1.9 Apache License 2.0 Apache Commons Collections 3.2.2 Apache License 2.0 Apache Commons Collections 4.1 Apache License 2.0 Apache Commons Collections 4.3 Apache License 2.0 Apache Commons Collections commons-commons-collections-4.4 Apache License 2.0 Apache Commons Compress 1.19 Apache License 2.0 Apache Commons Configuration 1.8 Apache License 2.0 Apache Commons CSV 1.4 Apache License 2.0 Apache Commons DBCP 1.4 Apache License 2.0 Apache Commons Digester 2.1 Apache License 1.1 Apache Commons Email 1.5 Apache License 2.0 Apache Commons FileUpload 1.3.3 Apache License 2.0 Apache Commons Lang
    [Show full text]
  • ANTLR Reference Manual ANTLR Reference Manual ANTLR Jguru Credits Project Lead Terence Parr
    ANTLR Reference Manual ANTLR Reference Manual ANTLR jGuru Credits Project Lead Terence Parr Support from jGuru.com Your View of the Java Universe Help with initial coding John Lilly, Empathy Software C++ code generator by Peter Wells and Ric Klaren Sather code generator by Mika Illouz Infrastructure support from Perforce: The world's best source code control system Substantial intellectual effort donated by John Mitchell Scott Stanchfield Jim Coker Monty Zukowski (Streams, translation) Chapman Flack (UNICODE, streams) ParseView parser debugger by Scott Stanchfield ANTLR Version 2.7.1 Version: $Id: //depot/code/org.antlr/main/main/doc/index.html#8 $ October 1, 2000 Check out the ANTLR FAQ at jguru.com! Check out the Bug Database (ok, so it's out of date) Check out http://www.ANTLR.org ParseView debugger is available. (for prior versions) http://www.antlr.org/doc/ (1 of 7) [8/10/2001 10:45:13 AM] ANTLR Reference Manual If you are looking for the previous main version (PCCTS 1.33) of ANTLR rather than the Java/C++/Sather version (2.x), see Getting started with PCCTS. Download ANTLR 2.7.1. ANTLR 2.7.1 release notes ANTLR Meta-Language ● Meta-Language Vocabulary ● Header Section ● Parser Class Definitions ● Lexical Analyzer Class Definitions ● Tree-parser Class Definitions ● Options Section ● Tokens Section ● Grammar Inheritance ● Rule Definitions ● Atomic Production elements ● Simple Production elements ● Production Element Operators ● Token Classes ● Predicates ● Element Labels ● EBNF Rule Elements ● Interpretation Of Semantic Actions ● Semantic
    [Show full text]
  • Part 1: Introduction
    Part 1: Introduction By: Morteza Zakeri PhD Student Iran University of Science and Technology Winter 2020 Agenda • What is ANTLR? • History • Motivation • What is New in ANTLR v4? • ANTLR Components: How it Works? • Getting Started with ANTLR v4 February 2020 Introduction to ANTLR – Morteza Zakeri 2 What is ANTLR? • ANTLR (pronounced Antler), or Another Tool For Language Recognition, is a parser generator that uses LL(*) for parsing. • ANTLR takes as input a grammar that specifies a language and generates as output source code for a recognizer for that language. • Supported generating code in Java, C#, JavaScript, Python2 and Python3. • ANTLR is recursive descent parser Generator! (See Appendix) February 2020 Introduction to ANTLR – Morteza Zakeri 3 Runtime Libraries and Code Generation Targets • There is no language specific code generators • There is only one tool, written in Java, which is able to generate Lexer and Parser code for all targets, through command line options. • The available targets are the following (2020): • Java, C#, C++, Swift, Python (2 and 3), Go, PHP, and JavaScript. • Read more: • https://github.com/antlr/antlr4/blob/master/doc/targets.md 11 February 2020 Introduction to ANTLR – Morteza Zakeri Runtime Libraries and Code Generation Targets • $ java -jar antlr4-4.8.jar -Dlanguage=CSharp MyGrammar.g4 • https://github.com/antlr/antlr4/tree/master/runtime/CSharp • https://github.com/tunnelvisionlabs/antlr4cs • $ java -jar antlr4-4.8.jar -Dlanguage=Cpp MyGrammar.g4 • https://github.com/antlr/antlr4/blob/master/doc/cpp-target.md • $ java -jar antlr4-4.8.jar -Dlanguage=Python3 MyGrammar.g4 • https://github.com/antlr/antlr4/blob/master/doc/python- target.md 11 February 2020 Introduction to ANTLR – Morteza Zakeri History • Initial release: • February 1992; 24 years ago.
    [Show full text]
  • Xtext User Guide
    Xtext User Guide Xtext User Guide Heiko Behrens, Michael Clay, Sven Efftinge, Moritz Eysholdt, Peter Friese, Jan Köhnlein, Knut Wannheden, Sebastian Zarnekow and contributors Copyright 2008 - 2010 Xtext 1.0 1 1. Overview .................................................................................................................... 1 1.1. What is Xtext? ................................................................................................... 1 1.2. How Does It Work? ............................................................................................ 1 1.3. Xtext is Highly Configurable ................................................................................ 1 1.4. Who Uses Xtext? ................................................................................................ 1 1.5. Who is Behind Xtext? ......................................................................................... 1 1.6. What is a Domain-Specific Language ..................................................................... 2 2. Getting Started ............................................................................................................ 3 2.1. Creating a DSL .................................................................................................. 3 2.1.1. Create an Xtext Project ............................................................................. 3 2.1.2. Project Layout ......................................................................................... 4 2.1.3. Build Your Own Grammar ........................................................................
    [Show full text]
  • ANTLR Reference PDF Manual
    ANTLR Reference Manual Home | Download | News | About ANTLR | Support Latest version is 2.7.3. Download now! » » Home » Download ANTLR Reference Manual » News »Using ANTLR » Documentation ANTLR » FAQ » Articles Reference Manual » Grammars Credits » File Sharing » Code API Project Lead and Supreme Dictator Terence Parr » Tech Support University of San Franciso »About ANTLR Support from » What is ANTLR jGuru.com » Why use ANTLR Your View of the Java Universe » Showcase Help with initial coding » Testimonials John Lilly, Empathy Software » Getting Started C++ code generator by » Software License Peter Wells and Ric Klaren » ANTLR WebLogs C# code generation by »StringTemplate Micheal Jordan, Kunle Odutola and Anthony Oguntimehin. »TML Infrastructure support from Perforce: »PCCTS The world's best source code control system Substantial intellectual effort donated by Loring Craymer Monty Zukowski Jim Coker Scott Stanchfield John Mitchell Chapman Flack (UNICODE, streams) Source changes for Eclipse and NetBeans by Marco van Meegen and Brian Smith ANTLR Version 2.7.3 March 22, 2004 What's ANTLR http://www.antlr.org/doc/index.html (1 of 6)31.03.2004 17:11:46 ANTLR Reference Manual ANTLR, ANother Tool for Language Recognition, (formerly PCCTS) is a language tool that provides a framework for constructing recognizers, compilers, and translators from grammatical descriptions containing Java, C++, or C# actions [You can use PCCTS 1.xx to generate C-based parsers]. Computer language translation has become a common task. While compilers and tools for traditional computer languages (such as C or Java) are still being built, their number is dwarfed by the thousands of mini-languages for which recognizers and translators are being developed.
    [Show full text]