Demodularization of a SGLR Parser for Performance Gains

Total Page:16

File Type:pdf, Size:1020Kb

Demodularization of a SGLR Parser for Performance Gains Undoing Software Engineering: Demodularization of a SGLR Parser for Performance Gains Nik Kapitonenko TU Delft Abstract avoided by removing unnecessary architecture supporting the modularization. This ‘inlining’ may simplify the way com- JSGLR2 is a java implementation of the Scanner- ponents interact on a lower abstraction level, which may im- less Generalized LR-parsing (SGLR) algorithm. It prove parsing throughput. employs a modular architecture. This architec- This paper presents four inlined versions of one the JS- ture comes with a performance overhead for letting GLR2 variants, called “recovery”. A copy of the “recovery multiple components interact with each other. This variant” is created, and refactors it be be less modular. The paper looks into the size of the performance over- four inlined versions share a common base, but have slight head penalty for the recovery parser variant. It does differences in how they inline certain components. Next, us- so by creating an ‘inlined’ version of the recovery ing a dedicated evaluation suite, this paper presents the per- parser variant. The inlined recovery variant is a formance results of the original and the inlined variants. The JSGLR2 implementation that ignores the modular inlined variants are compared to the base case and each other. architecture, and hard-codes the components. The From the results, conclusions are drawn on which inlined performance of the inlined variant is measured with variants are the most successful. a pre-existing evaluation suite. The results show that there is a performance increase between the 1.3 Organization original, and the inlined variant. The paper is organized as follows into several sections. Sec- tion 2 provides a more in-depth background on the JSGLR2 1 Introduction recovery parser. It gives an overview of SLGR parsing. Sec- tion 3 describes the methodology, and what the inlining pro- 1.1 Context cess consisted of. It also discusses the differences between Parsing is a fundamental component of processing most pro- the four proposed variants, and why four different variants gramming and markup languages. As various projects can get were created in the first place. Section 4 describes the eval- a large amount of source code, it is important that parsing is uation suite and discusses the results. Section 5 touches on as fast as possible to reduce build times. At the same time, reproducibility of the results. Section 6 summarizes the pa- it is convenient if a single parser implementation can han- per, and suggests future work. Finally, the appendix contains dle a large variety of languages. JSGLR2 [Denkers, 2018] most figures, including the results. is a Scannerless Generalized LR-parser implementation that can parse any context free language at practical speeds. It 2 Background manages the algorithm’s complexity by splitting the algo- rithm into multiple components of a manageable size, each 2.1 LR Parsing of which receive optimizations sourced from various scien- JSGLR2 is a Scannerless Generalized LR-parser. Visser tific papers and other sources. The components are then com- [Visser, 1997] describes it as “the integration and improve- posed together using dependency injection. A factory design ment of scannerless parsing, generalized-LR parsing and pattern provides several different compositions, each provid- grammar normalization”. LR-parsing forms the base of the ing a different feature set and optimizations. JSGLR2 shows SLGR algorithm. LR-parsing is based on Knuth’s LR pars- a 3x speedup for parsing Java source code, compared to JS- ing algorithm [Knuth, 1965]. It parses a subset of context free GLR1 [Denkers, 2018]. languages, noted as “LR(k)” grammars. It is based on con- suming input tokens one by one, and updating a state machine 1.2 The work with a stack. The state then tells which of the three functions However, to achieve the modularity, the JSGLR2 implemen- to perform: shifting, reducing, or accepting [Denkers, 2018]. tation makes use of bridging architecture unrelated to the ac- Before the parsing can begin, the state machine needs to tual paring algorithm itself. It is possible that the modular be created. In JSGLR2, it is called the parse table [Denkers, architecture provides an performance overhead, that can be 2018]. It represents the grammar of the language to be parsed. As the parse table is not part of parsing itself, further discus- know whether to reduce the "x + x" into "(x + x)", or to sion on its creation is outside the scope of this work. What shift until the last x, and then reduce the multiplication first. is important to know, is that it is a map between a tuple of a The solution for this would be to designate some kind of oper- state with a token, and an action. ‘Shift’ and ‘Reduce’ action ator precedence, where one action is preferred over the other. state to which state to transition to next, while the ‘Accept’ However, in this example, if "x + (x * x)" is the action successfully ends parsing. proffered outcome, another issue appears. When deciding Parsing itself starts with a pre-processing step called scan- whether to shift or reduce the middle x, the parser so far has ning. It reads the input text, and turns it into a stream of only consumed "x + x". It does not know what the next tokens. A token represents a language primitive, such as an character is. It might be *, in which case the parser should number, an operator, or a keyword. Scanning strips out white- shift, or it might be actually End of Input, in which case the space, layout, comments, and other features not needed for parser should reduce. Knuth solved this with lookahead: in- the parser. The tokens are then fed into the parser. stead of using the currently parsed token to look up the next Parsing starts with the freshly generated stream of tokens, state in the parse table, The parser reads, but not consumes, and a stack containing one element: the start state of the parse the next k tokens and uses them all together to look the next table. The parser then iteratively reads the next token, and state [Knuth, 1965]. uses it together with the topmost state in the stack to retrieve While this works for most practical languages, in theory, the next action. it is possible to devise a grammar that requires more than k The shift action pushes a tuple on top of the stack. The tokens to resolve an ambiguity [Denkers, 2018]. Generalized tuple consists of a new state, and the token that triggered this LR-Parsing is an alternative solution to the ambiguity prob- action. The shift action lets the parser consume the input until lem [Visser, 1997]. It allows a parse table to return multiple a reduce action can be applied. states from a lookup. Then, for each return action, the parser The reduce action is where the ‘comprehension’ of the in- creates a new parallel stack, and applies a different action on put happens. It pops off one or more states from the stack, each action. The parser then continues updating all stacks in and combines their associated tokens into a composite token parallel. Effectively, this means that the parser tries all ac- called a parse node. The node represents what the tokens tions at the same time. If a single stack ends up being wrong mean together. For example, if the three top-most tokens were in some way, it can be discarded, while other stacks continue “Constant 1”, “Operation +”, “Variable x”, then a reduction working. The parser fails only when all stacks fail. that takes these three tokens could produce a parse node rep- To save space, the parallel stacks are stored in a Graph resenting “Value (1 + x)”. After reducing, the next state is Structured Stack (GSS) [Denkers, 2018]. The states in the looked up in the parse table. The input tuple is then the new stacks are stored as nodes. Directed edges point to the next topmost state, which was directly below the states that were element below the top state. This lets parallel stacks share removed for reducing, and the parse node as token. The found common states. state is then put on top with the stack, together with the parse node. Effectively, reducing replaces simpler tokens with a 2.3 Scannerless parsing single composite token. The action state signifies that parsing is done. Usually, it is Scannerless parsing, as the name implies, is a modification only accessible when the next token is End Of Input, which to the LR-parsing algorithm that removes the scanner. It is appended at the end of a token stream. It tells the parser to changes the parsing algorithm to work on characters directly. inspect the stack, and see if is represents a successful parse. One advantage of this is that now white-space and layout This is the case when the stack only contains two elements: can be part of the language grammar. However, the main is- the start state, and another state with an associated token. This sue that scannerless parsing solves are context sensitive key- int int = 1; C token then represents the full syntax tree that the parser is words. For example, take the string in a - supposed to generate. If there more than two states in the like language. Normally, the scanner would mark the second int stack, then parser signals a failure. as a keyword token, and then parsing would fail, as as- signing a value to a keyword does not make any sense. How- 2.2 Generalized parsing ever, with scannerless parsing, a grammar could be devised that would consider any alphanumeric characters between the While LR parsing is relatively fast and simple, it has one ma- first int and the = as a valid identifier.
Recommended publications
  • Application and Interpretation
    Programming Languages: Application and Interpretation Shriram Krishnamurthi Brown University Copyright c 2003, Shriram Krishnamurthi This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License. If you create a derivative work, please include the version information below in your attribution. This book is available free-of-cost from the author’s Web site. This version was generated on 2007-04-26. ii Preface The book is the textbook for the programming languages course at Brown University, which is taken pri- marily by third and fourth year undergraduates and beginning graduate (both MS and PhD) students. It seems very accessible to smart second year students too, and indeed those are some of my most successful students. The book has been used at over a dozen other universities as a primary or secondary text. The book’s material is worth one undergraduate course worth of credit. This book is the fruit of a vision for teaching programming languages by integrating the “two cultures” that have evolved in its pedagogy. One culture is based on interpreters, while the other emphasizes a survey of languages. Each approach has significant advantages but also huge drawbacks. The interpreter method writes programs to learn concepts, and has its heart the fundamental belief that by teaching the computer to execute a concept we more thoroughly learn it ourselves. While this reasoning is internally consistent, it fails to recognize that understanding definitions does not imply we understand consequences of those definitions. For instance, the difference between strict and lazy evaluation, or between static and dynamic scope, is only a few lines of interpreter code, but the consequences of these choices is enormous.
    [Show full text]
  • Formalizing the Expertise of the Assembly Language Programmer
    MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY Working Paper 203 September 1980 Formalizing tte Expertise of the Assembly Language Programmer Roger DuWayne Duffey II Abstract: A novel compiler strategy for generating high quality code is described. The quality of the code results from reimplementing the program in the target language using knowledge of the program's behavior. The research is a first step towards formalizing the expertise of the assembly language programmer. The ultimate goal is to formalize code generation and implementation techniques it, the same way that parsing and code generation techniques have been formalized. An experimental code generator based on the reimplementation strategy will be constructed. The code generator will provide a framework for analyzing the costs, applicability, and effectiveness of various implementation techniques. Several common code generation problems will be studied. Code written by experienced programmers and code generated by a conventional optimizing compiler will provide standards of comparison. A.I. Laboratory Working Papers are produced for internal circulation, and may contain information that is, for 4.xample, too preliminary or too detailed for formal publication. It is not intended that they should be considered papers to which reference can be made in the literature. 0 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 1980 I. Introduction This research proposes a novel compiler strategy for generating high quality code. The strategy evolved .from studying the differences between the code produced by experienced programmers and the code produced by conventional optimization techniques. The strategy divides compilation into four serial stages: analyzing the source implementation, undoing source implementation decisions, reimplementing the program in the target language, and assembling the object modules.
    [Show full text]
  • System Language: Understanding Systems
    SYSTEM LANGUAGE: UNDERSTANDING SYSTEMS George Mobus1 University of Washington Tacoma, Institute of Technology, 1900 Commerce St. Tacoma WA, 98402 Kevin Anderson University of Washington Tacoma, Institute of Technology, 1900 Commerce St. Tacoma WA, 98402 ABSTRACT Current languages for system modelling impose limitations on how a system is described. For example system dynamics languages (e.g. Stella) assume that the only concern in modelling a system is its dynamics which can be expressed in stocks, flows, and regulators only. A language for describing systems in a general framework provides guidance for the analysis of real systems as well as a way to construct models of those systems suitable for simulation. The language being developed, system language (SL) for lack of a catchier name, consists of: • A set of lexical elements, terms that represent abstractions of components and entities that are found in all dynamic, complex systems to one extent or another - e.g. regulator, process, flow, boundaries, interfaces, etc. • A syntax for constructing the structure of a system including: o describing the boundary and its conditions (including expansion of boundaries as needed) o describing the hierarchical network of connections and relations (e.g. system of systems) o describing interfaces and protocols for entities to exchange flows o describing the behaviour of elements in the system (e.g. functions) o providing specific identifiers naming the abstract lexical elements (e.g. electrical power flow) o providing a set of attributes appropriate to the nature of the element (e.g. voltage, amperage, etc.) • A semantics that establishes patterns of connectivity and behaviour including: o distinction of material, energy, and message (communications) flows o laws of nature to be observed, e.g.
    [Show full text]
  • The Java® Language Specification Java SE 11 Edition
    The Java® Language Specification Java SE 11 Edition James Gosling Bill Joy Guy Steele Gilad Bracha Alex Buckley Daniel Smith 2018-08-21 Specification: JSR-384 Java SE 11 (18.9) ("Specification") Version: 11 Status: Final Release Specification Lead: Oracle America, Inc. ("Specification Lead") Release: September 2018 Copyright © 1997, 2018, Oracle America, Inc. All rights reserved. The Specification provided herein is provided to you only under the Limited License Grant included herein as Appendix A. Please see Appendix A, Limited License Grant. Table of Contents 1 Introduction 1 1.1 Organization of the Specification 2 1.2 Example Programs 6 1.3 Notation 6 1.4 Relationship to Predefined Classes and Interfaces 7 1.5 Feedback 7 1.6 References 7 2 Grammars 9 2.1 Context-Free Grammars 9 2.2 The Lexical Grammar 9 2.3 The Syntactic Grammar 10 2.4 Grammar Notation 10 3 Lexical Structure 15 3.1 Unicode 15 3.2 Lexical Translations 16 3.3 Unicode Escapes 17 3.4 Line Terminators 19 3.5 Input Elements and Tokens 19 3.6 White Space 20 3.7 Comments 21 3.8 Identifiers 22 3.9 Keywords 24 3.10 Literals 25 3.10.1 Integer Literals 25 3.10.2 Floating-Point Literals 32 3.10.3 Boolean Literals 35 3.10.4 Character Literals 35 3.10.5 String Literals 36 3.10.6 Escape Sequences for Character and String Literals 38 3.10.7 The Null Literal 39 3.11 Separators 40 3.12 Operators 40 4 Types, Values, and Variables 41 4.1 The Kinds of Types and Values 41 4.2 Primitive Types and Values 42 4.2.1 Integral Types and Values 43 iii The Java® Language Specification 4.2.2 Integer
    [Show full text]
  • Programming Languages: Principles and Paradigms Prof
    Undergraduate Topics in Computer Science Undergraduate Topics in Computer Science (UTiCS) delivers high-quality instruc- tional content for undergraduates studying in all areas of computing and information science. From core foundational and theoretical material to final-year topics and ap- plications, UTiCS books take fresh, concise, and modern approach and are ideal for self-study or for a one- or two-semester course. The texts are all authored by established experts in their fields, reviewed by an international advisory board, and contain numerous examples and problems. Many include fully worked solutions. For further volumes: http://www.springer.com/series/7592 Maurizio Gabbrielli and Simone Martini Programming Languages: Principles and Paradigms Prof. Dr. Maurizio Gabbrielli Prof. Dr. Simone Martini Università di Bologna Università di Bologna Bologna Bologna Italy Italy Series editor Ian Mackie Advisory board Samson Abramsky, University of Oxford, UK Chris Hankin, Imperial College London, UK Dexter Kozen, Cornell University, USA Andrew Pitts, University of Cambridge, UK Hanne Riis Nielson, Technical University of Denmark, Denmark Steven Skiena, Stony Brook University, USA Iain Stewart, University of Durham, UK David Zhang, The Hong Kong Polytechnic University, Hong Kong ISSN 1863-7310 ISBN 978-1-84882-913-8 e-ISBN 978-1-84882-914-5 DOI 10.1007/978-1-84882-914-5 Springer London Dordrecht Heidelberg New York British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Control Number: 2010922995 Translated from the original Italian edition: “Linguaggi di programmazione: principi e paradigmi”, pub- lished in 2006 by McGraw-Hill Companies, Publishing Group Italia.
    [Show full text]
  • Comparative Programming Languages
    Comparative Programming Languages hussein suleman uct csc304s 2003 Course Structure 15 lectures 2 assignments 1 x 2-week programming assignment 1 x 1-week “written” tutorial open-book final (1/2 paper) 1 Course Topics overview of paradigms evolution of languages assignment and expressions types, variables, binding and scope pointers and memory management control structures, subprograms runtime execution exceptions concurrency visual languages scripting languages Overview of Paradigms and PL Issues 2 Why study programming languages? Understand differences and similarities. Understand concepts and structures independently of languages. Learn reasons for design decisions in particular languages. Develop the ability to choose appropriate languages and paradigms for particular tasks. Develop the ability to design languages. Common paradigms paradigm n. pattern, model - Dorling Kindersley Dictionary Procedural a.k.a. Imperative imperative = peremptory, urgent, necessary peremptory = imperious, dictatorial, precluding opposition imperious = dominating, dictatorial Object-oriented Declarative a.k.a. Logic Functional 3 Examples Language Paradigm C C++ Java Clean Prolog Assembly Language Visual C++ HTML C# Javascript (Programming) Languages Languages are used to standardise communication. What makes it a programming language? Is HTML a programming language? Javascript? SQL? 3 traditional concepts: sequence selection iteration 4 Issues in comparing languages Simplicity Orthogonality Control structures Data types and type checking Syntax Abstractions
    [Show full text]
  • The FFP Machine
    The FFP Machine Technical Report 87-014 1987 Gy ula A. Mag6 Donald F. Stanat The University of North Carolina at Chapel Hill Department of Computer Science New West Hall 035 A Chapel Hill . N.C . 27514 (This appe&l'l u a chapter in 'Topic:• ln High-Level Language Computer Arc:hitec:ture,' edited by V. Milutinovic, published by Computer Science Press.) THE FFP MACHINE Gyula Mag6* Donald F. Stanat Department of Computer Science University of North Carolina Chapel Hill, North Carolina ABSTRACT The FFP machine directl11 e:tecutes programs written in a general purpose high level functional programming language. It is a small-grain reduction machine which dy­ namicallJI creates independent submachines of varJ!ing size consisting of conglom­ erations of processor. for each available subcomputation. Buau..e the machine per­ forms problem decomposition and resource alloeation without e:tplicit programmer or software control, it is appropriate for d11namic and irregular computations. We descn"be an implementation that consists of a binarJI tree of onl11 two kinds of pro­ cessors; this implementation iB u:tensible and well-suited to VLSI implementation. I. INTRODUCTION A highly parallel machine architecture has been developed and extensively studied at the University of North Carolina at Chapel Hill [Mago79,MagMi84). The project has as its goal the construction of a machine with the following characteristics: 1. general purpose, 2. well-suited to VLSI technology, 3. extensible, 4. high performance. The FFP machine holds promise for satisfying all these criteria. Detailed simulations have shown the design to be sound, and recent advances in microelectronic fabrication technology and asynchronous design make a successful implementation feasible.
    [Show full text]
  • Twig: a Configurable Domain-Specific Language
    TWIG: A CONFIGURABLE DOMAIN-SPECIFIC LANGUAGE by GEOFFREY C. HULETTE A DISSERTATION Presented to the Department of Computer and Information Science and the Graduate School of the University of Oregon in partial fulfillment of the requirements for the degree of Doctor of Philosophy June 2012 DISSERTATION APPROVAL PAGE Student: Geoffrey C. Hulette Title: Twig: A Configurable Domain-Specific Language This dissertation has been accepted and approved in partial fulfillment of the requirements for the Doctor of Philosophy degree in the Department of Computer and Information Science by: Dr. Allen D. Malony Chair Dr. Michal Young Member Dr. Zena Ariola Member Dr. Shawn Lockery Outside Member and Kimberly Andrews Espy Vice President for Research & Innovation/ Dean of the Graduate School Original approval signatures are on file with the University of Oregon Graduate School. Degree awarded June 2012 ii c 2012 Geoffrey C. Hulette iii DISSERTATION ABSTRACT Geoffrey C. Hulette Doctor of Philosophy Department of Computer and Information Science June 2012 Title: Twig: A Configurable Domain-Specific Language Programmers design, write, and understand programs with a high-level structure in mind. Existing programming languages are not very good at capturing this structure because they must include low-level implementation details. To address this problem we introduce Twig, a programming language that allows for domain-specific logic to be encoded alongside low-level functionality. Twig's language is based on a simple, formal calculus that is amenable to both human and machine reasoning. Users may introduce rules that rewrite expressions, allowing for user-defined optimizations. Twig can also incorporate procedures written in a variety of low-level languages.
    [Show full text]
  • Chapter 16 Programming and Languages
    CCHHAAPPTTEERR 1166 PPRROOGGRRAAMMMMIINNGG AANNDD LLAANNGGUUAAGGEESS Programming and Programmers 3 Programming and Languages Notre Dame University Programming and Programmers Learning Module Objectives When you have • Understood what programmers do and do not do completed this • Knew how programmers define a problem, plan the solution, and the code, learning test, and document the program module you will have: Programming and Programmers 4 Programming and Languages Notre Dame University Understand what is programming, what programmers do and do not do Why You may already have used software, perhaps for word processing or Programming spreadsheets, to solve problems. Perhaps now you are curious to learn how programmers develop software. A program is a set of step-by-step instructions that directs the computer to do the tasks you want it to do and produce the results you want. A set of rules that provides a way of telling a computer what operations to perform is called a programming language. There is not, however, just one programming language; there are many. What In general, the programmer's job is to convert problem solutions into instructions programmers for the computer. That is, the programmer prepares the instructions of a computer do program and runs those instructions on the computer, tests the program to see if it is working properly, and makes corrections to the program. The programmer also writes a report on the program. These activities are all done for the purpose of helping a user fill a need, such as paying employees, billing customers, or admitting students to college. The programming activities just described could be done, perhaps, as solo activities, but a programmer typically interacts with a variety of people.
    [Show full text]
  • The Java® Language Specification Java SE 10 Edition
    The Java® Language Specification Java SE 10 Edition James Gosling Bill Joy Guy Steele Gilad Bracha Alex Buckley Daniel Smith 2018-02-20 Specification: JSR-383 Java SE 10 (18.3) ("Specification") Version: 10 Status: Final Release Release: March 2018 Copyright © 1997, 2018, Oracle America, Inc. 500 Oracle Parkway, Redwood City, California 94065, U.S.A. All rights reserved. The Specification provided herein is provided to you only under the Limited License Grant included herein as Appendix A. Please see Appendix A, Limited License Grant. Table of Contents 1 Introduction 1 1.1 Organization of the Specification 2 1.2 Example Programs 6 1.3 Notation 6 1.4 Relationship to Predefined Classes and Interfaces 7 1.5 Feedback 7 1.6 References 7 2 Grammars 9 2.1 Context-Free Grammars 9 2.2 The Lexical Grammar 9 2.3 The Syntactic Grammar 10 2.4 Grammar Notation 10 3 Lexical Structure 15 3.1 Unicode 15 3.2 Lexical Translations 16 3.3 Unicode Escapes 17 3.4 Line Terminators 19 3.5 Input Elements and Tokens 19 3.6 White Space 20 3.7 Comments 21 3.8 Identifiers 22 3.9 Keywords 24 3.10 Literals 25 3.10.1 Integer Literals 25 3.10.2 Floating-Point Literals 32 3.10.3 Boolean Literals 35 3.10.4 Character Literals 35 3.10.5 String Literals 36 3.10.6 Escape Sequences for Character and String Literals 38 3.10.7 The Null Literal 39 3.11 Separators 40 3.12 Operators 40 4 Types, Values, and Variables 41 4.1 The Kinds of Types and Values 41 4.2 Primitive Types and Values 42 4.2.1 Integral Types and Values 43 iii The Java® Language Specification 4.2.2 Integer Operations
    [Show full text]
  • Modeling of Motion Primitive Architectures Using Domain-Specific Languages
    Modeling of Motion Primitive Architectures using Domain-Specific Languages by Dipl.-Ing. Arne Nordmann Dissertation Faculty of Technology Bielefeld University Bielefeld, August 2015 to the most wonderful family one can wish for to the friends and colleages who joined me on this crazy journey they call doing a phd Printed on permanent paper according to ISO 9706. Abstract Many aspects in robotics, and their omnipresent ideal models, animals and humans, are still not understood or explored well enough, for example producing motions of animal- and human-like complexity. To explore the inner workings of systems studying this complexity, the essential concepts of interest need to be made explicit and raised from the code-level to a higher level of abstraction to be able to reason about them. This work introduces a model-driven engineering approach for complex movement control architectures based on motion primitives, which in recent years have been a central development towards adaptive and flexible control of complex and compliant robots. The goal is to realize rich motor skills through the composition of motion primitives. This thesis proposes a design process to analyze the control architectures of representative example systems to identify common abstractions. Identified and formalized concepts can then be used to automate software development of motion primitive architectures through model-driven engineering methods and domain-specific languages. It turns out that the introduced notion of motion primitives implemented as dynamical systems with machine learning capabilities, provide the computational building block for a large class of such control architectures. Building on the identified concepts, a set of modularized domain-specific languages allows the compact specifica- tion of motion primitive architectures.
    [Show full text]
  • The Java® Language Specification Java SE 14 Edition
    The Java® Language Specification Java SE 14 Edition James Gosling Bill Joy Guy Steele Gilad Bracha Alex Buckley Daniel Smith Gavin Bierman 2020-02-20 Specification: JSR-389 Java SE 14 Version: 14 Status: Final Release Release: March 2020 Copyright © 1997, 2020, Oracle America, Inc. All rights reserved. The Specification provided herein is provided to you only under the Limited License Grant included herein as Appendix A. Please see Appendix A, Limited License Grant. Table of Contents 1 Introduction 1 1.1 Organization of the Specification 2 1.2 Example Programs 6 1.3 Notation 6 1.4 Relationship to Predefined Classes and Interfaces 7 1.5 Preview Features 7 1.6 Feedback 8 1.7 References 9 2 Grammars 11 2.1 Context-Free Grammars 11 2.2 The Lexical Grammar 11 2.3 The Syntactic Grammar 12 2.4 Grammar Notation 12 3 Lexical Structure 17 3.1 Unicode 17 3.2 Lexical Translations 18 3.3 Unicode Escapes 19 3.4 Line Terminators 21 3.5 Input Elements and Tokens 21 3.6 White Space 22 3.7 Comments 23 3.8 Identifiers 24 3.9 Keywords 26 3.10 Literals 27 3.10.1 Integer Literals 28 3.10.2 Floating-Point Literals 34 3.10.3 Boolean Literals 37 3.10.4 Character Literals 37 3.10.5 String Literals 38 3.10.6 Escape Sequences for Character and String Literals 41 3.10.7 The Null Literal 42 3.11 Separators 42 3.12 Operators 42 4 Types, Values, and Variables 43 4.1 The Kinds of Types and Values 43 4.2 Primitive Types and Values 44 iii The Java® Language Specification 4.2.1 Integral Types and Values 45 4.2.2 Integer Operations 45 4.2.3 Floating-Point Types, Formats,
    [Show full text]