Introduction to Parsing Ambiguity and Syntax Errors

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Parsing Ambiguity and Syntax Errors Outline Introduction to Parsing • Regular languages revisited Ambiguity and Syntax Errors • Parser overview • Context-free grammars (CFG’s) • Derivations • Ambiguity • Syntax errors 2 Languages and Automata Limitations of Regular Languages • Formal languages are very important in CS Intuition: A finite automaton that runs long – Especially in programming languages and compilers enough must repeat states • A finite automaton cannot remember number • Regular languages of times it has visited a particular state – The weakest formal languages widely used • because a finite automaton has finite memory – Many applications – Only enough to store in which state it is – Cannot count, except up to a finite limit • We will also study context-free languages • Many languages are not regular • E.g., the language of balanced parentheses is not regular: { (i )i | i ≥ 0} 3 4 The Functionality of the Parser Example • Input: sequence of tokens from lexer • If-then-else statement if (x == y) then z = 1; else z = 2; • Output: parse tree of the program • Parser input IF (ID == ID) THEN ID = INT; ELSE ID = INT; • Possible parser output IF-THEN-ELSE == = = ID ID ID INT ID INT 5 6 Comparison with Lexical Analysis The Role of the Parser • Not all sequences of tokens are programs ... • Parser must distinguish between valid and Phase Input Output invalid sequences of tokens Lexer Sequence of Sequence of characters tokens • We need – A language for describing valid sequences of tokens Parser Sequence of Parse tree – A method for distinguishing valid from invalid tokens sequences of tokens 7 8 Context-Free Grammars CFGs (Cont.) • Many programming language constructs have a A CFG consists of recursive structure – A set of terminals T – A set of non-terminals N • E.g. A STMT is of the form – A start symbol S (a non-terminal) if COND then STMT else STMT , or – A set of productions while COND do STMT , or … Assuming X ∈ N the productions are of the form • Context-free grammars are a natural notation X →ε , or for this recursive structure X → Y1 Y2 ... Yn where Yi ∈ N ∪T 9 10 Notational Conventions Examples of CFGs • In these lecture notes A fragment of an example language (simplified): – Non-terminals are written upper-case – Terminals are written lower-case STMT → if COND then STMT else STMT – The start symbol is the left-hand side of the first ⏐ production while COND do STMT ⏐ id = int 11 12 Examples of CFGs (cont.) The Language of a CFG Grammar for simple arithmetic expressions: Read productions as replacement rules: → E → E * E X Y1 ... Yn Means X can be replaced by Y1 ... Yn (in this order) ⏐ E + E X →ε ⏐ ( E ) Means X can be erased (replaced with empty string) ⏐ id 13 14 Key Idea The Language of a CFG (Cont.) (1) Begin with a string consisting of the start More formally, we write symbol “S” X XX→ XXYYXX (2) Replace any non-terminal X in the string by 11LLin L i−+11 L min1 L a right-hand side of some production if there is a production XYY→ 1 n L XYYim→ 1 L (3) Repeat (2) until there are no non-terminals in the string 15 16 The Language of a CFG (Cont.) The Language of a CFG Write Let G be a context-free grammar with start ∗ symbol S. Then the language of G is: XXYY11LLnm→ if ∗ aaSaa|→ and every a is a terminal { 11KKnn i } XX11LLLLnm→→→ YY in 0 or more steps 17 18 Terminals Examples • Terminals are called so because there are no L(G) is the language of the CFG G rules for replacing them ii Strings of balanced parentheses {() |i ≥ 0} • Once generated, terminals are permanent Two equivalent ways of writing the grammar G: • Terminals ought to be tokens of the language SS→ () SS→ () or S → ε | ε 19 20 Example Example (Cont.) A fragment of our example language (simplified): Some elements of the our language STMT → if COND then STMT id = int ⏐ if COND then STMT else STMT if (id == id) then id = int else id = int ⏐ while COND do STMT while (id != id) do id = int ⏐ id = int while (id == id) do while (id != id) do id = int COND → (id == id) if (id != id) then if (id == id) then id = int else id = int ⏐ (id != id) 21 22 Arithmetic Example Notes Simple arithmetic expressions: The idea of a CFG is a big step. E→∗ E+E | E E | (E) | id But: Some elements of the language: • Membership in a language is just “yes” or “no”; id id + id we also need the parse tree of the input (id) id ∗ id • Must handle errors gracefully (id) ∗∗ id id (id) • Need an implementation of CFG’s – e.g., yacc/bison/ML-yacc/... 23 24 More Notes Derivations and Parse Trees • Form of the grammar is important A derivation is a sequence of productions – Many grammars generate the same language S →→→ – Parsing tools are sensitive to the grammar LLL A derivation can be drawn as a tree – Start symbol is the tree’s root – For a production XYY → 1 L n add children YY 1 L n to node X Note: Tools for regular languages (e.g., lex/ML-Lex) are also sensitive to the form of the regular expression, but this is rarely a problem in practice 25 26 Derivation Example Derivation Example (Cont.) • Grammar Ε → E+E | E*E | (E) | id E→∗ E+E | E E | (E) | id E E → E+E • String E + E id ∗ id + id →∗EE+E →∗id E + E E * E id →∗id id + E id id →∗id id + id 27 28 Derivation in Detail (1) Derivation in Detail (2) Ε → E+E | E*E | (E) | id Ε → E+E | E*E | (E) | id E E E + E E E → E+E 29 30 Derivation in Detail (3) Derivation in Detail (4) Ε → E+E | E*E | (E) | id Ε → E+E | E*E | (E) | id E E E E E + E E + E → E+E → E+E E * E →∗EE+E E * E →∗E+EE →∗id E + E id 31 32 Derivation in Detail (5) Derivation in Detail (6) Ε → E+E | E*E | (E) | id Ε → E+E | E*E | (E) | id E E E E → E+E → E+E E + E E + E →∗EE+E →∗EE+E E * E →∗id E + E E * E id →∗id E + E → id∗ id + E →∗id id + E id id id id →∗id id + id 33 34 Notes on Derivations Left-most and Right-most Derivations • A parse tree has • What was shown before Ε → E+E | E*E | (E) | id – Terminals at the leaves was a left-most derivation – At each step, we replaced E – Non-terminals at the interior nodes the left-most non-terminal → E+E • An in-order traversal of the leaves is the • There is an equivalent → E+id original input notion of a right-most derivation →∗EE + id – Shown on the right • The parse tree shows the association of →∗Eid + id operations; the input string does not ! →∗id id + id 35 36 Right-most Derivation in Detail (1) Right-most Derivation in Detail (2) Ε → E+E | E*E | (E) | id Ε → E+E | E*E | (E) | id E E E + E E E → E+E 37 38 Right-most Derivation in Detail (3) Right-most Derivation in Detail (4) Ε → E+E | E*E | (E) | id Ε → E+E | E*E | (E) | id E E E E E + E E + E → E+E → E+E id → E+id E * E id → E+id → EE∗ + id 39 40 Right-most Derivation in Detail (5) Right-most Derivation in Detail (6) Ε → E+E | E*E | (E) | id Ε → E+E | E*E | (E) | id E E E E → E+E → E+E E + E E + E → E+id → E+id E * E id →∗EE + id E * E id → EE∗ + id →∗Eid + id → E ∗id + id id id id →∗id id + id 41 42 Derivations and Parse Trees Summary of Derivations • Note that: • We are not just interested in whether – right-most and left-most derivations have the same s ∈ L(G) parse tree – We need a parse tree for s – for each parse tree, there is a right-most and a left-most derivation • A derivation defines a parse tree – But one parse tree may have many derivations • The difference is just in the order in which branches are added • Left-most and right-most derivations are important in parser implementation 43 44 Ambiguity Ambiguity (Cont.) • Grammar: • A grammar is ambiguous if it has more than E → E + E | E * E | ( E ) | int one parse tree for some string – Equivalently, if there is more than one right-most or left-most derivation for some string • The string int * int + int has two parse trees • Ambiguity is bad E E – Leaves meaning of some programs ill-defined • Ambiguity is common in programming languages E + E E * E – Arithmetic expressions E E int int E + E – IF-THEN-ELSE * int int int int 45 46 Dealing with Ambiguity Ambiguity: The Dangling Else • There are several ways to handle ambiguity • Consider the following grammar • Most direct method is to rewrite the grammar S → if C then S unambiguously | if C then S else S | OTHER E → T + E | T T → int * T | int | ( E ) • This grammar is also ambiguous • This grammar enforces precedence of * over + 47 48 The Dangling Else: Example The Dangling Else: A Fix • The expression • else should match the closest unmatched then if C1 then if C2 then S3 else S4 • We can describe this in the grammar has two parse trees S → MIF /* all then are matched */ if if | UIF /* some then are unmatched */ MIF → if C then MIF else MIF C if C1 if S4 1 | OTHER UIF → if C then S | if C then MIF else UIF C2 S3 C2 S3 S4 • Typically we want the second form • Describes the same set of strings 49 50 The Dangling Else: Example Revisited Ambiguity • The expression if C1 then if C2 then S3 else S4 • No general techniques for handling ambiguity if if • Impossible to convert automatically an ambiguous grammar to an unambiguous one C1 if C1 if S4 • Used with care, ambiguity can simplify the C2 S3 S4 C2 S3 grammar • A valid parse tree – Sometimes allows more natural definitions • Not valid because the (for a UIF) then expression is not – However, we need disambiguation mechanisms a MIF 51 52 Precedence and Associativity Declarations Associativity Declarations • Instead of rewriting the grammar • Consider the grammar E → E + E | int – Use the more natural (ambiguous) grammar • Ambiguous: two parse trees of int + int + int – Along with disambiguating declarations E E • Most tools allow precedence and associativity E + E E E declarations to disambiguate grammars + E + E int int E + E • Examples … int int int int • Left associativity declaration: %left + 53 54 Precedence Declarations Error Handling • Consider the grammar E → E + E | E * E | int • Purpose of the compiler is And the string int + int * int – To detect non-valid programs – To translate the valid ones E E • Many kinds of possible errors (e.g.
Recommended publications
  • Layout-Sensitive Generalized Parsing
    Layout-sensitive Generalized Parsing Sebastian Erdweg, Tillmann Rendel, Christian K¨astner,and Klaus Ostermann University of Marburg, Germany Abstract. The theory of context-free languages is well-understood and context-free parsers can be used as off-the-shelf tools in practice. In par- ticular, to use a context-free parser framework, a user does not need to understand its internals but can specify a language declaratively as a grammar. However, many languages in practice are not context-free. One particularly important class of such languages is layout-sensitive languages, in which the structure of code depends on indentation and whitespace. For example, Python, Haskell, F#, and Markdown use in- dentation instead of curly braces to determine the block structure of code. Their parsers (and lexers) are not declaratively specified but hand-tuned to account for layout-sensitivity. To support declarative specifications of layout-sensitive languages, we propose a parsing framework in which a user can annotate layout in a grammar. Annotations take the form of constraints on the relative posi- tioning of tokens in the parsed subtrees. For example, a user can declare that a block consists of statements that all start on the same column. We have integrated layout constraints into SDF and implemented a layout- sensitive generalized parser as an extension of generalized LR parsing. We evaluate the correctness and performance of our parser by parsing 33 290 open-source Haskell files. Layout-sensitive generalized parsing is easy to use, and its performance overhead compared to layout-insensitive parsing is small enough for practical application. 1 Introduction Most computer languages prescribe a textual syntax.
    [Show full text]
  • Parse Forest Diagnostics with Dr. Ambiguity
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by CWI's Institutional Repository Parse Forest Diagnostics with Dr. Ambiguity H. J. S. Basten and J. J. Vinju Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands {Jurgen.Vinju,Bas.Basten}@cwi.nl Abstract In this paper we propose and evaluate a method for locating causes of ambiguity in context-free grammars by automatic analysis of parse forests. A parse forest is the set of parse trees of an ambiguous sentence. Deducing causes of ambiguity from observing parse forests is hard for grammar engineers because of (a) the size of the parse forests, (b) the complex shape of parse forests, and (c) the diversity of causes of ambiguity. We first analyze the diversity of ambiguities in grammars for programming lan- guages and the diversity of solutions to these ambiguities. Then we introduce DR.AMBIGUITY: a parse forest diagnostics tools that explains the causes of ambiguity by analyzing differences between parse trees and proposes solutions. We demonstrate its effectiveness using a small experiment with a grammar for Java 5. 1 Introduction This work is motivated by the use of parsers generated from general context-free gram- mars (CFGs). General parsing algorithms such as GLR and derivates [35,9,3,6,17], GLL [34,22], and Earley [16,32] support parser generation for highly non-deterministic context-free grammars. The advantages of constructing parsers using such technology are that grammars may be modular and that real programming languages (often requiring parser non-determinism) can be dealt with efficiently1.
    [Show full text]
  • Formal Languages - 3
    Formal Languages - 3 • Ambiguity in PLs – Problems with if-then-else constructs – Harder problems • Chomsky hierarchy for formal languages – Regular and context-free languages – Type 1: Context-sensitive languages – Type 0 languages and Turing machines Formal Languages-3, CS314 Fall 01© BGRyder 1 Dangling Else Ambiguity (Pascal) Start ::= Stmt Stmt ::= Ifstmt | Astmt Ifstmt ::= IF LogExp THEN Stmt | IF LogExp THEN Stmt ELSE Stmt Astmt ::= Id := Digit Digit ::= 0|1|2|3|4|5|6|7|8|9 LogExp::= Id = 0 Id ::= a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z How are compound if statements parsed using this grammar?? Formal Languages-3, CS314 Fall 01© BGRyder 2 1 IF x = 0 THEN IF y = 0 THEN z := 1 ELSE w := 2; Start Parse Tree 1 Stmt Ifstmt IF LogExp THEN Stmt Ifstmt Id = 0 IF LogExp THEN Stmt ELSE Stmt x Id = 0 Astmt Astmt Id := Digit y Id := Digit z 1 w 2 Formal Languages-3, CS314 Fall 01© BGRyder 3 IF x = 0 THEN IF y = 0 THEN z := 1 ELSE w := 2; Start Stmt Parse Tree 2 Ifstmt Q: which tree is correct? IF LogExp THEN Stmt ELSE Stmt Id = 0 Ifstmt Astmt IF LogExp THEN Stmt x Id := Digit Astmt Id = 0 w 2 Id := Digit y z 1 Formal Languages-3, CS314 Fall 01© BGRyder 4 2 How Solve the Dangling Else? • Algol60: use block structure if x = 0 then begin if y = 0 then z := 1 end else w := 2 • Algol68: use statement begin/end markers if x = 0 then if y = 0 then z := 1 fi else w := 2 fi • Pascal: change the if statement grammar to disallow parse tree 2; that is, always associate an else with the closest if Formal Languages-3, CS314 Fall 01© BGRyder 5 New Pascal Grammar Start ::= Stmt Stmt ::= Stmt1 | Stmt2 Stmt1 ::= IF LogExp THEN Stmt1 ELSE Stmt1 | Astmt Stmt2 ::= IF LogExp THEN Stmt | IF LogExp THEN Stmt1 ELSE Stmt2 Astmt ::= Id := Digit Digit ::= 0|1|2|3|4|5|6|7|8|9 LogExp::= Id = 0 Id ::= a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z Note: only if statements with IF..THEN..ELSE are allowed after the THEN clause of an IF-THEN-ELSE statement.
    [Show full text]
  • Ordered Sets in the Calculus of Data Structures
    Exercise 1: Balanced Parentheses Show that the following balanced parentheses grammar is ambiguous (by finding two parse trees for some input sequence) and find unambiguous grammar for the same language. B ::= | ( B ) | B B Remark • The same parse tree can be derived using two different derivations, e.g. B -> (B) -> (BB) -> ((B)B) -> ((B)) -> (()) B -> (B) -> (BB) -> ((B)B) -> (()B) -> (()) this correspond to different orders in which nodes in the tree are expanded • Ambiguity refers to the fact that there are actually multiple parse trees, not just multiple derivations. Towards Solution • (Note that we must preserve precisely the set of strings that can be derived) • This grammar: B ::= | A A ::= ( ) | A A | (A) solves the problem with multiple symbols generating different trees, but it is still ambiguous: string ( ) ( ) ( ) has two different parse trees Solution • Proposed solution: B ::= | B (B) • this is very smart! How to come up with it? • Clearly, rule B::= B B generates any sequence of B's. We can also encode it like this: B ::= C* C ::= (B) • Now we express sequence using recursive rule that does not create ambiguity: B ::= | C B C ::= (B) • but now, look, we "inline" C back into the rules for so we get exactly the rule B ::= | B (B) This grammar is not ambiguous and is the solution. We did not prove this fact (we only tried to find ambiguous trees but did not find any). Exercise 2: Dangling Else The dangling-else problem happens when the conditional statements are parsed using the following grammar. S ::= S ; S S ::= id := E S ::= if E then S S ::= if E then S else S Find an unambiguous grammar that accepts the same conditional statements and matches the else statement with the nearest unmatched if.
    [Show full text]
  • 14 a Simple Compiler - the Front End
    Compilers and Compiler Generators © P.D. Terry, 2000 14 A SIMPLE COMPILER - THE FRONT END At this point it may be of interest to consider the construction of a compiler for a simple programming language, specifically that of section 8.7. In a text of this nature it is impossible to discuss a full-blown compiler, and the value of our treatment may arguably be reduced by the fact that in dealing with toy languages and toy compilers we shall be evading some of the real issues that a compiler writer has to face. However, we hope the reader will find the ensuing discussion of interest, and that it will serve as a useful preparation for the study of much larger compilers. The technique we shall follow is one of slow refinement, supplementing the discussion with numerous asides on the issues that would be raised in compiling larger languages. Clearly, we could opt to develop a completely hand-crafted compiler, or simply to use a tool like Coco/R. We shall discuss both approaches. Even when a compiler is constructed by hand, having an attributed grammar to describe it is very worthwhile. On the source diskette can be found a great deal of code, illustrating different stages of development of our system. Although some of this code is listed in appendices, its volume precludes printing all of it. Some of it has deliberately been written in a way that allows for simple modification when attempting the exercises, and is thus not really of "production quality". For example, in order to allow components such as the symbol table handler and code generator to be used either with hand-crafted or with Coco/R generated systems, some compromises in design have been necessary.
    [Show full text]
  • Disambiguation Filters for Scannerless Generalized LR Parsers
    Disambiguation Filters for Scannerless Generalized LR Parsers Mark G. J. van den Brand1,4, Jeroen Scheerder2, Jurgen J. Vinju1, and Eelco Visser3 1 Centrum voor Wiskunde en Informatica (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands {Mark.van.den.Brand,Jurgen.Vinju}@cwi.nl 2 Department of Philosophy, Utrecht University Heidelberglaan 8, 3584 CS Utrecht, The Netherlands [email protected] 3 Institute of Information and Computing Sciences, Utrecht University P.O. Box 80089, 3508TB Utrecht, The Netherlands [email protected] 4 LORIA-INRIA 615 rue du Jardin Botanique, BP 101, F-54602 Villers-l`es-Nancy Cedex, France Abstract. In this paper we present the fusion of generalized LR pars- ing and scannerless parsing. This combination supports syntax defini- tions in which all aspects (lexical and context-free) of the syntax of a language are defined explicitly in one formalism. Furthermore, there are no restrictions on the class of grammars, thus allowing a natural syntax tree structure. Ambiguities that arise through the use of unrestricted grammars are handled by explicit disambiguation constructs, instead of implicit defaults that are taken by traditional scanner and parser gener- ators. Hence, a syntax definition becomes a full declarative description of a language. Scannerless generalized LR parsing is a viable technique that has been applied in various industrial and academic projects. 1 Introduction Since the introduction of efficient deterministic parsing techniques, parsing is considered a closed topic for research, both by computer scientists and by practi- cioners in compiler construction. Tools based on deterministic parsing algorithms such as LEX & YACC [15,11] (LALR) and JavaCC (recursive descent), are con- sidered adequate for dealing with almost all modern (programming) languages.
    [Show full text]
  • Introduction to Programming Languages and Syntax
    Programming Languages Session 1 – Main Theme Programming Languages Overview & Syntax Dr. Jean-Claude Franchitti New York University Computer Science Department Courant Institute of Mathematical Sciences Adapted from course textbook resources Programming Language Pragmatics (3rd Edition) Michael L. Scott, Copyright © 2009 Elsevier 1 Agenda 11 InstructorInstructor andand CourseCourse IntroductionIntroduction 22 IntroductionIntroduction toto ProgrammingProgramming LanguagesLanguages 33 ProgrammingProgramming LanguageLanguage SyntaxSyntax 44 ConclusionConclusion 2 Who am I? - Profile - ¾ 27 years of experience in the Information Technology Industry, including thirteen years of experience working for leading IT consulting firms such as Computer Sciences Corporation ¾ PhD in Computer Science from University of Colorado at Boulder ¾ Past CEO and CTO ¾ Held senior management and technical leadership roles in many large IT Strategy and Modernization projects for fortune 500 corporations in the insurance, banking, investment banking, pharmaceutical, retail, and information management industries ¾ Contributed to several high-profile ARPA and NSF research projects ¾ Played an active role as a member of the OMG, ODMG, and X3H2 standards committees and as a Professor of Computer Science at Columbia initially and New York University since 1997 ¾ Proven record of delivering business solutions on time and on budget ¾ Original designer and developer of jcrew.com and the suite of products now known as IBM InfoSphere DataStage ¾ Creator of the Enterprise
    [Show full text]
  • CS4200 | Compiler Construction | September 10, 2020 Disambiguation and Layout-Sensitive Syntax
    Disambiguation and Layout-Sensitive Syntax Eelco Visser CS4200 | Compiler Construction | September 10, 2020 Disambiguation and Layout-Sensitive Syntax Syntax Definition Summary Derivations - Generating sentences and trees from context-free grammars Ambiguity Declarative Disambiguation Rules - Associativity and priority Grammar Transformations - Eliminating ambiguity by transformation Layout-Sensitive Syntax - Disambiguation using layout constraints Structure Syntax = Structure module structure let inc = function(x) { x + 1 } imports Common in inc(3) context-free start-symbols Exp end context-free syntax Exp.Var = ID Exp.Int = INT Exp.Add = Exp "+" Exp Let( Exp.Fun = "function" "(" {ID ","}* ")" "{" Exp "}" [ Bnd( "inc" Exp.App = Exp "(" {Exp ","}* ")" , Fun(["x"], Add(Var("x"), Int("1"))) ) Exp.Let = "let" Bnd* "in" Exp "end" ] , App(Var("inc"), [Int("3")]) Bnd.Bnd = ID "=" Exp ) Token = Character module structure let inc = function(x) { x + 1 } imports Common in inc(3) context-free start-symbols Exp end context-free syntax Exp.Var = ID module Common Exp.Int = INT lexical syntax Exp.Add = Exp "+" Exp ID = [a-zA-Z] [a-zA-Z0-9]* Exp.Fun = "function" "(" {ID ","}* ")" "{" Exp "}" INT = [\-]? [0-9]+ Exp.App = Exp "(" {Exp ","}* ")" Exp.Let = "let" Bnd* "in" Exp "end" Lexical Syntax = Context-Free Syntax Bnd.Bnd = ID "=" Exp (But we don’t care about structure of lexical syntax) Literal = Non-Terminal module structure let inc = function(x) { x + 1 } imports Common in inc(3) context-free start-symbols Exp end context-free syntax syntax Exp.Var
    [Show full text]
  • Programming Languages
    Programming Languages CSCI-GA.2110-001 Summer 2011 Dr. Cory Plock What this course is ■ A study of programming language paradigms ◆ Imperitive ◆ Functional ◆ Logical ◆ Object-oriented ■ Tour of programming language history & roots. ■ Introduction to core language design & implementation concepts. ■ Exposure to new languages you may not have used before. ■ Ability to reason about language benefits/pitfalls. ■ A look at programming language implementation. ■ Offers an appreciation of language standards. ■ Provides the ability to more quickly learn new languages. 2 / 26 What this course isn’t ■ A comprehensive study of one or more languages. ■ An exercise in learning as many languages as possible. ■ A software engineering course. ■ A compiler course. 3 / 26 Introduction The main themes of programming language design and use: ■ Paradigm (Model of computation) ■ Expressiveness ◆ control structures ◆ abstraction mechanisms ◆ types and their operations ◆ tools for programming in the large ■ Ease of use: Writeability / Readability / Maintainability 4 / 26 Language as a tool for thought ■ Role of language as a communication vehicle among programmers can be just as important as ease of writing ■ All general-purpose languages are Turing complete (They can compute the same things) ■ But languages can make expression of certain algorithms difficult or easy. ◆ Try multiplying two Roman numerals ■ Idioms in language A may be useful inspiration when writing in language B. 5 / 26 Idioms ■ Copying a string q to p in C: while (*p++ = *q++) ; ■ Removing duplicates
    [Show full text]
  • A Practical Tutorial on Context Free Grammars
    A Practical Tutorial on Context Free Grammars Robert B. Heckendorn University of Idaho September 16, 2020 Contents 1 Some Definitions 3 2 Some Common Idioms and Hierarchical Development 5 2.1 A grammar for a language that allows a list of X's . .5 2.2 A Language that is a List of X's or Y's . .5 2.3 A Language that is a List of X's Terminated with a Semicolon . .5 2.4 Hierarchical Complexification . .6 2.5 A Language Where There are not Terminal Semicolons but Separating Commas . .6 2.6 A language where each X is followed by a terminating semicolon: . .6 2.7 An arg list of X's, Y's and Z's: . .6 2.8 A Simple Type Declaration: . .6 2.9 Augment the Type Grammar with the Keyword Static . .7 2.10 A Tree Structure as Nested Parentheses . .7 3 Associativity and Precedence 7 3.1 A Grammar for an Arithmetic Expression . .7 3.2 Grouping with Parentheses . .8 3.3 Unary operators . .9 3.4 An Example Problem . 10 4 How can Things go Wrong 10 4.1 Inescapable Productions . 10 4.2 Ambiguity . 11 4.2.1 Ambiguity by Ill-specified Associativity . 12 4.2.2 Ambiguity by Infinite Loop . 13 4.2.3 Ambiguity by Multiple Paths . 14 1 5 The Dangling Else 15 6 The Power of Context Free and incomplete grammars 16 7 Tips for Grammar Writing 18 8 Extended BNF 18 2 A grammar is used to specify the syntax of a language. It answers the question: What sentences are in the language and what are not? A sentence is a finite sequence of symbols from the alphabet of the language.
    [Show full text]
  • Context Free Languages
    Context Free Languages COMP2600 — Formal Methods for Software Engineering Katya Lebedeva Australian National University Semester 2, 2016 Slides by Katya Lebedeva and Ranald Clouston. COMP 2600 — Context Free Languages 1 Ambiguity The definition of CF grammars allow for the possibility of having more than one structure for a given sentence. This ambiguity may make the meaning of a sentence unclear. A context-free grammar G is unambiguous iff every string can be derived by at most one parse tree. G is ambiguous iff there exists any word w 2 L(G) derivable by more than one parse trees. COMP 2600 — Context Free Languages 2 Dangling else Take the code if e1 then if e2 then s1 else s2 where e1, e2 are boolean expressions and s1, s2 are subprograms. Does this mean if e1 then( if e2 then s1 else s2) or if e1 then( if e2 then s1) else s2 The dangling else is a problem in computer programming in which an optional else clause in an “ifthen(else)” statement results in nested conditionals being ambiguous. COMP 2600 — Context Free Languages 3 This is a problem that often comes up in compiler construction, especially parsing. COMP 2600 — Context Free Languages 4 Inherently Ambiguous Languages Not all context-free languages can be given unambiguous grammars – some are inherently ambiguous. Consider the language L = faib jck j i = j or j = kg How do we know that this is context-free? First, notice that L = faibickg [ faib jc jg We then combine CFGs for each side of this union (a standard trick): S ! T j W T ! UV W ! XY U ! aUb j e X ! aX j e V ! cV j e Y ! bYc j e COMP 2600 — Context Free Languages 5 The problem with L is that its sub-languages faibickg and faib jc jg have a non-empty intersection.
    [Show full text]
  • 165142A8ddd2a8637ad23690e
    CS 345 Lexical and Syntactic Analysis Vitaly Shmatikov slide 1 Reading Assignment Mitchell, Chapters 4.1 C Reference Manual, Chapters 2 and 7 slide 2 Syntax Syntax of a programming language is a precise description of all grammatically correct programs • Precise formal syntax was first used in ALGOL 60 Lexical syntax • Basic symbols (names, values, operators, etc.) Concrete syntax • Rules for writing expressions, statements, programs Abstract syntax • Internal representation of expressions and statements, capturing their “meaning” (i.e., semantics) slide 3 Grammars A meta-language is a language used to define other languages A grammar is a meta-language used to define the syntax of a language. It consists of: • Finite set of terminal symbols • Finite set of non-terminal symbols Backus-Naur • Finite set of production rules Form (BNF) • Start symbol • Language = (possibly infinite) set of all sequences of symbols that can be derived by applying production rules starting from the start symbol slide 4 Example: Decimal Numbers Grammar for unsigned decimal integers • Terminal symbols: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 • Non-terminal symbols: Digit, Integer Shorthand for • Production rules: Integer → Digit – Integer → Digit | Integer Digit Integer → Integer Digit – Digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 • Start symbol: Integer Can derive any unsigned integer using this grammar • Language = set of all unsigned decimal integers slide 5 Derivation of 352 as an Integer Production rules: Integer → Digit | Integer Digit Integer ⇒ Integer Digit Digit
    [Show full text]