BNF • Syntax Diagrams

BNF • Syntax Diagrams

Some Preliminaries • For the next several weeks we’ll look at how one can define a programming language • What is a language, anyway? “Language is a system of gestures, grammar, signs, 3 sounds, symbols, or words, which is used to represent and communicate concepts, ideas, meanings, and thoughts” • Human language is a way to communicate representaons from one (human) mind to another Syntax • What about a programming language? A way to communicate representaons (e.g., of data or a procedure) between human minds and/or machines CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. IntroducCon Why and How We usually break down the problem of defining Why? We want specificaons for several a programming language into two parts communies: • Other language designers • defining the PL’s syntax • Implementers • defining the PL’s semanBcs • Machines? Syntax - the form or structure of the • Programmers (the users of the language) expressions, statements, and program units How? One way is via natural language descrip- Seman-cs - the meaning of the expressions, Bons (e.g., user manuals, text books) but there statements, and program units are more formal techniques for specifying the There’s not always a clear boundary between syntax and semanBcs the two Syntax part Syntax Overview • Language preliminaries • Context-free grammars and BNF • Syntax diagrams This is an overview of the standard process of turning a text file into an executable program. IntroducCon Lexical Structure of Programming Languages A sentence is a string of characters over some alphabet (e.g., def add1(n): return n + 1) • The structure of its lexemes (words or tokens) – token is a category of lexeme A language is a set of sentences • The scanning phase (lexical analyser) collects characters into tokens A lexeme is the lowest level syntacBc unit of a • Parsing phase (syntacBc analyser) determines language (e.g., *, add1, begin) syntacBc structure A token is a category of lexemes (e.g., idenfier) Stream of tokens and Result of characters values parsing Formal approaches to describing syntax: • Recognizers - used in compilers lexical Syntactic • Generators - what we'll study analyser analyser Formal Grammar Role of Grammars in PLs • A (formal) grammar is a set of rules for • A grammar defines a (programming) strings in a formal language language, at least w.r.t. its syntax • The rules describe how to form strings • It can be used to from the language’s alphabet that are valid – Generate sentences in the language according to the language's syntax – Test whether a string is in the language • A grammar does not describe the meaning – Produce a structured representation of of the strings or what can be done with the string that can be used in them in whatever context — only their form subsequent processing Adapted from Wikipedia Grammars Context-Free Grammars • Developed by Noam Chomsky in the mid-1950s • Language generators, meant to describe the syntax of natural languages • Define a class of languages called context-free languages Backus Normal/Naur Form (1959) • Invented by John Backus to describe Algol 58 and refined by Peter Naur for Algol 60. • BNF is equivalent to context-free grammars •Chomsky & Backus independently came up with equiv- alent formalisms for specifying the syntax of a language •Backus focused on a pracBcal way of specifying an BNF (conCnued) arBficial language, like Algol •Chomsky made fundamental contribuBons to mathe- macal linguisBcs and was moBvated by the study of A metalanguage is a language used to describe human languages another language. In BNF, abstracons are used to represent classes of syntacBc structures -- they act like NOAM CHOMSKY, MIT InsBtute Professor; syntacBc variables (also called nonterminal Professor of LinguisBcs, LinguisBc Theory, Syntax, symbols), e.g. SemanBcs, Philosophy of Language <while_stmt> ::= while <logic_expr> do <stmt> This is a rule; it describes the structure of a while Six parBcipants from the 1960 Algol conference at the 1974 statement ACM conference on the history of programming languages. Top: John McCarthy, Fritz Bauer, Joe Wegstein. Bocom: John Backus, Peter Naur, Alan Perlis. BNF Non-terminals, pre-terminals & terminals • A rule has a leh-hand side (LHS) which is a single • A non-terminal symbol is any symbol in the RHS of a non-terminal symbol and a right-hand side (RHS), rule. They represent abstracBons in the language, e.g. one or more terminal or non-terminal symbols <if-then-else-statement> ::= if <test> • A grammar is a finite, nonempty set of rules then <statement> else <statement> • A non-terminal symbol is “defined” by its rules • A terminal symbol (AKA lexemes) is a symbol that’s not • MulBple rules can be combined with the verBcal-bar in any rule’s LHS. They are literal symbols that will ( | ) symbol (read as “or”) appear in a program (e.g., if, then, else in rule above) • These two rules: <if-then-else-statement> ::= if <test> <stmts> ::= <stmt> then <statement> else <statement> <stmts> ::= <stmnt> ; <stmnts> • A pre-terminal symbol is one that appears as a LHS of are equivalent to this one: rules, but in every case, the RHSs consist of single <stmts> ::= <stmt> | <stmnt> ; <stmnts> terminal symbol, e.g., <digit> in <digit> ::= 0 | 1 | 2 | 3 … 7 | 8 | 9 BNF BNF Example • RepeBBon is done with recursion A simple grammar for a subset of English • E.g., SyntacBc lists are described in BNF A sentence is noun phrase and verb phrase using recursion followed by a period ! • An <ident_list> is a sequence of one or more <sentence> ::= <nounPhrase> <verbPhrase> .! <nounPhrase> ::= <article> <noun>! <ident>s separated by commas. <article> ::= a | the! <noun> ::= man | apple | worm | penguin! <verbPhrase> ::= <verb>|<verb><nounPhrase>! <ident_list> ::= <ident> | <verb> ::= eats | throws | sees | is! <ident> , <ident_list> Derivaons Derivaon using BNF • A derivation is a repeated application of rules, beginning with the start symbol and ending with a sequence of just Here is a derivation for “the man eats the apple.” terminal symbols • It demonstrates, or proves, that the derived sentence is <sentence> -> <nounPhrase> <verbPhrase> . “generated” by the grammar and is thus in the language <article> <noun> <verbPhrase> . that the grammar defines the <noun> <verbPhrase> . • As an example, consider our baby English grammar the man <verbPhrase> . <sentence> ::= <nounPhrase><verbPhrase>.! the man <verb> <nounPhrase> . <nounPhrase> ::= <article><noun>! This is a <article> ::= a | the! leEmost the man eats <nounPhrase> . <noun> ::= man | apple | worm | penguin! derivaon the man eats <article> < noun> . <verbPhrase> ::= <verb> | <verb><nounPhrase>! <verb> ::= eats | throws | sees | is the man eats the <noun> . the man eats the apple . Derivaon Another BNF Example <program> -> <stmts> Note: There is some Every string of symbols in the derivaon is a varia-on in nota-on <stmts> -> <stmt> for BNF grammars. senten-al form (e.g., the man eats <nounPhrase> .) | <stmt> ; <stmts> Here we are using -> in the rules instead <stmt> -> <var> = <expr> of ::= . A sentence is a sentenBal form that has only <var> -> a | b | c | d terminal symbols (e.g., the man eats the apple .) <expr> -> <term> + <term> | <term> - <term> <term> -> <var> | const A leEmost deriva-on is one in which the leh- most nonterminal in each sentenBal form is the one that is expanded in the next step A derivaon may be either lehmost or rightmost or something else Another BNF Example Finite and Infinite languages <program> -> <stmts> Note: There is some varia-on in nota-on <stmts> -> <stmt> for BNF grammars. • A simple language may have a finite number | <stmt> ; <stmts> Here we are using -> in the rules instead of sentences <stmt> -> <var> = <expr> of ::= . <var> -> a | b | c | d • The set of strings represenBng integers <expr> -> <term> + <term> | <term> - <term> between -10**6 and +10**6 is a finite <term> -> <var> | const language Here is a derivaon: • A finite language can be defined by enumer- <program> => <stmts> => <stmt> ang the sentences, but using a grammar is => <var> = <expr> usually easier (e.g., small integers) => a = <expr> => a = <term> + <term> • Most interesBng languages have an infinite => a = <var> + <term> number of sentences => a = b + <term> => a = b + const! Is English a finite or infinite language? Rules with epsilons • Assume we have a finite set of words • An epsilon (ε) symbol expands to ‘nothing’ • Consider adding rules like the following to the • It is used as a compact way to make previous example something optional <sentence> ::= <sentence><conj><sentence>.! • For example: <conj> ::= and | or | because NP -> DET NP PP • Hint: When you see recursion in a BNF, the PP -> PREP NP | ε language is probably infinite PPEP -> in | on | of | under –When might it not be? • You can always rewrite a grammer with ε –The recursive rule might not be reachable. symbols to an equivalent one without There might be epsilons. Parse Tree Another Parse Tree A parse tree is a hierarchical representation of a derivation <program>! ! ! <sentence> <stmts>! ! <nounPhrase> <verbPhrase> ! <stmt>! ! <article> <noun> <verb> <nounPhrase> ! <var> = <expr>! ! the man eats ! <article> <noun> a <term> + <term>! ! the apple ! <var> const! ! ! b ! Ambiguous Grammar Ambiguous English Sentences • A grammar is ambiguous if and only if • (iff) it generates a sentenBal form that I saw the man on the hill with a has two or more disBnct parse trees telescope • Ambiguous grammars are, in general, • Time flies like an arrow very undesirable in formal languages • Fruit flies like a banana

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    20 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us