Specifying Syntax Components of a Grammar 1. Terminal symbols or terminals, Σ Language Specification 2. Nonterminal symbols or syntactic Syntax categories, N Form of phrases Vocabulary = Σ ∪ N Physical arrangement of symbols
Semantics 3. Productions or rules, α ::= β, Meaning of phrases where α,β∈ (Σ ∪ N)* Result of executing a program 4 .Start Symbol (a nonterminal) Pragmatics Usage of the language Features of implementations
Chapter 1 1 Chapter 1 2
Types of Grammars An English Grammar
Type 0: Unrestricted grammars
Chapter 1 3 Chapter 1 4 A Derivation (for Figure 1.4) Concrete Syntax for Wren
Chapter 1 5 Chapter 1 6
Lexical syntax:
Chapter 1 7 Chapter 1 8 Sample Wren Program Ambiguity program prime is var num,divisor : integer; One phrase has two different derivation trees, thus two different syntactic structures. var done : boolean; begin Ambiguous structure may lead to ambiguous read num; semantics. while num>0 do Dangling “else” divisor := 2; done := false; Associativity of expressions (ex 2, page 16) while divisor<= num/2 and not(done) do Ada: done := num = divisor*(num/divisor); Is “a(2)” a subscripted variable divisor := divisor+1 or a function call? end while; if done then write 0 Classifying Errors in Wren else write num Context-free syntax end if; — specified by concrete syntax read num ¥ lexical syntax end while ¥ phrase-structure syntax end Context-sensitive syntax (context constraints) ¥ also called “static semantics”
Semantic or dynamic errors
Chapter 1 9 Chapter 1 10
Context Constraints in Wren Syntax of Wren 1. The program name identifier may not be declared elsewhere in the program.
2. All identifiers that appear in a block must be
3. No identifier may be declared more than once Syntactically well formed in a block. Wren programs derived from
5. An identifier occurring as an (integer) Semantic Errors in Wren element must be an integer variable. 1. An attempt is made to divide by zero. 6. An identifier occurring as a Boolean element 2. A variable that has not been initialized is must be a Boolean variable. accessed. 3. A read command is executed when input file 7. An identifier occurring in a read command empty. must be an integer variable. 4. An iteration command (while) does not terminate.
Chapter 1 11 Chapter 1 12 Regular Expressions Denote Languages Nonessential Ambiguity ∅ Command Sequences: ε
May omit some parentheses following the
E1 E2 = E1 ¥ E2
Chapter 1 13 Chapter 1 14
Concrete Syntax (BNF) read m; if m=-9 then m:=0 end if; write 22*m Derivation Tree ¥ Specifies the physical form of programs.
read
num(9)
Chapter 1 15 Chapter 1 16 Abstract Syntax read m; if m=-9 then m:=0 end if; write 22*m ¥ Representation of the structure of language constructs as determined by the parser. Abstract Syntax Tree:
commands ¥ The hierarchical structure of language constructs can be represented as abstract syntax trees. read if write
var(m) times equal ¥ The goal of abstract syntax is to reduce commands num(22) var(m) language constructs to their “essential” assign var(m) minus properties. num(9) var(m) num(0) ¥ Specifying the abstract syntax of a language means providing the possible templates for the OR various constructs of the language. if write
expr ¥ Abstract syntax generally allows ambiguity expr since parsing has already been done. timesvar(m) num(22) equal var(m) minus
¥ Semantic specifications based on abstract num(9) syntax are much simpler.
Chapter 1 17 Chapter 1 18
Assume we only have to deal with programs that Designing Abstract Syntax are syntactically correct. What basic forms can an expression take? Consider concrete syntax for expressions:
Chapter 1 19 Chapter 1 20 Abstract Syntax Abstract Syntactic Categories Alternative Abstract Syntax Program Type Operator Block Command Numeral Abstract Production Rules Declaration Expression Identifier Program ::= prog (Identifier, Block) Abstract Production Rules Block ::= block (Declaration*, Command+) Program ::= program Identifier is Block Declaration ::= dec (Identifier+, Type) Block ::= Declaration* begin Command+ end Type ::= integer | boolean Declaration ::= var Identifier+ : Type ; Command ::= assign (Identifier, Expression) Type ::= integer | boolean | skip | read (Identifier) | write (Expression) Command ::= Identifier := Expression | skip | while (Expression, Command+) | read Identifier | write Expression | if (Expression, Command+) + | while Expression do Command | ifelse (Expression, Command+, Command+) | if Expression then Command+ | if Expression then Command+ else Command+ Expression ::= Numeral | Identifier | true | false | not (Expression) | minus (Expression) Expression ::= Numeral | Identifier | true | false | exp (Operator, Expression, Expression) | Expression Operator Expression | Ð Expression | not( Expression) Operator ::= + | Ð | * | / | or | and | <= | < | = | > | >= | <> Operator ::= + | Ð | * | / | or | and | <= | < | = | > | >= | <>
Chapter 1 21 Chapter 1 22
Example read m; if m=-9 then m:=0 end if; write 22*m
[ read (m), if (exp (equal, ide(m),minus (num(9))), [ assign (m, num(0)) ]), write (exp (times, num(22), ide(m))) ]
Language Processing scan : Character* → Token* parse : Token* → Abstract Syntax Tree Are either of these functions one-to-one? Explain.
Chapter 1 23