<<

Specifying Syntax Components of a Grammar 1. Terminal symbols or terminals, Σ Language Specification 2. Nonterminal symbols or syntactic Syntax categories, N Form of phrases Vocabulary = Σ ∪ N Physical arrangement of symbols

Semantics 3. Productions or rules, α ::= β, Meaning of phrases where α,β∈ (Σ ∪ N)* Result of executing a program 4 .Start Symbol (a nonterminal) Pragmatics Usage of the language Features of implementations

Chapter 1 1 Chapter 1 2

Types of Grammars An English Grammar

Type 0: Unrestricted grammars ::= . α ::= β α ::= where contains a nonterminal |

::= | 1: Context-sensitive grammars | αBγ ::= αβγ ::= Type 2: Context-free grammars ::= boy | girl | | telescope A ::= string of vocabulary symbols | song | feather

::= a | the Type 3: Regular grammars A ::= a or A ::= aB ::= saw | touched | surprised | sang

::= by | with

Chapter 1 3 Chapter 1 4 A Derivation (for Figure 1.4) Concrete Syntax for Wren . ⇒ . Phrase-structure syntax: ⇒ the . ⇒ the girl . ::= program is ⇒ the girl . ::= begin end ⇒ the girl touched . ε ⇒ the girl touched ::= | . ⇒ the girl touched the . ::= var : ; ⇒ the girl touched the cat . ::= integer | boolean ⇒ the girl touched the cat ::= | , . ::= | ; ⇒ the girl touched the cat with . ::= := | skip ⇒ the girl touched the cat with . | read | ⇒ the girl touched the cat with a . | while do end while ⇒ the girl touched the cat with a feather . | if then end if | if then Note example of a context-sensitive grammar. else end if

Chapter 1 5 Chapter 1 6

Lexical syntax: ::= | ::= <= | < | = | > | >= | <> ::= ::= + | Ð ::= * | / | ::= | ::= | | ::= a | b | c | d | e | f | g | h | i | j | k | l | m ::= | | n | o | p | q | r | s | t | u | v | | x | y | z | ( ) | Ð ::= | ::= | or ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ::= | and ::= true | false | Tokens: | | not ( ) | ( ) ::= | ::= | | | | ::= | := | ( | ) | , | ; | :

::= program | is | begin | end | var | integer | boolean | read | write | skip | while | do | if | then | else | and | or | true | false | not

Chapter 1 7 Chapter 1 8 Sample Wren Program Ambiguity program prime is var num,divisor : integer; One phrase has two different derivation trees, thus two different syntactic structures. var done : boolean; begin Ambiguous structure may lead to ambiguous read num; semantics. while num>0 do Dangling “else” divisor := 2; done := false; Associativity of expressions ( 2, page 16) while divisor<= num/2 and not(done) do Ada: done := num = divisor*(num/divisor); Is “a(2)” a subscripted variable divisor := divisor+1 or a function call? end while; if done then write 0 Classifying Errors in Wren else write num Context-free syntax end if; — specified by concrete syntax read num ¥ lexical syntax end while ¥ phrase-structure syntax end Context-sensitive syntax (context constraints) ¥ also called “static semantics”

Semantic or dynamic errors

Chapter 1 9 Chapter 1 10

Context Constraints in Wren Syntax of Wren 1. The program name identifier may not be declared elsewhere in the program. * — abitrary of tokens

2. All identifiers that appear in a block must be — strings derived from declared in that block. the context-free grammar

3. No identifier may be declared than once Syntactically well formed in a block. Wren programs derived from and also satisfying the context constraints 4. The identifier on the left side of an assignment must be declared as a variable, and the expression on the right must be of the same type.

5. An identifier occurring as an (integer) Semantic Errors in Wren element must be an integer variable. 1. An attempt is made to divide by zero. 6. An identifier occurring as a Boolean element 2. A variable that has not been initialized is must be a Boolean variable. accessed. 3. A read command is executed when input 7. An identifier occurring in a read command empty. must be an integer variable. 4. An iteration command (while) does not terminate.

Chapter 1 11 Chapter 1 12 Regular Expressions Denote Languages Nonessential Ambiguity ∅ Command Sequences: ε ::= a for each a∈Σ | ; (E | F) union (E ¥ F) concatenation ::= | ; (E*) Kleene closure

May omit some parentheses following the ::= ( ; )* precedence rules Or include command sequences with Abbreviations commands:

E1 E2 = E1 ¥ E2 ::= ; | skip | read | … E+ = E ¥ E* En = E • E • … • E, repeated n times E? = ε | E Language Processing: scan : Character* → Token* → Note: E* = ε | E1 | E2 | E3 | … parse : Token* ????

Chapter 1 13 Chapter 1 14

Concrete Syntax (BNF) read m; if m=-9 then m:=0 end if; write 22*m Derivation ¥ Specifies the physical form of programs.

¥ Guides the parser in determining the phrase structure of a program. ;

read ¥ Should not be ambiguous. ; var(m) ¥ A derivation proceeds by replacing one if then end if nonterminal a by its corresponding write right-hand side in some production for that nonterminal. := ¥ A derivation following a BNF definition of a var(m) language can be summarized in a derivation * var(m) (parse) tree. num(22) ¥ Although a parser may not produce a = derivation tree, the structure of the tree is num(0) embodied in the parsing process. var(m) -

num(9)

Chapter 1 15 Chapter 1 16 Abstract Syntax read m; if m=-9 then m:=0 end if; write 22*m ¥ Representation of the structure of language constructs as determined by the parser. Abstract Syntax Tree:

commands ¥ The hierarchical structure of language constructs can be represented as abstract syntax trees. read if write

var(m) times equal ¥ The goal of abstract syntax is to reduce commands num(22) var(m) language constructs to their “essential” assign var(m) minus properties. num(9) var(m) num(0) ¥ Specifying the abstract syntax of a language means providing the possible templates for the OR various constructs of the language. if write

expr ¥ Abstract syntax generally allows ambiguity expr since parsing has already been done. timesvar(m) num(22) equal var(m) minus

¥ Semantic specifications based on abstract num(9) syntax are much simpler.

Chapter 1 17 Chapter 1 18

Assume we only have to deal with programs that Designing Abstract Syntax are syntactically correct. What basic forms can an expression take? Consider concrete syntax for expressions: ::= | ::= | Ð ::= | or ::= | and | ( ) | Ð true false ::= not ( ) | or ::= | and Abstract Syntax: ::= true | false | Expression ::= Numeral | Identifier | true | false | | ( ) | Expression Operator Expression | not ( ) | Ð Expression | not(Expression) ::= Operator ::= + | Ð | * | / | or | and | <= | < | = | > | >= | <>

Chapter 1 19 Chapter 1 20 Abstract Syntax Abstract Syntactic Categories Alternative Abstract Syntax Program Type Operator Block Command Numeral Abstract Production Rules Declaration Expression Identifier Program ::= prog (Identifier, Block) Abstract Production Rules Block ::= block (Declaration*, Command+) Program ::= program Identifier is Block Declaration ::= dec (Identifier+, Type) Block ::= Declaration* begin Command+ end Type ::= integer | boolean Declaration ::= var Identifier+ : Type ; Command ::= assign (Identifier, Expression) Type ::= integer | boolean | skip | read (Identifier) | write (Expression) Command ::= Identifier := Expression | skip | while (Expression, Command+) | read Identifier | write Expression | if (Expression, Command+) + | while Expression do Command | ifelse (Expression, Command+, Command+) | if Expression then Command+ | if Expression then Command+ else Command+ Expression ::= Numeral | Identifier | true | false | not (Expression) | minus (Expression) Expression ::= Numeral | Identifier | true | false | exp (Operator, Expression, Expression) | Expression Operator Expression | Ð Expression | not( Expression) Operator ::= + | Ð | * | / | or | and | <= | < | = | > | >= | <> Operator ::= + | Ð | * | / | or | and | <= | < | = | > | >= | <>

Chapter 1 21 Chapter 1 22

Example read m; if m=-9 then m:=0 end if; write 22*m

[ read (m), if (exp (equal, ide(m),minus (num(9))), [ assign (m, num(0)) ]), write (exp (times, num(22), ide(m))) ]

Language Processing scan : Character* → Token* parse : Token* → Abstract Syntax Tree Are either of these functions one-to-one? Explain.

Chapter 1 23