<<

Introduction

• What is ? • Yacc original written by Stephen . Johnson, 1975. – Tool will produce a parser for a given grammar. •Variants: – – YACC ( Compiler) is a , yacc (&T) program designed to compile a LALR(1) – bison: a yacc replacement (GNU) grammar and to produce the of the – flex: fast lexical analyzer (GNU) syntactic analyzer of a language produced by this – BSD yacc grammar. – PCLEX, PCYACC (Abraxas Software)

How YACC Works An YACC Example

y.tab.h %{ #include YACC source (foo.y) yacc y.tab.c %} y.output %token NAME NUMBER %%

(1) Parse statement: NAME '=' expression | expression { ("= %d\n", $1); } ; a.out expression: expression '+' NUMBER { $$ = $1 + $3; } y.tab.c cc / gcc | expression '-' NUMBER { $$ = $1 - $3; } | NUMBER { $$ = $1; } ; %% (2) Compile int yyerror(char *s) { fprintf(stderr, "%s\n", s); Abstract return 0; } Token stream a.out Syntax int main(void) { yyparse(); return 0; (3) Run }

YACC File Format Definitions Section

%{ C declarations %{ %} #include yacc declarations #include %% Grammar rules %} It is a terminal %% %token ID NUM Additional C code %start start from expr – Comments in /* ... */ may appear in any of the sections.

1 Start Symbol Rules Section

• The first non-terminal specified in the grammar • Is a grammar specification section. • Example • To overwrite it with %start declaraction. %start non-terminal expr : expr '+' term | term; term : term '*' | factor; factor : '(' expr ')' | ID | NUM;

Rules Section The Position of Rules

• Normally written like this expr : expr '+' term { $$ = $1 + $3; } • Example: | term { $$ = $1; } expr : expr '+' term ; | term term : term '*' factor { $$ = $1 * $3; } ; | factor { $$ = $1; } term : term '*' factor ; | factor factor : '(' expr ')' { $$ = $2; } ; | ID factor : '(' expr ')' | NUM | ID ; | NUM ;

Works with LEX Communication between LEX and YACC

[0-9]+ yacc -d gram.y call yylex() • Use enumeration / define Will produce: • YACC creates y.tab.h • LEX includes y.tab.h y.tab.h

next token is NUM 12 + 26

NUM ‘+’ NUM

LEX and YACC need a way to identify tokens

2 Communication between LEX and YACC YACC

%{ scanner.l yacc -d xxx.y #include produces #include "y.tab.h" • Rules may be recursive y.tab.h %} • Rules may be ambiguous* id [_a-zA-Z][_a-zA-Z0-9]* %% # define CHAR 258 • Uses bottom up Shift/Reduce int { return INT; } # define FLOAT 259 – Get a token char { return CHAR; } # define ID 260 float { return FLOAT; } – Push onto stack # define INT 261 {id} { return ID;} – Can it reduced ? • : Reduce using a rule %{ parser.y #include • no: Get another token #include • Yacc cannot look ahead than one token %} %token CHAR, FLOAT, ID, INT %%

Passing value of token Passing value of token

• Every terminal-token (symbol) may represent a value or data • Yacc allows symbols to have multiple types of value symbols – May be a numeric quantity in case of a number (42) %union { – May be a pointer to a string ("Hello, World!") double dval; int vblno; • When using lex, we put the value into yylval char* strval; – In complex situations yylval is a union } • Typical lex code: [0-9]+ {yylval = atoi(yytext); return NUM}

Passing value of token Yacc Example

%union { • Taken from Lex & Yacc double dval; yacc -d • Example: Simple calculator int vblno; y.tab.h a = 4 + 6 char* strval; … a } extern YYSTYPE yylval; a=10 = 7 c = a + b [0-9]+ { yylval.vblno = atoi(yytext); c return NUM;} c = 17 [A-z]+ { yylval.strval = strdup(yytext); $ return STRING;} Lex file include “y.tab.h”

3 Grammar Symbol Table

expression ::= expression '+' term | 0 expression '-' term | name value 1 name value term #define NSYMS 20 /* maximum number 2 name value of symbols */ 3 name value term ::= term '*' factor | 4 name value term '/' factor | struct symtab { 5 name value factor char *name; 6 name value double value; 7 name value factor ::= '(' expression ')' | } symtab[NSYMS]; 8 name value '-' factor | 9 name value NUMBER | struct symtab *symlook(); 10 name value NAME

parser.h

Parser Parser (cont’d)

%{ statement_list: statement '\n' #include "parser.h" Terminal NAME and have | statement_list statement '\n' #include the same data type. ; %} Nonterminal expression and statement: NAME '=' expression { $1->value = $3; } %union { | expression { printf("= %g\n", $1); } double dval; have the same data type. ; struct symtab *symp; } expression: expression '+' term { $$ = $1 + $3; } %token NAME | expression '-' term { $$ = $1 - $3; } %token NUMBER | term ; %type expression %type term %type factor %% parser.y parser.y

Parser (cont’d) Scanner term: term '*' factor { $$ = $1 * $3; } %{ | term '/' factor { if ($3 == 0.0) #include "y.tab.h" yyerror("divide by zero"); #include "parser.h" else #include $$ = $1 / $3; %} } %% | factor ; ([0-9]+|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?) { yylval.dval = atof(yytext); factor: '(' expression ')' { $$ = $2; } return NUMBER; | '-' factor { $$ = -$2; } } | NUMBER { $$ = $1; } | NAME { $$ = $1->value; } [ \t] ; /* ignore white space */ ; %% parser.y scanner.l

4 Scanner (cont’d) Precedence / Association

[A-Za-z][A-Za-z0-9]* { /* return symbol pointer */ yylval.symp = symlook(yytext); (1) 1 - 2 - 3 return NAME; } (2) 1 - 2 * 3 "$" { return 0; /* end of input */ }

\n |”=“|”+”|”-”|”*”|”/” return yytext[0]; 1. 1-2-3 = (1-2)-3? or 1-(2-3)? %% 2. 1-2*3 = 1-(2*3) or (1-2)*3?

Yacc: Shift/Reduce conflicts. Default is to shift.

scanner.l

Precedence / Association Precedence / Association

%left '+' '-' %right ‘=‘ %left '*' '/' %noassoc UMINUS %left '<' '>' NE LE GE expr : expr ‘+’ expr { $$ = $1 + $3; } %left '+' '-‘ | expr ‘-’ expr { $$ = $1 - $3; } %left '*' '/' | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr highest precedence { if($3==0) yyerror(“divide 0”); else $$ = $1 / $3; } | ‘-’ expr %prec UMINUS {$$ = -$2; }

IF-ELSE Ambiguity IF-ELSE Ambiguity

• Consider following rule: • It is a shift/reduce conflict. stmt : IF expr stmt • Yacc will always choose to shift. | IF expr stmt ELSE stmt …… • A solution:

stmt : matched | unmatched Following state ? ; matched: other_stmt | IF expr THEN matched ELSE matched IF expr IF expr stmt ELSE stmt ; unmatched: IF expr THEN stmt | IF expr THEN matched ELSE unmatched ;

5 Shift/Reduce Conflicts Reduce/Reduce Conflicts

• Reduce/Reduce Conflicts: • shift/reduce conflict start : expr | stmt ; – occurs when a grammar is written in such a way expr : CONSTANT; that a decision between shifting and reducing can stmt : CONSTANT; not be made. • Yacc resolves the conflict by reducing using the rule that occurs – : IF-ELSE ambigious. earlier in the grammar. NOT GOOD!! • To resolve this conflict, yacc will choose to • So, modify grammar to eliminate them. shift.

Error Messages Debug Your Parser

1. Use –t option or define YYDEBUG to 1. • Bad error message: 2. Set variable yydebug to 1 when you want to – Syntax error. trace parsing status. – Compiler needs to give a good 3. If you want to trace the semantic values advice. z Define your YYPRINT function • It is better to track the line number in lex:

void yyerror(char *s) { fprintf(stderr, "line %d: %s\n:", yylineno, s); }

Shift and Reducing: Example Recursive Grammar

• Left recursion stmt: stmt ‘;’ stmt list: stack: item | NAME ‘=‘ exp | list ',' item ; • Right recursion exp: exp ‘+’ exp list: item | item ',' list | exp ‘-’ exp input: ; a = 7; b = 3 + a + 2 | NAME • LR parser (e.g. yacc) prefers left recursion. | NUMBER • LL parser prefers right recursion.

6 YACC Declaration Summary YACC Declaration Summary

`%start' `%right' Specify the grammar's start symbol Declare a terminal symbol (token type name) that is right-associative `%union' Declare the collection of data types that semantic values may have `%left' Declare a terminal symbol (token type name) that is left-associative `%token' Declare a terminal symbol (token type name) with no precedence or `%nonassoc' associativity specified Declare a terminal symbol (token type name) that is nonassociative (using it in a way that would be associative is a syntax error, `%type' ex: x op. y op. z is syntax error) Declare the type of semantic values for a nonterminal symbol

7