Introduction History
• What is YACC ? • Yacc original written by Stephen C. Johnson, 1975. – Tool which will produce a parser for a given grammar. •Variants: – – YACC (Yet Another Compiler Compiler) is a lex, yacc (AT&T) program designed to compile a LALR(1) – bison: a yacc replacement (GNU) grammar and to produce the source code of the – flex: fast lexical analyzer (GNU) syntactic analyzer of a language produced by this – BSD yacc grammar. – PCLEX, PCYACC (Abraxas Software)
How YACC Works An YACC File Example
y.tab.h %{ #include
(1) Parse statement: NAME '=' expression | expression { printf("= %d\n", $1); } ; a.out expression: expression '+' NUMBER { $$ = $1 + $3; } y.tab.c cc / gcc | expression '-' NUMBER { $$ = $1 - $3; } | NUMBER { $$ = $1; } ; %% (2) Compile int yyerror(char *s) { fprintf(stderr, "%s\n", s); Abstract return 0; } Token stream a.out Syntax int main(void) Tree { yyparse(); return 0; (3) Run }
YACC File Format Definitions Section
%{ C declarations %{ %} #include
1 Start Symbol Rules Section
• The first non-terminal specified in the grammar • Is a grammar specification section. • Example • To overwrite it with %start declaraction. %start non-terminal expr : expr '+' term | term; term : term '*' factor | factor; factor : '(' expr ')' | ID | NUM;
Rules Section The Position of Rules
• Normally written like this expr : expr '+' term { $$ = $1 + $3; } • Example: | term { $$ = $1; } expr : expr '+' term ; | term term : term '*' factor { $$ = $1 * $3; } ; | factor { $$ = $1; } term : term '*' factor ; | factor factor : '(' expr ')' { $$ = $2; } ; | ID factor : '(' expr ')' | NUM | ID ; | NUM ;
Works with LEX Communication between LEX and YACC
[0-9]+ yacc -d gram.y call yylex() • Use enumeration / define Will produce: • YACC creates y.tab.h • LEX includes y.tab.h y.tab.h
next token is NUM 12 + 26
NUM ‘+’ NUM
LEX and YACC need a way to identify tokens
2 Communication between LEX and YACC YACC
%{ scanner.l yacc -d xxx.y #include
Passing value of token Passing value of token
• Every terminal-token (symbol) may represent a value or data type • Yacc allows symbols to have multiple types of value symbols – May be a numeric quantity in case of a number (42) %union { – May be a pointer to a string ("Hello, World!") double dval; int vblno; • When using lex, we put the value into yylval char* strval; – In complex situations yylval is a union } • Typical lex code: [0-9]+ {yylval = atoi(yytext); return NUM}
Passing value of token Yacc Example
%union { • Taken from Lex & Yacc double dval; yacc -d • Example: Simple calculator int vblno; y.tab.h a = 4 + 6 char* strval; … a } extern YYSTYPE yylval; a=10 b = 7 c = a + b [0-9]+ { yylval.vblno = atoi(yytext); c return NUM;} c = 17 [A-z]+ { yylval.strval = strdup(yytext); $ return STRING;} Lex file include “y.tab.h”
3 Grammar Symbol Table
expression ::= expression '+' term | 0 expression '-' term | name value 1 name value term #define NSYMS 20 /* maximum number 2 name value of symbols */ 3 name value term ::= term '*' factor | 4 name value term '/' factor | struct symtab { 5 name value factor char *name; 6 name value double value; 7 name value factor ::= '(' expression ')' | } symtab[NSYMS]; 8 name value '-' factor | 9 name value NUMBER | struct symtab *symlook(); 10 name value NAME
parser.h
Parser Parser (cont’d)
%{ statement_list: statement '\n' #include "parser.h" Terminal NAME and
Parser (cont’d) Scanner term: term '*' factor { $$ = $1 * $3; } %{ | term '/' factor { if ($3 == 0.0) #include "y.tab.h" yyerror("divide by zero"); #include "parser.h" else #include
4 Scanner (cont’d) Precedence / Association
[A-Za-z][A-Za-z0-9]* { /* return symbol pointer */ yylval.symp = symlook(yytext); (1) 1 - 2 - 3 return NAME; } (2) 1 - 2 * 3 "$" { return 0; /* end of input */ }
\n |”=“|”+”|”-”|”*”|”/” return yytext[0]; 1. 1-2-3 = (1-2)-3? or 1-(2-3)? %% 2. 1-2*3 = 1-(2*3) or (1-2)*3?
Yacc: Shift/Reduce conflicts. Default is to shift.
scanner.l
Precedence / Association Precedence / Association
%left '+' '-' %right ‘=‘ %left '*' '/' %noassoc UMINUS %left '<' '>' NE LE GE expr : expr ‘+’ expr { $$ = $1 + $3; } %left '+' '-‘ | expr ‘-’ expr { $$ = $1 - $3; } %left '*' '/' | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr highest precedence { if($3==0) yyerror(“divide 0”); else $$ = $1 / $3; } | ‘-’ expr %prec UMINUS {$$ = -$2; }
IF-ELSE Ambiguity IF-ELSE Ambiguity
• Consider following rule: • It is a shift/reduce conflict. stmt : IF expr stmt • Yacc will always choose to shift. | IF expr stmt ELSE stmt …… • A solution:
stmt : matched | unmatched Following state ? ; matched: other_stmt | IF expr THEN matched ELSE matched IF expr IF expr stmt ELSE stmt ; unmatched: IF expr THEN stmt | IF expr THEN matched ELSE unmatched ;
5 Shift/Reduce Conflicts Reduce/Reduce Conflicts
• Reduce/Reduce Conflicts: • shift/reduce conflict start : expr | stmt ; – occurs when a grammar is written in such a way expr : CONSTANT; that a decision between shifting and reducing can stmt : CONSTANT; not be made. • Yacc resolves the conflict by reducing using the rule that occurs – ex: IF-ELSE ambigious. earlier in the grammar. NOT GOOD!! • To resolve this conflict, yacc will choose to • So, modify grammar to eliminate them. shift.
Error Messages Debug Your Parser
1. Use –t option or define YYDEBUG to 1. • Bad error message: 2. Set variable yydebug to 1 when you want to – Syntax error. trace parsing status. – Compiler needs to give programmer a good 3. If you want to trace the semantic values advice. z Define your YYPRINT function • It is better to track the line number in lex:
void yyerror(char *s) { fprintf(stderr, "line %d: %s\n:", yylineno, s); }
Shift and Reducing: Example Recursive Grammar
• Left recursion stmt: stmt ‘;’ stmt list: stack: item | NAME ‘=‘ exp | list ',' item
6 YACC Declaration Summary YACC Declaration Summary
`%start' `%right' Specify the grammar's start symbol Declare a terminal symbol (token type name) that is right-associative `%union' Declare the collection of data types that semantic values may have `%left' Declare a terminal symbol (token type name) that is left-associative `%token' Declare a terminal symbol (token type name) with no precedence or `%nonassoc' associativity specified Declare a terminal symbol (token type name) that is nonassociative (using it in a way that would be associative is a syntax error, `%type' ex: x op. y op. z is syntax error) Declare the type of semantic values for a nonterminal symbol
7