
Introduction History • What is YACC ? • Yacc original written by Stephen C. Johnson, 1975. – Tool which will produce a parser for a given grammar. •Variants: – – YACC (Yet Another Compiler Compiler) is a lex, yacc (AT&T) program designed to compile a LALR(1) – bison: a yacc replacement (GNU) grammar and to produce the source code of the – flex: fast lexical analyzer (GNU) syntactic analyzer of a language produced by this – BSD yacc grammar. – PCLEX, PCYACC (Abraxas Software) How YACC Works An YACC File Example y.tab.h %{ #include <stdio.h> YACC source (foo.y) yacc y.tab.c %} y.output %token NAME NUMBER %% (1) Parse statement: NAME '=' expression | expression { printf("= %d\n", $1); } ; a.out expression: expression '+' NUMBER { $$ = $1 + $3; } y.tab.c cc / gcc | expression '-' NUMBER { $$ = $1 - $3; } | NUMBER { $$ = $1; } ; %% (2) Compile int yyerror(char *s) { fprintf(stderr, "%s\n", s); Abstract return 0; } Token stream a.out Syntax int main(void) Tree { yyparse(); return 0; (3) Run } YACC File Format Definitions Section %{ C declarations %{ %} #include <stdio.h> yacc declarations #include <stdlib.h> %% Grammar rules %} It is a terminal %% %token ID NUM Additional C code %start expr start from expr – Comments in /* ... */ may appear in any of the sections. 1 Start Symbol Rules Section • The first non-terminal specified in the grammar • Is a grammar specification section. • Example • To overwrite it with %start declaraction. %start non-terminal expr : expr '+' term | term; term : term '*' factor | factor; factor : '(' expr ')' | ID | NUM; Rules Section The Position of Rules • Normally written like this expr : expr '+' term { $$ = $1 + $3; } • Example: | term { $$ = $1; } expr : expr '+' term ; | term term : term '*' factor { $$ = $1 * $3; } ; | factor { $$ = $1; } term : term '*' factor ; | factor factor : '(' expr ')' { $$ = $2; } ; | ID factor : '(' expr ')' | NUM | ID ; | NUM ; Works with LEX Communication between LEX and YACC [0-9]+ yacc -d gram.y call yylex() • Use enumeration / define Will produce: • YACC creates y.tab.h • LEX includes y.tab.h y.tab.h next token is NUM 12 + 26 NUM ‘+’ NUM LEX and YACC need a way to identify tokens 2 Communication between LEX and YACC YACC %{ scanner.l yacc -d xxx.y #include <stdio.h> produces #include "y.tab.h" • Rules may be recursive y.tab.h %} • Rules may be ambiguous* id [_a-zA-Z][_a-zA-Z0-9]* %% # define CHAR 258 • Uses bottom up Shift/Reduce parsing int { return INT; } # define FLOAT 259 – Get a token char { return CHAR; } # define ID 260 float { return FLOAT; } – Push onto stack # define INT 261 {id} { return ID;} – Can it reduced ? • yes: Reduce using a rule %{ parser.y #include <stdio.h> • no: Get another token #include <stdlib.h> • Yacc cannot look ahead more than one token %} %token CHAR, FLOAT, ID, INT %% Passing value of token Passing value of token • Every terminal-token (symbol) may represent a value or data type • Yacc allows symbols to have multiple types of value symbols – May be a numeric quantity in case of a number (42) %union { – May be a pointer to a string ("Hello, World!") double dval; int vblno; • When using lex, we put the value into yylval char* strval; – In complex situations yylval is a union } • Typical lex code: [0-9]+ {yylval = atoi(yytext); return NUM} Passing value of token Yacc Example %union { • Taken from Lex & Yacc double dval; yacc -d • Example: Simple calculator int vblno; y.tab.h a = 4 + 6 char* strval; … a } extern YYSTYPE yylval; a=10 b = 7 c = a + b [0-9]+ { yylval.vblno = atoi(yytext); c return NUM;} c = 17 [A-z]+ { yylval.strval = strdup(yytext); $ return STRING;} Lex file include “y.tab.h” 3 Grammar Symbol Table expression ::= expression '+' term | 0 expression '-' term | name value 1 name value term #define NSYMS 20 /* maximum number 2 name value of symbols */ 3 name value term ::= term '*' factor | 4 name value term '/' factor | struct symtab { 5 name value factor char *name; 6 name value double value; 7 name value factor ::= '(' expression ')' | } symtab[NSYMS]; 8 name value '-' factor | 9 name value NUMBER | struct symtab *symlook(); 10 name value NAME parser.h Parser Parser (cont’d) %{ statement_list: statement '\n' #include "parser.h" Terminal NAME and <symp>have | statement_list statement '\n' #include <string.h> the same data type. ; %} Nonterminal expression and statement: NAME '=' expression { $1->value = $3; } %union { | expression { printf("= %g\n", $1); } double dval; <dval> have the same data type. ; struct symtab *symp; } expression: expression '+' term { $$ = $1 + $3; } %token <symp> NAME | expression '-' term { $$ = $1 - $3; } %token <dval> NUMBER | term ; %type <dval> expression %type <dval> term %type <dval> factor %% parser.y parser.y Parser (cont’d) Scanner term: term '*' factor { $$ = $1 * $3; } %{ | term '/' factor { if ($3 == 0.0) #include "y.tab.h" yyerror("divide by zero"); #include "parser.h" else #include <math.h> $$ = $1 / $3; %} } %% | factor ; ([0-9]+|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?) { yylval.dval = atof(yytext); factor: '(' expression ')' { $$ = $2; } return NUMBER; | '-' factor { $$ = -$2; } } | NUMBER { $$ = $1; } | NAME { $$ = $1->value; } [ \t] ; /* ignore white space */ ; %% parser.y scanner.l 4 Scanner (cont’d) Precedence / Association [A-Za-z][A-Za-z0-9]* { /* return symbol pointer */ yylval.symp = symlook(yytext); (1) 1 - 2 - 3 return NAME; } (2) 1 - 2 * 3 "$" { return 0; /* end of input */ } \n |”=“|”+”|”-”|”*”|”/” return yytext[0]; 1. 1-2-3 = (1-2)-3? or 1-(2-3)? %% 2. 1-2*3 = 1-(2*3) or (1-2)*3? Yacc: Shift/Reduce conflicts. Default is to shift. scanner.l Precedence / Association Precedence / Association %left '+' '-' %right ‘=‘ %left '*' '/' %noassoc UMINUS %left '<' '>' NE LE GE expr : expr ‘+’ expr { $$ = $1 + $3; } %left '+' '-‘ | expr ‘-’ expr { $$ = $1 - $3; } %left '*' '/' | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr highest precedence { if($3==0) yyerror(“divide 0”); else $$ = $1 / $3; } | ‘-’ expr %prec UMINUS {$$ = -$2; } IF-ELSE Ambiguity IF-ELSE Ambiguity • Consider following rule: • It is a shift/reduce conflict. stmt : IF expr stmt • Yacc will always choose to shift. | IF expr stmt ELSE stmt …… • A solution: stmt : matched | unmatched Following state ? ; matched: other_stmt | IF expr THEN matched ELSE matched IF expr IF expr stmt ELSE stmt ; unmatched: IF expr THEN stmt | IF expr THEN matched ELSE unmatched ; 5 Shift/Reduce Conflicts Reduce/Reduce Conflicts • Reduce/Reduce Conflicts: • shift/reduce conflict start : expr | stmt ; – occurs when a grammar is written in such a way expr : CONSTANT; that a decision between shifting and reducing can stmt : CONSTANT; not be made. • Yacc resolves the conflict by reducing using the rule that occurs – ex: IF-ELSE ambigious. earlier in the grammar. NOT GOOD!! • To resolve this conflict, yacc will choose to • So, modify grammar to eliminate them. shift. Error Messages Debug Your Parser 1. Use –t option or define YYDEBUG to 1. • Bad error message: 2. Set variable yydebug to 1 when you want to – Syntax error. trace parsing status. – Compiler needs to give programmer a good 3. If you want to trace the semantic values advice. z Define your YYPRINT function • It is better to track the line number in lex: void yyerror(char *s) { fprintf(stderr, "line %d: %s\n:", yylineno, s); } Shift and Reducing: Example Recursive Grammar • Left recursion stmt: stmt ‘;’ stmt list: stack: item | NAME ‘=‘ exp | list ',' item <empty> ; • Right recursion exp: exp ‘+’ exp list: item | item ',' list | exp ‘-’ exp input: ; a = 7; b = 3 + a + 2 | NAME • LR parser (e.g. yacc) prefers left recursion. | NUMBER • LL parser prefers right recursion. 6 YACC Declaration Summary YACC Declaration Summary `%start' `%right' Specify the grammar's start symbol Declare a terminal symbol (token type name) that is right-associative `%union' Declare the collection of data types that semantic values may have `%left' Declare a terminal symbol (token type name) that is left-associative `%token' Declare a terminal symbol (token type name) with no precedence or `%nonassoc' associativity specified Declare a terminal symbol (token type name) that is nonassociative (using it in a way that would be associative is a syntax error, `%type' ex: x op. y op. z is syntax error) Declare the type of semantic values for a nonterminal symbol 7.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-