1 YACC File Format

1 YACC File Format

Introduction History • What is YACC ? • Yacc original written by Stephen C. Johnson, 1975. – Tool which will produce a parser for a given grammar. •Variants: – – YACC (Yet Another Compiler Compiler) is a lex, yacc (AT&T) program designed to compile a LALR(1) – bison: a yacc replacement (GNU) grammar and to produce the source code of the – flex: fast lexical analyzer (GNU) syntactic analyzer of a language produced by this – BSD yacc grammar. – PCLEX, PCYACC (Abraxas Software) How YACC Works An YACC File Example y.tab.h %{ #include <stdio.h> YACC source (foo.y) yacc y.tab.c %} y.output %token NAME NUMBER %% (1) Parse statement: NAME '=' expression | expression { printf("= %d\n", $1); } ; a.out expression: expression '+' NUMBER { $$ = $1 + $3; } y.tab.c cc / gcc | expression '-' NUMBER { $$ = $1 - $3; } | NUMBER { $$ = $1; } ; %% (2) Compile int yyerror(char *s) { fprintf(stderr, "%s\n", s); Abstract return 0; } Token stream a.out Syntax int main(void) Tree { yyparse(); return 0; (3) Run } YACC File Format Definitions Section %{ C declarations %{ %} #include <stdio.h> yacc declarations #include <stdlib.h> %% Grammar rules %} It is a terminal %% %token ID NUM Additional C code %start expr start from expr – Comments in /* ... */ may appear in any of the sections. 1 Start Symbol Rules Section • The first non-terminal specified in the grammar • Is a grammar specification section. • Example • To overwrite it with %start declaraction. %start non-terminal expr : expr '+' term | term; term : term '*' factor | factor; factor : '(' expr ')' | ID | NUM; Rules Section The Position of Rules • Normally written like this expr : expr '+' term { $$ = $1 + $3; } • Example: | term { $$ = $1; } expr : expr '+' term ; | term term : term '*' factor { $$ = $1 * $3; } ; | factor { $$ = $1; } term : term '*' factor ; | factor factor : '(' expr ')' { $$ = $2; } ; | ID factor : '(' expr ')' | NUM | ID ; | NUM ; Works with LEX Communication between LEX and YACC [0-9]+ yacc -d gram.y call yylex() • Use enumeration / define Will produce: • YACC creates y.tab.h • LEX includes y.tab.h y.tab.h next token is NUM 12 + 26 NUM ‘+’ NUM LEX and YACC need a way to identify tokens 2 Communication between LEX and YACC YACC %{ scanner.l yacc -d xxx.y #include <stdio.h> produces #include "y.tab.h" • Rules may be recursive y.tab.h %} • Rules may be ambiguous* id [_a-zA-Z][_a-zA-Z0-9]* %% # define CHAR 258 • Uses bottom up Shift/Reduce parsing int { return INT; } # define FLOAT 259 – Get a token char { return CHAR; } # define ID 260 float { return FLOAT; } – Push onto stack # define INT 261 {id} { return ID;} – Can it reduced ? • yes: Reduce using a rule %{ parser.y #include <stdio.h> • no: Get another token #include <stdlib.h> • Yacc cannot look ahead more than one token %} %token CHAR, FLOAT, ID, INT %% Passing value of token Passing value of token • Every terminal-token (symbol) may represent a value or data type • Yacc allows symbols to have multiple types of value symbols – May be a numeric quantity in case of a number (42) %union { – May be a pointer to a string ("Hello, World!") double dval; int vblno; • When using lex, we put the value into yylval char* strval; – In complex situations yylval is a union } • Typical lex code: [0-9]+ {yylval = atoi(yytext); return NUM} Passing value of token Yacc Example %union { • Taken from Lex & Yacc double dval; yacc -d • Example: Simple calculator int vblno; y.tab.h a = 4 + 6 char* strval; … a } extern YYSTYPE yylval; a=10 b = 7 c = a + b [0-9]+ { yylval.vblno = atoi(yytext); c return NUM;} c = 17 [A-z]+ { yylval.strval = strdup(yytext); $ return STRING;} Lex file include “y.tab.h” 3 Grammar Symbol Table expression ::= expression '+' term | 0 expression '-' term | name value 1 name value term #define NSYMS 20 /* maximum number 2 name value of symbols */ 3 name value term ::= term '*' factor | 4 name value term '/' factor | struct symtab { 5 name value factor char *name; 6 name value double value; 7 name value factor ::= '(' expression ')' | } symtab[NSYMS]; 8 name value '-' factor | 9 name value NUMBER | struct symtab *symlook(); 10 name value NAME parser.h Parser Parser (cont’d) %{ statement_list: statement '\n' #include "parser.h" Terminal NAME and <symp>have | statement_list statement '\n' #include <string.h> the same data type. ; %} Nonterminal expression and statement: NAME '=' expression { $1->value = $3; } %union { | expression { printf("= %g\n", $1); } double dval; <dval> have the same data type. ; struct symtab *symp; } expression: expression '+' term { $$ = $1 + $3; } %token <symp> NAME | expression '-' term { $$ = $1 - $3; } %token <dval> NUMBER | term ; %type <dval> expression %type <dval> term %type <dval> factor %% parser.y parser.y Parser (cont’d) Scanner term: term '*' factor { $$ = $1 * $3; } %{ | term '/' factor { if ($3 == 0.0) #include "y.tab.h" yyerror("divide by zero"); #include "parser.h" else #include <math.h> $$ = $1 / $3; %} } %% | factor ; ([0-9]+|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?) { yylval.dval = atof(yytext); factor: '(' expression ')' { $$ = $2; } return NUMBER; | '-' factor { $$ = -$2; } } | NUMBER { $$ = $1; } | NAME { $$ = $1->value; } [ \t] ; /* ignore white space */ ; %% parser.y scanner.l 4 Scanner (cont’d) Precedence / Association [A-Za-z][A-Za-z0-9]* { /* return symbol pointer */ yylval.symp = symlook(yytext); (1) 1 - 2 - 3 return NAME; } (2) 1 - 2 * 3 "$" { return 0; /* end of input */ } \n |”=“|”+”|”-”|”*”|”/” return yytext[0]; 1. 1-2-3 = (1-2)-3? or 1-(2-3)? %% 2. 1-2*3 = 1-(2*3) or (1-2)*3? Yacc: Shift/Reduce conflicts. Default is to shift. scanner.l Precedence / Association Precedence / Association %left '+' '-' %right ‘=‘ %left '*' '/' %noassoc UMINUS %left '<' '>' NE LE GE expr : expr ‘+’ expr { $$ = $1 + $3; } %left '+' '-‘ | expr ‘-’ expr { $$ = $1 - $3; } %left '*' '/' | expr ‘*’ expr { $$ = $1 * $3; } | expr ‘/’ expr highest precedence { if($3==0) yyerror(“divide 0”); else $$ = $1 / $3; } | ‘-’ expr %prec UMINUS {$$ = -$2; } IF-ELSE Ambiguity IF-ELSE Ambiguity • Consider following rule: • It is a shift/reduce conflict. stmt : IF expr stmt • Yacc will always choose to shift. | IF expr stmt ELSE stmt …… • A solution: stmt : matched | unmatched Following state ? ; matched: other_stmt | IF expr THEN matched ELSE matched IF expr IF expr stmt ELSE stmt ; unmatched: IF expr THEN stmt | IF expr THEN matched ELSE unmatched ; 5 Shift/Reduce Conflicts Reduce/Reduce Conflicts • Reduce/Reduce Conflicts: • shift/reduce conflict start : expr | stmt ; – occurs when a grammar is written in such a way expr : CONSTANT; that a decision between shifting and reducing can stmt : CONSTANT; not be made. • Yacc resolves the conflict by reducing using the rule that occurs – ex: IF-ELSE ambigious. earlier in the grammar. NOT GOOD!! • To resolve this conflict, yacc will choose to • So, modify grammar to eliminate them. shift. Error Messages Debug Your Parser 1. Use –t option or define YYDEBUG to 1. • Bad error message: 2. Set variable yydebug to 1 when you want to – Syntax error. trace parsing status. – Compiler needs to give programmer a good 3. If you want to trace the semantic values advice. z Define your YYPRINT function • It is better to track the line number in lex: void yyerror(char *s) { fprintf(stderr, "line %d: %s\n:", yylineno, s); } Shift and Reducing: Example Recursive Grammar • Left recursion stmt: stmt ‘;’ stmt list: stack: item | NAME ‘=‘ exp | list ',' item <empty> ; • Right recursion exp: exp ‘+’ exp list: item | item ',' list | exp ‘-’ exp input: ; a = 7; b = 3 + a + 2 | NAME • LR parser (e.g. yacc) prefers left recursion. | NUMBER • LL parser prefers right recursion. 6 YACC Declaration Summary YACC Declaration Summary `%start' `%right' Specify the grammar's start symbol Declare a terminal symbol (token type name) that is right-associative `%union' Declare the collection of data types that semantic values may have `%left' Declare a terminal symbol (token type name) that is left-associative `%token' Declare a terminal symbol (token type name) with no precedence or `%nonassoc' associativity specified Declare a terminal symbol (token type name) that is nonassociative (using it in a way that would be associative is a syntax error, `%type' ex: x op. y op. z is syntax error) Declare the type of semantic values for a nonterminal symbol 7.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us