Misc Topics

Thus far the tokens have been recognized and the grammar used to make sense out of their order. The next step is actually doing something with the information. This section will cover managing the semantic stack, debugging, and error recovery. Stack Handling The following code snippet contains reduction 7. When this reduction is discovered, the stack will contain the (3) pieces of information on the right hand side of the reduction. In many texts these are referred to as i, i-1, and i-2. However this program does it the opposite way as shown below.

E: E PLUS T_ {printf("Reduction 7\n");}

Typical: i = ‘T_’ i-1 = ‘PLUS’ i-2 = ‘E’

Yacc: $1 = ‘E’ $2 = ‘PLUS’ $3 = ‘T_’

In this reduction it is likely that the numbers represented by ‘E’ and ‘T’ need to be captured and added together. This can be achieved by:

E: E PLUS T_ {printf("Reduction 7\n"); int a,b,c; a = atoi($1);//converts string to int b = atoi($3); c = a + b;}

Notice that each of these variables (i.e - $1) are simply pointers to char strings, and string conversions will be necessary if numerical operations, etc. are needed. This mirrors the methodology in the class project and is simpler to setup, but there are options to do this differently using a union (where they could contain integer, float, pointer, etc.). In this case it might have been worthwhile because if it had been done that way, variables $1 and $3 might have been used directly as integers…

It is relatively straightforward to utilize pointers from the stack as shown above, but how does one place the results back on the stack? This is done with the $$ symbol. At the end of the reduction, the $1, $2, and $3 pointers will be wiped off the stack, and whatever $$ has been defined as will be placed back on the semantic stack. If for instance $1 is desired to be kept at the top of the stack, the last line in the function would be $$ = $1. This particular example however, is not actually necessary as this is the default behavior – if not explicitly defined, $$ is assumed to be $1.

In the example above, it would appear to make sense to just set $$ = c. The problem with this is that $$ is expected to be a char pointer. There are two ways this can be handled:  Cast the address of the integer as a char pointer, and make sure to recognize this later when that particular pointer surfaces in another reduction. The you will have to recast it as an integer to use it. This is probably a bad way to handle this specific situation, but this technique does make sense in other scenarios (possibly to store a pointer to a symbol table).  The second approach is to convert the integer back into a string representation and set $$ equal to that char pointer. This seems a little backwards, but it is a relatively easy solution that allows nested operations to be performed without lots of checking to determine whether the incoming ‘E’ was a char string or an integer masquerading as a char pointer - i.e. (4+(7+3)).

The following code snippet shows the second approach:

E: E PLUS T_ {printf("Reduction 7\n"); int a,b,c; char s[32]; a = atoi($1);//converts string to int b = atoi($3); c = a + b; sprintf(s,"%d",c);//converts c back to string $$ = s;

Warning – a frequent mistake involves not making a copy of the string and just sending back a pointer to a memory location whose contents will change. In the above case it is not necessary, but it will be necessary in other situations…

//makes a copy of string s, and places a pointer to it on the stack strcpy($$, s);

Tracing and Error Recovery There are times when it will be beneficial to see more detail of what is happening during program execution (as well as to meet flag requirements in the project). This feature can be enabled by turning on a debug flag during compiling – place the following statement in the header portion of the grammar code:

#define YYDEBUG 1

At any time during execution, setting yydebug=1 will activate tracing. This will print the next token number, which state it is entering, current state stack, etc. Setting yydebug = 0 will deactivate this function.

There is a very nice feature in Yacc that makes panic mode recovery fairly straight-forward to implement. It actually involves adding reductions to the grammar in places where the error should be caught and the reduction ignored. There is a simple example that explains its use fairly well in Appendix A of Yacc: Yet Another Compiler-Compiler by Stephen C. Johnson: . . . %% /∗ beginning of rules section ∗ / list : /∗ empty ∗ / | list stat \n | list error \n { yyerrok; } ;

stat : expr { printf( "%d\n", $1 ); } | LETTER = expr { regs[$1] = $3; } ;

expr : ́( expr ) { $$ = $2; } | expr + expr { $$ = $1 + $3; } | expr − expr { $$ = $1 − $3; } | expr ∗ expr { $$ = $1 ∗ $3; } . . .

The “error” part of the list reduction simply looks for a syntax error, intelligently pops of tokens until it reaches previously good reduction, then moves forward skipping tokens until reaches the end of the statement. This makes implementing the panic mode recovery very easy IF you place the error reduction(s) in the right place(s).