<<

CS4905 Introduction to Construction Lab 2 January 29, 2007 Purpose: To introduce compiler for grammar checking and expression evaluation.

1. Log in to a workstation in ITD415. By virtue of being enrolled in CS4905, you should receive a Computer Science Linux-lab login ID and password via your UNB E-.

2. Create a subdirectory in your directory space to contain the for this lab. For purposes of illustration, I will assume that you call this subdirectory “L2”.

Part 1. Experiments with yacc

yacc ( compiler compiler) can be used with to create LALR(1) compilers in . The basic structure of a yacc program is as follows:

%{ /* preliminary definitions, includes, other */ %} %token NAME NUMBER %% S : NAME ’=’ exp | exp ; exp : NUMBER ’+’ NUMBER | NUMBER ’-’ NUMBER ; %% #include "lex.yy.c" void yyerror(char * s) /* yacc error handler */ { fprintf (stderr, "%s\n", s); } int main(void) {return yyparse();} where the items between %% and %% correspond to rules of a grammar. The statement #include "lex.yy.c" includes the code generated by lex for passing the recognizable tokens. The process is illustrated in Figure 1 below. lex C program C program lex yacc C compiler program e.g. lex.yy.c y.tab.c .l

language LALR(1) rules; yacc dc source code compiler e.g. program e.g. dc.y dc

Interpreted output from valid dc statements

Figure 1. Using yacc to generate language interpreters (e.g dc = desk calculator).

1 3. Download the “bpb.l” and “bpb.y” files into your L2 subdirectory from the CS4905 web site http://www.cs.unb.ca/profs/nickerson/courses/cs4905/Examples/yacc/index.html (use e.g. Mozilla). names ending in .l are intended to contain lex input; those ending in .y are intended to contain yacc input. The “bpb” stands for “balanced parentheses and square brackets”, and the compiler generated by “bpb.y” is intended to recognize these. Follow the directions on the yacc examples web site to generate a compiler to recognize correctly balanced parentheses and square brackets.

4. What happens when yacc is invoked? Why does this happen? Invoke yacc in verbose mode (see the web site) to cause the y.output file to be generated. How many states are there in the parser? Change the grammar from S ::= S ( S ) S | S [ S ] S | epsilon to S ::= ( S ) | [ S ] | epsilon and generate your compiler again. What changes this ? How many states are in this revised grammar?

5. Try once again with the grammar to S ::= ( S ) | [ S ] | ( ) | [ ]. Run the compiler generated against the following input data file: (([])) (([][]([()]))) ([[](()[()][])]) ([])() Is the ouput correct? Note: to save typing, copy the above four input lines to a file (e.g. bpb.txt), and then redirect the file to the input program using e.g. ./bpb < bpb.txt .

6. Change your grammar once again to recognize the above properly balanced parentheses and square brackets input data. All four examples are properly balanced. Hint: A balanced set of parentheses or square brackets can be followed by a second set of balanced symbols; i.e. S ::= P | P ::= ( Q ) Q B ::= [ Q ] Q where Q is a nonterminal containing P, B and the epsilon production.

7. Download the “dc.l” and “dc.y” files into your L2 subdirectory from the CS4905 web site http://www.cs.unb.ca/profs/nickerson/courses/cs4905/Examples/yacc/index.html (use e.g. Mozilla). The “dc” stands for “desktop calculator”, and the initial “dc.y” is intended to generate a compiler to carry out calcuations from the line. Follow the directions as for step 3. above to generate a compiler to perform desktop calculations. your desk calculator with the following statements: -1.2*3/(4.5-3) +0.4*3+3/(-0.5) 2.4-3.9/3+(9/(3-2.5)) 7-4.2 2.1/3-(5/(4-2.5-1.5)) Are the reported answers correct?

8. Modify your desktop calculator to correctly parse and compute the 2nd statement by adding a unary plus ’+’ operator as a possible part of an expression. Test your revised desktop calculator against the statements from 7. above.

2 Part 2. Experiments with JavaCC

9. Download the “NL_Xlator.jj” file into your L2 subdirectory from the CS4905 web site http://www.cs.unb.ca/profs/nickerson/courses/cs4905/Examples/SimpleExample/index.html (use e.g. Mozilla). File names ending in .jj are intended to contain JavaCC (Java compiler compiler) input.

10. NL_Xlator.jj on the command line to compile the javacc into a Java program.

11. Type javac NL_Xlator.java on the command line to compile the Java program into an executable (.class) program. This will automatically compile any dependent classes.

12. Type java NL_Xlator the command line to execute the Java program. When running, the NL_Xlator program checks for valid simple arithmetic expressions composed from identifiers and integer numbers. Test that your program works correctly by entering the following statements: b2*4+3*P; 21 + 4*Q; Notice how the priority of * over + is maintained. Once again, you can place the test text in a file (e.g. .txt), and then redirect the file to the Java program using e.g. java NL_Xlator < NL.txt .

13. Modify your NL_Xlator program to allow for division ’/’ and subtraction ’-’ operators. Test your revised NL_Xlator program with the following input statements: T/3 - 2*S5; 17 + 8/(R - 2*3); 5 - 14*C/(3 - 5); Instead of having the interpreter state (e.g.): the of 5 and the product of 14, C, and the sum of 3 and 5 change the statments to (e.g.) the sum (or difference) of 5 and the product (or quotient) of 14, C, and the sum (or difference) of 3 and 5

Note that regular expression constructs can appear in the nonterminal definitions; e.g. ( ... )+ : One or occurrences of ... ( ... )? : An optional occurrence of ... ( ... )* : Zero or more occurrences of ... ( r1 | r2 | ... ) : Any one of r1, r2, ...

14. Add the capability to process unary plus ’+’ and unary minus ’-’ to your NL_Xlator program. Test with the following input statement: -3 + 5/-(R + 4); so that it correctly prints the sum (or difference) of minus 3 and the product (or quotient) of 5 and minus the sum (or difference) of R and 4 where the word ’minus’ is added in the appropriate places.

3 Hint: Define a second String u (initialized to the empty string) in the definition of () to store the optional characters required for the unary operators. Only the ’minus’ string is required as the default is positive.

4