<<

Lex & A or an interpreter performs its task in 3 stages: 1) : scans the input stream and converts Example: Lexical analysis sequences of characters into tokens (token is a Lexeme Token classification of groups of characters) identifier Lex is a tool for writing lexical analyzers. i identifier 2) Syntactic Analysis (): A parser reads tokens and := assignment_op assembles them into language constructs using the grammar rules of the language. = equal_sign Yacc is a tool for constructing parsers. 57 integer_constant 3) Actions: Acting upon input is done by code supplied by * multiplication_op the compiler writer. , comma ( left_paran input Lexical stream parse output Parser Actions stream Analyzer of tokens executable

K. Dincer Programming Languages - Lex 1 K. Dincer Programming Languages - Lex 2 (4) (4)

Using Lex and Yacc Tools

*.l Lex Specification Yacc Specification *.y Lex: reads a spec containing regular expressions and generates a routine that performs lexical analysis. Matches sequences that identify tokens. lex yacc *.c Lex declares an external variable called yytext lex.yy.c Custom C contains the matched string. yylex() yyparse() y.tab.c routines

Yacc: reads a spec file that codifies the grammar of a libl.a cc cc language and generates a parsing routine. Libraries liby.a cc -o scanner lex.yy.c -ll cc -o parser y.tab.c -ly -ll scanner parser

K. Dincer Programming Languages - Lex 3 K. Dincer Programming Languages - Lex 4 (4) (4)

Examples %% %% [A-Z]+[ \t\n] (“%s”, yytext); [+-]?[0-9]*(\.)?[0-9]+ printf(“FLOAT”); . ; /* no action specified */

digit [0-9] alphabetic [A-Za-z] %% digit [0 -9] [+-]?{digit}*( \.)?{digit}+ printf(“FLOAT”); alphanumeric ({alphabetic}|{digit}) %% digit [0-9] {alphabetic }{alphanumeric }* printf(“variable”); sign [+-] \, printf(“Comma”); %% \{ printf(“Left brace”); float val; \:\= printf(“Assignment”); {sign}?{digit}*( \.)?{digit}+ {sscanf(yytext, “%f”, &val); printf(“>%f<”, val);

K. Dincer Programming Languages - Lex 5 K. Dincer Programming Languages - Lex 6 (4) (4)

1 Yacc Compare the following two specs: Yacc specification describes a CFG, that can be used %% to generate a parser. for printf(“FOR”); [a-z]+ printf(“IDENTIFIER”); Elements of a CFG: • Terminals: tokens and literal characters, %% • Nonterminals (variables): syntactical elements, [a-z]+ printf(“IDENTIFIER”); • Production rules, for printf(“FOR”); • Start rule. Example: symbol: definition {action} ; A ® Bc is written as a : b ‘c’

K. Dincer Programming Languages - Lex 7 K. Dincer Programming Languages - Lex 8 (4) (4)

Declarations in a yacc spec file define tokens and /* print-int.l */ their characteristics. %% [0-9]+ {sscanf(yytext, “%d”, &yylval); %token: declare name of tokens return(INTEGER); %left: define left-associative operators } %right: define right-associative operators \n return NEWLINE; %nonassoc:define operators that may not associate . return(yytext[0]); with themselves. %: declare the type of variables %union: declare multiple data types for semantic values. /* print-int.y */ token INTEGER NEWLINE %start: declare the start symbol (first var in rules) %% %prec: assign precedence to a rule %{ C declarations %} directly copied to the resulting C program K. Dincer Programming Languages - Lex 9 K. Dincer Programming Languages - Lex 10 (4) (4)

K. Dincer Programming Languages - Lex 11 (4)

2