Lexical Analysis Lexeme Token Using Lex and Yacc Tools Examples
Total Page:16
File Type:pdf, Size:1020Kb
Lex & Yacc A compiler or an interpreter performs its task in 3 stages: 1) Lexical Analysis: scans the input stream and converts Example: Lexical analysis sequences of characters into tokens (token is a Lexeme Token classification of groups of characters) Sum identifier Lex is a tool for writing lexical analyzers. i identifier 2) Syntactic Analysis (Parsing): A parser reads tokens and := assignment_op assembles them into language constructs using the grammar rules of the language. = equal_sign Yacc is a tool for constructing parsers. 57 integer_constant 3) Actions: Acting upon input is done by code supplied by * multiplication_op the compiler writer. , comma ( left_paran input Lexical stream parse output Parser Actions stream Analyzer of tokens tree executable K. Dincer Programming Languages - Lex 1 K. Dincer Programming Languages - Lex 2 (4) (4) Using Lex and Yacc Tools *.l Lex Specification Yacc Specification *.y Lex: reads a spec file containing regular expressions and generates a C routine that performs lexical analysis. Matches sequences that identify tokens. lex yacc *.c Lex declares an external variable called yytext which lex.yy.c Custom C contains the matched string. yylex() yyparse() y.tab.c routines Yacc: reads a spec file that codifies the grammar of a libl.a cc cc Unix language and generates a parsing routine. Libraries liby.a cc -o scanner lex.yy.c -ll cc -o parser y.tab.c -ly -ll scanner parser K. Dincer Programming Languages - Lex 3 K. Dincer Programming Languages - Lex 4 (4) (4) Examples %% %% [A-Z]+[ \t\n] printf(“%s”, yytext); [+-]?[0-9]*(\.)?[0-9]+ printf(“FLOAT”); . ; /* no action specified */ digit [0-9] alphabetic [A-Za-z] %% digit [0 -9] [+-]?{digit}*( \.)?{digit}+ printf(“FLOAT”); alphanumeric ({alphabetic}|{digit}) %% digit [0-9] {alphabetic }{alphanumeric }* printf(“variable”); sign [+-] \, printf(“Comma”); %% \{ printf(“Left brace”); float val; \:\= printf(“Assignment”); {sign}?{digit}*( \.)?{digit}+ {sscanf(yytext, “%f”, &val); printf(“>%f<”, val); K. Dincer Programming Languages - Lex 5 K. Dincer Programming Languages - Lex 6 (4) (4) 1 Yacc Compare the following two specs: Yacc specification describes a CFG, that can be used %% to generate a parser. for printf(“FOR”); [a-z]+ printf(“IDENTIFIER”); Elements of a CFG: • Terminals: tokens and literal characters, %% • Nonterminals (variables): syntactical elements, [a-z]+ printf(“IDENTIFIER”); • Production rules, for printf(“FOR”); • Start rule. Example: symbol: definition {action} ; A ® Bc is written as a : b ‘c’ K. Dincer Programming Languages - Lex 7 K. Dincer Programming Languages - Lex 8 (4) (4) Declarations in a yacc spec file define tokens and /* print-int.l */ their characteristics. %% [0-9]+ {sscanf(yytext, “%d”, &yylval); %token: declare name of tokens return(INTEGER); %left: define left-associative operators } %right: define right-associative operators \n return NEWLINE; %nonassoc:define operators that may not associate . return(yytext[0]); with themselves. %type: declare the type of variables %union: declare multiple data types for semantic values. /* print-int.y */ token INTEGER NEWLINE %start: declare the start symbol (first var in rules) %% %prec: assign precedence to a rule %{ C declarations %} directly copied to the resulting C program K. Dincer Programming Languages - Lex 9 K. Dincer Programming Languages - Lex 10 (4) (4) K. Dincer Programming Languages - Lex 11 (4) 2.