An Introduction to LEX and YACC
SYSC-3101 1 Programming Languages 1 Contents CONTENTS 2 3 4 SYSC-3101 General Lex Y Main acc - - A Pr Y lexical et ogram Structur another analyzer e compiler compiler 2
Programming CONTENTS Languages 10 17 4 3 Program 1 SYSC-3101 3. 2. 1. “Ev parser: le General xical erything will analyzer: gram.y consist else”: Structur of scan.l main.c three parts: e
3
1 GENERAL
Programming
STR UCTURE Languages 2 SYSC-3101 /* }% /* %% /* %% /* %{ Definitions user Rules C Lex includes - subroutines A */ lexical Figure */ */ 1: analyzer Le */ x
program
4 2 LEX
structure
-
A LEXICAL
Programming AN
Languages AL YZER SYSC-3101 Le Lex
• • •
x If Definitions integer ie: T 2. 1. Definitions able action actions re gular with has e tw xpressions more o printf("found columns: than one
statement,
5 2 LEX keyword
enclose
-
A LEXICAL it Programming INT"); within
{} AN
Languages AL YZER Regular Re SYSC-3101
• • gular te operators: \ . \ "..." \ t n xt :
: treat Expressions : : match characters: tab ne Expr wline. . : ne treat " an xt essions ything. \ character ’... a [ ’ - as z ] , te 0 ˆ as xt - - 9 te characters , xt space ? character
. 6 2 ...
* LEX (useful + .
|
- A (
for LEXICAL ) Programming spaces). $ /
{ AN }
Languages AL
% YZER < > SYSC-3101 Re
• gular operators + (...) [...] | ? * { { n defn
, : : : : Expressions m match alternation, match match } } : : : : substitute (cont): repitition, group match one zero zero or or or e ..., an g more one more ab|cd ything e defn e g: g time, a{1,3} (ab)+ times, times, (from within → e g: e ab e first g:
→ 7 2 g: ab?c → [] , ab+c ab*c cd
ab section). LEX a , , → aa abab
→ - →
, ac A aaa
abc ac LEXICAL , ... Programming abc , , abc abbc , abbc
... AN
Languages AL
... YZER Actions Actions SYSC-3101 yytext yylen • • • • ; ECHO; {...} return → : Length : C-String Null → → yytext; action. of printf("%s", Multi-statement the of matched matched → send characters characters.
action. 8 2 contents
yytext); LEX
(Mak -
of A
yytext e LEXICAL a Programming cop y if to
neccessary!) the AN
Languages AL
parser YZER . SYSC-3101 Actions %% . %{ extern %} #include #include [0-9] %% /* -*- YYSTYPE { return } "gram.tab.h"
LEX = 9 2 atoi((char
T LEX
emplate
-
A LEXICAL Programming
*)&yytext[0]); AN
Languages AL YZER SYSC-3101 3 /* %% /* %% /* }% /* %{ user Rules Other C Y acc includes subroutines - */ Declarations
Y et 3
Figure
another Y
*/
A CC
3:
- Y YET */ acc */
compiler
10 ANO program THER
structure COMPILER compiler
Programming COMPILER Languages SYSC-3101 Y Y
A A
A
• • • • CC CC : liter BODY A A ' ' \ \ is
BODY grammar '' Rules n' als a Rules non-terminal consists → → are ; enclosed single ne
rule 3 wline. of
has Y
names, quote. A
name CC the in
quotes, follo -
literals, (LHS). YET wing
e 11 ANO g: and form:
’+’ THER
actions. COMPILER (RHS)
Programming COMPILER Languages SYSC-3101 Y
A A A A A
• • CC : : : : | | ; can The
B B C E C E Rules be rules: ; D F D F specified ; G G
; 3
as:
Y
A
CC
-
YET
12 ANO
THER COMPILER
Programming COMPILER Languages SYSC-3101 Y
%token A
• • • • CC
Define Names Ev Ev done represent rule. ery Rules ery by name nonterminal name1 representing writing a name1 nonterminal
not 3 , name2
defined
Y A
symbol name2 CC tok
,... ens
symbol. -
in YET in the must must . the
declarations 13 ANO . appear be declarations .
declared; THER on
the
section COMPILER section. this left Programming is side is most assumed
of COMPILER simply at Languages least to one Actions Actions SYSC-3101 • • • expr XXX Def $ the $ $$ recognized n is components user ault special! : { ; → → : return YYY printf("a may ’(’ psuedo-v The
in associate 3 the v type alue ZZZ
of Y expr
input A the ariables
is CC returned inte right actions
message\n");
process,
- ger YET ’)’ which hand . by
to 14 ANO e { the be g: side refer
left-hand performed THER $$ of to the = the
} COMPILER $2 rules. v side Programming alues each ; of } time returned a
rule. COMPILER the Languages rule by the is %type Declarations %union SYSC-3101 %token Declarations • the is can generated type be : typedef } declares : : associated declares declares YYSTYPE; body
and 3 return of other ALL
must Y union
with
A union CC v terminals return alue be
tok -
included YET ens. { type types.
... 15 ANO which for into
non-terminals. THER are the not
le COMPILER x literals. source Programming so
that COMPILER Languages types SYSC-3101 Declarations S %% } %union yyerror( %% %type %token %} #include %{ int
E printf( 3
* Y
s S INTEGER A
Figure CC ) E "Result
{
- YET fprintf( 4: Y
A 16 ANO is CC
%d\n", T THER emplate
stderr, COMPILER $1 Programming "%s\n", );
} COMPILER Languages s ); } SYSC-3101 4 extern #include #include #define /* { main() #include #include } Main yyparse(); int yydebug YYDEBUG
template 4
Programming
MAIN PR
Languages OGRAM SYSC-3101 C:\DJGPP\etc\SYSC-3˜1>bison C:\DJGPP\etc\SYSC-3˜1>echo C:\DJGPP\etc\SYSC-3˜1>gcc C:\DJGPP\etc\SYSC-3˜1>flex C:\DJGPP\etc\SYSC-3˜1> Result Microsoft (C) Copyright is 3 Windows 1985-2001 Figure XP 6: [Version Running Microsoft 18 main.c 1 scan.l -d + the 5.1.2600] gram.y 2 e xample | Corp.
a.exe 4
Programming
MAIN PR
Languages OGRAM SYSC-3101 Result 2 E:\YaccLex>main.exe Copyright E:\YaccLex>bison Microsoft E:\YaccLex>cl E:\YaccLex>flex Microsoft (C) + 1 Copyright is Figure 3 (C) (R) Windows 7: main.c Microsoft 32-bit 1985-2001 Running scan.l -d XP gram.y C/C++ [Version the Corp Microsoft e 19 xample Optimizing 1984-1998. 5.1.2600] using Corp.
V 4 Programming isual
Compiler MAIN All C++
rights PR
Languages Version OGRAM reserved. 12.00.8804 for 80x86