Lex & Yacc Tutorial

Lex & Yacc Tutorial

LEX & YACC TUTORIAL by Tom Niemann epaperpress.com Contents Contents .......................................................................................................................................... 2 Preface ............................................................................................................................................ 3 Introduction...................................................................................................................................... 4 Lex ................................................................................................................................................... 6 Theory .......................................................................................................................................... 6 Practice ........................................................................................................................................ 7 Yacc............................................................................................................................................... 11 Theory ........................................................................................................................................ 11 Practice, Part I............................................................................................................................ 12 Practice, Part II........................................................................................................................... 15 Calculator....................................................................................................................................... 18 Description ................................................................................................................................. 18 Include File................................................................................................................................. 21 Lex Input .................................................................................................................................... 22 Yacc Input .................................................................................................................................. 23 Interpreter................................................................................................................................... 27 Compiler..................................................................................................................................... 28 Graph ......................................................................................................................................... 30 More Lex........................................................................................................................................ 34 Strings ........................................................................................................................................ 34 Reserved Words ........................................................................................................................ 35 Debugging Lex........................................................................................................................... 35 More Yacc...................................................................................................................................... 36 Recursion................................................................................................................................... 36 If-Else Ambiguity ........................................................................................................................ 37 Error Messages.......................................................................................................................... 38 Inherited Attributes..................................................................................................................... 39 Embedded Actions..................................................................................................................... 39 Debugging Yacc......................................................................................................................... 40 Bibliography................................................................................................................................... 41 2 Preface This document explains how to construct a compiler using lex and yacc. Lex and yacc are tools used to generate lexical analyzers and parsers. I assume you can program in C and understand data structures such as linked-lists and trees. The introduction describes the basic building blocks of a compiler and explains the interaction between lex and yacc. The next two sections describe lex and yacc in more detail. With this background we can construct a sophisticated calculator. Conventional arithmetic operations and control statements, such as if-else and while, are implemented. With minor changes we will convert the calculator into a compiler for a stack-based machine. The remaining sections discuss issues that commonly arise in compiler writing. Source code for examples may be downloaded from the web site listed below. Permission to reproduce portions of this document is given provided the web site listed below is referenced. No additional restrictions apply. Source code, when part of a software project, may be used freely without reference to the author and is available at the following web site. Tom Niemann Portland, Oregon epaperpress.com/lexandyacc 3 Introduction Before 1975 writing a compiler was a very time-consuming process. Then Lesk [1975] and Johnson [1975] published papers on lex and yacc. These utilities greatly simplify compiler writing. Implementation details for lex and yacc may be found in Aho [2006]. Flex and bison, clones for lex and yacc, can be obtained for free from GNU and Cygwin. Cygwin is a 32-bit Windows ports of the GNU software. In fact Cygwin is a port of the Unix operating system to Windows and comes with compilers gcc and g++. To install simply download and run the setup executable. Under devel install bison, flex, gcc-g++, gdb, and make. Under editors install vim. Lately I've been using flex and bison under the Cygwin environment. source code a = b + c * d Lexical Analyzer Lex patterns tokens id1 = id2 + id3 * id4 Syntax Analyzer Yacc grammar syntax tree = id1 + id2 * id3 id4 Code Generator generated code load id3 mul id4 add id2 store id1 Figure 1: Compilation Sequence The patterns in the above diagram is a file you create with a text editor. Lex will read your patterns and generate C code for a lexical analyzer or scanner. The lexical analyzer matches strings in the input, based on your patterns, and converts the strings to tokens. Tokens are numerical representations of strings, and simplify processing. When the lexical analyzer finds identifiers in the input stream it enters them in a symbol table. The symbol table may also contain other information such as data type (integer or real) and location of each variable in memory. All subsequent references to identifiers refer to the appropriate symbol table index. The grammar in the above diagram is a text file you create with a text edtior. Yacc will read your grammar and generate C code for a syntax analyzer or parser. The syntax analyzer uses grammar rules that allow it to analyze tokens from the lexical analyzer and create a syntax tree. The syntax tree imposes a hierarchical structure the tokens. For example, operator precedence 4 and associativity are apparent in the syntax tree. The next step, code generation, does a depth- first walk of the syntax tree to generate code. Some compilers produce machine code, while others, as shown above, output assembly language. source (yyparse) bas.y yacc y.tab.c y.tab.h cc bas.exe bas.llex lex.yy.c (yylex) compiled output Figure 2: Building a Compiler with Lex/Yacc Figure 2 illustrates the file naming conventions used by lex and yacc. We'll assume our goal is to write a BASIC compiler. First, we need to specify all pattern matching rules for lex (bas.l) and grammar rules for yacc (bas.y). Commands to create our compiler, bas.exe, are listed below: yacc –d bas.y # create y.tab.h, y.tab.c lex bas.l # create lex.yy.c cc lex.yy.c y.tab.c –obas.exe # compile/link Yacc reads the grammar descriptions in bas.y and generates a syntax analyzer (parser), that includes function yyparse, in file y.tab.c. Included in file bas.y are token declarations. The –d option causes yacc to generate definitions for tokens and place them in file y.tab.h. Lex reads the pattern descriptions in bas.l, includes file y.tab.h, and generates a lexical analyzer, that includes function yylex, in file lex.yy.c. Finally, the lexer and parser are compiled and linked together to create executable bas.exe. From main we call yyparse to run the compiler. Function yyparse automatically calls yylex to obtain each token. 5 Lex Theory During the first phase the compiler reads the input and converts strings in the source to tokens. With regular expressions we can specify patterns to lex so it can generate code that will allow it to scan and match strings in the input. Each pattern specified in the input to lex has an associated action. Typically an action returns a token that represents the matched string for subsequent use by the parser. Initially we will simply print the matched string rather than return a token value. The following represents a simple pattern, composed of a regular expression,

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    41 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us