Pirate a Compiler for Lua Targeting the Parrot Virtual Machine

Pirate A Compiler for Lua targeting the Parrot Virtual Machine Klaas-Jan Stol June 6, 2003 Abstract The Lua programming language is a dynamically typed language with a simple syntax. It was designed to be extensible and embedded. Lua can be extended by the programmer so that events that otherwise would result in an error, can then be caught by user-de¯ned functions. Beside extending Lua, it is also possible to embed a Lua program within a host program. This way, the host program can invoke a Lua script for higher level tasks that are more easily programmed in Lua than in a lower level language such as C, although a Lua script may also be executed as a stand-alone program. The original distribution of Lua o®ers the programmer a Lua interpreter, or virtual machine. The interpreter can be used to execute a Lua script directly. Before actually executing the script, it is compiled into Lua byte-code. The interpreter can also be used to compile a script only, without actually running it. One way or the other, the Lua language and virtual machine are integrated. To be able to run a Lua script on another virtual machine, the Lua language must be separated from its original run-time environment. Parrot is a virtual machine that is currently being designed and implemented to execute dynamically typed languages. Because Parrot is able to execute dynamically typed languages, then it should also be able to host the Lua language. This paper describes the design and implementation of Pirate, a Lua compiler that targets the Parrot Virtual Machine. Not only is this an explo- ration of compiler construction theory and practice, it also demonstrates the re-implementation of a programming language to a new virtual machine. Pirate was not designed to execute a Lua script as an embedded program, but rather as a stand-alone application consisting of one or more modules (although it is possible to embed a Lua script compiled with Pirate). Because of the fact that Parrot will be able not only to run Perl 6 but other languages as well, it is interesting to investigate how these modules written in di®erent languages can cooperate. This o®ers a new use of the Lua language. Although Parrot is still under construction at this moment, many things are already operational and it was possible to implement a working Lua compiler. However, because certain features of the Lua language need rather speci¯c sup- port of Parrot that are not implemented yet, not all features of Lua have been implemented. Other features have a temporary implementation that will be changed at a later stage when the needed features of Parrot are fully operational. Contents Preface 5 I Overview 7 1 Introduction 8 1.1 Compilers . 8 1.2 The Lua language . 9 1.3 Virtual Machines . 11 1.4 The Parrot Virtual Machine . 12 1.5 Running Lua on Parrot . 14 2 Overview of Compiler Organization 15 2.1 Introduction . 15 2.2 Compiler Organization . 15 2.3 Compiler Data-ﬂow . 19 II Compiler Front-end 23 3 Syntactical Analysis 24 3.1 The Lexer . 24 3.2 The Parser . 26 3.3 The Abstract Syntax Tree . 31 3.4 Reporting Syntax Errors . 32 4 Scope Analysis 34 4.1 Lua Scope rules . 34 4.2 The Symbol Table . 36 4.3 Reporting Scope Errors . 39 III Compiler Back-end 40 5 Parrot Run-Time Environment 41 2 CONTENTS 3 5.1 Introduction . 41 5.2 Implementation of Lua Data Types . 41 5.3 Run-Time Storage Organization . 44 5.4 Parameter-Passing Modes . 45 5.5 Tag methods . 45 6 Generating Code 47 6.1 The Target Language . 47 6.2 Translating Control Structures . 48 6.3 Translating Assignments . 52 6.4 Translating Functions . 53 6.5 Translating Local Declarations . 58 6.6 Translating Data Structure References . 59 6.7 Implementing tags . 59 7 Optimizing Code 60 7.1 Optimizations in the IMC Compiler . 60 7.2 Optimizations in Pirate . 61 7.3 Implementation of the Code Optimizer . 62 IV Using Lua 63 8 Lua Libraries 64 8.1 The Standard Libraries . 64 8.2 Functions to Execute Lua Code . 65 9 The Lua Compiler for Parrot 67 9.1 Compiling Code . 67 9.2 Running Lua code . 68 9.3 Running other languages . 69 10 Conclusion 72 10.1 Design . 72 10.2 Implementation . 73 10.3 Where to go from here . 74 A Lexical and Syntactical Structure of Lua 77 A.1 Lexical Conventions for Lua . 77 A.2 Syntactical Structure of Lua . 79 B Details for a Recursive Descent Parser 81 C Parrot Calling Conventions 83 C.1 Calling Conventions . 83 C.2 Returning Conventions . 84 CONTENTS 4 D Internal Data Structures 85 D.1 The Compiler State Structure . 85 D.2 The Symbol Table . 86 D.3 The Symbol Table Iterator . 88 D.4 The Stack . 90 E Instruction Sets 92 E.1 IMC Instructions . 92 E.2 Parrot Instructions . 93 Bibliography 94 Preface This paper presents the design and implementation of Pirate, a Lua compiler that targets the Parrot virtual machine. This work was done in the last semester of my undergraduate education at the Hanzehogeschool Groningen. Pirate was created to explore the possibilities of re-targeting an existing programming language to a new platform, and to obtain a better understanding of compilers in general. For this purpose, the Lua language was re-targeted to the Parrot virtual machine, which will be the new virtual machine for the Perl 6 programming language. Content of this paper This paper consists of four parts. Part one is an introduction to the Lua language, the Parrot virtual machine and compilers in general. Chapter 1 intro- duces the Lua language, virtual machines and the Parrot virtual machine in particular. Chapter 2 gives an overview of compiler construction in general. Part two describes the implementation of the front-end of the compiler. The front-end is the part of the compiler that does syntax and contextual checking (such as scope checking) of the source program. Implementation of the syntactical analyzer is described in Chapter 3. Implementation of the scope checker is discussed in Chapter 4. Part three deals with the back-end of the compiler. The back-end of a compiler does not have to do any more checks on the source program. At this point of compilation, all errors should have been discovered. Chapter 5 ¯rst discusses some issues that must be dealt with before actual generating code. After that, Chapter 6 describes how machine instructions are generated. Chapter 7 describes the optimizer that tries to optimize the generated code. Part four ¯nally describes how Pirate can be used. Chapter 8 describes what Lua libraries are available in Pirate and which restrictions are present for using the libraries. After that, Chapter 9 explains in what ways Pirate can be used. Finally, the conclusion recapitulates the presented work and provides a list of known issues in the current implementation. After that, a list of potential future improvements is provided, that can be implemented at a later stage. CONTENTS 6 Obtaining Pirate Pirate is written in ansi C, so it should be possible to run the program on any platform that has a corresponding compiler. The software is available at: http://sourceforge.net/projects/luaparrot/ Readership This report is aimed at people that have some experience in programming in a higher level programming language, such as Java, C or Pascal. I assume the reader understands what a compiler does, and has a basic knowledge of scripting languages. No knowledge of compilers or compiler theory is assumed. Acknowledgments At this point I would like to express my gratitude towards Martijn de Vries for his supervision and helpful tips. I would also like to thank the people of the Perl 6 community for their responsive reactions and contributions. Klaas-Jan Stol Part I Overview Chapter 1 Introduction 1.1 Compilers A compiler is the basic tool for any modern programmer. Any program written in a high-level programming language must be translated to some form that can be understood by a computer. For example, it is not possible for a computer to understand the following Lua program: do x = 1 + 2 end For the computer to be able to execute such a program, it must be translated to some form of machine code, like this: new P0, .LuaNumber # create a new number variable set P0, 1 # set this variable to the value 1 add P0, 2 # add 2 to this number store global "x", P0 # store the value with name "x" The instructions above are examples of the machine code for the Parrot Virtual Machine. Actually, the mnemonics for the instructions are printed here, for ease of reading. The actual machine code is in binary form. To be able to execute the above instructions, these mnemonics should be translated to a binary format by the assembler. The translation of a programming language to machine instructions (or its mnemonics) is done by a compiler. A compiler does not necessarily translate a programming language. There are more types of compilers. For example, the program that formatted this report, LATEX, is in fact a compiler too. The text is typed in as plain text, and all kinds of commands are used to indicate which text is a chapter title, paragraph title, normal text, et cetera. Chapter 2 gives a more detailed overview of the tasks of a compiler. 1.2. THE LUA LANGUAGE 9 1.2 The Lua language This section gives an overview of the Lua language.

Load more