Writing an INTERPRETER in Go Writing an Interpreter in Go
Total Page:16
File Type:pdf, Size:1020Kb
Thorsten Ball writing an INTERPRETER in go Writing An Interpreter In Go Thorsten Ball Contents Acknowledgments 3 Introduction 4 The Monkey Programming Language & Interpreter .................... 5 Why Go? ........................................... 7 How to Use this Book .................................... 8 1 Lexing 9 1.1 - Lexical Analysis ..................................... 9 1.2 - Defining Our Tokens .................................. 10 1.3 - The Lexer ........................................ 12 1.4 - Extending our Token Set and Lexer ......................... 20 1.5 - Start of a REPL .................................... 25 2 Parsing 28 2.1 - Parsers .......................................... 28 2.2 - Why not a parser generator? ............................. 30 2.3 - Writing a Parser for the Monkey Programming Language ............. 31 2.4 - Parser’s first steps: parsing let statements ...................... 32 2.5 - Parsing Return Statements .............................. 44 2.6 - Parsing Expressions .................................. 46 Expressions in Monkey ................................. 47 Top Down Operator Precedence (or: Pratt Parsing) ................ 48 Terminology ....................................... 48 Preparing the AST ................................... 49 Implementing the Pratt Parser ............................ 52 Identifiers ........................................ 53 Integer Literals ..................................... 56 Prefix Operators .................................... 59 Infix Operators ..................................... 63 2.7 - How Pratt Parsing Works ............................... 69 2.8 - Extending the Parser .................................. 77 Boolean Literals .................................... 79 Grouped Expressions .................................. 82 If Expressions ...................................... 84 Function Literals .................................... 89 Call Expressions .................................... 93 Removing TODOs ................................... 97 2.9 - Read-Parse-Print-Loop ................................. 100 3 Evaluation 103 1 3.1 - Giving Meaning to Symbols .............................. 103 3.2 - Strategies of Evaluation ................................ 104 3.3 - A Tree-Walking Interpreter .............................. 105 3.4 - Representing Objects .................................. 106 Foundation of our Object System ........................... 108 Integers ......................................... 108 Booleans ......................................... 109 Null ........................................... 109 3.5 - Evaluating Expressions ................................. 110 Integer Literals ..................................... 111 Completing the REPL ................................. 113 Boolean Literals .................................... 114 Null ........................................... 116 Prefix Expressions ................................... 116 Infix Expressions .................................... 120 3.6 - Conditionals ....................................... 125 3.7 - Return Statements ................................... 127 3.8 - Abort! Abort! There’s been a mistake!, or: Error Handling ............ 131 3.9 - Bindings & The Environment ............................. 136 3.10 - Functions & Function Calls .............................. 140 3.11 - Who’s taking the trash out? ............................. 150 4 Extending the Interpreter 152 4.1 - Data Types & Functions ................................ 152 4.2 - Strings .......................................... 152 Supporting Strings in our Lexer ............................ 153 Parsing Strings ..................................... 155 Evaluating Strings ................................... 156 String Concatenation .................................. 158 4.3 - Built-in Functions ................................... 160 len ............................................ 160 4.4 - Array .......................................... 164 Supporting Arrays in our Lexer ............................ 165 Parsing Array Literals ................................. 166 Parsing Index Operator Expressions ......................... 169 Evaluating Array Literals ............................... 172 Evaluating Index Operator Expressions ....................... 174 Adding Built-in Functions for Arrays ......................... 177 Test-Driving Arrays .................................. 180 4.5 - Hashes .......................................... 181 Lexing Hash Literals .................................. 182 Parsing Hash Literals ................................. 184 Hashing Objects .................................... 188 Evaluating Hash Literals ................................ 192 Evaluating Index Expressions With Hashes ..................... 195 4.6 - The Grand Finale .................................... 198 Resources 200 Feedback 202 Changelog 203 2 Acknowledgments I want to use these lines to express my gratitude to my wife for supporting me. She’s the reason you’re reading this. This book wouldn’t exist without her encouragement, faith in me, assistance and her willingness to listen to my mechanical keyboard clacking away at 6am. Thanks to my friends Christian, Felix and Robin for reviewing early versions of this book and providing me with invaluable feedback, advice and cheers. You improved this book more than you can imagine. 3 Introduction The first sentence of this introduction was supposed to be this one: “Interpreters are magical”. But one of the earliest reviewers, who wishes to remain anonymous, said that “sounds super stupid”. Well, Christian, I don’t think so! I still think that interpreters are magical! Let me tell you why. On the surface they look deceptively simple: text goes in and something comes out. They are programs that take other programs as their input and produce something. Simple, right? But the more you think about it, the more fascinating it becomes. Seemingly random characters - letters, numbers and special characters - are fed into the interpreter and suddenly become meaningful. The interpreter gives them meaning! It makes sense out of nonsense. And the computer, a machine that’s built on understanding ones and zeroes, now understands and acts upon this weird language we feed into it - thanks to an interpreter that translates this language while reading it. I kept asking myself: how does this work? And the first time this question began forming in my mind, I already knew that I’ll only be satisfied with an answer if I get to it by writing my own interpreter. So I set out to do so. A lot of books, articles, blog posts and tutorials on interpreters exist. Most of the time, though, they fall into one of two categories. Either they are huge, incredibly heavy on theory and more targeted towards people who already have a vast understanding of the topic, or they are really short, provide just a small introduction to the topic, use external tools as black boxes and only concern themselves with “toy interpreters”. One of the main sources of frustration was this latter category of resources, because the inter- preters they explain only interpret languages with a really simple syntax. I didn’t want to take a shortcut! I truly wanted to understand how interpreters work and that included understand- ing how lexers and parsers work. Especially with a C-like language and its curly braces and semicolons, where I didn’t even know how to start parsing them. The academic textbooks had the answers I was looking for, of course. But rather inaccessible to me, behind their lengthy, theoretical explanations and mathematical notation. What I wanted was something between the 900 page book on compilers and the blog post that explains how to write a Lisp interpreter in 50 lines of Ruby code. So I wrote this book, for you and me. This is the book I wish I had. This is a book for people who love to look under the hood. For people that love to learn by understanding how something really works. In this book we’re going to write our own interpreter for our own programming language - from scratch. We won’t be using any 3rd party tools and libraries. The result won’t be production- ready, it won’t have the performance of a fully-fledged interpreter and, of course, the language it’s built to interpret will be missing features. But we’re going to learn a lot. 4 It’s difficult to make generic statements about interpreters since the variety is so highandnone are alike. What can be said is that the one fundamental attribute they all share is that they take source code and evaluate it without producing some visible, intermediate result that can later be executed. That’s in contrast to compilers, which take source code and produce output in another language that the underlying system can understand. Some interpreters are really small, tiny, and do not even bother with a parsing step. They just interpret the input right away. Look at one of the many Brainfuck interpreters out there to see what I mean. On the other end of the spectrum are much more elaborate types of interpreters. Highly optimized and using advanced parsing and evaluation techniques. Some of them don’t just evaluate their input, but compile it into an internal representation called bytecode and then evaluate this. Even more advanced are JIT interpreters that compile the input just-in-time into native machine code that gets then executed. But then, in between those two categories, there are interpreters that parse the source code, build an abstract