Compiling a Language

Compiling a Language

Universidade Federal de Minas Gerais – Department of Computer Science – Programming Languages Laboratory COMPILING A LANGUAGE DCC 888 Dealing with Programming Languages • LLVM gives developers many tools to interpret or compile a language: – The intermediate representaon – Lots of analyses and opDmiEaons Fhen is it worth designing a new • Fe can work on a language that already languageM exists, e.g., C, CKKI Lava, etc • Fe can design our own language. We need a front Machine independent Machine dependent end to convert optimizations, such as optimizations, such programs in the constant propagation as register allocation source language 2344555 to LLVM IR *+,-), 1#% ((0 !"#$%&'() '()./0 '()./0 '().- The Simple Calculator • To illustrate this capacity of LLAM, le2Ns design a very simple programming language: – A program is a funcDon applicaon – A funcDon contains only one argument x – Only the integer type exists – The funcDon body contains only addiDons, mulDplicaons, references to x, and integer constants in polish notaon: 1) Can you understand why we got each of these valuesM SR How is the grammar of our languageM The Architecture of Our Compiler !"#"$ %&$'"$ 2$34"$ (#)$ !!*05136' 1) Can you guess the meaning of the *&$(#)$ 0,1(#)$ diUerent arrowsM SR Can you guess the +,-(#)$ .//(#)$ role of each classM 3) Fhat would be a good execuDon mode for our systemM The Execuon Engine Our execuDon engine parses the expression, $> ./driver 4! converts it to a funcDon wriSen in LLAM IR, LIT * x x! Result: 16! compiles this funcDon, and runs it with the argument passed to the program in command $> ./driver 4! line. + x * x 2! Result: 12! Le2Ns start with our lexer. Which $> ./driver 4! ; ModuleID = NExampleN tokens do we * x + x 2! have? Result: 24! de]ne i32 @fun_i32 %xR*a* entry: %addtmp = add i32 %x, 2 %multmp = mul i32 %x, %addtmp ret i32 %multmp b* Lexer.h! The Lexer • A lexer is a program that divides a string of characters into tokens. – A token is a terminal in our grammar, e.g., cifndef LEdWR_H a symbol that is part of the alphabet of c(e]ne LEdWR_H our language. #include f'tringg* – Lexers can be easily implemented as class Lexer*a* ]nite automata. public: std::string getToken_R[ Lexer() : lastChar_N 'R*ab* private: 1) Again: which kind of char lastChar; tokens do we haveM inline char getNextChar_R*a* char c = lastChar; SR Can you guess the lastChar = getchar_R[ implementaon of the return c; getToken_R methodM } b[ cendif Lexer.cpp! Implementaon of the Lexer #include iLexer.hi* std::string Lexer::getToken_R*a* while (isspace_lastCharRR*a*lastChar = getchar_R[ } if (isalpha(lastCharRR*a* std::string idStr; do { idStr += getNextChar_R[ } while (isalnum_lastCharRR[ return idStr; } else if (isdigi2_lastCharRR*a* std::string numStr; do { numStr += getNextChar_R[ } while (isdigi2_lastCharRR[ return numStr; } else if (lastChar == EOFR a return ii; 1) Fould you be able to } else*a* represent this lexer as std::string operatorStr; a state machineM operatorStr = getNextChar_R[ return operatorStr; SR Fe must now de]ne } the parser. How can b* we implement i2M Parser.cpp! Parsing • Parsing is the act to transform a string of tokens in a syntax tree♤. cifndef PARSER_H 1) Fhat are these c(e]ne PARSER_H forward declaraons good forM #include f'tringg* class Expr; SR 0o you understand class Lexer; this syntaxM class Parser a 3) Fhat does the parser public: returnM Parser_Lexer* argLexer) : lexer_argLexerR*ab* Expr* parseExpr_R[ private: Lexer* lexer; b[ cendif ♤: it used to be one of the most important problems in computer science. Syntax Trees • The parser produces syntax trees. * x x + x * x 2 * x + x 2 * + * x x x * x + x 2 x 2 How can we implement these trees in CKKM Expr.h! The hodes of the Tree cifndef AST_H c(e]ne AST_H #include illvm/IR/IRBuilder.hi* class AddExpr : public Expr*a* class Expr*a* public: public: AddExpr(Expr* op1Arg, Expr* op2ArgR : op1(op1Arg), op2(op2Ar;R*ab* virtual mExpr_R*ab* virtual llvm::Aalue *gen_llvm::IRBuilderfg**builder, llvm::Aalue *gen_llvm::IRBuilderfg**builder, llvm::LLAMContextn conR const = o; llvm::LLAMContextn conR const; b[ private: const Expr* op1; const Expr* op2; class NumExpr : public Expr*a* public: b[ NumExpr(int arghum) : num_arghumR*ab* llvm::Aalue *gen_llvm::IRBuilderfg**builder, class MulExpr : public Expr*a* llvm::LLAMContextn conR const; public: MulExpr_Expr* op1Arg, Expr* op2ArgR : stac const unsigned int SYpEeYhT = 32; private: op1(op1Arg), op2(op2Ar;R*ab* const int num; llvm::Aalue *gen_llvm::IRBuilderfg**builder, b[ llvm::LLAMContextn conR const; private: const Expr* op1; class AarExpr : public Expr*a* const Expr* op2; public: There is a gen method llvm::Aalue *gen_llvm::IRBuilderfg**builder, b[ llvm::LLAMContextn conR const; that is a bit weird. Fe stac llvm::Aalue* varAalue; cendif shall look into it later. b[ .oing lack into the Parser • Our parser will build a syntax tree. + x * x 2 !"#$%# &%'()**+,-#. ((((&%'(/"#+,-.01 + ((((&%'(234+,-#. ((((((((&%'(/"#+,-#.01 ((((((((&%'(536+,-#.70 x * ((((0 0 x 2 The polish notaon really So, how can we simpli]es parsing. Fe implement our already have the tree, and parserM Lan rukasiewicEI father without parenthesesq of the Polish notaon Parser.cpp! The ParserNs Implementaon Expr* Parser::parseExpr_R*a* #include iExpr.hi* std::string tk = lexersggetToken_R[ #include iLexer.hi* if (tk == iiR a #include iParser.hi* return hULL; b else if _isdigi2_tktouRR a 1) Fhy checking the ]rst return new humExpr_atoi_tk.c_str_RRR[ character of each token is b else if _tktou == NxNR*a already enough to avoid any return new AarExpr_R[ ambiguityM b else if _tktou == NKNR a Expr *op1 = parseExpr_R[ SR how we need a way to Expr *op2 = parseExpr_R[ translate trees into LLAM IR. return new AddExpr_op1, op2R; How to do i2M b else if _tktou == NjNR a Expr *op1 = parseExpr_R[ !"#$%&'() '()156 "$; Expr *op2 = parseExpr_R[ <,!=), return new MulExpr_op1, op2R; } else*a* *+,-), ./#,12)"34 7#% 89: return hULL; } ./#,0 '()17#%1- b* Expr.cpp! The Translator #include iExpr.hi* Our implementaon has a llvm::Aalue* AarExpr::varAalue = hULL; small hack: our language llvm::Aalue* NumExpr::gen has only one variable, which _llvm::IRBuilderfg**builder, llvm::LLAMContext ncontext) const a we have decided to call Nx'. return llvm::ConstantInt::get This variable must be _llvm::Type::getInt32Ty_context), numR; represented by an LLAM b* value, which is the llvm::Aalue* AarExpr::gen _llvm::IRBuilderfg**builder, llvm::LLAMContext ncontext) const a argument of the funcDon llvm::Aalue* var = AarExpr::varAalue; that we will create. Thus, return var ? var : NULL; we need a way to inform b* the translator this value. Fe llvm::Aalue* AddExpr::gen do it through a stac _llvm::IRBuilderfg**builder, llvm::LLAMContext ncontext) const a llvm::Aalue* v1 = op1sggen_builder, context); variable varValue. That is llvm::Aalue* v2 = op2sggen_builder, context); the only stac variable that return buildersg6reateAdd_v1, v2, iaddtmpiR[* we are using in this class. b* llvm::Aalue* MulExpr::gen _llvm::IRBuilderfg**builder, llvm::LLAMContext ncontext) const a llvm::Aalue* v1 = op1sggen_builder, context); llvm::Aalue* v2 = op2sggen_builder, context); return buildersg6reateMul_v1, v2, imultmpiR[* b* Driver.cpp! The DriverNs Skeleton int main_int argc, char** argvR*a* The procedure if (argc q= 2R a that creates an llvm::errs_R ff*iInform an argument to your expression.vni[* LLAM funcDon is return 1; not that } else*a* complicated. Can llvm::LLAMContext context; you guess its llvm::Module *module = new llvm::Module_iExamplei, context); implementaonM llvm::Funcon *funcDon = createEntryFuncHon(module, context); modulesg(ump_R; llvm::ExecuonEngine* engine = createEngine(moduleR[ LIT_engine, funcDon, atoi_argvt1uRR; } b* !"#$%&'() '()156 "$; <,!=), *+,-), ./#,12)"34 7#% 89: ./#,0 '()17#%1- Driver.cpp! Creang an LLAM FuncDon llvm::Funcon *createEntryFuncDon( This code is not itha2i complicated, but it llvm::Module *module, is not super straighworward either, so we llvm::LLAMContext ncontext) a will go a bit more carefully over it. llvm::FuncHon *funcHon = llvm::castDllvm::FuncHonEF modul4HE1etOrInsertFuncHonFIfunIJ llvm::Type::1etInt32TyFcontextP, llvm::Type::getInt32TyFcontext), (llvm::Type *)0P ); llvm::lasicllock *bb = llvm::lasicllock::Create_context, ientryi, funcDonR; llvm::IRBuilderfg*builder(contextR[ builder.SetInsertPoin2_bbR[ llvm::Argument *argd*= funcDonsgarg_begin_R[ Le2Ns start with argdsgsetName_ixiR[* this humongous AarExpr::varAalue = argd; call. Fhat do you Lexer lexer; think it is doingM Parser parser_nlexerR[ Expr* expr = parser.parseExpr_R[ llvm::Aalue* re2Aal = exprsggen_nbuilder, context); builder.CreateRe2_re2AalR[ return funcDon; b* Driver.cpp! Creang an LLAM FuncDon llvm::Funcon *createEntryFuncDon( llvm::Module *module, llvm::LLAMContext ncontext) a llvm::Funcon *funcDon = llvm::castfllvm::Funcong_ modulesggetOrInsertFuncDon_ifuni, llvm::Type::getInt32Ty_context), llvm::Type::getInt32Ty_context), _llvm::Type jRoR ); llvm::BasicBlock *bb = llvm::BasicBlock::CreateFconteOt, "entry", funcHon); llvm::IRBuilderDE#builder(conteOt); builder.SetInsertPoint(bb); Here we are creang a funcDon llvm::Argument *argd*= funcDonsgarg_begin_R[ called ifuni that returns an argdsgsetName_ixiR[* integer, and receives an integer AarExpr::varAalue = argd; as a parameter. This cast has a And here, what Lexer lexer; variable number of arguments, are we doingM Parser parser_nlexerR[ and so we use a senDnel, e.g., Expr* expr = parser.parseExpr_R[ hULL, to indicate the end of llvm::Aalue* re2Aal = exprsggen_nbuilder, context);

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    31 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us