GHC Reading Guide
Total Page:16
File Type:pdf, Size:1020Kb
GHC Reading Guide - Exploring entrances and mental models to the source code - Takenobu T. Rev. 0.01.1 WIP NOTE: - This is not an official document by the ghc development team. - Please refer to the official documents in detail. - Don't forget “semantics”. It's very important. - This is written for ghc 9.0. Contents Introduction 1. Compiler - Compilation pipeline - Each pipeline stages - Intermediate language syntax - Call graph 2. Runtime system 3. Core libraries Appendix References Introduction Introduction Official resources are here GHC source repository : The GHC Commentary (for developers) : https://gitlab.haskell.org/ghc/ghc https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary GHC Documentation (for users) : * master HEAD https://ghc.gitlab.haskell.org/ghc/doc/ * latest major release https://downloads.haskell.org/~ghc/latest/docs/html/ * version specified https://downloads.haskell.org/~ghc/9.0.1/docs/html/ The User's Guide Core Libraries GHC API Introduction The GHC = Compiler + Runtime System (RTS) + Core Libraries Haskell source (.hs) GHC compiler RuntimeSystem Core Libraries object (.o) (libHsRts.o) (GHC.Base, ...) Linker Executable binary including the RTS (* static link case) Introduction Each division is located in the GHC source tree GHC source repository : https://gitlab.haskell.org/ghc/ghc compiler/ ... compiler sources rts/ ... runtime system sources libraries/ ... core library sources ghc/ ... compiler main includes/ ... include files testsuite/ ... test suites nofib/ ... performance tests mk/ ... build system hadrian/ ... hadrian build system docs/ ... documents : : 1. Compiler 1. Compiler Compilation pipeline 1. compiler The GHC compiler Haskell language GHC compiler Assembly language (native or llvm) 1. compiler GHC compiler comprises pipeline stages GHC compiler Haskell language Parser Renamer Type checker Desugarer Core to Core Core to STG STG to Cmm Assembly language Cmm to Asm (native or llvm) 1. compiler Pipeline stages process with intermediate languages GHC compiler Intermediate languages Haskell language Parser Renamer Haskell abstract syntax (HsSyn) Type checker Desugarer Core language Core to Core Core to STG STG language STG to Cmm Cmm language Cmm to Asm Assembly language (native or llvm) 1. compiler Each code is located in GHC compiler Parser [compiler/GHC/Parser] Renamer [compiler/GHC/Rename] Type checker [compiler/GHC/Tc] Desugarer [compiler/GHC/HsToCore] Core to Core [compiler/GHC/Core/Opt] Core to STG [compiler/GHC/CoreToStg] STG to Cmm [compiler/GHC/StgToCmm] Cmm to Asm [compiler/GHC/CmmTo{Asm, Llvm}] 1. Compiler Each pipeline stages 1. compiler Parser HsSyn Haskell source (Haskell Abstract Syntax) FunBind f x = inc x Parser f x HsApp (reader) inc x Abstracted - Parsing a Haskell source file - Checking user syntax errors - etc. 1. compiler Renamer HsSyn HsSyn FunBind FunBind f x HsApp Renamer f_1 x_2 HsApp inc x inc_3 x_2 Unique named - Resolving all of the identifiers - Rearranging infix expressions - Checking user scope errors - Building global environments - etc. 1. compiler Type checker HsSyn HsSyn FunBind FunBind f_1 x_2 HsApp Type checker f_1 x_2 HsApp @ Int -> Int @ Int inc_3 x_2 inc_3 x_2 @ Int -> Int @ Int Explicitly typed - Resolving/Inferring types - Decorating AST with types - Checking user type errors - etc. 1. compiler Desugarer HsSyn Core language FunBind Bind f_1 x_2 HsApp Desugarer Lam @ Int -> Int @ Int (HsToCore) App inc_3 x_2 @ Int -> Int @ Int Squeezed into λ calculus - Desugaring HsSyn to Core - Checking pattern-match overlap - etc. 1. compiler Core to Core Core language Core language Bind Bind Lam Core to Core App Optimized - Optimizing Core (simplifier, ...) - Checking typechecker's result with Lint - Tidying Core to Core - etc. 1. compiler Core to Stg Core language STG language StgBind Bind StgRhsClosure Core to STG StgCase register heap stack - Transforming Core to STG - etc. Operationally mapped 1. compiler STG to Cmm STG language Cmm language StgBind StgRhsClosure STG to Cmm StgCase (code gen) register heap stack - Optimizing STG Instruction blocked - Transforming STG to Cmm - etc. 1. compiler Cmm to Assembly Cmm language Assembly/LLVM language add $0x2, %rbx Cmm to Asm jmpq *0x0(%rbp) (native code gen) : Machine coded - Optimizing Cmm - Generating Asm - etc. 1. Compiler Intermediate language syntax 1. compiler HsSyn (Haskell abstract syntax) [compiler/GHC/Hs/Decls.hs] [compiler/GHC/Hs/Expr.hs] data HsDecl p data HsExpr p = TyClD ... -- Type or Class Declaration = HsVar ... | InstD ... -- Instance declaration | HsUnboundVar ... | HsConLikeOut ... | DerivD ... -- Deriving declaration | HsRecFld ... | ValD ... -- Value declaration | HsOverLabel ... | SigD ... -- Signature declaration | HsIPVar ... | KindSigD ... -- Standalone kind signature | HsOverLit ... | DefD ... -- 'default' declaration | HsLit ... | ForD ... -- Foreign declaration | HsLam ... | WarningD ... -- Warning declaration | HsLamCase ... | HsApp ... | AnnD ... -- Annotation declaration | HsAppType ... | RuleD ... -- Rule declaration | OpApp ... | SpliceD ... -- Splice declaration | NegApp ... | DocD ... -- Documentation comment declaration | HsPar ... | RoleAnnotD ... -- Role annotation declaration | SectionL ... | XHsDecl ... | SectionR ... | ExplicitTuple | ExplicitSum [compiler/GHC/Hs/Binds.hs] | HsCase ... data HsBindLR idL idR | HsIf ... = FunBind ... -- Function-like Binding | HsMultiIf ... | HsLet ... | PatBind ... -- Pattern Binding | HsDo ... | VarBind ... -- Variable Binding | ExplicitList | AbsBinds ... -- Abstraction Bindings | RecordCon | PatSynBind ... -- Patterns Synonym Binding | RecordUpd | XHsBindsLR ... | ExprWithTySig | ArithSeq : An abstract syntax corresponding to Haskell user source. 1. compiler Core language [compiler/GHC/Core.hs] type CoreProgram = [CoreBind] type CoreBndr = Var type CoreExpr = Expr CoreBndr type CoreArg = Arg CoreBndr type CoreBind = Bind CoreBndr type CoreAlt = Alt CoreBndr data Expr b = Var Id -- Variable | Lit Literal -- Literal | App (Expr b) (Arg b) -- Application | Lam b (Expr b) -- Lambda abstraction | Let (Bind b) (Expr b) -- Variable binding | Case (Expr b) b Type [Alt b] -- Pattern match | Cast (Expr b) Coercion -- Cast | Tick (Tickish Id) (Expr b) -- Internal note | Type Type -- Type | Coercion Coercion -- Coercion A tiny explicitly-typed functional language. Only ten data constructors based on System FC. 1. compiler STG language [compiler/GHC/Stg/Syntax.hs] data GenStgTopBinding pass = StgTopLifted (GenStgBinding pass) | StgTopStringLit Id ByteString data GenStgBinding pass = StgNonRec (BinderP pass) (GenStgRhs pass) | StgRec [(BinderP pass, GenStgRhs pass)] data GenStgRhs pass = StgRhsClosure (XRhsClosure pass) CostCentreStack !UpdateFlag [BinderP pass] (GenStgExpr pass) | StgRhsCon CostCentreStack DataCon [StgArg] data GenStgExpr pass = StgApp Id [StgArg] | StgLit Literal | StgConApp DataCon [StgArg] [Type] | StgOpApp StgOp [StgArg] Type | StgLam (NonEmpty (BinderP pass)) StgExpr | StgCase (GenStgExpr pass) (BinderP pass) AltType [GenStgAlt pass] | StgLet (XLet pass) (GenStgBinding pass) (GenStgExprpass) | StgLetNoEscape (XLetNoEscape pass) (GenStgBinding pass) (GenStgExpr pass) | StgTick (Tickish Id) (GenStgExpr pass) A very small purely-functional language with the abstract machine (STG-machine) semantics. 1. compiler Cmm language [compiler/GHC/Cmm.hs] type CmmProgram = [CmmGroup] type CmmGroup = GenCmmGroup CmmStatics CmmTopInfo CmmGraph type CmmGraph = GenCmmGraph CmmNode [compiler/GHC/Cmm/Node.hs] data CmmNode e x where CmmEntry ... -- Entry CmmComment ... -- Comment CmmTick ... -- Tick annotation CmmUnwind ... -- Unwind pseudo-instruction CmmAssign :: !CmmReg -> !CmmExpr -> CmmNode O O -- Assign to register CmmStore ... -- Assign to memory location CmmUnsafeForeignCall ... -- An unsafe foreign call CmmBranch ... -- Goto another block CmmCondBranch ... -- Conditional branch CmmSwitch ... -- Switch CmmCall ... -- A native call or tail call CmmForeignCall ... -- A safe foreign call [compiler/GHC/Cmm/Expr.hs] data CmmExpr = CmmLit CmmLit -- Literal | CmmLoad !CmmExpr !CmmType -- Read memory location | CmmReg !CmmReg -- Contents of register | CmmMachOp MachOp [CmmExpr] -- Machine operation (+, -, *, etc.) | CmmStackSlot Area {-# UNPACK #-} !Int | CmmRegOff !CmmReg Int A low-level imperative language with an explicit stack. 1. Compiler Call graph 1. compiler An example of a call graph main [ghc/Main.hs] main' [ghc/Main.hs] oneShot [compiler/GHC/Driver/Pipeline.hs] compileFile [compiler/GHC/Driver/Pipeline.hs] runPipeline [compiler/GHC/Driver/Pipeline.hs] runPipeline' [compiler/GHC/Driver/Pipeline.hs] pipeLoop [compiler/GHC/Driver/Pipeline.hs] runHookedPhase [compiler/GHC/Driver/Pipeline.hs] runPhase [compiler/GHC/Driver/Pipeline.hs] hscIncrementalCompile [compiler/GHC/Driver/Main.hs] hscIncrementalFrontend [compiler/GHC/Driver/Main.hs] hsc_typecheck [compiler/GHC/Driver/Main.hs] hscParse' [compiler/GHC/Driver/Main.hs] Parser parseModule [compiler/GHC/Parser.y] tcRnModule' [compiler/GHC/Driver/Main.hs] Renamer and tcRnModule [compiler/GHC/Tc/Module.hs] Type checker tcRnModuleTcRnM [compiler/GHC/Tc/Module.hs] tcRnSrcDecls [compiler/GHC/Tc/Module.hs] tc_rn_src_decls [compiler/GHC/Tc/Module.hs] rnTopSrcDecls [compiler/GHC/Tc/Module.hs] tcTopSrcDecls [compiler/GHC/Tc/Module.hs] finish [compiler/GHC/Driver/Main.hs] hscDesugar' [compiler/GHC/Driver/Main.hs] Desugarer deSugar [compiler/GHC/HsToCore.hs] hscSimplify' [compiler/GHC/Driver/Main.hs] Core to Core core2core [compiler/GHC/Core/Opt/Pipeline.hs] hscGenHardCode [compiler/GHC/Driver/Main.hs]