Presentation on Ocaml Internals
Total Page:16
File Type:pdf, Size:1020Kb
OCaml Internals Implementation of an ML descendant Theophile Ranquet Ecole Pour l’Informatique et les Techniques Avancées SRS 2014 [email protected] November 14, 2013 2 of 113 Table of Contents Variants and subtyping System F Variants Type oddities worth noting Polymorphic variants Cyclic types Subtyping Weak types Implementation details α ! β Compilers Functional programming Values Why functional programming ? Allocation and garbage Combinatory logic : SKI collection The Curry-Howard Compiling correspondence Type inference OCaml and recursion 3 of 113 Variants A tagged union (also called variant, disjoint union, sum type, or algebraic data type) holds a value which may be one of several types, but only one at a time. This is very similar to the logical disjunction, in intuitionistic logic (by the Curry-Howard correspondance). 4 of 113 Variants are very convenient to represent data structures, and implement algorithms on these : 1 d a t a t y p e tree= Leaf 2 | Node of(int ∗ t r e e ∗ t r e e) 3 4 Node(5, Node(1,Leaf,Leaf), Node(3, Leaf, Node(4, Leaf, Leaf))) 5 1 3 4 1 fun countNodes(Leaf)=0 2 | countNodes(Node(int,left,right)) = 3 1 + countNodes(left)+ countNodes(right) 5 of 113 1 t y p e basic_color= 2 | Black| Red| Green| Yellow 3 | Blue| Magenta| Cyan| White 4 t y p e weight= Regular| Bold 5 t y p e color= 6 | Basic of basic_color ∗ w e i g h t 7 | RGB of int ∗ i n t ∗ i n t 8 | Gray of int 9 1 l e t color_to_int= function 2 | Basic(basic_color,weight) −> 3 l e t base= match weight with Bold −> 8 | Regular −> 0 in 4 base+ basic_color_to_int basic_color 5 | RGB(r,g,b) −> 16 +b+g ∗ 6 +r ∗ 36 6 | Grayi −> 232 +i 7 6 of 113 The limit of variants Say we want to handle a color representation with an alpha channel, but just for color_to_int (this implies we do not want to redefine our color type, this would be a hassle elsewhere). 1 t y p e extended_color= 2 | Basic of basic_color ∗ w e i g h t 3 | RGB of int ∗ i n t ∗ i n t 4 | Gray of int 5 | RGBA of int ∗ i n t ∗ i n t ∗ i n t 6 7 l e t extended_color_to_int= function 8 | RGBA(r,g,b,a) −> 256 +a+b ∗ 6 +g ∗ 36 +r ∗ 216 9 | (Basic_| RGB_| Gray_) as color −> color_to_int color 10 (∗ C h a r a c t e r s 154 −159: 11 E r r o r: This expression has type extended_color 12 but an expression was expected of type color ∗) 13 7 of 113 Table of Contents Variants and subtyping System F Variants Type oddities worth noting Polymorphic variants Cyclic types Subtyping Weak types Implementation details α ! β Compilers Functional programming Values Why functional programming ? Allocation and garbage Combinatory logic : SKI collection The Curry-Howard Compiling correspondence Type inference OCaml and recursion 8 of 113 Polymorphic variants Polymorphic variants are more flexible and syntactically more lightweight than ordinary variants, but that extra power comes at a cost. Syntactically, polymorphic variants are distinguished from ordinary variants by the leading backtick. And unlike ordinary variants, polymorphic variants can be used without an explicit type declaration : 1 l e t three=‘Int3 2 (∗ v a l three: [>‘Int of int]=‘Int3 ∗) 3 4 l e t li=[‘ On;‘Off] 5 (∗ v a l li: [>‘Off|‘On] list=[‘ On;‘Off] ∗) 9 of 113 [< and [> The > at the beginning of the variant types is critical because it marks the types as being open to combination with other variant types. We can read the type [> ‘On | ‘Off ] as describing a variant whose tags include ‘On and ‘Off, but may include more tags as well. In other words, you can roughly translate > to mean : "these tags or more." 1 ‘Unknown:: li 2 (∗− : [>‘Off|‘On|‘Unknown] list =[‘Unknown;‘On;‘Off] ∗) 3 4 l e tf= function‘A|‘B −> ( ) 5 (∗ v a lf: [<‘A|‘B] −> unit=<fun> ∗) 6 7 l e tg= function‘A|‘B|_ −> ( ) 8 (∗ v a lf: [>‘A|‘B] −> unit=<fun> ∗) 10 of 113 Extending types 1 l e tf= function‘A −> ‘C|‘B −> ‘D|x −>x 2 (∗ v a lf: ([>‘A|‘B|‘C|‘D] as’a) −>’a=<fun> ∗) 3 4 f‘E 5 (∗− : [>‘A|‘B|‘C|‘D|‘E]=‘E ∗) 6 7 f 8 (∗ v a lf: ([>‘A|‘B|‘C|‘D] as’a) −>’a=<fun> ∗) 9 11 of 113 Abbreviations Beware of the similarity : 1 t y p e ab=A|B 2 3 t y p e ab=[‘A|‘B] 1 l e tf(x:ab) = matchx withv −>v 2 (∗ v a lf: ab −> ab=<fun> ∗) 3 4 f‘A 5 (∗− : ab=‘A ∗) 6 7 f‘C 8 (∗ E r r o r: This expression has type [>‘C] 9 but an expression was expected of type ab 10 The second variant type does not allow tag(s)‘C ∗) 12 of 113 Abbreviations Useful shorthand : 1 l e tf= function‘C −> 1 | #ab −> 0 2 (∗ v a lf: [<‘A|‘B|‘C] −> int=<fun> ∗) 3 4 f‘A 5 (∗− : int=0 ∗) 6 7 f‘C 8 (∗− : int=1 ∗) 9 10 f‘D 11 (∗ E r r o r: This expression has type [>‘D] 12 but an expression was expected of type [<‘A|‘B|‘C] 13 The second variant type does not allow tag(s)‘D ∗) 14 13 of 113 The solution to our color problem 1 l e t extended_color_to_int= function 2 | ‘RGBA(r,g,b,a) −> 256 +a+b ∗ 6 +g ∗ 36 +r ∗ 216 3 | ( ‘Basic_|‘RGB_|‘Gray_) as color −> color_to_int c o l o r 4 5 (∗ v a l extended_color_to_int: 6 [<‘Basic of 7 [<‘Black 8 |‘Blue 9 |‘Cyan 10 |‘Green 11 |‘Magenta 12 |‘Red 13 |‘White 14 |‘Yellow] ∗ 15 [<‘Bold|‘Regular] 16 |‘Gray of int 17 |‘RGB of int ∗ i n t ∗ i n t 18 |‘RGBA of int ∗ i n t ∗ i n t ∗ i n t] −> 19 i n t=<fun> ∗) 14 of 113 Table of Contents Variants and subtyping System F Variants Type oddities worth noting Polymorphic variants Cyclic types Subtyping Weak types Implementation details α ! β Compilers Functional programming Values Why functional programming ? Allocation and garbage Combinatory logic : SKI collection The Curry-Howard Compiling correspondence Type inference OCaml and recursion 15 of 113 Subtype polymorphism In programming language theory, subtyping (also subtype polymorphism or inclusion polymorphism) is a form of type polymorphism in which a subtype is a datatype that is related to another datatype (the supertype) by some notion of substitutability, meaning that program elements, typically subroutines or functions, written to operate on elements of the supertype can also operate on elements of the subtype. 16 of 113 Subtypes 17 of 113 Subtype polymorphism (ctd.) In object-oriented programming the term ‘polymorphism’ is commonly used to refer solely to this subtype polymorphism, while the techniques of parametric polymorphism would be considered generic programming. In the branch of mathematical logic known as type theory, System F< :, pronounced "F-sub", is an extension of system F with subtyping. System F< : has been of central importance to programming language theory since the 1980s because the core of functional programming languages, like those in the ML family, support both parametric polymorphism and record subtyping, which can be expressed in System F< :. 18 of 113 Example : in Object Oriented Programming 1 t y p e farm= quadrupede list 2 3 l e t li: farm= new cat:: new dog::[] 4 (∗ E r r o r: This expression has type cat but an expression was e x p e c t e d of type quadrupede ∗) Unlike other languages, cat and dog being both derived from the quadrupede class isn’t enough to treat them both as quadrupedes. One must use an explicit coercion : 1 l e tx :> quadrupede= new cat 2 (∗ v a lx: quadrupede=<obj> ∗) 3 4 l e t li: farm=(new cat :> quadrupede)::(new dog :> quadrupede)::[] in 5 L i s t.iter(func −> print_endline(c#species())) li 19 of 113 Coercion on variants 1 l e tfx=(x:[‘A]:> [‘A|‘B]) 2 (∗ v a lf:[‘A] −>[‘A|‘B]=<fun> ∗) 3 4 funx −> (x :> [‘A|‘B|‘C]) 5 (∗− : [<‘A|‘B|‘C] −>[‘A|‘B|‘C]=<fun> ∗) On tuples : 1 l e tfx=(x:[‘A] ∗ [‘B]:>[ ‘A|‘C] ∗ [‘B|‘D]) 2 (∗ v a lf:[‘A] ∗ [‘B] −>[‘A|‘C] ∗ [‘B|‘D]=<fun > ∗) On arrow types : 1 l e tfx=(x:[‘A|‘B] −> [ ‘C]:>[ ‘A] −> [ ‘C|‘D]) 2 (∗ v a lf:([‘A|‘B] −>[‘C]) −>[‘A] −>[‘C|‘D]= <fun> ∗) 20 of 113 Variance : covariance and contravariance 21 of 113 Variance annotations : +a and -a A somewhat obscure corner of the language. For types in OCaml like tuple and arrow, one can use + and -, called "variance annotations", to state the essence of the subtyping rule for the type – namely the direction of subtyping needed on the component types in order to deduce subtyping on the compound type. 1 t y p e (+’a, +’b)t=’a ∗ ’b 2 (∗ t y p e(’a,’b)t=’a ∗ ’b ∗) 3 4 t y p e (+’a, +’b)t=’a −> ’b 5 (∗ E r r o r: In this definition, expected parameter variances are 6 not satisfied. The1st type parameter was expected 7 to be covariant, but it is injective contravariant. ∗) 8 9 t y p e( −’a, +’b)= ’a −> ’b 10 (∗ t y p e(’a,’b)t=’a −>’b ∗) 22 of 113 Variance annotations : +a and -a (ctd.) 1 moduleM: sig 2 t y p e(’a,’b)t 3 end= struct 4 t y p e(’a,’b)t=’a ∗ ’b 5 end 6 7 l e tfx=(x:([‘A],[‘B])M.t 8 :> ( [ ‘A|‘C],[‘B|‘D])M.t) 9 (∗ E r r o r: Type([‘A],[‘B])M.t is nota subtype of 10 ([‘A|‘C],[‘B|‘D])M.t 11 The first variant type does not allow tag(s)‘C ∗) 23 of 113 Variance annotations : +a and -a (ctd.) 1 moduleM: sig 2 t y p e (+’a, +’b)t 3 end= struct 4 t y p e(’a,’b)t=’a ∗ ’b 5 end 6 7 l e tfx=(x:([‘A],[‘B])M.t 8 :> ( [ ‘A|‘C],[‘B|‘D])M.t) 9 (∗ Ok ∗) 24 of 113 Table of Contents Variants and subtyping System F Variants Type oddities worth noting Polymorphic variants Cyclic types Subtyping Weak types Implementation details α ! β Compilers Functional programming Values Why functional programming ? Allocation and garbage Combinatory logic : SKI collection The Curry-Howard Compiling correspondence Type inference OCaml and recursion 25 of 113 Compilers 26 of 113 The OCaml toolchain Source code Lambda | /\ | parsing and preprocessing / \ closure conversion, | / \ inlining, uncurrying, | camlp4 syntax extensions v \ data representation | Bytecode \ strategy v | +----+ Parsetree (untyped AST) | Cmm | |ocamlrun | | type inference and checking | | code generation v | | assembly and Typedtree (type-annotated AST) | | linking | v v | pattern-matching compilation Interpreted Compiled | elimination of modules and classes v Lambda 27 of 113 Table of Contents Variants and subtyping System F Variants Type oddities worth noting Polymorphic variants Cyclic types Subtyping Weak types Implementation details α ! β Compilers Functional programming Values Why functional programming ? Allocation and garbage Combinatory logic : SKI collection The Curry-Howard Compiling correspondence Type inference OCaml and recursion 28 of 113 Value representation In a running OCaml program, a value is either an "integer-like thing" or a pointer to a block.