Types and Static Semantic Analysis Data Types Are Data Types
Total Page:16
File Type:pdf, Size:1020Kb
Data Types Are Data Types Necessary? What is a type? No: many languages operate just fine without them. Types and Static Semantic Analysis A restriction on the possible interpretations of a segment Assembly languages usually view memory as of memory or other program construct. undifferentiated array of bytes. Operators are typed, registers may be, data is not. COMS W4115 Useful for two reasons: Basic idea of stored-program computer is that programs Prof. Stephen A. Edwards Runtime optimization: earlier binding leads to fewer be indistinguishable from data. Spring 2002 runtime decisions. E.g., Addition in C efficient because Columbia University type of operands known. Everything’s a string in Tcl including numbers, lists, etc. Department of Computer Science Error avoidance: prevent programmer from putting round peg in square hole. E.g., In Java, can’t open a complex number, only a file. C’s Type System: Base Types/Pointers C’s Type System: Arrays, Functions C’s Type System: Structs and Unions Base types match typical processor Arrays Structures: each field has own storage Typical sizes: 8 16 32 64 char c[10]; /* c[0] ... c[9] are chars */ struct box { char short int long double a[10][3][2]; /* array of 10 int x, y, h, w; arrays of 3 arrays char *name; float double of 2 doubles */ }; Pointers (addresses) Functions Unions: fields share same memory int *i; /* i is a pointer to an int */ union token { char **j; /* j is a pointer to /* function of two arguments int i; a pointer to a char */ returning a char */ char foo(int, double); double d; char *s; }; Composite Types: Records Applications of Records Composite Types: Variant Records A record is an object with a collection of fields, each with a Records are the precursors of objects: A record object holds all of its fields. A variant record potentially different type. In C, holds only one of its fields at once. In C, Group and restrict what can be stored in an object, but not struct rectangle { what operations they permit. union token { int n, s, e, w; int i; Can fake object-oriented programming: char *label; float f; color col; struct poly { ... }; char *string; struct rectangle *next; }; }; struct poly *poly_create(); void poly_destroy(struct poly *p); union token t; struct rectangle r; void poly_draw(struct poly *p); t.i = 10; r.n = 10; void poly_move(struct poly *p, int x, int y); t.f = 3.14159; /* overwrites t.i */ r.label = "Rectangle"; int poly_area(struct poly *p); char *s = t.string; /* returns gibberish */ Applications of Variant Records Layout of Records and Unions Layout of Records and Unions A primitive form of polymorphism: Modern processors have byte-addressable memory. Modern memory systems read data in 32-, 64-, or 128-bit chunks: struct poly { 0 int x, y; 1 3 2 1 0 int type; 2 7 6 5 4 union { int radius; 11 10 9 8 int size; 3 float angle; } d; 4 Reading an aligned 32-bit value is fast: a single operation. }; Many data types (integers, addresses, floating-point 3 2 1 0 If poly.type == CIRCLE, use poly.d.radius. numbers) are wider than a byte. 7 6 5 4 If poly.type == SQUARE, use poly.d.size. 16-bit integer: 1 0 11 10 9 8 If poly.type == LINE, use poly.d.angle. 32-bit integer: 3 2 1 0 Layout of Records and Unions Layout of Records and Unions C’s Type System: Enumerations Slower to read an unaligned value: two reads plus shift. Most languages “pad” the layout of records to ensure enum weekday {sun, mon, tue, wed, alignment restrictions. 3 2 1 0 thu, fri, sat}; 7 6 5 4 struct padded { int x; /* 4 bytes */ enum weekday day = mon; 11 10 9 8 char z; /* 1 byte */ Enumeration constants in the same scope must be short y; /* 2 bytes */ unique: 6 5 4 3 char w; /* 1 byte */ }; enum days {sun, wed, sat}; SPARC prohibits unaligned accesses. x x x x enum class {mon, wed}; /* error: mon, wed MIPS has special unaligned load/store instructions. y y z : Added padding redefined */ x86, 68k run more slowly with unaligned accesses. w C’s Type System Strongly-typed Languages Statically-Typed Languages Types may be intermixed at will: Strongly-typed: no run-time type clashes. Statically-typed: compiler can determine types. struct { C is definitely not strongly-typed: Dynamically-typed: types determined at run time. int i; Is Java statically-typed? union { float g; class Foo { char (*one)(int); union { float f; int i } u; public void x() { ... } char (*two)(int, int); u.i = 3; } } u; g = u.f + 3.14159; /* u.f is meaningless */ class Bar extends Foo { double b[20][10]; Is Java strongly-typed? public void x() { ... } } *a[10]; Is Tiger strongly-typed? } Array of ten pointers to structures. Each structure contains void baz(Foo f) { an int, a 2D array of doubles, and a union that contains a f.x(); pointer to a char function of one or two arguments. } Polymorphism Polymorphism C++ Templates Say you write a sort routine: To sort doubles, only need to change a few types: template <class T> void sort(T a[], int n) { void sort(int a[], int n) void sort(double a[], int n) int i, j; { { for ( i = 0 ; i < n-1 ; i++ ) int i, j; int i, j; for ( j = i + 1 ; j < n ; j++ ) for ( i = 0 ; i < n-1 ; i++ ) for ( i = 0 ; i < n-1 ; i++ ) if (a[j] < a[i]) { for ( j = i + 1 ; j < n ; j++ ) for ( j = i + 1 ; j < n ; j++ ) T tmp = a[i]; if (a[j] < a[i]) { if (a[j] < a[i]) { a[i] = a[j]; int tmp = a[i]; double tmp = a[i]; a[j] = tmp; a[i] = a[j]; a[i] = a[j]; } a[j] = tmp; a[j] = tmp; } } } int a[10]; } } sort<int>(a, 10); C++ Templates Faking Polymorphism with Objects Faking Polymorphism with Objects C++ templates are essentially language-aware macros. class Sortable { This sort works with any array of objects derived from Each instance generates a different refinement of the bool lessthan(Sortable s) = 0; Sortable. same code. } Same code is used for every type of object. void sort(Sortable a[], int n) { sort<int>(a, 10); int i, j; Types resolved at run-time (dynamic method dispatch). for ( i = 0 ; i < n-1 ; i++ ) Does not run as quickly as the C++ template version. sort<double>(b, 30); for ( j = i + 1 ; j < n ; j++ ) if ( a[j].lessthan(a[i]) ) { sort<char *>(c, 20); Sortable tmp = a[i]; Fast code, but lots of it. a[i] = a[j]; a[j] = tmp; } } Arrays Array Address Calculation Allocating Arrays Most languages provide array types: In C, int a[10]; /* static */ char i[10]; /* C */ struct foo a[10]; void foo(int n) a[i] is at a + i ∗ sizeof(struct foo) { character(10) i ! FORTRAN struct foo a[10][20]; int b[15]; /* stacked */ int c[n]; /* stacked: tricky */ i : array (0..9) of character; -- Ada a[i][j] is at a + (j + 20 ∗ i) ∗ sizeof(struct foo) int d[]; /* on heap */ ) Array bounds must be known to access 2D+ arrays vector<int> e; /* on heap */ var i : array [0 .. 9] of char; { Pascal } d = new int[n*2]; /* fixes size */ e.append(1); /* may resize */ e.append(2); /* may resize */ } Allocating Fixed-Size Arrays Allocating Variable-Sized Arrays Allocating Variable-Sized Arrays Local arrays with fixed size are easy to stack. Variable-sized local arrays aren’t as easy. As always: return address FP add a level of indirection return address FP void foo(int n) return address FP a void foo() void foo(int n) a { a b-ptr { int a; { b[0] c int a; b[0] int a; int b[n]; . int b[10]; . int b[n]; b[0] . int c; . int c; } int c; . b[9] b[n-1] . } } c FP + 12 c FP + ? b[n-1] Doesn’t work: generated code expects a fixed offset for c. Even worse for multi-dimensional arrays. Variables remain constant offset from frame pointer. Static Semantic Analysis Tiger’s Type System Lexical analysis: Make sure tokens are valid Explicit built-in types: if i 3 "This" /* valid */ int 1 15 20 5 + 3 #a1123 /* invalid */ string "Hello There" Syntactic analysis: Makes sure tokens appear in correct Static Semantic Analysis order Implicit built-in types (unnamed): for i := 1 to 5 do 1 + break /* valid */ “nil” nil if i 3 /* invalid */ “void” (), a := 3, if a then b := 2 Semantic analysis: Makes sure program is consistent let v := 3 in v + 8 end /* valid */ let v := "f" in v(3) + v end /* invalid */ Tiger’s Type System Tiger’s Type System: Nil Tiger’s Type System: Void Array types: The nil keyword is a stand-in for the null pointer. There’s an implicit void type: type ia = array of int Can be assigned to any record, but nothing else. () type ba = array of bar let a := 3 type rec = { x : int, y : int } for i := 1 to 10 do b := b + i Record types: var a : rec := nil while i < 10 do i := i + 1 type point = { x : int, y : int, in if a = nil then a := rec { x=10, y=20 } Can’t be assigned to anything, but certain expressions name : string } end must be void. Is not a valid type a := () /* ERROR */ let for i := 1 to 10 do i /* ERROR */ var b := nil /* ERROR */ for i := 1 to 10 do ( i ; () ) /* OK */ in end Name vs.