Lesson 1: Symbol Tables 1
Total Page:16
File Type:pdf, Size:1020Kb
Lesson 1: Symbol Tables 1. Introduction 2. Name spaces 3. Organization 4. Block structured languages 5. Perspective 6. tabla.c and tabla.h 7. Exercises Readings: Scott, section 3.3 Munchnick, chapter 3 Aho, section 7.6 Fischer, chapter 8 Holub, section 6.3 Bennett, section 5.1 Cooper, sections 5.7 y B.4 12048 Compilers II - J. Neira – University of Zaragoza 1 The Symbol Table • Why is it important? – Lexical analysis? int a, 1t; – Syntactic analysis? – Semantic analysis? ¿?¿? – Code generation? – Code optimization? – Execution? while a then ... • Why is it particular and complex? – What information does it contain? ¿?¿? – How/when is information included? – How/when is it accessed? – How/when is it deleted? • Do interpreters need one? c := i > 1; • Do debuggers? ¿? ¿? • Unassemblers? ¿? ¿? 12048 Compilers II - J. Neira – University of Zaragoza 2 1. Introduction Symbol table: structure It can additionally include: used by the compiler to store – temporary symbols information (in the form of attributes) associated to – labels symbols declared in the – Predefined symbols program. •Conceptually it is a set of records •Program dictionary •Its organization is strongly •name: lexical analysis influenced by syntactic as •type: syntactic analysis well as semantic aspects of •scope: semantic analysis the language at hand. •address: code generation • Types available in the language: determine the CONTENTS of the table. • Scope rules: determine the visibility of the symbols, i.e., the table’s ACCESS MECHANISM. 12048 Compilers II - J. Neira – University of Zaragoza 3 Table contents • Reserved words: they have a • Literals, constants that denote a special meaning; they CANNOT certain value be redefined. • Symbols generated by the program begin end type compiler var array if ... var a: record • Predefined symbols: also have b,c : integer; a special meaning, but can be end; redefined. – Generates symbol noname1 for sin cos get put read the anonymous type correspon- write integer! real! ding to the record. •Symbols predefined by the programmer –Variables: type, place in memory, value? references? –Data types: description –Procedures and functions: address, parameters, result type –Parameters: type of variable, parameter class –Labels: place in the program 12048 Compilers II - J. Neira – University of Zaragoza 4 Query When processing ............... the compiler ...............…. • declarations queries the table to prevent illegal duplication of symbol names. • statements queries the table to verify that the involved symbols are accessible and used correctly. const c = v; var v : t; • Does identifier c exist? • Identifier v exists? v := f (a, b + 1); • Is type t declared? •Is f a function? v := e; • Does the number of •Is v defined? Is it a variable? arguments agree with the Which is its address? number of parameters? •Is e defined? Is it a variable? • Do the types of the arguments agre with those of the • Is it the same type as v (or of a parameters? compatible type)? • Is the use of arguments 12048 Compilers II - J. Neira – University of Zaragozacorrect? 5 Update When processing ............... the compiler ...............…. • Declarations updates the table to include new symbols. • scopes updates the table to modify the visibility of symbols. const c = v; var v : t; – Include c of type v and its value – Include v with type t – Assign a location in memory? – Assign it a position in memory end; – Delete (or hide) all symbols of the block (scope) that is closing. function f (i : integer) : integer; procedure P (i : integer; var b : boolean); – Include f (as function) – Open a new scope – Include P (procedure) –Include f (as assignment variable) – Open a new scope –Include i (as parameter) –Include i and b (parameters) 12048 Compilers II - J. Neira – University of Zaragoza 6 Requirements • Speed: query is the most • Easy maintenance: frequent operation. identifier deletion must be simple O(n)? – It is not random O(log2(n))? O(1)? • Duplicate identifiers • Efficiency in space should be allowed: most management: a large languages allow it. It must amount of information is be clear which are stored. accessible at each moment. • Flexibility: the possibility of defining types makes the declaration of variables arbitrarily complex. 12048 Compilers II - J. Neira – University of Zaragoza 7 Requirements program e; e(): var a, b, c : char; a, b, c procedure f; f, g, j var a, b, c : char; ... procedure g; var a, b : char; g(): f(): j(): procedure h; a, b var c, d : char; a, b, c b, d ... h, i procedure i; var b, c : char; ... ... procedure j; h(): i(): var b, d : char; c, d b, c ... ... Program being compiled: e() f() g() h() i() j() e() h() i() f() g() g() g() j() e() e() e() e() e() e() e() Symbol table: 12048 Compilers II - J. Neira – University of Zaragoza 8 2. Names • Remember: conceptually, we store records with the struct { name and attributes of char name[MAX]; each symbol. ... } table_entry; •A storage and search mechanism by name is required. Space for the longest Possible identifier • Alternative 1: define the name attributes name field as a vector of base ... characters. indice • FORTRAN IV: number of comienzo significant characters x severely limited (6). It is not so in modern languages. • Variability in the length of names -> space is wasted. 12048 Compilers II - J. Neira – University of Zaragoza 9 Use of the Heap • Alternative 2: define the name field as a pointer to char. name attributes HEAP ...base ... indice struct { ... comienzo char *nombre; ... ... } entrada_tabla; x ... ... e->nombre = strdup(nombre); • Obtaining space may be slow • Space reuse depends on the heap memory recovering mechanism. • The requirements for a symbol table are simpler than those offered by heaps. 12048 Compilers II - J. Neira – University of Zaragoza 10 Name space • Alternative 3: define the name field as an index in a vector of names. Name space bb a a s s e e i i n n d d i i c c e e c c o o m m i i e e n n z z o o x x free struct { int nombre; ... name attributes } entrada_tabla; .... char espacio_nombres[MAX]; • Administration is local • Space can be reused • Space is not ‘unlimited’ 12048 Compilers II - J. Neira – University of Zaragoza 11 Name space •Small:it may be WhatWhat is is the the appropriate appropriate size size insufficient forfor the the name name space? space? •Large:space may be wasted • Solution: segmented name space Array – segment: of Vector of size name div s pointers s 0 bb a a s s e e i i n n d d i i c c e e – Index in segment: c o m i e n z o x name mod s c o m i e n z o x T name = segment * s + index • Overall size is limited by the size of the vector of pointers (T=50 pointers to 1024 chars = 50k) 12048 Compilers II - J. Neira – University of Zaragoza 12 3. Organization • Three basic operations: – search() – include() – delete() • Alternative 1: unordered list search() O(n) include() O(1) name attributes delete() O(1) 0 name attributes p Include here n u M-1 Include here 12048 Compilers II - J. Neira – University of Zaragoza 13 3. Organization • Aternative 2: List ordered by name name attributes name attributes 0 p u n Insertion where? M-1 search() O(log2(n)) O(n) include() O(log2(n))+O(n) O(n)+O(1) delete() ? ? TheThe orderorder ofof inclusioninclusion isis lost!lost! 12048 Compilers II - J. Neira – University of Zaragoza 14 3. Organization • Alternative 3: Binary trees Only if balanced! name attributes search() O(log2(n)) include() O(log2(n))+O(1) delete() ? • In the worst case, the cost of each operation is the same as the ordered list. i i var i, j var i, m j, m, k, k j, j l, l l, l m : integer; m k: integer; k • There is no guarantee that the tree will be balanced (names are not random). 12048 Compilers II - J. Neira – University of Zaragoza 15 3. Organization • Alternative 4: hash • Collisions: two different tables. Mechanism to sequences may be randomly distribute an associated with the same arbitrary number of items index. into a finite set of classes. 0 base i x management?management? h(’base’) = 0 h(’x’) = i ... h(’comienzo’) = j j comienzo M-1 ¿¿hh?? • With an appropriate hash • Hash function h: function and an adequate associates a character collision management, you sequence with a hash code can search() in constant = index in the table. time . 12048 Compilers II - J. Neira – University of Zaragoza 16 The hash function h • Desirable characteristics: – h(s) should depend only on s – Efficiency: it must be simple and easy to compute – Efficacy: it should produce small collision lists » Uniform: all indexes should be assigned with equal probability » Randomizing: similar names should go to different indexes • Birthday paradox: –GivenN names, and a table of size M – Let h be uniform and randomizing. The number of expected insertions before a collision is: M sqrt(pi M/2) 10 4 Random numbers between 0 y 100: 365 24 1.000 40 84 35 45 32 89 1 58 16 38 69 5 90 16 53 61 ... 10.000 125 Collision at 13th. 100.000 396 12048 Compilers II - J. Neira – University of Zaragoza 17 Examples Method of division function hash_add(s : string; M : integer) : integer; var k : integer; begin k := 0; for i := 1 to length(s) do k := k + ord(s[i]); hash_add := k mod M; end; • Size = 19, ids A0...A199 • Size = 100, ids A0...A99 0- 50- - - 0-- ********************* - - - ********** - - - ********** - - - ********** - - - ********** - - - ********** - - - *********** - -**-* - ******************** -* - *** - *********** -* - ***** - *********** -* - ************* - *********** -* - ******** - *********** -* - ********* - *********** --* - ***************