Chapter 6 Data Types

Topics Introduction Chapter 6 Type Information Data Type

n Specification of a Data Type Data Types Primitive Data Types n Numeric Data Types Integers Floating-Point Numbers Fixed-Point Real Numbers

n Boolean Types

n Character Types

Chapter 6: Data Types 2

Topics Topics

Vectors Composite Data Types Arrays Character Strings Slices User-Defined Ordinal Type Associative Arrays n Enumerations Records n Subranges Unions Structure Types Lists Sets Pointers

Chapter 6: Data Types 3 Chapter 6: Data Types 4

Introduction Introduction

Every program uses data, either explicitly Basic differences among languages exist or implicitly, to arrive at a result. in the types of data allowed, in the types of operations available, and in the All programs specify a set of operations mechanism provided for controlling the that are to be applied to certain data in a sequence in which the operations are certain sequence. applied to the data. Data in its most primitive forms inside a Most programming languages provide a computer is just a collection of bits. set of simple data entities as well as mechanism for constructing new data entities from these.

Chapter 6: Data Types 5 Chapter 6: Data Types 6

1 Type Information Type Information: implicit

Program data can be classified n Implicit type information includes the types of according to their types. constants and values, types that can be inferred from a name convention, and types can be inferred from context. Type information can be contained in a Example: number 2 is implicitly an integer in most program either implicitly or explicitly. languages Example: TRUE is Boolean Example: variable I in FORTRAN is, in the absence of other information, an integer variable.

Chapter 6: Data Types 7 Chapter 6: Data Types 8

Type Information: explicit Data Type

n Explicit type information is primarily A data type is a set of values that can be specified contained in declarations. in many ways:

Variables can be declared to be of specific n Explicitly listed

types. n Enumerated

Example: var x: array[1..10] of integer; n Given as a subrange of known values

Example: var b: boolean; n Borrowed from mathematics A set of values also has a group of operations that can be applied to the values.

n These operations are often not mentioned explicitly with the type, but are part of its definition.

Chapter 6: Data Types 9 Chapter 6: Data Types 10

Data Type: definition Specification of a Data Type

A data type is a set of values, together Basic elements of a specification of a with a set of operations on those values data type: having certain properties. 1. The attributes that distinguish data objects Every language comes with a set of and types. predefined types from which all other 2. The values that data objects of that type types are constructed. may have, and 3. The operations that defined the possible n Provide facility to allow the programmer defined new data types. manipulations of data objects of that type.

Chapter 6: Data Types 11 Chapter 6: Data Types 12

2 Specification of a Data Type: Primitive Data Types example Specification of an array: Algol-like languages (Pascal, Algol68, C,

1. The attributes might include the number of Modula-2, Ada, C++), all classify types dimensions, the subscript range for each dimension, according to a basic scheme, with minor and the data type of the components. variations. n 2. The values would be the sets of numbers that form Names used are often different, even though the valid values for array components. concepts are the same.

3. The operations would include subscripting to select Primitive types are also called base types or individual array components and possibly other scalar types or unstructured types or elementary operations to create arrays, change their shape, types. access attributes such as upper and lower bounds A scalar type is a type whose elements consist of subscripts, and perform arithmetic on pair of of indivisible entities (a single data value or arrays. attribute). Chapter 6: Data Types 13 Chapter 6: Data Types 14

Numeric Data Types Integers: specification

Some form of numeric data is found in The set of integer values defined for the almost every programming language. type forms an ordered subset, within some finite bounds, of the infinite set of integers n Integers studied in mathematics. n Floating-Point Real Numbers n The maximum integer value is sometimes n Fixed-Point Real Numbers represented as a defined constant Example: In Pascal, the constant maxint The range of values is defined to be from –maxint to maxint

n Some languages, such as C, have different integer specifications: short, long

Chapter 6: Data Types 15 Chapter 6: Data Types 16

Integers: specification Integers: specification

Operations on integer data objects include n Relational Operations: Includes equal, not equal, less-than, greater-than, the main groups: less-than-or-equal, and greater-than-or-equal. n Arithmetic Operations: RelOp: integer x integer à Boolean

Binary arithmetic operations such as addition (+), n Assignment subtraction (-), multiplication (*), division (/ or div), assignment: integer x integer à integer remainder (mod). n Bit Operations BinOp: integer x integer à integer In some languages, integers fulfill many roles. Unary arithmetic operations such as negation (-) or In C, integer also play the role of Boolean values. identity (+). C includes operations to and the bits together (&), UnaryOp: integer à integer or the bits together (|), and shift the bits (<<).

Chapter 6: Data Types 17 Chapter 6: Data Types 18

3 Integers: implementation Integers: implementation

Implemented using hardware-defined No descriptor integer storage representation and set of n Only the value is stored. hardware possible storage n Possible if the language provides representations for integers. declarations and static type checking for Some possible storage representation integer data objects. for integers. Sign bit

Binary integer

Chapter 6: Data Types 19 Chapter 6: Data Types 20

Integers: implementation Integers: implementation Descriptor separated Descriptor in the same word

n Stores the description in a separate memory location, n Descriptor and value are stored in a single memory with a pointer to the full-word integer value. location by shortening the size of the integer sufficiently

n Often used in LISP. to provide space for the descriptor. Storage is conserved. n Disadvantage: It may double the storage required. Hardware operations: clearing the descriptor from the integer n Advantage: because the value is stored using the build- data object, perform the arithmetic, and then reinserting the in hardware representation, the hardware arithmetic descriptor. operations may be used. Arithmetic is inefficient. Sign bit Type descriptor Sign bit Type description Binary integer Binary integer

Chapter 6: Data Types 21 Chapter 6: Data Types 22

Floating-Point Numbers: Floating-Point Numbers: specification specification Sometimes called real, as in FORTRAN. Same arithmetic, relational, and assignment Also called float, as in C. operations as for integers. As with type integer, the values form an n Boolean are sometimes restricted. ordered sequence from some hardware- n Due to roundoff issues: equality between two determined minimum negative value to a real numbers is rarely achieved. maximum value, but the values are not Equality may be prohibited by the language designer. n Most languages provide other operations as distributed evenly across this range. build-in functions:

n Sine: Sin: real à real

n Maximum value: max: real x real à real

Chapter 6: Data Types 23 Chapter 6: Data Types 24

4 Floating-Point Numbers: Floating-Point Numbers: implementation implementation Storage representation based on an This model emulates scientific notation k underlying hardware representation. n Number N can be expressed as N=mx2 n IEEE Standard 754 n Mantissa (i.e. the significant digits of the 32 and 64 bit standard number) n Numbers consist of 3 fields: n Exponent S : a one-bit sign field. 0 is positive E: Values (8 bits) range from 0 to 355 corresponding to Exponent sign bit exponents of 2 that ranges from –127 to 128 Sign bit M: a mantissa of 23 bits. The first number is always 1 then it is Exponent Mantissa inserted automatically by the hardware yielding an extra 24th bit of precision.

S exponent mantissa

Chapter 6: Data Types 25 Chapter 6: Data Types 26

Floating-Point Numbers: Fixed-Point Real Numbers: implementation specification A double-precision form is available in A fixed-point number is represented as a many programming languages. digit sequence of fixed length, with the

n Additional memory word is used to store an decimal point position at a given point extended mantissa. between two digits.

8 bits 23 bits n Example: In COBOL:

S exponent mantissa X PICTURE 999V99 declares X as a fixed-point variable with 3 11 bits 52 bits digits before the decimal and 2 digits after.

S exponent mantissa

Chapter 6: Data Types 27 Chapter 6: Data Types 28

Fixed-Point Real Numbers: Fixed-Point Real Numbers: implementation implementation

A fixed-point type may be directly n If X has the value 103.421, then r-value of X will be supported by the hardware or may be 103 421, and the object X will have an attribute scale factor (SF) of three (the decimal point is 3 places to simulated by software. the left).

n Example: In PL/I, fixed data are of type Fixed n The statement Z = X + Y Decimal. Shift Y left one position (equivalent to multiplying the integral r-value of Y by 10) DECLARE X FIXED DECIMAL (10,3), The sum will have 3 decimal digits (SF=3) Y FIXED DECIMAL (10,2), Because Z has only 2 decimal places (SF=2) and the sum Z FIXED DECIMAL (10,2); has 3, remove one place (divide by 10). Data is stored as integers, with the decimal point Subtraction and division are handled in an being an attribute of the data object. analogous manner.

Chapter 6: Data Types 29 Chapter 6: Data Types 30

5 Boolean Types: specification Boolean Types: implementation

Data objects having one of two values (true of Storage is a single bit providing no descriptor false). Because single bits may not be separately The most common operations on Boolean types addressable in memory, its storage is extended to include assignment and the following logical be a single addressable unit such as a byte. operations: The values true and false might be represented in 2 n and: Boolean x Boolean à Boolean (conjunction) ways within this storage unit: n or: Boolean x Boolean à Boolean (inclusive disjunction) n A particular bit is used for the value (often, the sign bit of n not: Boolean à Boolean (negation or complement) the number representation), with 0=false and 1=true , and Other Boolean operations: equivalence, the rest of the byte is ignored. exclusive or, implication, nand (not-and), and n A zero value in the entire storage unit represents false, nor (not-or). and any other nonzero value represents true. Chapter 6: Data Types 31 Chapter 6: Data Types 32

Programming Language Character Types: specification Problem The set of possible character values is Find the right mechanisms to allow the usually taken to be a language-defined programmer to create and manipulate object appropriate to the problem at hand. enumeration corresponding to the standard n Language design: simplicity, efficiency, generality, character sets supported (I.e. ASCII). etc. Operations on character data include only A PL is strongly typed if all type checking can be done at compile time. the relational operations, assignments, and A PL is type complete if all objects in the sometimes operations to test whether a language have equal status. character is one of the special classes letter, n In some languages objects of certain types are digit, or special character. restricted.

Chapter 6: Data Types 33 Chapter 6: Data Types 34

Structured Types Composite Data Types A structured type is a compound type. Usually considered elementary data n Arrays, records, tuples, lists, sets, functions, etc. objects. n Two kinds of structured types: Their implementation usually involves a Heterogeneous: elements of different types. Homogeneous: elements of the same type. complex data structure organization by the compiler. homogeneous heterogeneous Multiple attributes are often given for each static record data type. dynamic array

Dynamic selection à homogeneous à compiler will not know which one will be selected at run time Chapter 6: Data Types 35 Chapter 6: Data Types 36

6 Character Strings: specification Character Strings and syntax Data objects composed of a sequence of At least 3 different treatments: characters. 1. Fixed declared length. It is important in most languages Value assigned: a character string of a n Used for data input and output. certain length. Design Issues Assignment of a new string value results in a length adjustment of the new string through n Should strings be a special kind of character truncation of excess characters or addition array or a primitive type (no array-style of blank characters to produce a string of the subscripting operations)? corre4ct length. n Should string have static or dynamic length? Storage allocation is determined at translation time. Chapter 6: Data Types 37 Chapter 6: Data Types 38

Character Strings: specification Character Strings: C and syntax 2. Variable length to a declared bound. Strings are arrays of characters (no string The string may have a maximumlength that is declared declaration). previously. The actual value stored may be a string of shorter Convention: null character (“\0”) follows length (even the empty string). the last character of a string. During execution, the length of the string value may Every string, when stored in an array, will vary, but it is truncated if it exceeds the bound. have the null character appended by the C Storage allocation is determined at translation time. translator. 3. Unbound length. The string may have a string value of any length. n Programmers have to manually include the The length may vary dynamically during execution with final null character to strings made from no bound (beyond available memory). programmer-defined arrays.

Dynamic storageChapter allocation 6: Data Types at run time. 39 Chapter 6: Data Types 40

Character Strings: operations Character Strings: operations

A wide variety of operations are usually 2. Relational operations on strings.

provided. n Usual relational operations (equal, less-than,

1. Concatenation. greater-than, etc) may be extended to strings.

Operation of joining two strings to make one n Lexicographic (alphabetic) order long string Example: String A is less than String B if either

Example: if || is the symbol used for n The first character of A is less than the first character of concatenation, “BLOCK” || “HEAD” gives B

“BLOCKHEAD” n If both characters are equal and the second character of A is less than the second character of B, and so on.

n A shorter string is extended with blank character (spaces) to the length of the longer.

Chapter 6: Data Types 41 Chapter 6: Data Types 42

7 Character Strings: operations Character Strings: operations

3. Substring selection using positioning 4. Input-output formatting.

subscripts. n Formatting data for output.

n Some languages provide an operation for n Breaking up formatted input data into smaller selecting a substring by giving the position of data items. its first and last characters 5. Substring selection using pattern matching. Or first character position and length of the substring. n Often the position of a desired substring within Example: In Fortran: Next = STR(6:10) a larger string is not know. Its relation to other substrings is know. n Some problem could arise if substring selection appears on both sides of an assignment Examples: n A sequence of digits followed by a decimal point Example: In Fortran, STR(!:5) = STR(I:I+4) n The word following the word THE.

Chapter 6: Data Types 43 Chapter 6: Data Types 44

Recursive Definition of a Regular Character Strings: operations Expression

n Patter matching operation takes two arguments: Individual terminals are regular A pattern data structure

n The pattern specifies the form of the substring desired and expressions possibly other substrings that should adjoin it. If a and b are regular expressions so are n A string with a substring that matches the specified pattern. The most common pattern matching mechanism are n a | b choice regular expressions. Some languages have pattern matching built into the n ab sequence language (Perl, Python, Ruby, …). n (a) grouping Some languages implement pattern matching via external libraries or classes n a* zero or more repetitions n Java has Pattern and Matcher classes Nothing else is a regular expression

Chapter 6: Data Types 45 Chapter 6: Data Types 46

Examples Pattern Symbols

Symbol Meaning Identifiers . Any single character (except '\n') n letter(letter | digit)* * 0 or more occurrences Binary strings + 1 or more occurrences

n (0 | 1)(0 | 1)* ? 0 or 1 occurrences of previous character Binary strings divisible by 2 [abc] One of enclosed characters [^abc] None of enclosed characters n (0 | 1)*0 {i, j} Between i and j occurrences | Choice () Grouping \i Case insensitive Chapter 6: Data Types 47 Chapter 6: Data Types 48

8 Character Classes Anchors

There are several classes of characters Used to specify position within a string that have special names Symbol Position ^ Beginning of string Match Exclude $ End of string \d Any digit \D \b Word boundary \w Any letter, \W \B Not at word boundary digit, or \bpattern\b matches the word pattern but not underscore patterned \s Any whitespace \S

Chapter 6: Data Types 49 Chapter 6: Data Types 50

Character Strings: Storage Representation for implementation Strings

Each of the 3 methods for handling Fixed declared length Variable length with bound character string utilizes a different storage R E L A 10 14 R E

representation. T I V I L A T I

Hardware support for the simple fixed- T Y V I T Y length representation is usually available but other representations for strings must Strings stored 4 characters per word Current and maximum usually be software simulated. padded with blanks string length stored at header of string

Chapter 6: Data Types 51 Chapter 6: Data Types 52

Storage Representation for Strings: evaluation Strings

Unbounded with fixed allocations Strings stored at 4 String types are important to writability and characters per block. 10 R E L Length at header of readability. string A T I V Not costly (language or compiler complexity): add strings as a primitive I T Y type.

n Standard libraries is as convenient as using Unbounded with variable allocations strings as primitive types. String stored as contiguous R E L A T I V I T Y Array of characters. Terminated by null character

Chapter 6: Data Types 53 Chapter 6: Data Types 54

9 User-Defined Ordinal Type Enumerations

A variable can take only one of a small number of Ordinal type: the range of possible values symbolic values. can be easily associated with the set of n Example: a variable StudentClass might have only four positive integers. possible values representing freshman, sophomore, junior, and senior. n Enumerations n Example: a variable EmployeeSex might have only two n Subrange values representing male and female.

n In older languages (Fortran, Cobol), an enumeration variable is given the data type integer. The values are represented as distinct, arbitrary chosen integers. Example: Freshman = 1, Sophomore = 2, and so on. Example: Male = 0, Female = 1 Chapter 6: Data Types 55 Chapter 6: Data Types 56

Enumerations: specification Enumerations: specification

Modern languages include an enumeration data Ordinarily many variables of the same type. enumeration type are used in a program. An enumeration is an ordered list of distinct Define the enumeration in a separate type values. definition The programmer defines both the literal names n In Pascal: type Class = {Fresh, Soph, Junior, Senior}; to be used for the values and their ordering followed by declarations for variables: using a declaration. StudentClass: Class; TransferStudent: Class; n Example: In C, n The type definition introduces the type name enum StudentClass {Fresh, Soph, Junior, Senior}; Class, which may be used as a primitive. enum EmployeeSex{Male, Female};

Chapter 6: Data Types 57 Chapter 6: Data Types 58

Enumerations: specification Enumerations: operations

n It also introduces the literals of Fresh, Soph, Basic operations Junior, and Senior which may be used instead n Relational operations (equal, less-than, etc) of the corresponding integers. Defined for enumerations types because the set of Example: if StudentClass = Junior then … values is given an ordering in the type definition. instead of the less undertandable n Assignment if StudentClass = 3 then … n Successor and predecessor Static type checking by the compiler could Gives the next and previous value, respectively, in find programming errors. the sequence of literals defining the enumeration.

n Example: if StudentClass = Male then … Undefined for the last and first values, respectively

Chapter 6: Data Types 59 Chapter 6: Data Types 60

10 Enumerations: implementation Enumerations: implementation

Storage representation is straightforward Example: The previous type Class has only four possible values 0 = Fresh, 1 = Soph, 2 = Junior, and 3 = n Each value in the enumeration sequence is Senior. represented at run time by one of the integer 0, Only 2 bits are required to represent these 4 1, …. possible values in memory.

n Only a small set is involved and the values are Successor & predecessor: add or subtracts one never negative. form the integer representing the value and check if the result is within the proper range. n The usual integer representation is often n In C, the programmer may override this shortened to omit the sign bit and use only default and set any values desired. enough bits for the range of values required. Example: enum class {Fresh=14, Soph=36, Junior=4, Senior=42}; Chapter 6: Data Types 61 Chapter 6: Data Types 62

Enumerations: evaluation Subrange Types: specification

Provides advantages in A subrange is a contiguous sequence of

n Readability an ordinal type within some restricted Named values are easily recognized. range.

n Reliability n Example: In Pascal, A: 1..10 In languages such as C#, Ada, Java there are two In Ada, A: integer range 1..10 advantages:

n No arithmetic operations are legal on enumeration types. A subrange type allows the same set of

n No enumeration variable can be assigned a value operations to be used as for the outside its defined range. corresponding ordinal type. In C there are no advantages (treats enumerations as integers).

Chapter 6: Data Types 63 Chapter 6: Data Types 64

Subrange Types: Subrange Types: implementation implementation

Two important effects on implementations: 2. Better type checking. More precise type checking to be performed on the 1. Smaller storage requirements. values assigned to the variable. Because a smaller range of values is possible, a Example: if variable is , then the subrange value can usually be stored in fewer bits Month Month: 1..12 than a general integer value. assignment Month := 0 is invalid and can be detected at compile time. If Month is declared to be of integer n Example: the subrange1..10 requires only 4 bits whereas a full integer requires 16 or 32. type, then the assignment is valid and the error must be found by the programmer during testing. n Because arithmetic operations on shortened integers may need software simulation for their execution (slower), Some subrange type checks cannot be performed subranges values are often represented as the smallest at compile time, i.e in Month := Month + 1 run time number of bits for which the hardware implements arithmetic operations (generally 8 or 16). checking is needed to determine whether the new value is within the bounds declared.

Chapter 6: Data Types 65 Chapter 6: Data Types 66

11 Subrange Types: evaluation Structured Types

Enhance readability by showing clearly A structured type is a compound type that that variables of subtypes can store only contains other data objects as its elements certain ranges of values. or components. Increase reliability because possible n Arrays n Records values that are outside of a range can be n Tuples detected faster and easier. n Lists

No contemporary language except Ada95 n Sets has subrange types. n Functions n Stacks

Chapter 6: Data Types 67 Chapter 6: Data Types 68

Structure Types: specification Structure Types: specification

The major attributes for specifying 2. Type of each component structure types include: n Two kinds of structured types: 1. Number of components Heterogeneous: elements of different types.

n Fixed size: the number of components is Homogeneous: elements of the same type. invariant during its lifetime. Arrays and records. homogeneous heterogeneous n Variable size: the number of components static record, list changes dynamically. dynamic array, set, file Usually define operations that insert and delete components. Dynamic selection à homogeneous à compiler will not Stacks, lists, sets, tables, and files. know which one will be selected at run time

Chapter 6: Data Types 69 Chapter 6: Data Types 70

Structure Types: specification Structure Types: specification

3. Names to be used for selecting 4. Maximum number of components. components. For variable-size data structures, a maximum size for the structure in terms of number of components may be n Needs a selection mechanism for identifying specified. individual components of the data structure 5. Organization of the components. Array: name of an individual component may be an integer subscript or sequence of subscripts. n The most common organization is a simple linear Table: the name may be a programmer-defined identifier. sequence of components. Record: name is usually a programmer-defined identifier. Vectors (one-dimensional arrays), records, stacks, lists, and Some data structures (stacks and files) allow access to only a files. particular component (top or current component) at any time. Array, record, and list types are usually extended to multidimensional forms: multidimensional arrays, records whose components are records, lists whose components are lists. Chapter 6: Data Types 71 Chapter 6: Data Types 72

12 Structure Types: operations Structure Types: operations

1. Component selection operations. 2. Whole-data structure operations. Processing data often proceeds by Operations may take entire data structures retrieving each component of the structure. as arguments and produce new data Two types of selection operations: structures as results. Random selection: an arbitrary component of Most languages provide a limited set. the data structure is accessed. Addition of two arrays. Sequential selection: components are selected Assignment of one record to another. in a predetermined order. Union operation on sets.

Chapter 6: Data Types 73 Chapter 6: Data Types 74

Structure Types: Structure Types: operations implementation 3. Insertion/deletion of components. The storage representation includes (1) Operations that change the number of storage for the components of the components in a data structure. structure, and (2) an optional descriptor 4. Creation/destruction of data structures that store some or all of the attributes of Operations that create and destroy data the structure. structures. There are two basic representations:

n Sequential

n Linked

Chapter 6: Data Types 75 Chapter 6: Data Types 76

Structure Types: Structure Types: implementation implementation 1. Sequential representation. 2. Linked representation. The data structure is stored in a single The data structure is stored in several non- contiguous block of storage that includes contiguous blocks of storage, with the blocks both descriptor and components. linked together through pointers.

Component Descriptor

Component

. Component . . Component Component …

Chapter 6: Data Types 77 Chapter 6: Data Types 78

13 Structure Types: Vectors and Arrays implementation Vectors and arrays are the most common types Sequential representations are used for of data structures in programming languages. fixed-size structures and sometimes for A vector is a data structure composed of a fixed homogeneous variable-size structures number of components of the same type organized as a simple linear sequence. such as character strings or stacks. A component of a vector is selected by giving its Linked representations are commonly subscript, an integer (or enumeration value) used for variable-sized structures such indicating the position of the component in the as lists. sequence. A vector is also called a one-dimensional array or linear array.

Chapter 6: Data Types 79 Chapter 6: Data Types 80

Vectors Vectors: subscripts

The attributes of a vector are: Subscripts may be either a range of values 1. Number of components: usually indicated as -5...5 or an upper bound with an implied implicitly by giving a sequence of subscript lower bound, as A(10). ranges, one for each dimension. Examples: 2. Data type of each component, which is a single data type, because the components n In Pascal, V: array [-5 .. 5] of real; are all of the same type. Defines a vector of 11 components, each a real number, where 3. Subscript to be used to select each the components are selected by the subscripts, -5, -4, … 5.

component: usually given as a range of n In C, float a[10]; integers, with the first integer designating the first component, and so on. Defines a vector of 10 components with subscripts ranging from 0 to 9.

Chapter 6: Data Types 81 Chapter 6: Data Types 82

Vectors: subscripts Vectors: operations

Subscript ranges need not begin at 1. Subscripting: the operation that selects a Subscript ranges need not even be a subrange component from a vector. of integers; it may be any enumeration (or a n It is usually written as the vector name subsequence of an enumeration) followed by the subscript of the component to Example: be selected.

n In Pascal, type class = (Fresh, Soph, Junior, Senior); V[2] or ClassAverage[Soph]

var ClassAverage: array [class] of real; n It may be a computed value (an expression that computes the subscript) V[I + 2]

Chapter 6: Data Types 83 Chapter 6: Data Types 84

14 Vectors: other operations Vectors: implementation

Operations to create and destroy vectors. Storage and accessing of individual Assignment to components of a vector. components are straightforward:

Operations that perform arithmetic n Homogeneity of components operations on pairs of vectors of the same The size and structure of each component is the size (i.e. addition of two vectors). same. Insertions and deletions of components n Fixed size are not allowed The number and position of each component of a vector are invariant through its lifetime. n Only the value of a component may be modified. A sequential storage is appropriate.

Chapter 6: Data Types 85 Chapter 6: Data Types 86

Vectors: implementation Vectors: access function

An access function is used to map array Vector Data type subscripts to addresses. LB Lower subscript bound Descriptor UB Upper subscript bound can be addressed by skipping I-1 Integer Data type of component components. E Size of component A[LB] n If E is the size of each component, then skip A[LB+1] Storage (I-1) x E memory locations. representation n If LB is the lower bound on the subscript A[UB] range, then the number of such components to skip is I-LB or (I-LB) x E memory locations.

Chapter 6: Data Types 87 Chapter 6: Data Types 88

Vectors: access function Multidimensional Arrays

n If the first element of the vector begins at location a, An array is a homogeneous collection of data the access function is: elements in which an element is identified by its address(A[I]) = a + (I – LB) x E position in the collection, relative to the first which can be rewritten as: element address(A[I]) = (a – LB x E) + (I x E) Indexing is a mapping from indices to elements n Once the storage for the vector is allocated, (a – LB x E) map(array_name, index_value_list) ® an element is a constant (K) and the accessing formula reduces to Indexes are also known as subscripts. address(A[I]) = K + I x E Index Syntax

n Example: access function of a C vector n FORTRAN, PL/I, Ada use parentheses address(A[I]) = address(array[0]) + i*element_size n Most other languages use brackets

Chapter 6: Data Types 89 Chapter 6: Data Types 90

15 Arrays: subscript types Arrays: subscript issues

What type(s) are allowed for defining array In some languages the lower bound of the subscripts? subscript range is implicit

n FORTRAN, C, C++, and Java allow integer n C, C++, Java—fixed at 0

subscripts only. n FORTRAN—fixed at 1

n Pascal allows any ordinal type n VB (0 by default, could be configured to 1) int, boolean, char, enum Other languages require programmer to n Ada allows integer or enumeration types specify the subscript range. Including boolean and char

Chapter 6: Data Types 91 Chapter 6: Data Types 92

Arrays: 4 categories Static Arrays

There are 4 categories of arrays based on Static arrays are those in which n Range of subscripts is statically bound (at compile subscript range bindings and storage time). binding: n Storage bindings are static (initial program load time) Examples: n Static n FORTRAN77, global arrays in C, static arrays n Fixed stack-dynamic (C/C++), some arrays in Ada.

n Stack dynamic Advantage: n Execution efficiency since no dynamically n Heap-dynamic allocation/deallocation is required Disadvantages: n Size must be known at compile time. n Bindings are fixed for entire program. Chapter 6: Data Types 93 Chapter 6: Data Types 94

Fixed stack dynamic Arrays Stack dynamic Arrays

Fixed stack-dynamic arrays are those in which A stack-dynamic array is one in which n Subscript ranges are statically bound. n Subscript ranges are dynamically bound n Storage allocation is done at runtime n Allocation is done at declaration elaboration time (on the stack). n Both remain fixed during the lifetime of the variable Examples: Advantage: flexibility - size need not be known until the n Pascal locals, most Java locals, and C locals that are array is about to be used not static. Disadvantge: once created, array size is fixed. Advantage is space efficiency Example: n Storage is allocated only while block in which array is n Ada arrays can be stack dynamic: declared is active. Get(List_Len); Declare n Using stack memory means the space can be reused List : array (1..List_Len) of Integer; when array lifetime ends. Begin Disadvantage … End; n Size must be knownChapter at 6:compile Data Types time. 95 Chapter 6: Data Types 96

16 Heap dynamic Arrays Summary: Array Bindings

Storage is allocated on the heap Binding times for Array A heap-dynamic array is one in which

n Subscript range binding is dynamic Subscript range Storage n Storage allocation is dynamic Static Compile time Compile time Examples:

n In APL, Perl and JavaScript, arrays grow and shrink Fixed stack dynamic Compile time Declaration elaboration as needed time Stack dynamic Runtime but fixed Runtime but fixed n C and C++ allow heap-dynamic arrays using pointers thereafter thereafter n In Java, all arrays are objects (heap dynamic) Dynamic Runtime Runtime n C# provides both heap-dynamic and fixed-heap dynamic

Chapter 6: Data Types 97 Chapter 6: Data Types 98

Arrays: attributes Arrays: initialization

Number of scripts Examples of array initialization:

n FORTRAN I allowed up to three 1. FORTRAN - uses the DATA statement, or put the

n FORTRAN 77 allows up to seven values in / ... / on the declaration

n Others languages have no limits. Integer, Dimension (4) :: stuff = (/2, 4, 6, 8/)

n Other languages allow just one, but elements 2. Java, C and C++ - put the values in braces; can let themselves can be arrays. the compiler count them Array Initialization e.g. int stuff [] = {2, 4, 6, 8}; n Usually just a list of values that are put in the array in 3. For strings (which are treated as arrays in C and the order in which the array elements are stored in C++), an alternate form of initialization is provided. memory char* names[] = {"Bob", "Mary", "Joe"};

Chapter 6: Data Types 99 Chapter 6: Data Types 100

Arrays: initialization Arrays: implementation

3. Ada provides two mechanisms A matrix is implemented by considering it - List in the order in which they are stored. as a vector of vectors; a three-dimensional - Positions for the values can be specified. arrays is a vector whose elements are stuff : array (1..4) of Integer := (2,4,6,8); vectors, and so on. SCORE : array (1..14, 1..2) of Integer := (1 => (24, 10), 2 => (10, 7), 3 =>(12, 30), others => (0, 0)); n All subvectors must have the same number of 4. Pascal does not allow array initialization elements of the same type. Matrix:

n Column of rows vs. row of columns

Chapter 6: Data Types 101 Chapter 6: Data Types 102

17 Arrays: Row- vs. Column-major Arrays: implementation order Row-major order (column of rows)

n The array is first divided into a vector of subvectors for each element in the range of the first subscript, then each of these subvectors is subdivided into subvectors for each element in the range of the second subscript, and so on. Column-major order (single row of columns)

Chapter 6: Data Types 103 Chapter 6: Data Types 104

Arrays: storage representation Arrays: implementation

VO Virtual Origin Storage representation follows directly LB 1 (= 1) Lower bound on subscript 1 from that for a vector. Descriptor UB 1 (= 3) Upper bound on subscript 1 LB 2 (= -1) Lower bound on subscript 2 n For a matrix, the data objects in the first row UB 2 (= 1) Upper bound on subscript 2 (assuming row -major order) followed by the E (= 1) Size of component data objects in the second row, and so on. M[1,-1] n Result: a single sequential block of memory M[1,0] First row containing all the components of the array in M[1,1] sequence. Storage M[2,-1] representation M[2,0] Second row The descriptor is the same as that for the vector, M[2,1] except that an upper and lower bound for the M[3,-1] subscript range of each dimension are needed. M[3,0] Third row M[3,1] Chapter 6: Data Types 105 Chapter 6: Data Types 106

Arrays: accessing function Arrays: accessing function

The accessing function is similar to that for If A is a matrix with M rows and N columns, the vectors: location of element A[I,J] is given by: n A is stored in row-major order n Determine the number of rows to skip over location(A[I,J]) = a + (I - LB1) x S + (J - LB2) x E (I-LB1) where S = length of a row = (UB2 - LB2 + 1) x E n Multiply by the length of a row to get the n A is stored in column-major order location of the start of the Ith row location(A[I,J]) = a + (J - LB2) x S + (I - LB1) x E th n Find the location of the J component in that where S = length of a row = (UB1 - LB1 + 1) x E row, as for a vector where a = base address

LB1 = lower bound on first subscript

LB2, UB2 = lower and upper bounds on the second subscript

Chapter 6: Data Types 107 Chapter 6: Data Types 108

18 Arrays: Row-major access Arrays: Column-major access function function

location(A[I,J]) = a + (I - LB1) x S + (J - LB2) x E location(A[I,J]) = a + (J - LB2) x S + (I - LB1) x E

S = (UB 2 - LB2 + 1) x E S = length of a row = (UB1 - LB 1 + 1) x E

Chapter 6: Data Types 109 Chapter 6: Data Types 110

Slices Example Slices in FORTRAN 90 A slice is some substructure of an array

n Nothing more than a referencing mechanism

n A way of designating a part of the array Slices are only useful in languages for operations that can be done on a whole array. 1. In FORTRAN 90 INTEGER MAT (1:4, 1:4) MAT(1:4, 1) - the first column MAT(2, 1:4) - the second row Chapter 6: Data Types 111 Chapter 6: Data Types 112

Slices: examples Associative Arrays

2. Ada - single-dimensioned arrays only An associative array is an unordered LIST(4..10) collection of data elements that are 3. Java has something like slices for multi- indexed by an equal number of values called keys dimensioned arrays The keys are stored in the structure int [][]array = array[1] - gets the second row Thus, element is a (key, value) pair PL/I was one of the earliest languages to Design Issues: implement slices. 1. What is the form of references to elements? 2. Is the size static or dynamic?

Chapter 6: Data Types 113 Chapter 6: Data Types 114

19 Associative Arrays Associative Arrays

Structure and Operations in Perl Structure and Operations in Perl

n Names begin with % n Subscripting is done using braces and keys %hi_temps = ("Monday" => 77, $hi_temps{"Wednesday"}; "Tuesday" => 79, #returns the value 83

"Wednesday" => 83); n A new elements is added by

n Alternative notation $hi_temps{"Thursday"} = 91;

%hi_temps = ("Monday", 77, n Elements can be removed with delete "Tuesday", 79, delete $hi_temps{"Thursday"}; "Wednesday", 83);

Chapter 6: Data Types 115 Chapter 6: Data Types 116

Records Records: attributes

A data structure composed of a fixed Records have 3 main attributes: number of components of different types. 1. The number of components Vectors vs. Records 2. The data type of each component 3. The selector used to name each component 1. The components of records may be heterogeneous, of mixed data types, rather than homogeneous. n The components of a records are often called fields, and the component names 2. The components of records are named with then are field names. symbolic names (identifiers) rather than n Records are sometimes called structures (as indexed with subscripts. in C).

Chapter 6: Data Types 117 Chapter 6: Data Types 118

Records: definition syntax Records: definition syntax

1. COBOL uses level numbers to show 2. Other languages use recursive definitions

nested records type Employee_Name_Type is record 01 EMPLOYEE-RECORD First : String(1..20); 02 EMPLOYEE-NAME Middle: String (1..10); 05 FIRST PICTURE IS X(20). Last : String (1..20); 05 MIDDLE PICTURE IS X(10). end record; 05 LAST PICTURE IS X(20). type Employee_Record_Type is record 02 HOURLY-RATE PICTURE IS 99V99 Employee_Name: Employee_Name_Type; Hourly_Rate: Float; end record; Employee_record: Employee_Record_Type;

Chapter 6: Data Types 119 Chapter 6: Data Types 120

20 Record Field References Records

1. COBOL Fully qualified references must include all field_name OF record_name_1 OF ... OF record_name_n record names Example: Elliptical references allow leaving out MIDDLE OF EMPLOYEE-NAME OF EMPLOYEE-RECORD record names as long as the reference is unambiguous (Cobol only) 2. Others (dot notation) record_name_1.record_name_2. ... record_name_n.field_name Pascal and Modula-2 provide a with Example: clause to abbreviate references

Employee_Record.Employee_Name.Middle

Chapter 6: Data Types 121 Chapter 6: Data Types 122

Records: operations Records: initialization

Assignment Define & initialize with struct student s1 = n Pascal, Ada, and C++ allow it if the types are list of variables {"Ted","Tanaka", identical. 22, 2.22}; Initialization Define & initialize n Allowed in Ada, using an aggregate. struct student s2; using the dot operator Comparison strcpy(s.first, (structure member "Sally"); n In Ada, = and /=; one operand can be an aggregate operator) Move Corresponding s.last="Suzuki"; s.age = 33; n In COBOL: it moves all fields in the source record to fields with the same names in the destination record. s.gpa = 3.33;

Chapter 6: Data Types 123 Chapter 6: Data Types 124

Records: implementation Records: descriptors

The storage representation for a record consists of a single sequential block of memory in which the components are stored in sequence. Individual components may need descriptors to indicate their data type and other attributes.

n No runtime descriptor for the record is required.

Chapter 6: Data Types 125 Chapter 6: Data Types 126

21 Unions Unions: examples

1. FORTRAN - with EQUIVALENCE A union is a type whose variables are EQUIVALENCE (A, B, C, D), (X(1), Y(1)) allowed to store different type values at n Free Unions: different times during execution No tag variable is required. No type checking Design Issues for unions: C/C++ have free unions 1. What kind of type checking, if any, must be 2. Pascal: variant records done? Contain one or more components that are common 2. Should unions be integrated with records? to all variants. Each variant has several other components with names and data types that are unique to each variant.

Chapter 6: Data Types 127 Chapter 6: Data Types 128

Unions: examples Unions: type checking issues

type PayType = (Salaried, Hourly); var Employee: record System must check value of flag before each ID: integer; variable access Dept: array [1..3] of char; Age: integer; Employee. PayClass := Salaried; case PayClass: PayType of Employee. MontlyRate := 1973.30; Salaried: (MontlyRate: real; … StartDate: integer): print(Employee.Overtime); -- error Hourly: (HourRate: real; Still not good enough! Reg: integer; Overtime: integer) Employee. PayClass := Salaried; Employee. MontlyRate := 1973.30; end Employee. StartDate := 626; The component PayClass is called the tag (Pascal) or discriminant (Ada) because it serves to indicate which variant of the record exists Employee. PayClass := Hourly; at a given point during program execution. print(Employee.Overtime); -- this should be an error

Chapter 6: Data Types 129 Chapter 6: Data Types 130

Unions: selection operation Ada Union Types

Selection operation: same as that for an Similar to Pascal, except ordinary record. n No free union Tag must be specified with union declaration n For ordinary records: each component exists throughout the lifetime of the record. n When tag is changed, all appropriate fileds

n For variant records/unions: the component must be set too. may exist at one point during execution (when Employee.PayClass:= Hourly; the tag component has a particular value), Employee.HourRate := 8.75; Employee.Reg := 8; may later cease of exist (when the value of Employee.Overtime := 2; the tag changes to indicate a different n Ada union types are safe variant), and later may reappear (if the tag Ada systems required to check the tag of all changes back to its original value). references to variants.

Chapter 6: Data Types 131 Chapter 6: Data Types 132

22 Unions: type checking Unions: implementation

Problem with Pascal’s design During translation, the amount of storage

n Type checking is ineffective. required for the components of each variant is

n User can create inconsistent unions (because the tag determined

can be individually assigned) n Storage is allocated in the record for the largest n Also, the tag is optional (free union). possible variant. Ada discriminant union n Each variant describes a different layout for the block in terms of number and types of components. n Tag must be present During execution, no special descriptor is n All assignments to the union must include the tag value –tag cannot be assigned by itself. needed for a variant record because the tag

n It is impossible for the user to create an inconsistent component is considered just another union. component of the record.

Chapter 6: Data Types 133 Chapter 6: Data Types 134

Unions: storage representation Unions: storage representation

ID Dept Age PayClass MontlyRate HourRate

StartDate Reg Overtime

Storage if Storage if PayClass = Salaried PayClass = Hourly

Chapter 6: Data Types 135 Chapter 6: Data Types 136

Union: evaluation Pointers

A pointer type is a type in which the range of Useful values consists of memory addresses and a Potentially unsafe in most languages special value (nil, or null) Ada, Algol 68 provide safe versions Pointers are useful for n Addressing flexibility

n Dynamic storage management Pointer operations

n Assignment of an address to a pointer int *p, x = 5; p = &x; n Dereferencing *p = 12;

Chapter 6: Data Types 137 Chapter 6: Data Types 138

23 Pointers Pointers: problems

1. Dangling pointers (dangerous)

n A pointer points to a heap-dynamic variable that has been deallocated

n Creating one (with explicit deallocation): a. Allocate a heap-dynamic variable and set a pointer to point at it b. Set a second pointer to the value of the first pointer c. Deallocate the heap-dynamic variable, using the first pointer The assignment operation j = *ptr

Chapter 6: Data Types 139 Chapter 6: Data Types 140

Pointers: problems Pointers: problems

n Dangling pointers b. p1 is later set to point to another newly created heap- int *p, *q; dynamic variable p = (int*)malloc(sizeof(int* 5); q = p; n The process of losing heap-dynamic variables is free (p); called memory leakage 2. Lost Heap-Dynamic Variables ( wasteful) n Lost heap-dynamic variables (garbage) n A heap-dynamic variable that is no longer p = (int*)malloc(5*sizeof(int)); referenced by any program pointer p = new int(20);

n Creating one: The process of losing heap-dynamic variables is called memory leakage . (cannot free the first chunk a. Pointer p1 is set to point to a newly created of memory) heap-dynamic variable

Chapter 6: Data Types 141 Chapter 6: Data Types 142

Pointers: examples Pointers: examples

C and C++ pointers C++ reference types

n Used for dynamic storage management and n Constant pointers that are implicitly addressing dereferenced: n Explicit dereferencing (*) and address-of operator (&) float x = 1.0; n Can do pointer arithmetic float &y = x; float arr[100]; y = 2.2; à sets x to 2.2 float *p = arr; n Used for reference parameters: *(p+5) º arr[5] º p[5] *(p+i) º arr[i] º p[i] Advantages of both pass-by-reference and pass- by-value n void* can point to any data type but cannot be dereferenced

Chapter 6: Data Types 143 Chapter 6: Data Types 144

24 Pointers: examples Lists A data structure composed of an ordered Java - Only references (no pointers) sequence of data structures. n No pointer arithmetic List are similar to vectors in that they consist of an

n Can only point at objects (which are all on the ordered sequence of objects. heap) Lists vs. Vectors 1. Lists are rarely of fixed length. Lists are often used to n No explicit deallocator (garbage collection is represent arbitrary data structures, and typically lists used) grow and shrink during program execution.

n Means there can be no dangling references 2. Lists are rarely homogeneous. The data type of each member of a list may differ from its neighbour. n Dereferencing is always implicit 3. Languages that use lists typically declares such data implicitly without explicit attributes for list members.

Chapter 6: Data Types 145 Chapter 6: Data Types 146

Variations on Lists Variations on Lists

Stacks and queues Trees

n A stack is a list in which component selection, n A list in which the components may be lists as insertion, and deletion are restricted to one well as elementary data objects, provided that end. each list is only a component of at most one

n A queue is a list in which component selection other list. and deletion are restricted to one end and Directed graphs insertion is restricted to the other end. n A data structure in which the components n Both sequential and linked storage may be linked together using arbitrary linkage representations for stacks and queues are patterns (rather than just linear sequences of common. components).

Chapter 6: Data Types 147 Chapter 6: Data Types 148

Variations on Lists Sets

Property lists A set is a data object containing an

n A record with a varying number of unordered collection of distinct values. components, if the number of components Basic operations on sets: may vary without restriction 1. Membership. n The component names (property names) and 2. Insertion and deletion of single values. their values (property values) must be stored. 3. Union of sets n A common representation is an ordinary linked list with the property names and their 4. Intersection of sets values alternating in a single long sequence. 5. Difference of sets

Chapter 6: Data Types 149 Chapter 6: Data Types 150

25 Programming Language Problem Find the right mechanisms to allow the programmer to create and manipulate object appropriate to the problem at hand.

n Language design: simplicity, efficiency, generality, etc. A PL is strongly typed if all type checking can be done at compile time. A PL is type complete if all objects in the language have equal status.

n In some languages objects of certain types are restricted.

Chapter 6: Data Types 151