Chapter 3 a Dat g Representin Elements

This chapter relates the block model of secondary storage that we covered in Section 2.3 to the requirements of database management systems. We begin by looking at the way that relations or sets of objects are represented in secondary storage.

• Attributes need to be represented by fixed- or variable-length sequences of bytes, called "fields." • Fields, in turn, are put together in fixed- or variable-length collections . objects r o s tuple o t d correspon h whic " "records, d calle • Records need to be stored in physical blocks. Various data structures are useful, especially if blocks of records need to be reorganized when the database is modified. • A collection of records that forms a relation or the extent of a class is stored as a collection of blocks, called a file.1 To support efficient querying and modification of these collections, we put one of a number of "index" structures on the file; these structures are the subject of Chapters 4 and 5.

3.1 Data Elements and Fields

We shall begin by looking at the representation of the most basic data elements: the values of attributes found in relational or object-oriented database systems. These are represented by "fields." Subsequently, we shall see how fields are put

lrThe database notion of a "file" is somewhat more general that the "file" in an operating system. While a database file could be an unstructured stream of bytes, it is more common for the file to consist of a collection of blocks organized in some useful way, with indexes or other specialized access methods. We discuss these organizations in Chapter 4.

83 84 CHAPTER 3. REPRESENTING DATA ELEMENTS

together to form the larger elements of a storage system: records, blocks, and files.

3.1.1 Representing Relational Database Elements Suppos e havw e e declare systemL a CREATda relatioSQ y n b a , En ni TABLE statement such as that of Fig. 3.1. The DBMS has the job of representing and storing the relation described by this declaration. Since a relation is a set of tuples, and tuples are similar to records or "structs" (the C or C++ term), we may imagine that each tuple will be stored on disk as a record. The record will d fiel e on e b l wil e ther d recor e th n withi d an , block k dis e som ) of t (par y occup for every attribute of the relation.

CREATE TABLE MovieStar( , KEY e CHAR(30Y nam PRIMAR ) address VARCHAR(255), gender CHAR(l), birthdate DATE );

Figure 3.1: An SQL table declaration

While the general idea appears simple, the "devil is in the details," and we shall have to discuss a number of issues:

1. How do we represent SQL datatypes as fields?

? records s a s tuple t represen e w o d w Ho . 2

3. How do we represent collections of records or tuples in blocks of memory?

4. How do we represent and store relations as collections of blocks?

5. How do we cope with record sizes that may be different for different tuples or that do not divide the block size evenly, or both?

- up s i d fiel e som e becaus s change d recor a f o e siz e th f i s happen t Wha . 6 dated? How do we find space within its block, especially when the record grows?

The first item is the subject of this section. The next two items are covered in . respectively , 3.5 d an 4 3. s Section n i o tw t las e th s discus l shal e W . 3.2 n Sectio d accesse e b n ca s tuple r thei o s s relation g representin — n questio h fourt e Th . 4 r Chapte n i d studie e b l wil — y efficientl e ar t tha a dat f o s kind n certai t represen o t w ho r conside o t d nee e w , Further t objec s a h suc , systems d object-oriente r o l object-relationa n moder n i d foun 3.1. DATA ELEMENTS AND FIELDS 85

identifiers (or other pointers to records) and "blobs" (binary, large objects, such as a 2-gigabyte MPEG video). These matters are addressed in Sections 3.4d an . 3 3.

3.1.2 Representing Objects Today, many database systems support "objects." These systems include pure d extende , C++ e lik e languag d object-oriente n a e wher , DBMS's d object-oriente with an object-oriented query language such as OQL,2 is used as the query and host language. They also include object-relational extensions of the classical a n i s attribute f o s value s a s object t suppor s system e thes ; systems l relationa relation. To a first approximation, an object is a tuple, and its fields or "instance variables" are attributes. However, there are two important differences: h wit d associate s function e special-purpos r o methods e hav n ca s Object . 1 f o s clas a r fo a schem e th f o t par s i s function e thes r fo e cod e Th . them objects.

2. Objects may have an object identifier (OID), which is an address in some global address space that refers uniqviely to that object. Moreover, ob- e ar s relationship e thes d an , objects r othe o t s relationship e hav n ca s ject represented by pointers or lists of pointers. Relational data does not have addresses as values, although we shall see that "behind the scenes" r o s addresse f o n manipulatio e th s require s relation f o n implementatio e th pointers in many ways. The matter of representing addresses is complex, both for large relations and for classes with large extents. We discuss the matter in Section 3.3.

- repre t I . Star s clas a f o n definitio L OD n a 2 3. . Fig n i e se 3.1 e e W : Exampl sents movie stars, although the information is somewhat different from that in the relation MovieStar of Fig. 3.1. In particular, we do not represent gender s movie e th d an s star n betwee p relationshi a e hav e w t bu , stars f o e birthdat r o they starred in. This relationship is represented by starredln from stars to their movies, and its inverse, stars, from a movie to its stars. We do not show . relationship s thi n i d involve s i h whic , Movie s clas e th f o n definitio e th s field e hav l wil d recor s Thi . record a y b d represente e b n ca t objec r Sta A r prefe t migh e w , structure a s i r latte e th e Sinc . address d an e nam s attribute r fo to use two fields, named street and city in place of a field named address. More problemati representatioe th s ci relationshie th f no p starredln. This relationship is a set of references to Movie objects. We need a way to represent the locations of these Movie objects, which normally means we must specify

) (ed. l Cattel . G . G . R n i d describe e languag y quer d object-oriente d standar e OQth s Li 2 , Morgan-Kaufmann, edition d thir Francisco n Sa , ODMG, Standard Database Object The d schemae object-oriente n si databas e describ o t d use s i , ODL , language n companio s It . 1998 terms. 86 CHAPTER 3. REPRESENTING DATA ELEMENTS

interface Star { attribute string name; attribute Struct Addr { string street, string city} address; relationship Set starredln e Movie::starsinvers ; >;

Figure 3.2: The ODL definition of a movie star class

the place on the disk of some machine where they are stored. Techniques for representing such addresses are discussed in Section 3.3. We also need the ability to represent arbitrarily long lists of movies for a given star; this problem of "variable-length records" is the subject of Section 3.4. D

3.1.3 Representing Data Elements d represente e ar s datatype L SQ l consideriny principa b e n th begi w s gho u Let bytes. f o e sequenc a s a d represente s i a dat l al , Ultimately . record a f o s field s a For example, an attribute of type INTEGER is normally represented by two or four bytes, and an attribute of type FLOAT is normally represented by four or eight bytes. The integers and real numbers are represented by bit strings that c arithmeti l usua e th o s e hardwar s machine' e th y b d interprete y speciall e ar . them n o d performe e b n ca s operation

s String r Characte h Fixed-Lengt The simplest kind of character strings to represent are those described by the SQL type CHAR(n). These are fixed-length character strings of length n. The r fo e valu e th d Shoul bytes. n f o y arra n a s i e typ s thi h wit e attribut n a r fo d fiel this attribute be a string of length shorter than n, then the array is filled out s character l lega e th f o e on t no s i e cod t 8-bi e whos , character pad l specia a h wit for SQL strings.

Exampl attributn ea f 3.I 2: wereA e declare havo dt e type CHAR(5), thee nth field corresponding to A in all tuples is an array of five characters. If in one y arra e th f o e valu e th ' cat' n e the , wer A e attribut r fo t componen e th e tupl would be:

c a t _L J_ f o s byte h fift d an h fourt e th s occupie h whic , character " "pad e th , s _i L Here the array. Note that the quote marks, which are needed to indicate a character string in SQL programs, are not stored with the value of the string, n 3.1. DATA ELEMENTS AND FIELDS 87

A Note on Terminology

Depending on whether you have experience with file systems, conventional programming languages like C, with relational database languages (SQL e th r o , C++ , Smalltalk , (e.g. s language d object-oriente r o , particular) n i object-oriented database language OQL), you may know different terms e th s summarize e tabl g followin e Th . concepts e sam e th y essentiall r fo correspondence, although there are some differences, e.g., a class can have methods; a relation cannot.

____| Data Element Record Collection e fil d recor d fiel s File C field struct array, file SQL attribute tuple relation OQL attribute, object extent (of relationship a class)

We shall tend to use file-system terms — fields and records unless we are referring to specific uses of these concepts in database applications. In . terms d object-oriente r and/o l relationa e us l shal e w e cas r latte e th

Variable-Length Character Strings Sometimes the values in a column of a relation are character strings whose length may vary widely. The SQL type VARCHAR(n) is often used as the type of such a column. However, there is an intended implementation of attributes g strin e th f o e valu e th o t d dedicate c ar s byte 1 + n h whic n i , way s thi d declare regardless of how long it is. Thus, the SQL VARCHAR type actually represents l shal e W . varies t tha h lengt a s ha e valu s it h althoug , length d fixe f o s field examine character strings whose representation's length varies in Section 3.4. There arc two common representations for VARCHAR strings:

1. Length plus content. We allocate an array of n + 1 bytes. The first byte g strin e Th . string e th n i s byte f o r numbe e th , integer t 8-bi n a s a , holds l shal e w r o , 255 d excee t canno f itsel n d an , characters n d excee t canno not be able to represent the length in a single byte.3 The second and subsequent bytes hold the characters of the string. Any bytes of the m maximu e th n tha r shorte s i g strin e th e becaus , used t no e ar t tha y arra t par s a d construe e b y possibl t canno s byte e Thes . ignored e ar , possible of the value, because the first byte tells us when the string ends. 2. Null-terminated string. Again allocate an array of n + 1 bytes for the value of the string. Fill this array with the characters of the string, followed by

. length e th o t d dedicate e ar s byte e mor r o o tw h whic n i e schem a e us d coul e w e 3Ofcours ELEMENTS DATA REPRESENTING 3. CHAPTER 8 8

a null character, which is not one of the legal characters that can appear in character strings. As with the first method, unused positions of the array cannot be construed as part of the value; here the null terminator looo t t warnk furtherno s su alsd oan , make representatioe sth VARCHAf no R strings compatible with that of character strings in C.

Example 3.3: Suppose attribute A is declared VARCHAR(IO). We allocate an e 'cat' Suppos A. f o e valu e th r fo d recor s tuple' h eac n i s character 1 1 f o y arra o t e byt t firs e th n i 3 t pu d woul e w , 1 d metho n i n The . represent o t g strin e th s i represent the length of the string, and the next three characters would be the string itself. The final seven positions are irrelevant. Thus, the value appears as:

Scat

r '3'. characte e th t no , 00000011 , i.e. , 3 r intege t 8-bi e th s i " "3 e th t tha e Not In the second method, we fill the first three positions with the string; the fourth is the null character (for which we use the symbol _L, as we did for the irrelevante ar s , Thus . position n seve g " character)remainin e "pad th d an ,

c a t _L

is the representation of ' cat' as a null-terminated string. D

Dates and Times A date is usually represented as a fixed-length character string, following some format. Thus, a date can be represented just as we would represent any other . string r characte h fixed-lengt y b d represente s date s ha d standar 2 SQL e th , example n a 3.4e s A : Exampl 10-character string fore th mf o sYYYY-MM-DD .firse Thath t, fouis t r characters are digits representing the year, the fifth is a hyphen, the sixth and seventh are digits representing the month, with a leading 0 if necessary, the eighth character is another hyphen, and last come two digits representing the day, with a leading y ' 1948-05-14 Ma g s strin r represent ' characte e th , instance r Fo . necessary f i 0 , 194814 . D

Times may similarly be represented as if they were character strings. For ex- s second f o s number l integra e ar t tha s time s represent d standar 2 SQL e th , ample s character o tw t firs e th , is t HHm Tha : MM. for e SS : th f o g strin r 8-characte n a y b are the hour, represented on a 24-hour clock, with a leading 0 if needed. Thus, 7AM is represented by the digits 07, and 7PM is represented by the digits 19. Following a colon are two digits representing the minutes, another colon, and two digits representing the seconds. Both minutes and seconds require a lead- o tw ' 20:19:02s , represent instance ' r Fo . digits o tw e mak o t y necessar f i 0 g in . PM 9 8:1 r afte s second 3.1. DATA ELEMENTS AND FIELDS 89

m Proble " 2000 r "Yea e Th

- represen a e hav s program n applicatio r othe d an s system e databas y Man e exampl r fo , year e th r fo s digit o tw y onl s involve t tha s date r fo n tatio YYMMDD. Since these applications neve rdeao t hav d l witeha hdata e other than one in the 1900's, the "19" could be understood, and a date like May s '480514'a d . represente e b d woul 8 194 , 14 The problem is that these applications can take advantage of the fact that if date di is earlier than date d?, then d\ is represented by a string s di.e Thi dat s strine represent t th n gtha tha s les y lexicographicall s i t tha e lik s querie e writ o t s u s allow n observatio

SELECT name FROM MovieStar WHERE birthdat < e'980601 '

to select (from the relation MovieStar declared in Fig. 3.1) those movie g gettin s start e databas r ou n Whe . 1998 , 1 e Jun e befor n bor e wer o wh s star some child stars born in the third millennium, their birthdates will be lexicographically less than ' 980601'. For example, a star born on Aug. 31, s les y lexicographicall f '010831' o s i e h valu whic e , birthdat a s ha 1 200 e th l unti t leas t (a m proble s thi d avoi o t y wa n '980601'y tha onl e Th . a e us o t recedo t s s i edate s date e compar t tha s system n i ) 10,000 r yea four-digit year, as the SQL2 standard does.

Such a time is easily represented as a fixed-length character string of length 8. However, the SQL2 standard also allows a value of type TIME to include frac- tions of a second. We follow the 8 characters described above by a period, and o tw , instance r Fo . second a f o n fractio e th e describ o t d neede s a s digit y man s a ' 20:19:02.25' y b 2 SQL n i d . represente s i M P 9 8:1 r afte s second r quarte a d an : choices o tw e hav e w , length y arbitrar f o e ar s string h suc e Sinc n the n ca s time d an , times f o n precisio e th n o t limi a t pu n ca m syste e Th . 1 storee b thef i s yda were type VARCHAR(n) greates,e wherth s i en t length a time can have: 9 plus the number of fractional digits allowed in seconds.

- dis s a h wit t variable-lengte deal tru d s a an d s store hvalue . e Time2 b n sca . 3.4 n Sectio n i d cusse

Bits BIT(ne n ca typ e SQLn — )i th d y 2b describe a dat , is t tha — s bit f o e sequenc A g ignorin f of t bes e ar e w n the , 8 y b e divisibl t no s i n f I . byte a o t t eigh d packe e b the unused bits of the last byte. For instance, the bit sequence 010111110011 might be represented by 01011111 as the first byte and 00110000 as the second; t represen n ca e w , case l specia a s A . field y an f o t par t no e ar s O' r fou l fina e th ELEMENTS DATA REPRESENTING 3. CHAPTER 0 9

Packing Fields Into a Single Byte

- enumer l smal e hav t tha s field f o e advantag e tak o t d tempte e b y ma e On e singl a o int s field l severa k pac o t , boolean-valued e ar t s 0tha 1 type d ate byte. For instance, if we had three fields that were a boolean, a day of the , first e th r fo t bi e on e us d coul e w , respectively , colors r fou f o e on d an , week e singl a n i l al m the t pu , third e th r fo s bit o tw d an , second e th r fo s bit 3 byte and still have two bits left over. There is no impediment to doing so, but it makes retrieval of values from one of the fields or the writing of new f o g packin h Suc . error-prone d an x comple e mor s field e th f o e on r fo s value fields used to be more important when storage space was more expensive. Today, we do not advise it in common situations.

a boolean value, that is, a single bit, as 10000000 for true and 00000000 for e mak e w f i n boolea a t tes o t r easie e b s context e som n i y ma t i , However . false the distinction appear in all bits; i.e., use 11111111 for true and 00000000 for false.

Enumerated Types d fixe , small a n o e tak s value e whos e attribut n a e hav o t l usefu s i t i s Sometime g consistin e typ e th d an , names c symboli n give e ar s value e Thes . values f o t se d enumerate f o s example n Commo type. enumerated n a s i s name e thos l al f o types are days of the week, e.g., {SUN, MOW, TUE, WED, THU, FBI, SAT}, or a set of colors, e.g., {RED, GREEN, BLUE, YELLOW}. We can represent the values of an enumerated type by integer codes, using onl mans ya y byte neededs sa instancer Fo . coule w , d represent , GREERE0 y Db N YELLOd . an Thes 3 , , BLU 1 2 y W b y y eErepresenteb e b b integer l al o n tw s ca y db l ful e us o t , however , convenient e mor s i t I . respectively , 11 d an , 10 , 01 , 00 , bits bytes for representing integers chosen from a small set. For example, YELLOW is represented by the integer 3, which is 00000011 as an eight-bit byte. Any enumerated type with up to 256 values can be represented by a single byte. l wil s byte o tw f o r intege t shor a , 2o t 1values 6 p u s ha e typ d enumerate e th f I suffice, and so on.

3.2 Records

We shall now begin the discussion of how fields are grouped together into records. The study continues in Section 3.4, where we look at variable-length fields and records. In general, each type of record used by a database system must have a d an s name e th s include a schem e Th . database e th y b d store s i h whic schema, 3.2. RECORDS 91 e Th . record e th n withi s offset r thei d an , record e th n i s field f o s type a dat schema is consulted when it is necessary to access components of the record.

3.2.1 Building Fixed-Length Records n i d discusse s field f o s sort e th f o g consistin s record y b d represente e ar s Tuple Section 3.1.3. The simplest situation occurs when all the fields of the record have a fixed length. We may then concatenate the fields to form the record.

. 3.1.Fig MovieStae n i th n f o n rrelatio declaratio e th r Conside : 5 3. e Exampl : fields r fou e ar e Ther

1. name, a 30-byte string of characters.

2. address, of type VARCHAR(255). This field will be represented by 256 bytes, using the schema discussed in Example 3.3. 3. gender, a single byte, which we suppose will always hold either the char- M'' r . characte e acteth r F' ro '

4. birthdate, of type DATE. We shall assume that the 10-byte SQL2 repre- . field s thi r fo d use s i s date f o n sentatio

Thus, a record of type MovieStar takes 30 + 256 + 1 + 10 = 297 bytes. It looks as suggested in Fig. 3.3. We have indicated the offset of each field, which is f itsel d fiel e th h whic t a d recor e th f o g beginnin e th m fro s byte f o r numbe e th begins. Thus, field name begin t offsesa ; addres0 t s begin t offsesa , gende30 t r at 286, and birthdate at offset 287. Q

Figure 3.3: A MovieStar record

Some machines allow more efficient reading and writing of data that begins at a byte of main memory whose address is a multiple of 4 (or 8 if the machine has a 64-bit processor). Certain types of data, such as integers, may be abso- h suc , others e whil , 4 f o e multipl a s i t tha s addres n a t a n begi o t d require y lutel as double-precision reals, may need to begin with a multiple of 8. While the tuples of a relation are stored on disk and not in main memory, we have to be aware of this issue. The reason is that when we read a block d place e b surely l wil k bloc e th f o e byt t firs e th , memory n mai o t k dis m fro 92 CHAPTER 3. REPRESENTING DATA ELEMENTS f o e multipl a e b l wil t fac n i d an , 4 f o e multipl a s i t tha s addres y memor a t a some high power of 2, such as 212 if blocks and pages have length 4096 = 212. Requiiements that certain fields be loaded into a main-memory position whose first byte address is a multiple of 4 or 8 thus translate into the requirement that those fields have an offset within their block that has the same divisor. For simplicity, let us assume that the only requirement on data is that fields start at a main-memory byte whose address is a multiple of 4. Then it is sufficient that

a) Each record start at a byte within its block that is a multiple of 4, and

b) All fields within the record start at a byte that is offset from the beginning of the record by a multiple of 4. e multipl t nex e th o t p u s length d recor d an d roune fiel w l , dal way r anothe t Pu of 4.

Example 3.6: Suppose that the tuples of the MovieStar relation need to be s offset e th n The . 4 f o e multipl a s i t tha e byt a t a s start d fiel h eac o s d represente of the four fields would be 0, 32, 288, and 292, and the entire record would take . 3.4 . Fig y b d suggeste s i t forma e Th . bytes 4 30

t a t star o t d require e ar s field n whe s tuple r MovieSta f o t layou e Th : 3.4 e Figur multiple of 4 bytes e th t star t canno e w t bu , bytes 0 3 s take , name , field t firs e th , instance r Fo second field unti nexe th l t multiplr offse o . Thus, 4 32 t f eo , addres s offsesha t 32 in this record format. The second field is of length 256 bytes, which means the first available byte following address is 288. The third field, gender, needs , later s byte 4 f o l tota a l unti d fiel t las e th t star t canno e w t bu , byte e on y onl at 292. The fourth field, birthdate, being 10 bytes long, ends at byte 301, which makes the record length 302 (notice that the first byte is 0). However if 2 30 d numbere s byte e th , 4 f o e multipl a t a t star t mus s record l al f o s field l al and 303 aie useless, and effectively, the record consumes 304 bytes. We shall assign bytes 302 and 303 to the birthdate field, so they do not get used for any other purpose accidentally. D 3.2. RECORDS 93

The Need for a Record Schema d recor e th n i a schem d recor e th e indicat o t d nee e w y wh r wonde t migh e W r Fo . records t fixed-forma g considerin y onl e ar e w y currentl e sinc , itself t no o d , languages r simila r o C n i d use s a " "struct, a n i s field , example have their offsets stored when the program is running; rather the offsets are compiled into the application programs that access the struct. However, there are several reasons why the record schema must be d (an n relatio a f o a schem e th , one r Fo . DBMS e th o t e accessibl d an d store . change n ca ) t representha s tuples s it t record e th f o a schem e th e therefor Queries need to use the current schema for these records, and so need to know what the schema currently is. In other situations, we may not be n locatio s it m fro y simpl s i e typ d recor e th t wha y immediatel l tel o t e abl t permi s organization e storag e som , . Foexample i system e storag e th n i tuples of different relations to appear in the same block of storage.

3.2.2 Record Headers a f o t layou e th n desig e w n whe d raise e b t mus t tha e issu r anothe s i e Ther record. Often, there is information that must be kept in the record but that is : record e th n i p kee o t t wan y ma e w example, r Fo . field y an f o e valu e th t no

1. The record schema, or more likely, a pointer to a place where the DBMS , record f o e typ s thi r fo a schem e th s store , record e th f o h lengt e Th . 2 3. Timestamps indicating the time the record was last modified, or last read, e includ s layout d recor informationf y o s man , piece e Thus . possibl r othe g amon . information l additiona s thi e provid o t s byte f o r numbe l smal e som f o header a y information,essentiall s i h schema s whic maintain m syste e databas e Th what appear CREATe th n si E TABLE statemen thar fo t t relation:

1. The attributes of the relation, 2. Their types, , tuple e th n i r appea s attribute h whic n i r orde e Th . 3 4. Constraints on the attributes and the relation itself, such as primary key declarations, or a constraint that some integer attribute must have a value in a certain range. . record s tuple' a f o r heade e th n i n informatio s thi l al t pu o t e hav t no o d e W t abou n informatio e th e wher e plac e th o t r pointe a e ther t pu o t t sufficien s i t I 94 CHAPTER 3. REPRESENTING DATA ELEMENTS n whe d obtaine e b n ca n informatio s thi l al stoieds i n n The . relatio s tuple' e th needed. As another example, even though the length of the tuple may be deducible from its schema, it may be convenient to have the length in the recoid itself. For instance, we may not wish to examine the record contents, but just find the beginning of the next record quickly. A length field lets us avoid accessing the . I/O k dis a e involv y ma h whic , schema s record'

Example 3.7: Let us modify the layout of Example 3.6 to include a header of 12 bytes. The first four bytes are the type. It is actually an offset in an area where the schemas for all the relations are kept. The second is the record e tupl e th n whe g indicatin p timestam a s i d thir e th d an , 4-byta , einteger length e Th . integer e 4-byt a o als s i p timestam e Th . updated t las r o d inserte s wa . bytes 6 31 w no s i d recor e th f o h lengt e Th . 3.5 . Fig n i n show s i t layou g resultin D

•*• header •*•

Figure 3.5: Adding some header information to records representing tuples of MovieStae th n relatio r

3.2.3 Packing Fixed-Length Records into Blocks

Records representing tuples of a relation are stored in blocks of the disk and s acces o t d nee e w n whe ) block e entir r thei h wit g (alon y memor n mai o int d move . 3.6 . Fig n i d suggeste s i s record s hold t tha k bloc a f o t layou e Th . them e updat r o

s record g holdin k bloc l typica A : 3.6 e Figur

There is an optional block header that holds information such as: 3.2. RECORDS 95 s block f o k networ a f o t par e ar t tha s block r othe e mor r o e . Link1 on o t s such as those described in Chapter 4 for creating indexes to the tuples of a relation.

. network a h suc n i k bloc s thi y b d playe e rol e th t abou n Informatio . 2

3. Information about which relation the tuples of this block belong to.

4. A "directory" giving the offset of each record in the block.

. 3.3 ''blocn A . 5 k Sectio ID" e se ;

6. Timestamp(s) indicating the time of the block's last modification and/or access.

By far the simplest case is when the block holds tuples from one relation, g followin , case t tha n I . format d fixe a e hav s tuple e thos r fo s record e th d an the header, we pack as many records as we can into the block and leave the . unused e spac g remainin n i d develope t layou e th h wit s record g storin e ar e w e 3.8e Suppos : Exampl Example 3.7. These records are 316 bytes long. Suppose also that we use 4096- 4 408 g leavin , header k bloc a r fo d use e b l wil 2 1 y sa , bytes e thes f O . blocks e byt bytes for data. In this space we can fit twelve records of the given 316-byte format, and 292 bytes of each block are wasted space. D

3.2.4 Exercises for Section 3.2 Exercis* e 3.2.1: Suppos followine recorea th s dha g field thin i s s orderA : character string of length 15, an integer of 2 bytes, an SQL2 date, and an SQL2 time (no decimal point). How many bytes does the record take if:

a) Fields can stait at any byte.

b) Fields must start at a byte that is a multiple of 4.

. 8 f o e multipl a s i t tha e byt t star a t a tmus s Field ) c

Exercise 3.2.2: Repeat Exercise 3.2.1 for the list of fields: A real of 8 bytes, a character stiing of length 17, a single byte, and an SQL2 date.

a e hav o als s record t bu . 3.2.1 e Exercis n i s a e ar s field e Assum :3 3.2. e Exercis * record header consisting of two 4-byte pointers and a character. Calculate the ) (c h throug situatione ) (a thre t e foh th is regardinlengt d alignmen d recor gfiel in Exercise 3.2.1. r heade a e includ o als s record e th f i 2 3.2. e Exercis t 3.2.4e Repea : Exercis consisting of an 8-byte pointer, and ten 2-byte integers. 96 CHAPTER 3. REPRESENTING DATA ELEMENTS k pac o t h wis e w d an , 3.2.3 e lecoide Exercis sn i Suppos ar s : e a 5 3.2. e Exercis * as many records as we can into a block of 4096 bytes, using a block header that consists of ten 4-byte integers. How many records can we fit in the block in each of the three situations regarding field alignment (a) through (c) of Exercise 3.2.1?

Exercise 3.2.6: Repeat Exercise 3.2.5 for the records of Exercise 3.2.4, as- f o t consis s header k bloc t tha d an , long s byte 4 16,38 e ar s block t tha g sumin three 4-byte integers and a dhectory that has a 2-byte integer for every record in the block. s Addresse d Recor d an k Bloc g Representin 3 3. e structur x comple e mor h wit s record w ho f o y stud e th h wit g proceedin e Befor o t s reference r o , pointers , addresses w ho r conside t mus e w , represented e ar t par m for n ofte s pointer e thes e sinc , represented e b n ca s block d an s record of complex records. There are other reasons for knowing about secondary- storage address representation as well. When we look at efficient structures for s use t importan l severa e se l shal e w , 4 r Chapte n i s relation r o s file g representin . record a f o s addres e th r o k bloc a f o s addres e th r fo The address of a block when it is loaded into a buffer of main memory can a f o s addres e th d an , byte t addresy firs s it f so virtual-memor e th e b o t n take e b t tha f o e byt t firs e th f o s addres y virtual-memor e th s i k bloc t tha n withi d recor - applica e th f o t par t no s i k bloc e th , storage y secondar n i , However . record tion's virtual- space. Rather, a sequence of bytes describes the : DBMS e th o t e accessibl a dat f o m syste l overal e th n withi k bloc e th f o n locatio the device ID for the disk, the cylinder number, and so on. A record can be identified by giving its block and the offset of the first byte of the record within the block. To complicate further the matter of representing addresses, a recent trend toward "object brokers" allows independent creation of objects by many coop- f o t par e ar t tha s record y b d represente e b y ma s object e Thes . systems g eratin an object-oriented DBMS, although we can think of them as tuples of relations without losing the principal idea. However, the capability for independent creation of objects or records puts additional stress on the mechanism that maintains addresses of these records. In this section, we shall begin with a discussion of address spaces, especially as they pertain to the common "client-server" architecture for DBMS's. We then discuss the options for representing addresses, and finally look at "pointer swizzling," the ways in which we can convert addresses in the data server's programsn .applicatio t clien e th f o d worl e th o t d worl 7 9 ADDRESSES RECORD AND BLOCK REPRESENTING 3.3. s System r Client-Serve 1 3.3. Commonlj, a database consists of a server process that provides data fiom secondary storage to one or more client processes that are applications using the data. The server and client processes may be on one machine, or the server . machines y man r ove d distribute e s clientb n ca svariou e th d an The client application uses a conventional "virtual" address space, typically 32 bits, or about 4 billion different addresses. The operating system or DBMS , memory n mai n i d locate y currentl e ar e spac s addres e th f o s part h whic s decide and hardware maps the virtual address space to physical locations in main memory. We shall not think further of this virtual-to-physical translation, and shall think of the client address space as if it were main memory itself. The server's data lives in a database address space. The addresses of this space refer to blocks, and possibly to offsets within the block. There are several : represented e b n ca e spac s addres s thi n i s addresse t tha s way e th e determin s u t le t tha s string e byt e ar e Thes 1. PhysicalAddresses. place within the secondary storage system where the block or record can e indicat o t d use e ar s addres l physica e th f o s byte e mor r o e On . found e b each of:

(a) The host to which the storage is attached (if the database is stored across more than one machine), (b) An identifier for the disk or other device on which the block is lo- cated, , disk e th f o r cylinde e th f o r numbe e Th ) (c (d) The number of the track within the cylinder (if the disk has more than one surface), (e) The number of the block within the track. e th n withi d recor e th f o g beginnin e th f o t offse e th ) cases e som n (I ) (f block.

2. Logical Addresses. Each block or record has a "logical address," which is an arbitrary string of bytes of some fixed length. A map table, stored on suggested s a , addresses l physica o t l logica s relate , location n know a n i k dis Fign i . 3.7.

Notice that physical addresses are long. Eight bytes is about the minimum we could use if we incorporate all the listed elements, and some systems use up t las o t d designe s i t tha s object f o e databas a e imagin , example r Fo . bytes 6 1 o t for 100 years. In the future, the database may grow to encompass one million y ever t objec e on e creat o t h enoug t fas e b t migh e machin h eac d an , machines

a s require h whic , objects 2d aroun e creat d woul m syste s Thi . nanosecond 7

7 minimum of ten bytes to represent addresses. Since we would probably prefer to reserve some bytes to represent the host, others to represent the storage unit, 98 CHAPTER 3. REPRESENTING DATA ELEMENTS

s addresse l physica o t l logica s translate e tabl p ma A : 3.7 e Figur

and so on, a rational address notation would probably use considerably more than 10 bytes for a system of this scale.

3.3.2 Logical and Structured Addresses One might wonder what the purpose of logical addresses could be. All the infor- g followin d an , table p ma e th n i d foun s i s addres l physica a r fo d neede n matio g goin n the d an e tabl p ma e th g consultin s require s record o t s pointer l logica p ma e th n i d involve n indirectio f o l leve e th , However . address l physica e th o t table allows us considerable flexibility. For example, many data organizations require us to move records around, either within a block or from block to block. , table p ma s thi o t r refe d recor e th o t s pointer l al n the , table p ma a e us e w f I y entr e th e chang o t s i d recor e th e delet r o e mov e w n whe o d o t e hav e w l al d an . table e th n i d recor t tha r fo Many combinations of logical and physical addresses are possible as well, yielding structured address schemes. For instance, one could use a physical address for the block (but not the offset within the block), and add the key value for the record being referred to. Then, to find a record given this structured address, we use the physical part to reach the block containing that record, and . key r prope e th h wit e on e th d fin o t k bloc e th f o s record e th e examin e w Of course, to survey the records of the block, we need enough information to locate them. The simplest case is when the records are of a known, fixed- n i d fin o t e hav y onl e w , Then . offset n know a t a d fiel y ke e th h wit , type h lengt the block header a count of how many records are in the block, and we know exactly where to find the key fields that might match the key that is part of the d organize e b t migh s block t tha s way r othe y man e ar e ther , However . address so that we could survey the records of the block; we shall cover others shortly. s i s addresse l logica d an l physica f o n combinatio , useful y ver d an , similar A to keep in each block an offset table that holds the offsets of the records within the block, as suggested in Fig. 3.8. Notice that the table grows from the front end of the block, while the records are placed starting at the end of the block. 3.3. REPRESENTING BLOCK AND RECORD ADDRESSES 99 e w , Then . length l equa f o e b t no d nee s record e th n whe l usefu s i y strateg s Thi do not know in advance how many records the block will hold, and we do not . initially e tho t etabl r heade k bloc e th f o t amoun d fixe a e allocat o t e hav

offset "*~table ~5" •*— header —5*"*=— unused —*•

d recor h eac f o n ths u e positio g tellin s offset f o e tabl a h wit k bloc A : 3.8 e Figur k bloc e th n withi

offsee th s t plu k bloc s it f o s addres l physica e th w no s i d recor a f o s addres e Th n indirectio f o l leve s Thi . record t tha r fo e tabl t offse s block' e th n i y entr e th f o e th t withou , addresses l logica f o s advantage e th f o y man s offer k bloc e th n withi need for a global map table.

• We can move the record around within the block, and all we have to do is change the record's entry in the offset table; pointers to the record will still be able to find it.

• We can even allow the record to move to another block, if the offset table entries are large enough to hold a "forwarding address" for the lecord. s it n i g leavin f o , deleted e b d recor e th d shoul , option n a e hav e w , Finally • s ha d recor e th s indicate t tha e tombstone,a valu y l entr e specia a offset-tabl been deleted. Prior to its deletion, pointers to this record may have been g followin , deletion d recor r Afte . database e th n i s place s variou t a d store a pointer to this record leads to the tombstone, whereupon the pointer can either be replaced by a null pointer, or the otherwise - tomb e th t lef t no e w d Ha . record e th f o n deletio e th t reflec o t d modifie stone, the pointer might lead to some new record, with surprising, and erroneous, results.

3.3.3 Pointer Swizzling Often, pointers or addresses aie part of records. This situation is not common for records that represent tuples of a relation, but it is for tuples that represent objects. Also, modern object-relational database systems allow attributes of ELEMENTS DATA REPRESENTING 3. CHAPTER 0 10

Ownership of Memory Address Spaces y secondar n betwee r transfe e th f o w vie a d presente e hav e w n sectio s thi n I and main memory in which each client owns its own memory address space, and the database address space is shared. This model is common in object-oriented DBMS's. However, relational systems often treat the memory address space as shared; the motivation is to support recovery . 9 d an 8 s Chapter n i s discus l shal e w s a y concurrenc d an n o e spac s addres y memor d share l compiornisa e usefu hav o A t s ei . side ' clients e th n o e spac t tha f o s part f o s copie h wit , side r serve e th That organization supports recovery and concurrency, while also allowing e mor e th s client e mor e th : way " "scalable n i d distribute e b o t g processin processors can be brought to bear.

pointer type (called references), so even relational systems need the ability to represent pointers in tuples. Finally, index structures are composed of blocks that usually have pointers within them. Thus, we need to study the manage- e w ; memory y secondar d an n mai n betwee d move e ar s pointerf o t block s men sa . section s thi n i o s o d As we mentioned earlier, eveiy block, record, object, or other referenceable : address f o s form o tw s ha m ite a dat

a y typicall s i h whic , space s addres e databas s server' e th n i s addres s It . 1 sequence of eight or so bytes locating the item in the secondary storage databasee th s address. addres s thi l cal l shal e W . system e th f o d buffere y currentl s i m ite t tha d (provide y memor l virtua n i s addres n A . 2 in virtual memory). These addresses are typically four bytes. We shall refer to such an address as the memory address of the item.

When in secondary storage, we surely must use the database address of the m ite e th o t r refe n ca e w , memory n mai e th n i s i m ite e th n whe , However . item t pu o t t efficien e mor s i t I . address y memor its r o s addres e databas its r eithe y b n ca s pointer e thes e becaus , pointer a s ha m ite n a r whereve s addresse y memor be followed using single machine instructions. In contrast, following a database address is much more time-consuming. We need a table that translates from all those database addresses that are currently translationa h is Suc . table address y memor t curren r thei o t y memor l virtua n i suggested in Fig. 3.9. It may be reminiscent of the map table of Fig. 3.7 that translates between logical and physical addresses. However: e databas e th r fo s representation h bot e ar s addresse l physica d an l Logica ) a address. In contrast, memory addresses in the translation table are for . memory n i t objec g correspondin e th f o s copie 3.3. REPRESENTING BLOCK AND RECORD ADDRESSES 101

b) All addressable items in the database have entries in the map table, while only those items currently in memory are mentioned in the translation table.

Figure 3.9: The translation table turns database addresses into their equivalents in memoiy

To avoid the cost of translating repeatedly from database addresses to mem- y collectivel e ar t tha d develope n bee e hav s technique l severa , addresses y or known as pointer sunzzlmg. The general idea is that when we move a block from secondary to main memory, pointers within the block may be "swizzled," that is, translated from the database address space to the virtual address space. Thus, a pointer actually consists of:

1. A bit indicating whether the pointer is currently a database address or a (swizzled) memory address. 2. The database or memory pointer, as appropriate. The same space is used l al t no , course f O . moment e th t a t presen s i m for s addres r whicheve r fo s i t i e becaus , present s i s addres y memor e th n whe d use e b y ma e spac e th typically shorter than the database address.

Example 3.9 : Figure 3.10 shows a simple situation in which the Block 1 has pointea o t n d o r an k bloc e sam e th n o d recor d secon a o t s pointer h wit d recor a another block. The figure also shows what might happen when Block 1 is copied to memory. The first pointer, which points within Block 1, can be swizzled so it points directly to the memory address of the target record. e th e swizzl t canno e w n the , time s thi t a y memor n i t no s i 2 k Bloc f i , However f o s addres e databas e th o t g pointin , unswizzled n remai t mus t i ; pointer d secon its target. Should Block 2 be brought to memory later, it becomes theoretically possible to swizzle the second pointer of Block 1. Depending on the swizzling strategy used, there may or may not be a list of such pointers that are in memory, referring to Block 2; if so, then we have the option of swizzling the t timetha t , a r d pointe 102 CHAPTER 3. REPRESENTING DATA ELEMENTS

Figure 3.10: Structure of a pointei when swizzling is used

There are several strategies we can use to determine when to swizzle point- ers.

Automatic Swizzling As soon as a block is brought into memory, we locate all its pointers and y alread t no e ar y the f i e tabl n translatio e th o int m the r ente d an s addresse o t k bloc e th n i s record from s pointer e th h bot e includ s pointer e Thes . there e ar e thes f i , records s it r and/o f itsel k bloc e th f o s addresse e th d an e elsewher addressable items. We need some mechanism to locate the pointers within the block. For example: s u l tel l wil a schem e th , schema n know a h wit s record s hold k bloc e th f I . 1 . found e ar s pointer e th s record e th n i e whor n i s discus structurex l inde shal e e th w s f o e on r fo d use s i k bloc e th f I . 2 Chapter 4, then the block will hold pointers at known locations.

. are s pointer e th e wher f o t lis a r heade k bloc e th n withi p kee y ma e W . 3

When we enter into the translation table the addresses for the block just moved into memory, and/or its records, we know where in memory the block e thes r fo y entr e translation-tabl e th e creat s thu y ma e W . buffered n bee s ha database addresses straightforwardly. WTien we insert one of these database 3.3. REPRESENTING BLOCK AND RECORD ADDRESSES 103

, already e tabl e th n i t i d fin y ma e w , table n translatio e addresseth o int A s k bloc e replace th w n , i eA case s thi n I . currentls i k memory bloc n yi s it e becaus just moved to memory by the corresponding memory address, and we set the "swizzled" bit to true. On the othei hand, if A is not yet in the translation table, then its block has not been copied into main memory. We therefore . pointer e databas a s a k bloc e th leav d n i an t ei r pointe s thi e swizzl t canno If we try to follow a pointer P fiom a block, and we find that pointer P is make o t d nee e w n the , pointer e databas a f o m for l unswizzlede stil th n i , i.e. , e y (0els 1 memor n i s i o t s point P t tha m ite e th g containin B k bloc e th e sur f i e se o t e tabl n translatio e th t t pointer?)tha consul g e W . followin e w e ar y wh database address P currently has a memory equivalent. If not, we copy block B into a memory buffer. Once B is in memory, we can "swizzle" P by replacing . form y memor t equivalen e th y b m for e databas s it

d SwizzlinDeman n o g

s fusi k t bloc e th n whe d l pointeral e unswizzle leav s o t s i h approac r Anothe brought into memory. We enter its address, and the addresses of its pointers, into the translation table, along with their memory equivalents. If and when we follow a pointer P that is inside some block of memory, we swiz/le it, using pointed g usin r unswizzle n a d foun e w n whe d followe e w t tha y strateg e sam e th automatic swizzling. The difference between on-demand and automatic swi/zling is that the latter ik s bloc e th n whe y efficientl d an y quickl d e pointeith l al t swizzle s ge o t s trie loaded into memory. The possible time saved by swizzling all of a block's pointers at one time must be weighed against the possibility that some swiz/led pointers will nevei be followed. In that case, any time spent swiz/ling and . unswi/zlinwasted pointee e b gth l wil i d invali e lik k loo t databas s tha e epointei arrang o t s i n optio g interestin n A memoiy addiesses. If so, then we can allow the computer to follow any pointer n the , unswizzled e b o t s happen r pointe e th f I . form y memoi s it n i e wer t i f i s a the memoiy reference will cause a hardware trap. If the DBMS provides a r pointe e th " "swizzles n functio s thi d an , trap e th y b d invoke s i t tha n functio in the manner described above, then we can follow swizzled pointers in single instructions, and only need to do something more time consuming when the pointer is unswizzled.

No Swizzling

Of course it is possible nevei to swizzle pointers. We still need the translation table, so the pointers may be followed in their unswizzled form. This approach does offer the advantage that records cannot be pinned in memory, as discussed in Section 3.3.5, and decisions about which foim of pointer is present need not be made. ELEMENTS DATA REPRESENTING 3. CHAPTER 4 10

Programmer Control of Swizzling r whethe r programme n applicatio e th y b n know e b y ma t i , applications e som n I the pointers in a block are likely to be followed. This programmer may be able s pointer s it e hav o t s i y memor o int d loade k bloc a t tha y explicitl y specif o t swizzled, or the programmer may call for the pointers to be swizzled only as d accesse e b o t y likel s i k bloc a t tha s know r programme a f i , example r Fo . needed heavily, such as the root block of a B- (discussed in Section 4.3), then the pointers would be swizzled. However, blocks that are loaded into memory, used once, and then likely dropped from memory, would not be swizzled.

3.3.4 Returning Blocks to Disk t tha n withi s pointer y an , disk o t k bac y memor m fro d move s i k bloc a n Whe block must be "unswizzled"; that is, their memory addresses must be replaced by the corresponding database addresses. The translation table can be used to associate addresses of the two types in either direction, so in principle it is e th h whic o t s addres e databas e th , address y memor a n give , find o t e possibl memory address is assigned. However, we do not want each unswizzling operation to require a search of the entire translation table. While we have not discussed the implementation of this table, we might imagine that the table of Fig. 3.9 has appropriate indexes. If we think of the translation table as a relation, then the problem of finding s a d expresse e b n ca x s addres e databas a h wit d associate s addres y memor e th the query:

SELECT memAddr FROM TranslationTable WHERE dbAddr = x;

For instance, a hash table using the database address as the key might be y man s suggest 4 r Chapte ; attribute r dbAdd e th n o x inde n a r fo e appropriat . structures a dat e possibl , query e revers e th t suppor o t t wan e w f I

SELECT dbAddr FROM TranslationTable ; y = r memAdd E WHER

then we need to have an index on attribute memAddr as well. Again, Chapter 4 s talk 5 3.3. n Sectio , Also . index n a h suc r fo e suitabl s structure a dat t sugges about linked-list structures that in some circumstances can be used to go from . address t tha o t s pointer y main-memor l al o t s addres y memor a 3.3. REPRESENTING BLOCK AND RECORD ADDRESSES 105 s Block d an s Record d Pinne 5 3.3. A block in memory is said to be pinned if it cannot at the moment be safely written back to disk. A bit telling whether or not a block is pinned can be located in the header of the block. There aie many reasons why a block could be pinned, including requirements of a recovery system as discussed in Chapter 8. Pointer swizzling introduces an important reason why certain blocks must be pinned.

_/?k 2, bloc n i m ite a dat e som o t r pointe d swizzle a t i n withi s Bk ha I bloc a f I g reusin d an k dis o t k B?k bac bloc g movin t abou l carefu y ver e b t mus e w n the its main-memory buffer. The reason is that, should we follow the pointer in r pointe e th , effect s BI\ n i hold r longe o n h whic , buffer e th o t s u d lea l wil BI, t i r pointe d swizzle a y b o t d referre s i t B%,e tha lik , block A . dangling e becom s ha from somewhere else is therefore pinned. neey "unswizzleo donl t t no e y w , an " disk o t k bac k bloc a e writ e w n Whe pointers in that block. We also need to make sure it is not pinned. If it is pinned, we must either unpin it, or let the block remain in memory, occupying k bloc a n unpi o T . block r othe e som r fo d use e b e otherwis d coul t tha e spac that is pinned because of swizzled pointers from outside, we must "unswizzle" h eac r fo , record t mus e tabl n translatio e th , Consequently . it o t s pointer y an database address whose data item is in memory, the places in memory where swizzled pointers to that item exist. Two possible approaches are: d attache t lis d linke a s a s addres y f referenceo t memor lis a e o t sth p Kee . 1 . table n translatio e th n i s addres t tha r fo y entr e th o t

2. If memory addresses are significantly shorter than database addresses, we . themselves s pointer e th r fo d use e spac e th n i t lis d linke e th e creat n ca That is, each space used for a database pointer is replaced by

d an , pointer d swizzle e Th ) (a (b) Another pointer that forms part of a of all occurrences of this pointer. d coul y r pointe y memor a f o s occurrence e th l al w ho s 3.1e 1suggest Figur be linked, starting at the entry in the translation table for database ad- y. s addres y memor g correspondin s it d an x s dres

3.3.6 Exercises for Section 3.3 k dis 7 74 n Megatro e th r fo s addresse l physica t represen 3.3.1e e w f I : Exercis * by allocating a separate byte or bytes to each of the cylinder, track within a e Mak ? need e w o d s byte y man w ho , track a n withi k bloc d an , cylinder a reasonable assumption about the maximum number of blocks on each track; . sectors/track f o r numbe e variabl a s ha 7 74 n Megatro e th t tha l recal ELEMENTS DATA REPRESENTING 3. CHAPTER 6 10

e tabl n Translatio r pointe d swizzle a f o s occurrence f o t lis d linke A : 3.11 e Figur

Exercise 3.3.2 : Repeat Exercise 3.3.1 for the Megatron 777 disk described in Exercise 2.2.1 k bloc s a l wel s a s addresse d recor t represen o t e 3.3.3 h f wI wis e: Exercis * addresses, we need additional bytes. Assuming we want addresses for a single r fo d nee e w d woul s byte y man e 3.3.1 w ho , Exercis n i s a k dis 7 74 n Megatro : we f i s addresse d recor

* a) Included the number of the byte within a block as part of the physical address. s record d store e th t tha e Assum . records r fo s addresse d structure d Use ) b . key a s a r intege e 4-byt a e hav

Exercise 3.3.4: Today, IP addresses have four bytes. Suppose that block addresses for a world-wide address system consist of an IP address for the host, a device number between 1 and 1000, and a block address on an individual device (assumed to be a Megatron 747 disk). How many bytes would block addresses require?

, addition n I . bytes 6 1 e us l wil s addresse P I , future e 3.3.5e th n I : Exercis y an t a t star y ma h whic , records t bu , blocks y onl t no s addres o t t wan y ma e w byte of a block. However, devices will have their own IP address, so there will y necessar s wa d suggeste e w s a , host a n withi e devic a t represen o t d nee o n e b n i s addresse t represen o t d neede e b d woul s byte y man w Ho . 3.3.4 e Exercis n i these circumstances, again assuming devices were Megatron 747 disks?

a n o s block f o s addresse e th t represen o t h wis e w e 3.3.6e Suppos : !Exercis o als e fce W . som r fo s byte k f o s identifier g usin , i.e. , logically k dis 7 74 n Megatro s pair f o g consistin , 3.7 . Fig n i s a , table p ma a f itsel k dis e th n o e stor o t d nee of logical and physical addresses. The blocks used for the map table itself are 7 10 ADDRESSES RECORD AND BLOCK REPRESENTING 3.3. s addresse l logica n ow r thei e hav t no o d e therefor d an , database e th f o t par t no in the map table. Assuming that physical addresses use the minimum possible number of bytes for physical addresses (as calculated in Exercise 3.3.1), and l logica r fo s byte f o r numbe e possibl m minimu e th e us e likewis s addresse l logica addresses, how many blocks of 4096 bytes does the map table for the disk occupy? e stor e w h whic n i s block e 4096-byt e hav e w t tha e Suppos : 3.3.7 e Exercis ! * , 3.8 . Fig n i s a , table t offse n a f o s consist r heade k bloc e Th . bytes 0 10 f o s record o tw , day e averag n a n O . block e th n withi s record o t s pointer e 2-byt g usin records per block are inserted, and one record is deleted. A deleted record must have its pointer replaced by a "tombstone," because there may be dangling s occur s alway y da y an n o n deletio e th e assum , specificity r Fo . it o t s pointer before the insertions. If the block is initially empty, after how many days will ? records e mor y an t inser o t m roo no e b e ther ! Exercise 3.3.8: Repeat Exercise 3.3.7 on the assumption that each day there is one deletion and 1.1 insertions on the average. Exercise 3.3.9 : Repeat Exercise 3.3.7 on the assumption that instead of delet- ing records, they are moved to another block and must be given an 8-byte forwarding address in their offset-table entry. Assume either: ! a) All offset-table entries are given the maximum number of bytes needed in an entry. l al t tha y wa a h suc n i h lengt n i y var o t d allowe e ar s entrie e Offset-tabl ) b ! ! entries can be found and interpreted properly. e w , automatically s pointer l al e swizzl e w f i t tha e 3.3.10e Suppos : Exercis * e on h eac e swizzl o t e tak d woul t i e tim e th f hal n i g swizzlin e th m perfor n ca t a d followe e b l wil y memor n mai n i r pointe a t tha y probabilit e th f I . separately y automaticall e swizzl o t t efficien e mor t i s i p f o s value t wha r ps i fo , e onc t leas than on demand? e w t tha y possibilit e th e includ o t 0 3.3.1 e Exercis e 3.3.11e Generaliz : Exercis ! never swizzle pointers. Suppose that the important actions take the following times, in some arbitrary time units: i. On-demand swizzling of a pointer: 30. ii. Automatic swizzling of pointers: 20 per pointer. iii. Following a swizzled pointer: 1. . 10 : pointer d unswizzle n a g Followin iv. p) — 1 y (probabilit d followe t no r eithe e t in-memorar tha s e ypointer Suppos or are followed k times (probability p). For what values of k and p do no- d on-demand-swizzlint an , bes e th r offe h eac g automatic-swizzling , swizzling ? performance e averag 108 CHAPTER 3. REPRESENTING DATA ELEMENTS s Record d an a Dat h Variable-Lengt 4 3.

Until now, we have made the simplifying assumption that every data item has f o t lis a s i a schem e th t tha d an , schema d fixe a e hav s record t tha , length d fixe a fixed-length fields. However, in practice, life is rarely so simple. We may wish to represent:

1. Data items whose size vanes. For instance, in Fig. 3.1 we considered a MovieStar relation that had an address field of up to 255 bytes. While there might be some addresses that long, the vast majority of them will probably be 50 bytes or less. We could probably save more than half the space used for storing MovieStar tuples if we used only as much space as . l addiesactua sneeded e th s object r movie-sta f o s clas a d discusse e w 1 3. e . Exampl Repeating2 n I fields. that contained a relationship to a set of movies in which the star appeared. The number of movies varies from star to star, so the amount of space needed to store such a star object as a record would vary, with no obvious limit.

e th t wha e advanc n i w kno t no o d e w s . 3 Variable-formatSometime records. fields of a record will be, or how many occurrences of each field there will be. For example, some movie stars also direct movies, and we might . directed y the s movie e th o t g referrin d recor r thei o t s field d ad o t t wan Likewise, some stars produce movies or participate in other ways, and we might wish to put this information into their record as well. However, since most stars are neither producers nor directors, we would not want to reserve space for this information in every star's record.

4. Enormous fields. Modern DBMS's support attributes whose value is a very large data item. For instance, we might want to include a picture A . star e th f o e imag F GI a s i t tha d recor r movie-sta a h wit e attribut movie record might have a field that is a 2-gigabyte MPEG encoding of the movie itself, as well as more mundane fields such as the title of the movie. These fields are so large, that our intuition that records fit within blocks is contradicted.

3.4.1 Records With Variable-Length Fields

If one or more fields of a record have variable length, then the record must contain enough information to let us find any field of the record. A simple but effective scheme is to put all fixed-length fields ahead of the variable-length fields. We then place in the record header:

1. The length of the record. 9 10 RECORDS AND DATA VARIABLE-LENGTH 3.4.

2. Pointers to (i.e., offsets of) the beginnings of all the variable-length fields. , order e sam e th n i r appea s alway s field h variable-lengt e th f i , However then the first of them needs no pointer; we know it immediately follows the fixed-length fields.

Example 3.10 : Suppose that we have movie-star records with name, address, gender, and birthdate. We shall assume that the gender and birthdate are e nam h bot , However . respectively , bytes 2 1 d an 4 g takin , fields h fixed-lengt - ap s i h lengt r whateve f o s string r characte y b d represente e b l wil s addres d an k loo d woul d recor r movie-sta l typica a t wha s suggest 2 3.1 e Figur . propriate o t r pointe o n , Thus . address e th e befor e nam e th t pu s alway l shal e W . like e th r afte t righ n begi s alway l wil d fiel t tha ; needed s i e nam e th f o g beginnin e th fixed-length portion of the record. D

Figure 3.12: A MovieStar record with name and address implemented as variable-length character strings s Field g Repeatin h Wit s Record 2 3.4. s occurrence f o r numbe e variabl a s contain d recor a f i s occur n situatio r simila A l al p grou o t t sufficien s i t I . length d fixe f o s i f itsel d fiel e th t bu F, d fiel a f o e th o t r pointe a r heade d recor e th n i t pu d an r togethe F d fiel f o s occurrence first. We can locate all the occurrences of the field F as follows. Let the number r fo t offse e th o t d ad n the e W L. e b F d fiel f o e instanc e on o t d devote s byte f o the field F all integer multiples of L, starting at 0, then L, 2L, 3L, and so on. Eventually, we reach the offset of the field following F, whereupon we stop.

Example 3.11: Suppose that we redesign our movie-star records to hold only the name and address (which are variable-length strings) and pointers to all the movies of the star. Figure 3.13 shows how this type of record could be d fiel s addres e th f o g beginnin e th o t s pointer s contain r heade e Th . represented (we assume the name field always begins right after the header) and to the first of the movie pointers. The length of the record tells us how many movie D . are e ther s pointer 110 CHAPTER 3. REPRESENTING DATA ELEMENTS

Representing Null Values

Tuples often have field NULLe b srecore thay Th . ma td foima Figf o t . 3.12 offers a convenient way to represent NULL values. If a field such as address is null, then we put a null pointer in the place where the pointer to an address goes. Then, we need no space for an address, except the place for the pointer. This arrangement can save space on average, even if address . NULL e valu e th s ha y frequentl t bu d fiel h fixed-lengt a s i

pointers to movies

Figure 3.13: A record with a repeating group of refeiences to movies t pu d an , length d fixe f o d recor e th p kee o t s i n alteinativn A erepresentatio t repeatha s t field r o h lengt e variabl f o s field t i e b — n portio h variable-lengt e th e w f itsel d recor e th n I . block e separat a n o — s e numbetime f o i indefinit n a keep

1. Pointers to the place wheie each repeating field begins, and

2. Eithei how many repetitions there are, or where the repetitions end.

Figure 3.14 shows the layout of a record for the problem of Example 3.11, t wit e variable-lengtbu hth h fields nam addressd e repeatinan eth d an , g field starredl movif o t ese referencesa n( ) separata kep n o t e bloc r blocksko . There are advantages and disadvantages to using indirection for the variable- length components of a record:

• Keeping the recoid itself fixed-length allows records to be searched more efficiently, minimizes the overhead in block headers, and allows records to . effort m minimu h wit s block g amon r o n withi d move e b 1 11 RECORDS AND DATA VARIABLE-LENGTH 3.4.

Figure 3.14: Storing variable-length fields separately from the record

• On the other hand, storing variable-length components on another block increases the number of disk I/0's needed to examine all components of a record. d recor e th f o n portio h fixed-lengt e th n i p kee o t s i y strateg e compromis A enough space for:

, fields g repeatin e th f o s occurrence f o r numbe e reasonabl e Som . 1

2. A pointer to a place where additional occurrences could be found, and

3. A count of how many additional occurrences there are.

If there are fewer than this number, some of the space would be unused. If there l additiona o t r pointe e th n the , portion h fixed-lengt e th n i t fi n ca n tha e mor e ar space will be nonnull, and we can find the additional occurrences by following . pointer s thi

3.4.3 Variable-Format Records An even more complex situation occurs when records do not have a fixed y b d determine y completel t no e ar r orde r thei r o s field e th , is t Tha . schema 112 CHAPTER 3. REPRESENTING DATA ELEMENTS

the relation or class whose tuple or object the record represents. The simplest representation of variable-format records is a sequence of tagged fields, each of : of s consist h whic

1. Information about the role of this field, such as:

(a) The attribute or field name, (b) The type of the field, if it is not apparent from the field name and d an , information a schem e availabl y readil e som (c) The length of the field, if it is not apparent from the type.

2. The value of the field.

. ^ sense e mak d woul s field d tagge y wh s reason o tw t leas t a e ar e Ther

- con n bee s ha n relatio a , . 1 Information-integrationSometimes applications. structed from several earlier sources, and these sources have different kinds of information; see Section 11.1 for a discussion. For instance, our movie- s record h whic f o e on , sources l severa m fro e com e hav y ma n informatio r sta birthdates and the others do not, some give addresses, others not, and so mano to t y probabl there f no fieldsI ar e . e ear on w , y bes leavinf of t g NULL those values we do not know. However, if there are many sources, with many different kinds of information, then there may be too many NULL's, and we can save significant space by tagging and listing only the nonnull fields.

2. Records with a very flexible schema. If many fields of a record can repeat and/or not appear at all, then even if we know the schema, tagged fields n informatio n contai y ma s record l medica , instance r Fo . useful e b y ma about many tests, but there are thousands of possible tests, and each patient has results for relatively few of them.

Example 3.12: Suppose some movie stars have information such as movies t bu d fixe r othe f o r numbe a d an , owned s restaurant , spouses r forme , directed - hypothet a f o g beginnin e th e se e w 5 3.1 . Fig n I . information f o s piece l unusua s code e single-byt t tha e suppos e W . fields d tagge g usin d recor r movie-sta l ica e ar s code e Appropriat . types d an s name d fiel e possibl s variou e th r fo d use e ar f o h bot , shown s field o tw e th r fo s length h wit g alon , figure e th n o d indicate . D string e typ f o e b o t n happe h whic

3.4.4 Records That Do Not Fit in a Block We shall now address another problem whose importance has been increasing : values e larg h wit s datatype e manag o t d use y frequentl e mor e ar s DBMS' s a often values do not fit in one block. Typical examples are video or audio "clips." 3 11 RECORDS AND DATA VARIABLE-LENGTH 3.4.

s field d tagge h wit d recor A : 3.15 e Figur d fixe s i h lengt e th f i n eve t bu , length e variabl a e hav s value e larg Oftene thes , for all values of the type, we need to use some special techniques to represent d "spanne d calle e techniqu a r conside l shal e w n sectio s e valuesthi n thes I . e Th . blocks n tha r large e ar t tha s record e manag o t d use e b n ca t tha " records n i d addresse s i ) gigabytes r o s (megabyte s value e larg y extremel f o t managemen . 3.4.5 n Sectio Spanned records also are useful in situations where records are smaller than f o s amount t significan s waste s block o int s record e whol g packin t bu , blocks space. For instance, the waste space in Example 3.8 was only 7%, but if records e Th . 50% h approac n ca e wastag e th , block a f hal n tha r large y slightl t jus e ar . recoie block on r y dpe onl k pac n ca e w n the t tha s i n reaso For both these reasons, it is sometimes desirable to allow records to bo split ik s bloc e on n i s appear t tha d recor a f o n portio e Th . blocks e mor r o o tw s acros spanned, d calle s i s fragment e mor r o o tw h wit d recor A fragment. record a d calle unspanned. e ar y boundar k bloc a s cros t no o d t tha s record d an s require t fragmen d recor d an d recor y ever n the , spanned e b n ca s record f I some extra header information: t no r o r whethe g tellin t bi a n contai t mus r heade t fragmen r o d recor h Eac . 1 it is a fragment.

2. If it is a fragment, then it needs bits telling whether it is the first or last fragment for its record. e th n the , record e sam e th r fo t fragmen s previou r and/o t nex a s i e ther f I . 3 . fragments r othe e thes o t s pointer s need t fragmen

a f o % 60 t abou e wer t tha s record w ho s suggest 6 3.1 e 3.13e Figur : Exampl block in size could be stored with three records for every two blocks. The header for record fragment 2a contains an indicator that it is a fragment, an indicator that it is the first fragment for its record, and a pointer to next fragment, 2&. Similarly, the header for 26 indicates it is the last fragment for its record and holds a back-pointer to the previous fragment 2a. D ELEMENTS DATA REPRESENTING 3. CHAPTER 4 11

block 1 block 2

s block s acros s record d spanne g Storin : 3.16 e Figur

3.4.5 BLOBS s field r o s record r fo s value e larg y trul f o n representatio e th r conside s u t le , Now , GIF , (e.g. s format s variou n i s image e includ s example n commo e Th . records f o or JPEG), movies in formats such as MPEG, or signals of all sorts: audio, radar, and so on. Such values are often called binary, large objects, or BLOBS. When a field has a BLOB as value, we must rethink at least two issues.

Storage of BLOBS

A BLOB must be stored on a sequence of blocks. Often we prefer that these blocks are allocated consecutively on a cylinder or cylinders of the disk, so the BLOB may be retrieved efficiently. However, it is also possible to store the . blocks f o t lis d linke a n o B BLO Moreover, it is possible that the BLOB needs to be retrieved so quickly k dis e on n o t i g storin t tha , time) l rea n i d playe e b t mus t tha e movi a , (e.g. does not allow us to retrieve it fast enough. Then, it is necessary to stripe the g amon B BLO e th f o s block e alternat o t , is t tha , disks l severa s acros B BLO these disks. Thus, several blocks of the BLOB can be retrieved simultaneously, f o r numbe e th o t l equa y approximatel r facto a y b e rat l retrieva e th g increasin disks involved in the striping.

Retrieval of BLOBS e th g containin k bloc e th , record a s want t clien a n whe t tha n assumptio r Ou record is passed from the database server to the client in its entirety may not hold. We may want to pass only the "small" fields of the record, and allow the f o t res e th f o y independentl , time a t a e on B BLO e th f o s block t reques o t t clien s request t clien e th d an , movie r 2-hou a s i B BLO e th f i , instance r Fo . record e th to have the movie played, the movie could be shipped several blocks at a time to the client, at just the rate necessary to play the movie. 3.4. VARIABLE-LENGTH DATA AND RECORDS 115 t reques o t e abl e b t clien e th t tha t importan o als s i t i , applications y man n I interior portions of the BLOB without having to receive the entire BLOB. g endin e th r o , movie a f o e minut h 45t e th e se o t t reques a e b d woul s Example of an audio clip. If the DBMS is to support such operations, then it requires a suitable index structure, e.g., an index by seconds on a movie BLOB.

3.4.6 Exercises for Section 3.4 : fields h fixed-lengt g followin e th f o s consist d recor t 3.4.1e patien A : Exercis * the patient's date of birth, social-security number, and patient ID, each 10 bytes long. It also has the following variable-length fields: name, address, and patient history. If pointers within a record require 4 bytes, and the record length is a 4-byte integer, how many bytes, exclusive of the space needed for the variable- f o t alignmen o n t tha e assum y ma u Yo ? record e th r fo d neede e ar , fields h lengt . required s i s field

* Exercise 3.4.2: Suppose records are as in Exercise 3.4.1, and the variable- y uniforml s i t tha h lengt a e hav h eac y histor d an , address , name s field h lengt distributed. For the name, the range is 10-50 bytes; for address it is 20-80 bytes, and for history it is 0 1000 bytes. What is the average length of a patient record?

Exercise 3.4.3: Suppose that the patient records of Exercise 3.4.1 are aug- mented by an additional repeating field that represents cholesterol tests. Each cholesterol test requires 16 bytes for a date arid an integer result of the test. : if s record t patien f o t layou e th w Sho

a) The repeating tests are kept with the record itself.

b) The tests are stored on a separate block, with pointers to them in the record. e suppos , 3.4.1 e Exercis f o s record t patien e th h wit g 3.4.4e Startin : Exercis we add fields for tests and their results. Each test consists of a test name, a , date, and a test result. Assume that each such test requires 40 bytes. Also, suppose that for each patient and each test a result is stored with probability P-

a) Assuming pointers and integers each require 4 bytes, what is the average t tha g assumin , record t patien a n i s result t tes o t d devote s byte f o r numbe ? field h variable-lengt a s a , itself d recor e th n withi t kep e ar s result t tes l al

b) Repeat (a), if test results are represented by pointers within the record to test-result fields kept elsewhere. ! c) Suppose we use a hybrid scheme, where room for k test results are kept within the record, and additional test results are found by following a 116 CHAPTER 3. REPRESENTING DATA ELEMENTS

pointer to another block (or chain of blocks) where those results are kept. As a function of p, what value of k minimizes the amount of storage used ? results t tes r fo e th t no s i s field t test-resul g repeatin e th y b d use e spac f o t amoun e Th ) d ! ! e minimiz o t h wis e w t meri f o e figur e th t tha e suppos s u t Le . issue y onl e stor o t e hav e w f i 0 10.00 f o y penalt a s plu , used s byte f o r numbe e th s i r fo O I/ k dis a e requir l wil e therefor d (an k bloc r anothe n o s someresult , assumption s thi r Unde . do o t d nee e w s accesse t test-resul e th f o y man pif o n functio a s a k f o e valu t bes e th s i t wha

*!! Exercise 3.4.5: Suppose blocks have 1000 bytes available for the storage of e wher r, h lengt f o s record h fixed-lengt m the n o e stor o t h wis e w d an , records 500 < r < 1000. The value of r includes the record header, but a record t wha r Fo . header t fragmen e th r fo s byte 6 1 l additiona n a s require t fragmen ? records g spannin y b n utilizatio e spac e improv e w n ca r f o s value

! Exercise 3.4.6: Recall from Example 2.3 that the transfer rate of the Mega- s use e movi G MPE n A . block e 4096-byt r pe d millisecon 2 1/ s i k dis 7 74 n tro about one gigabyte per hour of play. If we organize the blocks of an MPEG ? time l rea n i e movi e th y pla e w Megatio a n n o ca , n ca n 747 e w t bes s a e movi e th e organiz e w d coul w Ho ? need e w d woul s disk n Megatro y man w ho , not f I blocks so that the movie could be played with only a very small delay?

s Modification d Recor 5 3. e Thes . problems l specia e creat n ofte s record f o e updat d an , deletions , Insertions problems are most severe when the records change their length, but they come up even when records and fields are all of fixed length.

3.5.1 Insertion , equivalently r (o n relatio a o int s record w ne f o n insertio r conside s u t le , First n i t kep e ar n relatio a f o s record e th f I . class) a f o t exten t curren e th o int no particular order, we can just find a block with some empty space, or get a new block if there is none, and put the record there. Usually, there is some mechanism for finding all the blocks holding tuples of a given relation or objects s block e thes f o k trac p kee o t w ho f o n questio e th r defe l shal e w t bu , class a f o until Section 4.1. There is more of a problem when the tuples must be kept in some fixed order, such as sorted by their primary key. There is good reason to keep records sorted, since it facilitates answering certain kinds of queries, as we shall see in e appropriat e th e locat t firs e w , record w ne a t inser o t d nee e w f I . 4.1 n Sectio block foi that record. Fortuitously, there may be space in the block to put the s record e slid o t e hav y ma e w , order n i t kep e b t mus s record e Sinc . record w ne . point r prope e th t a e availabl e spac e mak o t k bloc e th n i d aroun 3.5. RECORD MODIFICATIONS 111

e showew t dtha n e bloce tecordsth slid organizatio kn o t the , d nee e w f I in Fig. 3.8, which we reproduce here as Fig. 3.17, is useful. Recall from our discussion in Section 3.3.2 that we may create an "offset table" in the header of r pointe A . block e th n i d recor h eac f o n locatio e th o t s pointer h wit , block h eac k bloc e th , is t tha " address, d "structure a s i k bloc e th e outsid m fro d recor a o t . table t offse e th n i d recor e th r fo y entr e th f o n locatio e th d an s addres

Figure 3.17: An offset table lets us slide records within a block to make room for new records e w n the , hand t a k bloc e th n i d recor d inserte e th r fo m roo d fin n ca e w f I simply slide the records within the block and adjust the pointers in the offset e th o t r pointe w ne a d an , block s e insertei th d o dint recor w ne e Th . table record is added to the offset table for the block. However, there may be no room in the block for the new record, in which s approache r majo o tw e ar e Ther . block e th e outsid m roo d fin o t e hav e w e cas to solving this problem, as well as combinations of these approaches. 1. Find space on a "nearby" block. For example, if block BI has no available , block t tha o int r orde d sorte n i d inserte e b o t s need t tha d recor a r fo e spac

then look at the following block B2 in the sorted order of the blocks. If e th e slid BId f?f o t o 2 an ,) record(s t highes e th e _Bn i 2mov , m roo s i e ther records around on both blocks. However, if there are external pointers to records, then we have to be careful to leave a forwarding address in the d Bo t an 2 d move n bee s ha d recor n certai a t tha y sa BIf o t o e tabl t offse s addresse g forwardin g Allowin . _Bf is o e tabl t offse e th n i y entr s it e wher

2 typically increases the amount of space needed for entries of the offset table. 2. Create an overflow block. In this scheme, each block B has in its header t tha s record l additiona e wher k bloc overflow n a o t r pointe a r fo e plac a theoretically belong in B can be placed. The overflow block for B can point to a second overflow block, and so on. Figure 3.18 suggests the situation. We show the pointer for overflow blocks as a nub on the block, although it is in fact part of the block header. 118 CHAPTER 3. REPRESENTING DATA ELEMENTS

k bloc w overflo B k bloc forB

Figure 3.18: A block and its first overflow block

3.5.2 Deletion When we delete a record, we may be able to reclaim its space. If we use an e w n the , block e th d aroun e slid n ca s record d an 7 3.1 . Fig n i s a e tabl t offse can compact the space in the block so there is always one unused region in the center, as suggested by that figure. If we cannot slide records, we should maintain an available-space list in the block header. Then we shall know where, and how large, the available regions are if a new record is inserted into the block. Note that the block header normally does not need to hold the entire available space list. It is sufficient to put the list head in the block header, and use the available regions themselves . 3.11 . Fig n i d di e w s a h muc , list e th n i s link e th d hol o t . block w overflo n a h wit y awa o d o t e deleteds i d abl e b ,y recor wa ema n Whe If the record is deleted either from a block B 01 from any block on its overflow chain, we can consider the total amount of used space on all the blocks of that chain. If the records can fit on fewer blocks, and we can safely move records among blocks of the chain, then a reorganization of the entire chain can be performed. e w h whic , deletion n i d involve n complicatio l ons i ee additiona ther , However must remember regardless of what scheme we use for reorganizing blocks. There may be pointers to the deleted record, and if so, we don't want these pointers e th f o e plac e th n i t pu s i t tha d recor w ne a o t g pointin p u d win r o e dangl o t deleted record. The usual technique, which we pointed out in Section 3.3.2, is to place a tombstone in place of the record. This tombstone is permanent; it must exist until the entire database is reconstructed. f I . recorf o e dpointers natur e th n o s depend d place s i e tombston e th e Wher , found s i r pointe e th f o n locatio e th h whic m fro s location d fixe o t o g s pointer : examples o tw e ar e Her . location d fixe t tha n i e tombston e th t pu e w n the

1. We suggested in Section 3.3.2 that if the offset-table scheme of Fig. 3.17 were used, then the tombstone could be a null pointer in the offset table, since pointers to the record were really pointers to the offset table entries. d recor l logica e translat o t , 3.7 . Fig n i s a , table p ma a g usin e ar e w f I . 2 r pointe l nul a e b n ca e tombston e th n the , addresses l physica o t s addresse in place of the physical address. 9 11 MODIFICATIONS RECORD 3.5. e th t a e hav o t e wis e b d woul t i , tombstones y b s record e replac o t d nee e w f I very beginning of the record header a bit that serves as a tombstone; i.e., it is . deleted n bee s ha d t therecor tha s mean 1 e whil , deleted not s i d recor e th f i 0 t subsequen d an , begin o t d use d whern recor e eth remai t mus t bi s thi y onl , Then bytes can be reused for another record, as suggested by Fig. 3.19.4 When we follow a pointer to the deleted record, the first thing we see is the "tombstone" bit telling us that the lecord was deleted. We then know not to look at the following bytes.

2 d recor ; remains e tombston e th t bu , replaced e b n ca 1 d Recor : 3.19 e Figur t i o t r pointe a g followin y b n see e b n ca d an e tombston o n s ha

3.5.3 Update When a fixed-length record is updated, there is no effect on the storage system, because we know it can occupy exactly the same space it did before the update. s problem e th l al e hav e w , updated s i d recor h variable-lengt a n whe , However associated with both insertion and deletion, except that it is never necessary to create a tombstone for the old version of the record. If the updated record is longer than the old version, then we may need to create more space on its block. This process may involve sliding records or even the creation of an overflow block. If variable-length portions of the e mov o t d nee y ma e w n the , 3.14 . Fig n i s a , block r anothe n o d store e ar d recor elements around that block 01 create a new block for storing variable-length e th e hav e w , update s becaus e th f shrink eo d recor e th f i , Conversely . fields same opportunities as with a deletion to recover or consolidate space, or to eliminate overflow blocks.

3.5.4 Exercises for Section 3.5 Exercise 3.5.1: Suppose we have blocks of records sorted by their sort key t sor f o e rang a s ha k bloc h Eac . order n i s block g amon d partitione d an d fiel s i 3 4.1. n Sectio sparse-indee n i (th e e outsid xstructur m fro n know s i t tha s key an example of this situation). There are no pointers to records from outside, so it is possible to move records between blocks if we wish. Here are some of the ways we could manage insertions and deletions. 4However, the field-alignment problem discussed in Se

i. Split blocks whenever there is an overflow. Adjust the range of sort keys for a block when we do. n. Keep the range of sort keys for a block fixed, and use overflow blocks as needed. Keep for each block and each overflow block an offset table for the records in that block alone. in. Same as (ii), but keep the offset table for the block and all its overflow blocks in the first block (or overflow blocks if the offset table needs the e mov n ca e w , needed s i e tabl t offse e th r fo e spac e mor f i t space)tha e Not . records from the first block to an overflow block to make room. iv. Same as (ii), but keep the sort key along with a pointer in the offset tables. t offse e th n i r pointe a h wit g alon y ke t sor e th p kee s (Hi),t a e bu Sam v. table.

Answer the following questions:

* a) Compare methods (i) and (ii) for the average numbers of disk I/O's needed to retrieve the record, once the block (or first block in a chain s i y ke t sor n give a h wit d recor a e hav d coul t tha ) blocks w overflo h wit found. Are there any disadvantages to the method with the fewer average f/0'sk dis ? r pe s I/O' k dis f o s number e averag r thei r fo (in) d an (ii) s method e Compar ) b record retrival, as a function of b, the total number of blocks in the chain. e tak s record e th d an , space e th f o % 10 s take offsee e th t tabl t tha e Assum the remaining 90%. e Assum . (b) t par m fro n compariso e th n i (v) d s (iv)an method e Includ ) c ! that the sort key is 1/9 of the record. Note that we do not have to repeat the sort key in the record if it is in the offset table. Thus, in effect, the offset table uses 20% of the space and the remainders of the records use 80% of the space.

Exercise 3.5.2: Relational database systems have always preferred to use fixed-length tuples if possible. Give three reasons for this preference.

3.6 Summary of Chapter 3

4- Fields: Fields are the most primitive data elements. Many, such as in- tegers or fixed-length character strings are simply given an appropriate number of bytes in secondary storage. Variable-length character strings are encoded either with a fixed-length block and an endmarker, or stored in an area for varying strings, with a length indicated by an integer at the . end e th t a r endmarke n a r o g beginnin 3.6. SUMMARY OF CHAPTER 3 121 e Th . header d recor a s plu s field l severa f o d compose e ar s Record 4Records: - header contains information about the record, possibly including such . length d recor a d an , information a schem , timestamp a s a s matter h r moro e variable-lengt e on n contai s 4 Variable-Lengthrecord f I Records: - addi n the , field a f o s repetition f o r numbe n unknow n a n contai r o s field tional structure is necessary. A directory of pointers in the record header can be used to locate variable-length fields within the record. Alterna- - (fixed y b s field g h 0repeatin 1 variable-lengt e th e replac n ca e w , tively length) pointers to a place outside the record where the field value is kept. 4 Blocks: Records are generally stored within blocks. A block header, with infprmation about that block consumes some of the space in the block, with the remainder occupied by one or more records. 4 Spanned Records: Generally, a record exists within one block. However, if records are longer than blocks, or we wish to make use of leftover space within blocks, then we can break records into two or more fragments, one f o s fragment e th k lin o t d neede n the s i r heade t fragmen A . block h eac n o a record. 4 BLOBS: Very large values, such as images and videos, are called BLOBS (binary, large objects). These values must be stored across many blocks. e th p kee o t e desirabl e b y ma t i , access r fo s requirement e th n o g Dependin BLOB on one cylinder, to reduce the access time for the BLOB, or it may be necessary to stripe the BLOB across several disks, to allow parallel retrieval of its contents. 4 Offset Tables: To support insertions and deletions of records, as well as h varying-lengt f o n modificatio o t e du h lengt r thei e chang t tha s record fields, we can put in the block header an offset table that has pointers to each of the records in the block. a , records g growin d an s insertion t suppor o t d use o Als Blocks: Overflow 4 block may have a link to an overflow block or chain of blocks, wherein are . block t thn i efirs g belon y logicall t tha s record e som t kep severag amon l d foun s i S DBM a y b d manage a Dat Addresses: Database 4 - stor s thi n i s record d an s block e locat o T . disks y typicall , devices e storag f o n descriptio a e ar h whic , addresses l physica e us n ca e w , system e ag a n withi e byt y possibl d an , sector(s) , track , cylinder , number e devic e th r characte y arbitrar e ar h whic , addresses l logica e us o als n ca e W . sector strings that are translated into physical addresses by a map table. e th f o t par g usin y b s record e locat o als y ma e W Addresses: Structured 4 , found s i d recor a n whereo k bloc e th f o n locatio e th , e.g. , address l physica plus additional information such as a key for the record or a position in . record e th s locate t tha k bloc a f o e tabl t offse e th 122 CHAPTER 3. REPRESENTING DATA ELEMENTS

4- Pointer Swizzhng: When disk blocks are brought to main memory, the database addresses need to be translated to memoiy addresses, if pointers e b r eithe swizzlingd n ca calle d s i an ,n translatio e Th . followed e b o t e ar , on-demand r o , memory o t t brough e ar s block n whe , automatically e don when a pointer is first followed.

4- Tombstones: When a record is deleted, it may cause pointers to it to