Chapter 3 Representing Data Elements

Chapter 3 a Dat g Representin Elements This chapter relates the block model of secondary storage that we covered in Section 2.3 to the requirements of database management systems. We begin by looking at the way that relations or sets of objects are represented in secondary storage. • Attributes need to be represented by fixed- or variable-length sequences of bytes, called "fields." • Fields, in turn, are put together in fixed- or variable-length collections . objects r o s tuple o t d correspon h whic " "records, d calle • Records need to be stored in physical blocks. Various data structures are useful, especially if blocks of records need to be reorganized when the database is modified. • A collection of records that forms a relation or the extent of a class is stored as a collection of blocks, called a file.1 To support efficient querying and modification of these collections, we put one of a number of "index" structures on the file; these structures are the subject of Chapters 4 and 5. 3.1 Data Elements and Fields We shall begin by looking at the representation of the most basic data elements: the values of attributes found in relational or object-oriented database systems. These are represented by "fields." Subsequently, we shall see how fields are put lrThe database notion of a "file" is somewhat more general that the "file" in an operating system. While a database file could be an unstructured stream of bytes, it is more common for the file to consist of a collection of blocks organized in some useful way, with indexes or other specialized access methods. We discuss these organizations in Chapter 4. 83 84 CHAPTER 3. REPRESENTING DATA ELEMENTS together to form the larger elements of a storage system: records, blocks, and files. 3.1.1 Representing Relational Database Elements Suppos e havw e e declare systemL a CREATda relatioSQ y n b a , En ni TABLE statement such as that of Fig. 3.1. The DBMS has the job of representing and storing the relation described by this declaration. Since a relation is a set of tuples, and tuples are similar to records or "structs" (the C or C++ term), we may imagine that each tuple will be stored on disk as a record. The record will d fiel e on e b l wil e ther d recor e th n withi d an , block k dis e som ) of t (par y occup for every attribute of the relation. CREATE TABLE MovieStar( , KEY e CHAR(30Y nam PRIMAR ) address VARCHAR(255), gender CHAR(l), birthdate DATE ); Figure 3.1: An SQL table declaration While the general idea appears simple, the "devil is in the details," and we shall have to discuss a number of issues: 1. How do we represent SQL datatypes as fields? ? records s a s tuple t represen e w o d w Ho . 2 3. How do we represent collections of records or tuples in blocks of memory? 4. How do we represent and store relations as collections of blocks? 5. How do we cope with record sizes that may be different for different tuples or that do not divide the block size evenly, or both? - up s i d fiel e som e becaus s change d recor a f o e siz e th f i s happen t Wha . 6 dated? How do we find space within its block, especially when the record grows? The first item is the subject of this section. The next two items are covered in . respectively , 3.5 d an 4 3. s Section n i o tw t las e th s discus l shal e W . 3.2 n Sectio d accesse e b n ca s tuple r thei o s s relation g representin — n questio h fourt e Th . 4 r Chapte n i d studie e b l wil — y efficientl e ar t tha a dat f o s kind n certai t represen o t w ho r conside o t d nee e w , Further t objec s a h suc , systems d object-oriente r o l object-relationa n moder n i d foun 3.1. DATA ELEMENTS AND FIELDS 85 identifiers (or other pointers to records) and "blobs" (binary, large objects, such as a 2-gigabyte MPEG video). These matters are addressed in Sections 3.4d an . 3 3. 3.1.2 Representing Objects Today, many database systems support "objects." These systems include pure d extende , C++ e lik e languag d object-oriente n a e wher , DBMS's d object-oriente with an object-oriented query language such as OQL,2 is used as the query and host language. They also include object-relational extensions of the classical a n i s attribute f o s value s a s object t suppor s system e thes ; systems l relationa relation. To a first approximation, an object is a tuple, and its fields or "instance variables" are attributes. However, there are two important differences: h wit d associate s function e special-purpos r o methods e hav n ca s Object . 1 f o s clas a r fo a schem e th f o t par s i s function e thes r fo e cod e Th . them objects. 2. Objects may have an object identifier (OID), which is an address in some global address space that refers uniqviely to that object. Moreover, ob- e ar s relationship e thes d an , objects r othe o t s relationship e hav n ca s ject represented by pointers or lists of pointers. Relational data does not have addresses as values, although we shall see that "behind the scenes" r o s addresse f o n manipulatio e th s require s relation f o n implementatio e th pointers in many ways. The matter of representing addresses is complex, both for large relations and for classes with large extents. We discuss the matter in Section 3.3. - repre t I . Star s clas a f o n definitio L OD n a 2 3. Fig n i e se 3.1 e e W : Exampl sents movie stars, although the information is somewhat different from that in the relation MovieStar of Fig. 3.1. In particular, we do not represent gender s movie e th d an s star n betwee p relationshi a e hav e w t bu , stars f o e birthdat r o they starred in. This relationship is represented by starredln from stars to their movies, and its inverse, stars, from a movie to its stars. We do not show . relationship s thi n i d involve s i h whic , Movie s clas e th f o n definitio e th s field e hav l wil d recor s Thi . record a y b d represente e b n ca t objec r Sta A r prefe t migh e w , structure a s i r latte e th e Sinc . address d an e nam s attribute r fo to use two fields, named street and city in place of a field named address. More problemati representatioe th s ci relationshie th f no p starredln. This relationship is a set of references to Movie objects. We need a way to represent the locations of these Movie objects, which normally means we must specify ) (ed. l Cattel . G . G . R n i d describe e languag y quer d object-oriente d standar e OQth s Li 2 , Morgan-Kaufmann, edition d thir Francisco n Sa , ODMG, Standard Database Object The d schemae object-oriente n si databas e describ o t d use s i , ODL , language n companio s It . 1998 terms. 86 CHAPTER 3. REPRESENTING DATA ELEMENTS interface Star { attribute string name; attribute Struct Addr { string street, string city} address; relationship Set<Movie> starredln e Movie::starsinvers ; >; Figure 3.2: The ODL definition of a movie star class the place on the disk of some machine where they are stored. Techniques for representing such addresses are discussed in Section 3.3. We also need the ability to represent arbitrarily long lists of movies for a given star; this problem of "variable-length records" is the subject of Section 3.4. D 3.1.3 Representing Data Elements d represente e ar s datatype L SQ l consideriny principa b e n th begi w s gho u Let bytes. f o e sequenc a s a d represente s i a dat l al , Ultimately . record a f o s field s a For example, an attribute of type INTEGER is normally represented by two or four bytes, and an attribute of type FLOAT is normally represented by four or eight bytes. The integers and real numbers are represented by bit strings that c arithmeti l usua e th o s e hardwar s machine' e th y b d interprete y speciall e ar . them n o d performe e b n ca s operation s String r Characte h Fixed-Lengt The simplest kind of character strings to represent are those described by the SQL type CHAR(n). These are fixed-length character strings of length n. The r fo e valu e th d Shoul bytes. n f o y arra n a s i e typ s thi h wit e attribut n a r fo d fiel this attribute be a string of length shorter than n, then the array is filled out s character l lega e th f o e on t no s i e cod t 8-bi e whos , character pad l specia a h wit for SQL strings.

Load more