PhysicalPhysical SchemaSchema DesignDesign

Storage structures Indexing techniques in DBS

1 PhysicalPhysical Design:Design: GoalGoal andand influenceinfluence factorsfactors

Physical schema design goals — Effective data organization

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 : Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Effective access to disc — Effectiveness = performance

Quality Measures — Throughput: # transactions / sec — Turn-around-time: time for answering an individual query (e.g. average)

Parameters — Application dependent factors: size of database, typical operations, frequency of operations, isolation level — System related factors: storage of data, access path, index structures, optimization, … 2 PhysicalPhysical Design:Design: StorageStorage DevicesDevices

Memory Hierarchy: DBMS Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr

Archive storage Tertiary storage

Secondary storage Disk

Main memory Primary storage Cache

3 PhysicalPhysical Design:Design: StorageStorage DevicesDevices DBMS Tertiary storage

Disk

Cache Main memory Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Fastest, most costly form of storage Cache — Size very small — Usage managed by OS/Hardware

Main Memory — Data available to be operated on — Too small for entire DB — Data lost after power failure or a system crash

4 PhysicalPhysical Design:Design: StorageStorage DevicesDevices DBMS Tertiary storage

Disk

Disk Main memory Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Where entire DB is stored Cache — Direct access (not sequential) — Data operations: disk -> main memory -> disk — (Usually) survives power failures/system crashes

Tape storage (tertiary storage) —Cheaper than disk — slower access: tape read sequentially from the beginning (sequential-access storage). — Backups and archival data — Used for recovery from disk failures — Less complex than disk => more reliable 5 PhysicalPhysical Design:Design: StorageStorage DevicesDevices

Access time vs capacity: Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Bytes Y 13 Tertiary storage 12 Capacity in 10 11 Secondary storage 10 9 Main memory 8 Zip disk 7 6 Floppy disk Cache 5 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Access time in 10X sec

Source: Garcia-Molina, Ullman, Widom “Database systems”, 2002 6 PhysicalPhysical Design:Design: SecondarySecondary storagestorage

Mechanics Disk heads Cylinder Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr

Platter = 2 surfaces

track gap

sector 7 PhysicalPhysical Design:Design: SecondarySecondary storagestorage

“Typical” Characteristics — Rotation: 5400 –10,000 RPM

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Capacity: 360 KB – 100+ GB — Price (per GB): $2 - $2.5 — Platter diameter: 2.5 to 25 cm — Platters: 1 - 15 — Tracks (per surface): 40 – 20,000 — Sector size: 512 B – 50 KB — Sectors (per track): 32-500

Holds: Virtual memory (handled by OS) File system (access via OS) DBMS stuff (bypass OS)

8 PhysicalPhysical Design:Design: SecondarySecondary storagestorage

Block: — Neighboring byte sequence on single track of one platter

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Smallest data unit moved between disk and main memory — work with blocks — Fixed size — Size determined by physical properties of disk + OS — Size: from 512 to 4096 bytes

track gap

sector

blocks 9 PhysicalPhysical Design:Design: I/OI/O costcost

Data transfer disk - main memory Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr —In blocks — Bytes transferred at constant speed — Information flow (if): between 120 Kb/s and 5 Mb/s

Seek time: Time for repositioning the arm to right cylinder Move disk heads to the right cylinder: Start (constant), Move (variable), Stop (constant) 0 if arm there, otherwise long (between 8 to 10 ms) Track-to-track seek time: 0.5ms –2ms

10 PhysicalPhysical Design:Design: I/OI/O costcost

Rotate time (disk latency): — Time until interesting sector positioned under the head

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Access to all data within a cylinder within rotate time — 12 to 6 ms per rotation — Average: 6 to 3 ms rotational latency. — => store related information close by

Transfer time (read time): Depends on # bytes to be transferred

Total time to transfer T bytes: Seek time + Rotational time + T/if

11 PhysicalPhysical Design:Design: I/OI/O costcost

Typical access time:

Disk access time = SeekTime 6 ms Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr + RotateTime 3 ms + TransferTime 1 ms

Seek time dominates ! Compare: RAM 3-10 nsec

Random Disk / RAM: ~10 * 10-3 / 10 * 10-9 = 106 ~ Distance Hamilton - moon vs. distance Hamilton – Raglan

Sequential disk read ("scan") much faster

12 PhysicalPhysical Design:Design: I/OI/O costcost

Disk characteristics: year Capacity $/GB Scan Scan Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr GB Sequential Random 1988 0.25 20,000 2 min 20 min 1998 18 50 20 min 5 hrs 2003 200 5 2 hrs 1.2 days

Source: The Rebirth of Database Machines; talk by Bitton/Gray at VLDB'98

Random access favorable only if small percentage of the data accessed frequently Rule of thumb: less than 15 % on a large table

13 PhysicalPhysical Design:Design: I/OI/O costcost

 The Myth: seek time dominates  The Reality: Wait Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr 1. Queuing dominates (Queuing) 2. Transfer dominates BLOBs 3. Disk seeks often short

 Implication: many cheap servers better than one fast expensive server Transfer — Shorter queues — Parallel transfer Rotate — Lower cost/access and cost/byte

Seek

Source: “Storage: Alternate Futures”;Futures” talk by J.Gray at NetStore99, Seattle 14 PhysicalPhysical Design:Design: I/OI/O costcost

 Goal: Accelerate access to secondary storage

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr  Strategies 1. Place blocks that are accessed together on same cylinder (avoids seek time) 2. Divide data between smaller disks (independent heads increase # block accesses) 3. Mirror a disk: simultaneous access to several blocks 4. Disk-scheduling algorithm: selects order of block access 5. Prefetch blocks in main memory

 Strategy depends on situation/application # processes, parallel access, shared disks, (un)predictable queries, budget,…

15 PhysicalPhysical Design:Design: RAIDRAID

RAID Technology — Redundant array of independent disks; originally redundant

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr array of inexpensive disks — Reduces access time and queue length — Fault tolerance by "Parity disks" — Principle technique: striping

A B C D E Large disk: F G H Long queue, Long transfer RAID 0:

A E B F C G D H Striping (interleaving)

16 PhysicalPhysical Design:Design: RAIDRAID

RAID levels:

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr RAID 1 Mirroring and Duplexing: mirror without stripping

A B ====A B C D C D E F E F G H G H

RAID 0+1 High Data Transfer Performance

A E ====A E B F B F C G C G D H D H

17 PhysicalPhysical Design:Design: RAIDRAID

RAID levels:

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr RAID 2 Byte (Bit) level striping + error correcting disks

(A0..A3) = word A (B0..B3) = word B A0 B0 A1 B1 A2 B2 A3 B3 EA EB

RAID 3 Block stripping with parity

Byte level

A0 B0 A1 B1 A2 B2 A3 B3 PA PB

stripe 0 stripe 1 stripe 2 stripe 3 18 PhysicalPhysical Design:Design: RAIDRAID

RAID levels:

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr RAID 4 Independent Data disks with shared Parity disk

Block level A0 B0 A1 B1 A2 B2 A3 B3 PA PB

block 0 block 1 block 2 block 3

RAID 5 Independent Data disks with distrib. parity blocks

A0 A1 P2 B0 P1 B2 P0 C1 C2

A block B block C block 19 PhysicalPhysical Design:Design: RAIDRAID

Levels 0, 1, 2, 3, 4, 5, 6, 7, 10, 1+0, 53

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr RAID controller provides OS / DBS with standard disk interface

Performance: — gains for read operations — Writes need recomputation of parity: Main cause of parity disk bottleneck in RAID-4 architecture

Further info: http://www.raid.com

20 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata

Database data — Stored in disk files

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Files provided by the underlying OS

File: — Logical sequence of records — Records mapped onto disk blocks

Records — Composed of fields (byte sequences for attributes) —Fixed or variable length — BLOBs stored separately in pool of disk blocks — Record points to BLOB

21 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata

 Goal: Mapping the database to files

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr  Problem: — Blocks with fixed size but varying record sizes (tuples of distinct relations of different size)

 Approaches: 1. Use several files and store records of only one fixed length in any given file 2. Structure the files such that they can deal with multiple lengths for records (more difficult than fixed length).

22 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata -- OracleOracle

Oracle storage hierarchy

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Logical Physical

Database

Tablespace Data file

Segment

Extent

Oracle O/S block block

23 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata -- OracleOracle

Database block Header − contains the data block address, Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr table directory, row directory, … − grows from the top down.

Free space − middle of the block − initially contiguous − may be fragmented

Data space − inserted into from the bottom up

Control of space usage in blocks: PCTUSED and PCTFREE 24 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata -- OracleOracle

PCTUSED: defines minimum percentage of used space the server tries

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr to maintain for each data block

Example: PCTUSED=40

Inserts 40%

1

25 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata -- OracleOracle

PCTFREE: specifies percentage of space in each data block reserved

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr for growth resulting from updates to rows in the block

Example: PCTUSED=40 PCTFREE=20

Inserts 80% Inserts 40%

1 2

26 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata -- OracleOracle

Example: — 3: block data percentage sinks below 100%-PCTFREE

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — 4: block data percentage sinks below PCTUSED

PCTUSED=40 PCTFREE=20

80% Inserts Inserts 40%

3 4

27 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata -- OracleOracle

Row structure: Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr

Row header Column length Database block Column value

28 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata -- OracleOracle free space Row access: Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr

header, row directory, variable length rows

RowId: (Pagenumber, offset): physical pointers OOOOOO FFF BBBBBB RRR:

Segment No, data file No (in table space) Block in file, row in block

Disadvantage: uniform page address space -> large page number 29 PhysicalPhysical Design:Design: BufferBuffer ManagementManagement

Database Buffer: — Main memory part for storage copies of disk blocks

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — (older) block copies always available on disk — Managed by the buffer manager.

Buffer manager: — Allocation of space in main memory — Intercepts all system requests for blocks of the DB

Goal: — Minimize number of blocks to access — Keep as many blocks as possible in main memory

30 PhysicalPhysical Design:Design: BufferBuffer ManagementManagement

Strategies: — Replacement strategy:

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr No room left in the buffer: make some by removing blocks

— Pinned blocks: (Important) blocks not allowed to be written back are pinned

— Forced output of blocks: make sure to write a block back to the disk

Buffer vs disk — Buffer: faster — Disk: safer

31 PhysicalPhysical Design:Design: PhysicalPhysical datadata modelmodel

Physical — Mapping logical design structures onto files

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Usually: each relation in a separate file

Large scale DB: — Usually do not rely on OS for file management — One (large) OS file is assigned to the DBS — All relations in this file

Organizing records into blocks Block should contain related records (rows) Interesting to access several records using 1 block access Random assignment: different block for each record Heap (unsorted) or sequential file (sorted) 32 PhysicalPhysical Design:Design: PhysicalPhysical datadata modelmodel

Example:

095 Psycho suspense 1960 Hitchcock 2.00 ... Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr 100 Jaws horror 1975 Spielberg 1.50 ... 112 ET comedy 1982 Spielberg 1.50 ... 222 Psycho suspense 1998 Van Sant 2.20 ... 290 Star Wars IV ScFi 1997 Lucas 2.00 ... 345 Star Wars I SciFi 1999 Lucas 2.00 ......

Sequential file processing following pointers After several inserts/deletes:  Order doesn't match physical order of the records  Sequential processing not efficient → Reorganize the file 33 IndexIndex Structures:Structures: BasicsBasics

Goal: Accelerate access to secondary storage

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Index: — Optional for fast access to individual data items in the DB — Defined for one or more attributes of a table — Locates table-rows in an efficient way

Goal of DB designer: — Define index for schema to increase performance — Use one of the implementations supplied by DBS

Goal of DBS implementer: — Find efficient data structure for index 34 IndexIndex Structures:Structures: BasicsBasics

Indexing technique depends on application

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Criteria: — Access time — Insertion / deletion time (data + structure update) — Space overhead: Space occupied by index structure

Trade off — High #indexes: DB updates to costly — Low #indexes: no efficient data search

Efficiency measure: # blocks to access on disk

35 IndexIndex Structures:Structures: BasicsBasics

Search key: — Attribute set used to look up records in file

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Each index associated with particular search key — Fast random access to records in a file

Index Records

A A B A C A B B C C

Search key

36 IndexIndex Structures:Structures: BasicsBasics

Primary (unique) index — Index whose search key specifies sequential order in file

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Maps key values to physical locations — Usually corresponds to primary key — ∀ values in attribute A: at most one row r with r.A=value — Such files: index-sequential files

Example:

095 095 Psycho suspense 1960 Hitchcock 2.00 ... 100 100 Jaws horror 1975 Spielberg 1.50 ... 112 112 ET comedy 1982 Spielberg 1.50 ... 222 222 Psycho suspense 1998 Van Sant 2.20 ... 290 290 Star Wars IV ScFi 1997 Lucas 2.00 ... 345 345 Star Wars I SciFi 1999 Lucas 2.00 ...... 37 IndexIndex Structures:Structures: BasicsBasics

Secondary index — Every other not primary index

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — In most cases not unique — ∀ value in attribute A: one or more rows r with r.A=value

Example:

comedy 112 095 Psycho suspense 1960 horror 100 Jaws horror 1975 ... 112 ET comedy 1982 SciFi ... suspense 222 Psycho suspense 1998 095 290 Star Wars IV ScFi 1997 222 345 Star Wars I SciFi 1999 ......

38 IndexIndex Structures:Structures: BasicsBasics

A Dense Index A A B A — Index record for every search C B B Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr key value in the file C — Faster to locate a record than sparse index C

A A A C A Sparse Index B B Index records for only C some of the records C Access: Find less or equal index value, follow the pointers Less space and easier to maintain than dense index

Trade-of access time and space overhead Compromise: sparse index with one entry per block 39 IndexIndex Structures:Structures: IndexIndex typestypes

Index Trees — Tree defines search way to attribute values

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Hash Index — Uses hash functions to encode attribute values Bitmap Index — Bits indicate the attribute values Cluster Index — Store "logically related data" in physical neighborhood

Important task in physical schema design: Deciding which indexes to create (= which index types on which attributes)

40 IndexIndex Structures:Structures: ISAMISAM

ISAM (Index-sequential Access method) — Index and data ordered according search key

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — "sequential“: rows may be read in key sequence

K1 K2 … Kn Index pages ... P1 P2 Pn ... Data pages

D1 D2 … Di-1 Di Di+1 Di+2 … Dn-1 Dn

—Keys Ki, Data tuple Di, Pi pointer to data Dj: Ki-1< Dj.key ≤Ki

— Performance degrades as the file grows

41 IndexIndex Structures:Structures: B+B+ --TreesTrees Important concept B+ -Trees ( ~ B* - Trees) (Knuth, D. The Art of Computer Programming, vol. 3. Sorting and Searching. Addison-Wesley, Menlo Park, CA., 1973)

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Standard for most DBS — Like B-Trees, inner nodes contain only keys and pointers

K1 K.. …K..

K.. K.. …K...... K.. K.. …K..

..... K.. K.. …K.. K.. K.. …K...... data pointer

— maintains its efficiency despite insertion/deletion of data 42 IndexIndex Structures:Structures: B+B+ --TreesTrees

Self-organized balanced tree Every path from root to a leaf is of same length Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Degree/order n: — All nodes have min n, max 2n keys, — Root nodes have max 2n keys — Every node with m keys has m+1 pointers

Alternative definitions: [Elmasri/Navathe]: nodes have ⎡k/2⎤ … k keys [Garcia-Molina et al]: nodes have ⎡(n+1)/2⎤ … n+1 pointers

B+ = nodes min 1/2 full , B* = nodes min 2/3 full

43 IndexIndex Structures:Structures: B+B+ --TreesTrees

Structure of a inner node:

— 2n search-key values K1, K2,…, K2n,

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — 2n+1 pointers P1, P2,…, P2n+1

Search-key values sorted: if i < j, then Ki < Kj

Pi points to sub-tree with Keys X:

—Ki-1 ≤ X < Ki (1< i < 2n+1)

—X ≤ Ki (i=1), Ki-1 < X (i = 2n+1)

K1 K2 K3 K4 n=2 P1 P2 P3 P4 P5

44 IndexIndex Structures:Structures: B+B+ --TreesTrees

Node: same size as a disk block (10 ≤ 2n ≤ 100)

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Chained leaves provide sequential access Structure of leaf node:

—Pi (1 ≤ i ≤ 2n) points to file record with search-key value Ki

K1 K2 K3 K4 P1 P2 P3 P4 ...... K3 ......

45 IndexIndex Structures:Structures: B+B+ --TreesTrees

Example: — Degree 2 B* Index Tree on Movie

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr 112

95 100 112 222 290 345

095 Psycho suspense 1960 Hitchcock 2.00 ...

100 Jaws horror 1975 Spielberg 1.50 ...

112 ET comedy 1982 Spielberg 1.50 ...

222 Psycho suspense 1998 Van Sant 2.20 ...

290 Star Wars IV ScFi 1997 Lucas 2.00 ...

345 Star Wars I SciFi 1999 Lucas 2.00 ...

...... 46 IndexIndex Structures:Structures: B+B+ --TreesTrees

Non-unique index (if search key is not primary key)

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Different implementations:

— Leaf node pointers Pi (1 ≤ i ≤ 2n, ) point to a bucket of pointers pointing to a file record with search-key value Ki

— Oracle implementation: key values repeated for each row-id with this key

—DB2: key value followed by list of row-ids

47 IndexIndex Structures:Structures: B+B+ --TreesTrees –– SearchSearch algorithmalgorithm

Search in unique tree: one path from root to a leaf

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Start: root node already in main memory

Lookup Algorithm:

lookup(page p, item x) if p is a leaf page if x==p.item[k] return p.pointer[k] else return NULL else if x

48 IndexIndex Structures:Structures: B+B+ --TreesTrees –– DataData insertioninsertion

Insert in unique tree: values only once inserted

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr 1. Node does not become too large:

Insertion Algorithm: — Locate the leaf page where the item should appear — If leaf page is not full: include item in the page

112

95 100 101 112 222 290 345

49 IndexIndex Structures:Structures: B+B+ --TreesTrees –– DataData insertioninsertion

2. Node does become too large:

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr If the leaf page has 2n+1 items after insertion then — create new page with n(+1) items, leave other with n items — insert the pointer of the new page and the first item in the parent directory

112 260

95 100 101

112 222

260 290 345 50 IndexIndex Structures:Structures: B+B+ --TreesTrees –– DataData insertioninsertion

Apply splitting recursively if necessary Eventually split root and create one more level Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr 260

101 112 302 345

95 100 260 290

101 102 105 302 340

112 222 345 400 401

51 IndexIndex Structures:Structures: B+B+ --TreesTrees –– DeletionDeletion

1. Case: No Combining Pages

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr if leaf page B where item x appears has >= n+1 items — if x is not the first in the page, simply delete it — else find the parent of the leaf page where x appears and update it

112 260

95 100 101

112 222

260 290 345 52 IndexIndex Structures:Structures: B+B+ --TreesTrees –– DeletionDeletion

2. Case: Transferring Items From Siblings

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr if the leaf page B where x appears has n items if there is a neighbor B’ with >n items — then transfer the first (or the last item) of B’ to B and update the appropriate ancestors of B 101 260

95 100 101

101 222

260 290 53 IndexIndex Structures:Structures: B+B+ --TreesTrees –– DeletionDeletion

2. Case: Transferring Items From Siblings

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr if the leaf page B where x appears has n items if there is a neighbor B’ with >n items — then … (see previous page) — else combine B with a neighbor B’, delete from parent of B the first item of the rightmost of B and B’

101

95 100 101 222 290

Deletion often not fully implemented 54 IndexIndex Structures:Structures: B+B+ --TreesTrees ImplementationsImplementations

Oracle: — Standard index form ( called B-tree index) Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr —Syntax: CREATE [ UNIQUE ]INDEX ON () [ TABLESPACE ] [ PCTFREE integer ] [...]

MySQL: Standard index form Syntax: CREATE [UNIQUE] INDEX ON (attribute_list>)

55 IndexIndex Structures:Structures: IndexIndex TreesTrees withwith datadata leavesleaves

Leaf of B+ index: additional I/O to access row

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr rowId rowId ...... rowId ......

Header Key value

Data leaf: row in leaf node instead of row-id

.... Non key values Key value 56 IndexIndex Structures:Structures: IndexIndex TreesTrees withwith datadata leavesleaves

Key value not repeated in row data No pointer (row-id) in leaf pages Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr

Advantages: —Space reduction — Very good performance if key is long and row is short to medium, otherwise frequent splits

Primary key indexes: Leaves containing row data Secondary indexes: Use primary key as "pointer"

57 IndexIndex Structures:Structures: IndexIndex TreesTrees withwith datadata leavesleaves

Oracle: Called index organized tables (IOT) Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr CREATE TABLE ORGANIZATION INDEX TABLESPACE ;

Differences to regular table: Table IOT Row-id is identifier Primary key is identifier Physical row-id Logical row-id Row-id based access Primary key based access Unique Constraints allowed Unique constraints not allowed LONG and LOB supported LOBs supported, not LONG MySQL: not supported

58 IndexIndex Structures:Structures: HashHash indexindex

Apply hash function on search key h:K→B B: numbering of n buckets , |K| >> |B| Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Assign keys to buckets according h(K)

K Buckets 0 4 Record with key value Ki (blocks) hash function h 1 K1

2 K2 K3

Advantage: Efficient access, if inserts infrequent

59 K IndexIndex Structures:Structures: HashHash indexindex 0 4

Disadvantages 1 K1

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr K No sequential scan 2 2 K3 Static: No dynamic increase of space Full buckets → pointers to overflow buckets

Point queries efficient Range queries inefficient Non unique index: retrieval has to scan rehash chain

⇒ Most DBS don't use simple hash

60 IndexIndex Structures:Structures: ExtensibleExtensible HashHash indexindex Important concept Dynamic growth of hash table Principle idea: Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — h(K) is n-bit Bit-string, n large, e.g. 32 bits — Not all bits used for addressing buckets, e.g., 232 Blocks — Use as many bits as necessary to address used blocks — Use Bits right or left ordered (here right-ordered)

Global depth 1 Local depth 1 110 ***.. .. * 0 ***.. .. * 1 1 1 Bit used 111 Directory 1001 Records

61 IndexIndex Structures:Structures: ExtensibleExtensible HashHash indexindex

Indirect addressing of blocks by hash values and directory (= pointer array PA)

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr PA may grow, doubles each time Same block may have more than one pointer

Row access(key K) — Take last i Bits of h(K), i= global depth

—PA[bi...b1] is Pointer to block — Access block, search sequentially for row

One I/O if directory in main memory already

62 IndexIndex Structures:Structures: ExtensibleExtensible HashHash indexindex

Example — (Primary) Extensible Hash Index on Movie

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — h(id)=id mod 7, bucket size = 3 2

095 Psycho suspense 1960 Hitchcock 2.00 ...

112 ET comedy 1982 Spielberg 1.50 ... 2 0 0 2 0 1 100 Jaws horror 1975 Spielberg 1.50 ... 1 0 345 Star Wars I SciFi 1999 Lucas 2.00 ... 1 1 1

222 Psycho suspense 1998 Van Sant 2.20 ...

290 Star Wars IV ScFi 1997 Lucas 2.00 ...

63 IndexIndex Structures:Structures: ExtensibleExtensible HashHash indexindex

Hash key inserts: — Split buckets and increase local depth if necessary

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Increase global depth if necessary

Hash key deletions Delete empty buckets Shrink directory if all local depths smaller than global depth

Variant: ‘start buckets’ not deleted or merged Variant: Merge 2 neighboring buckets if content fits in one, neighbors: same local depth, similar hash values

64 IndexIndex Structures:Structures: BitmapBitmap IndexIndex

Useful if few different values in a large table

...... File 3, block 10-12: ...... Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr ...... Bitmap index: start end Key ROWID ROWID bitmap 65 IndexIndex Structures:Structures: BitmapBitmap IndexIndex

Efficient implementation of set operations

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Example: SELECT x,y,z FROM people WHERE (color = 'Blue' OR color = 'Red' ) AND sex = ‘f'

( OR ) AND

66 IndexIndex Structures:Structures: BitmapBitmap IndexIndex

Advantage — If few values and many rows e.g. sex, marital status,…

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Compression of bit lists saves space — Efficient processing of OR / AND queries Disadvantage — Updates expensive: bitmaps and data blocks must be locked — B+ -tree : one row is locked during update

Oracle Syntax: CREATE BITMAP INDEX ON )> [TABLESPACE ] [PCTFREE integer];

67 IndexIndex Structures:Structures: FunctionFunction--basedbased indexesindexes

Indexes functions and expressions Useful if operation performed frequently Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr

Example: — Indexes on from_date,until_date not sufficient SELECT mem_no FROM rental WHERE DAYS_BETWEEN(from_date,until_date)>10;

Oracle Syntax: CREATE INDEX ON ();

68 IndexIndex Structures:Structures: ClusteringClustering Important concept

Principle: put related data into a group (a cluster) Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Goal: efficient access to related ("clustered") data

Typical statistical technique No statistics available during DB design

Two approaches: Clustering of heterogeneous objects (table clustering) Index clustering

Be careful with different cluster notions 69 IndexIndex Structures:Structures: TableTable ClusteringClustering

Tables frequently accessed together (join)

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Example: — Access to a Movie record is often followed by access to a tape containing this movie: SELECT t.id, t.format, m.id, m.title FROM Tape t, Movie m WHERE m.id = t.movie_id;

— Tape- and movie records with the same movie_id should be placed in one block

Cluster: set of blocks with interleaving (related) rows of more than one table 70 IndexIndex Structures:Structures: TableTable ClusteringClustering

Example: — Cluster key: movie_id Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr

095 Psycho suspense 1960 Hitchcock 2.00 ... 0001 DVD

100 Jaws horror 1975 Spielberg 1.50 ...

112 ET comedy 1982 Spielberg 1.50 ...

0002 DVD

222 Psycho suspense 1998 Van Sant 2.20 ... 0003 VHS

290 Star Wars IV ScFi 1997 Lucas 2.00 ...

345 Star Wars I SciFi 1999 Lucas 2.00 ...

0004 DVD

0005 VHS

0009 VHS

71 IndexIndex Structures:Structures: TableTable ClusteringClustering

Clusters are defined by common cluster key Cluster key often primary / foreign key Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Need to chain the records, otherwise more block accesses required than unclustered

Table cluster definition: — First create a cluster CREATE CLUSTER mtc(id NUMBER);

— Then create the tables in the cluster CREATE TABLE Movie(…) CLUSTER mtc(id); CREATE TABLE Tape(…) CLUSTER mtc(movie_id);

72 IndexIndex Structures:Structures: IndexIndex ClusteringClustering

Row-ids in a leaf page normally not ordered Sequential index scan means random access to rows Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr

Example: Heap Storage, index without cluster

Root K.. K.. …K..

...... Leaf nodes K.. K.. …K.. K.. K.. …K.. ... K.. K.. …K...... Data records

73 IndexIndex Structures:Structures: IndexIndex ClusteringClustering

Clustering index: — Physical order on non-key attribute

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Obvious: only one cluster per table — Faster access to rows with same value in cluster attribute

Example: K.. K.. …K..

...... K.. K.. …K.. K.. K.. …K.. ... K.. K.. …K......

74 IndexIndex Structures:Structures: ClusteringClustering implementationimplementation

Oracle: — Table cluster supported

Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Index cluster supported (only hash index)

CREATE INDEX ON [...];

Postgres: — Table cluster not supported — Index cluster supported CLUSTER ON

MySQL: table and index cluster not supported

75 PhysicalPhysical DesignDesign ++ IndexIndex Structures:Structures: SummarySummary

Data stored on disk Access time crucial in querying Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — I/Os is most important cost measure — Access Time: Seek time + Rotational time + Transfer time

Indexes accelerate access to secondary storage — B+ tree is standard in most DBs — Often extensible hashing supported — Clustering: related data in physical neighborhood

Great differences in physical organization in DBS Indexing not standardized

76