PhysicalPhysical SchemaSchema DesignDesign
Storage structures Indexing techniques in DBS
1 PhysicalPhysical Design:Design: GoalGoal andand influenceinfluence factorsfactors
Physical schema design goals — Effective data organization
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Effective access to disc — Effectiveness = performance
Quality Measures — Throughput: # transactions / sec — Turn-around-time: time for answering an individual query (e.g. average)
Parameters — Application dependent factors: size of database, typical operations, frequency of operations, isolation level — System related factors: storage of data, access path, index structures, optimization, … 2 PhysicalPhysical Design:Design: StorageStorage DevicesDevices
Memory Hierarchy: DBMS Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr
Archive storage Tertiary storage
Secondary storage Disk
Main memory Primary storage Cache
3 PhysicalPhysical Design:Design: StorageStorage DevicesDevices DBMS Tertiary storage
Disk
Cache Main memory Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Fastest, most costly form of storage Cache — Size very small — Usage managed by OS/Hardware
Main Memory — Data available to be operated on — Too small for entire DB — Data lost after power failure or a system crash
4 PhysicalPhysical Design:Design: StorageStorage DevicesDevices DBMS Tertiary storage
Disk
Disk Main memory Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Where entire DB is stored Cache — Direct access (not sequential) — Data operations: disk -> main memory -> disk — (Usually) survives power failures/system crashes
Tape storage (tertiary storage) —Cheaper than disk — slower access: tape read sequentially from the beginning (sequential-access storage). — Backups and archival data — Used for recovery from disk failures — Less complex than disk => more reliable 5 PhysicalPhysical Design:Design: StorageStorage DevicesDevices
Access time vs capacity: Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Bytes Y 13 Tertiary storage 12 Capacity in 10 11 Secondary storage 10 9 Main memory 8 Zip disk 7 6 Floppy disk Cache 5 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 Access time in 10X sec
Source: Garcia-Molina, Ullman, Widom “Database systems”, 2002 6 PhysicalPhysical Design:Design: SecondarySecondary storagestorage
Mechanics Disk heads Cylinder Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr
Platter = 2 surfaces
track gap
sector 7 PhysicalPhysical Design:Design: SecondarySecondary storagestorage
“Typical” Characteristics — Rotation: 5400 –10,000 RPM
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Capacity: 360 KB – 100+ GB — Price (per GB): $2 - $2.5 — Platter diameter: 2.5 to 25 cm — Platters: 1 - 15 — Tracks (per surface): 40 – 20,000 — Sector size: 512 B – 50 KB — Sectors (per track): 32-500
Holds: Virtual memory (handled by OS) File system (access via OS) DBMS stuff (bypass OS)
8 PhysicalPhysical Design:Design: SecondarySecondary storagestorage
Block: — Neighboring byte sequence on single track of one platter
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Smallest data unit moved between disk and main memory — Databases work with blocks — Fixed size — Size determined by physical properties of disk + OS — Size: from 512 to 4096 bytes
track gap
sector
blocks 9 PhysicalPhysical Design:Design: I/OI/O costcost
Data transfer disk - main memory Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr —In blocks — Bytes transferred at constant speed — Information flow (if): between 120 Kb/s and 5 Mb/s
Seek time: Time for repositioning the arm to right cylinder Move disk heads to the right cylinder: Start (constant), Move (variable), Stop (constant) 0 if arm there, otherwise long (between 8 to 10 ms) Track-to-track seek time: 0.5ms –2ms
10 PhysicalPhysical Design:Design: I/OI/O costcost
Rotate time (disk latency): — Time until interesting sector positioned under the head
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Access to all data within a cylinder within rotate time — 12 to 6 ms per rotation — Average: 6 to 3 ms rotational latency. — => store related information close by
Transfer time (read time): Depends on # bytes to be transferred
Total time to transfer T bytes: Seek time + Rotational time + T/if
11 PhysicalPhysical Design:Design: I/OI/O costcost
Typical access time:
Disk access time = SeekTime 6 ms Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr + RotateTime 3 ms + TransferTime 1 ms
Seek time dominates ! Compare: RAM 3-10 nsec
Random Disk / RAM: ~10 * 10-3 / 10 * 10-9 = 106 ~ Distance Hamilton - moon vs. distance Hamilton – Raglan
Sequential disk read ("scan") much faster
12 PhysicalPhysical Design:Design: I/OI/O costcost
Disk characteristics: year Capacity $/GB Scan Scan Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr GB Sequential Random 1988 0.25 20,000 2 min 20 min 1998 18 50 20 min 5 hrs 2003 200 5 2 hrs 1.2 days
Source: The Rebirth of Database Machines; talk by Bitton/Gray at VLDB'98
Random access favorable only if small percentage of the data accessed frequently Rule of thumb: less than 15 % on a large table
13 PhysicalPhysical Design:Design: I/OI/O costcost
The Myth: seek time dominates The Reality: Wait Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr 1. Queuing dominates (Queuing) 2. Transfer dominates BLOBs 3. Disk seeks often short
Implication: many cheap servers better than one fast expensive server Transfer — Shorter queues — Parallel transfer Rotate — Lower cost/access and cost/byte
Seek
Source: “Storage: Alternate Futures”;Futures” talk by J.Gray at NetStore99, Seattle 14 PhysicalPhysical Design:Design: I/OI/O costcost
Goal: Accelerate access to secondary storage
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Strategies 1. Place blocks that are accessed together on same cylinder (avoids seek time) 2. Divide data between smaller disks (independent heads increase # block accesses) 3. Mirror a disk: simultaneous access to several blocks 4. Disk-scheduling algorithm: selects order of block access 5. Prefetch blocks in main memory
Strategy depends on situation/application # processes, parallel access, shared disks, (un)predictable queries, budget,…
15 PhysicalPhysical Design:Design: RAIDRAID
RAID Technology — Redundant array of independent disks; originally redundant
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr array of inexpensive disks — Reduces access time and queue length — Fault tolerance by "Parity disks" — Principle technique: striping
A B C D E Large disk: F G H Long queue, Long transfer RAID 0:
A E B F C G D H Striping (interleaving)
16 PhysicalPhysical Design:Design: RAIDRAID
RAID levels:
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr RAID 1 Mirroring and Duplexing: mirror without stripping
A B ====A B C D C D E F E F G H G H
RAID 0+1 High Data Transfer Performance
A E ====A E B F B F C G C G D H D H
17 PhysicalPhysical Design:Design: RAIDRAID
RAID levels:
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr RAID 2 Byte (Bit) level striping + error correcting disks
(A0..A3) = word A (B0..B3) = word B A0 B0 A1 B1 A2 B2 A3 B3 EA EB
RAID 3 Block stripping with parity
Byte level
A0 B0 A1 B1 A2 B2 A3 B3 PA PB
stripe 0 stripe 1 stripe 2 stripe 3 18 PhysicalPhysical Design:Design: RAIDRAID
RAID levels:
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr RAID 4 Independent Data disks with shared Parity disk
Block level A0 B0 A1 B1 A2 B2 A3 B3 PA PB
block 0 block 1 block 2 block 3
RAID 5 Independent Data disks with distrib. parity blocks
A0 A1 P2 B0 P1 B2 P0 C1 C2
A block B block C block 19 PhysicalPhysical Design:Design: RAIDRAID
Levels 0, 1, 2, 3, 4, 5, 6, 7, 10, 1+0, 53
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr RAID controller provides OS / DBS with standard disk interface
Performance: — gains for read operations — Writes need recomputation of parity: Main cause of parity disk bottleneck in RAID-4 architecture
Further info: http://www.raid.com
20 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata
Database data — Stored in disk files
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Files provided by the underlying OS
File: — Logical sequence of records — Records mapped onto disk blocks
Records — Composed of fields (byte sequences for attributes) —Fixed or variable length — BLOBs stored separately in pool of disk blocks — Record points to BLOB
21 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata
Goal: Mapping the database to files
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Problem: — Blocks with fixed size but varying record sizes (tuples of distinct relations of different size)
Approaches: 1. Use several files and store records of only one fixed length in any given file 2. Structure the files such that they can deal with multiple lengths for records (more difficult than fixed length).
22 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata -- OracleOracle
Oracle storage hierarchy
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Logical Physical
Database
Tablespace Data file
Segment
Extent
Oracle O/S block block
23 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata -- OracleOracle
Database block Header − contains the data block address, Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr table directory, row directory, … − grows from the top down.
Free space − middle of the block − initially contiguous − may be fragmented
Data space − inserted into from the bottom up
Control of space usage in blocks: PCTUSED and PCTFREE 24 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata -- OracleOracle
PCTUSED: defines minimum percentage of used space the server tries
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr to maintain for each data block
Example: PCTUSED=40
Inserts 40%
1
25 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata -- OracleOracle
PCTFREE: specifies percentage of space in each data block reserved
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr for growth resulting from updates to rows in the block
Example: PCTUSED=40 PCTFREE=20
Inserts 80% Inserts 40%
1 2
26 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata -- OracleOracle
Example: — 3: block data percentage sinks below 100%-PCTFREE
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — 4: block data percentage sinks below PCTUSED
PCTUSED=40 PCTFREE=20
80% Inserts Inserts 40%
3 4
27 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata -- OracleOracle
Row structure: Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr
Row header Column length Database block Column value
28 PhysicalPhysical Design:Design: StorageStorage ofof DBDB datadata -- OracleOracle free space Row access: Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr
header, row directory, variable length rows
RowId: (Pagenumber, offset): physical pointers OOOOOO FFF BBBBBB RRR:
Segment No, data file No (in table space) Block in file, row in block
Disadvantage: uniform page address space -> large page number 29 PhysicalPhysical Design:Design: BufferBuffer ManagementManagement
Database Buffer: — Main memory part for storage copies of disk blocks
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — (older) block copies always available on disk — Managed by the buffer manager.
Buffer manager: — Allocation of space in main memory — Intercepts all system requests for blocks of the DB
Goal: — Minimize number of blocks to access — Keep as many blocks as possible in main memory
30 PhysicalPhysical Design:Design: BufferBuffer ManagementManagement
Strategies: — Replacement strategy:
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr No room left in the buffer: make some by removing blocks
— Pinned blocks: (Important) blocks not allowed to be written back are pinned
— Forced output of blocks: make sure to write a block back to the disk
Buffer vs disk — Buffer: faster — Disk: safer
31 PhysicalPhysical Design:Design: PhysicalPhysical datadata modelmodel
Physical data model — Mapping logical design structures onto files
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Usually: each relation in a separate file
Large scale DB: — Usually do not rely on OS for file management — One (large) OS file is assigned to the DBS — All relations in this file
Organizing records into blocks Block should contain related records (rows) Interesting to access several records using 1 block access Random assignment: different block for each record Heap (unsorted) or sequential file (sorted) 32 PhysicalPhysical Design:Design: PhysicalPhysical datadata modelmodel
Example:
095 Psycho suspense 1960 Hitchcock 2.00 ... Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr 100 Jaws horror 1975 Spielberg 1.50 ... 112 ET comedy 1982 Spielberg 1.50 ... 222 Psycho suspense 1998 Van Sant 2.20 ... 290 Star Wars IV ScFi 1997 Lucas 2.00 ... 345 Star Wars I SciFi 1999 Lucas 2.00 ......
Sequential file processing following pointers After several inserts/deletes: Order doesn't match physical order of the records Sequential processing not efficient → Reorganize the file 33 IndexIndex Structures:Structures: BasicsBasics
Goal: Accelerate access to secondary storage
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Index: — Optional data structure for fast access to individual data items in the DB — Defined for one or more attributes of a table — Locates table-rows in an efficient way
Goal of DB designer: — Define index for schema to increase performance — Use one of the implementations supplied by DBS
Goal of DBS implementer: — Find efficient data structure for index 34 IndexIndex Structures:Structures: BasicsBasics
Indexing technique depends on application
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Criteria: — Access time — Insertion / deletion time (data + structure update) — Space overhead: Space occupied by index structure
Trade off — High #indexes: DB updates to costly — Low #indexes: no efficient data search
Efficiency measure: # blocks to access on disk
35 IndexIndex Structures:Structures: BasicsBasics
Search key: — Attribute set used to look up records in file
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Each index associated with particular search key — Fast random access to records in a file
Index Records
A A B A C A B B C C
Search key
36 IndexIndex Structures:Structures: BasicsBasics
Primary (unique) index — Index whose search key specifies sequential order in file
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Maps key values to physical locations — Usually corresponds to primary key — ∀ values in attribute A: at most one row r with r.A=value — Such files: index-sequential files
Example:
095 095 Psycho suspense 1960 Hitchcock 2.00 ... 100 100 Jaws horror 1975 Spielberg 1.50 ... 112 112 ET comedy 1982 Spielberg 1.50 ... 222 222 Psycho suspense 1998 Van Sant 2.20 ... 290 290 Star Wars IV ScFi 1997 Lucas 2.00 ... 345 345 Star Wars I SciFi 1999 Lucas 2.00 ...... 37 IndexIndex Structures:Structures: BasicsBasics
Secondary index — Every other not primary index
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — In most cases not unique — ∀ value in attribute A: one or more rows r with r.A=value
Example:
comedy 112 095 Psycho suspense 1960 horror 100 Jaws horror 1975 ... 112 ET comedy 1982 SciFi ... suspense 222 Psycho suspense 1998 095 290 Star Wars IV ScFi 1997 222 345 Star Wars I SciFi 1999 ......
38 IndexIndex Structures:Structures: BasicsBasics
A Dense Index A A B A — Index record for every search C B B Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr key value in the file C — Faster to locate a record than sparse index C
A A A C A Sparse Index B B Index records for only C some of the records C Access: Find less or equal index value, follow the pointers Less space and easier to maintain than dense index
Trade-of access time and space overhead Compromise: sparse index with one entry per block 39 IndexIndex Structures:Structures: IndexIndex typestypes
Index Trees — Tree defines search way to attribute values
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Hash Index — Uses hash functions to encode attribute values Bitmap Index — Bits indicate the attribute values Cluster Index — Store "logically related data" in physical neighborhood
Important task in physical schema design: Deciding which indexes to create (= which index types on which attributes)
40 IndexIndex Structures:Structures: ISAMISAM
ISAM (Index-sequential Access method) — Index and data ordered according search key
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — "sequential“: rows may be read in key sequence
K1 K2 … Kn Index pages ... P1 P2 Pn ... Data pages
D1 D2 … Di-1 Di Di+1 Di+2 … Dn-1 Dn
—Keys Ki, Data tuple Di, Pi pointer to data Dj: Ki-1< Dj.key ≤Ki
— Performance degrades as the file grows
41 IndexIndex Structures:Structures: B+B+ --TreesTrees Important concept B+ -Trees ( ~ B* - Trees) (Knuth, D. The Art of Computer Programming, vol. 3. Sorting and Searching. Addison-Wesley, Menlo Park, CA., 1973)
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Standard for most DBS — Like B-Trees, inner nodes contain only keys and pointers
K1 K.. …K..
K.. K.. …K...... K.. K.. …K..
..... K.. K.. …K.. K.. K.. …K...... data pointer
— maintains its efficiency despite insertion/deletion of data 42 IndexIndex Structures:Structures: B+B+ --TreesTrees
Self-organized balanced tree Every path from root to a leaf is of same length Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Degree/order n: — All nodes have min n, max 2n keys, — Root nodes have max 2n keys — Every node with m keys has m+1 pointers
Alternative definitions: [Elmasri/Navathe]: nodes have ⎡k/2⎤ … k keys [Garcia-Molina et al]: nodes have ⎡(n+1)/2⎤ … n+1 pointers
B+ = nodes min 1/2 full , B* = nodes min 2/3 full
43 IndexIndex Structures:Structures: B+B+ --TreesTrees
Structure of a inner node:
— 2n search-key values K1, K2,…, K2n,
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — 2n+1 pointers P1, P2,…, P2n+1
Search-key values sorted: if i < j, then Ki < Kj
Pi points to sub-tree with Keys X:
—Ki-1 ≤ X < Ki (1< i < 2n+1)
—X ≤ Ki (i=1), Ki-1 < X (i = 2n+1)
K1 K2 K3 K4 n=2 P1 P2 P3 P4 P5
44 IndexIndex Structures:Structures: B+B+ --TreesTrees
Node: same size as a disk block (10 ≤ 2n ≤ 100)
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Chained leaves provide sequential access Structure of leaf node:
—Pi (1 ≤ i ≤ 2n) points to file record with search-key value Ki
K1 K2 K3 K4 P1 P2 P3 P4 ...... K3 ......
45 IndexIndex Structures:Structures: B+B+ --TreesTrees
Example: — Degree 2 B* Index Tree on Movie
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr 112
95 100 112 222 290 345
095 Psycho suspense 1960 Hitchcock 2.00 ...
100 Jaws horror 1975 Spielberg 1.50 ...
112 ET comedy 1982 Spielberg 1.50 ...
222 Psycho suspense 1998 Van Sant 2.20 ...
290 Star Wars IV ScFi 1997 Lucas 2.00 ...
345 Star Wars I SciFi 1999 Lucas 2.00 ...
...... 46 IndexIndex Structures:Structures: B+B+ --TreesTrees
Non-unique index (if search key is not primary key)
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Different implementations:
— Leaf node pointers Pi (1 ≤ i ≤ 2n, ) point to a bucket of pointers pointing to a file record with search-key value Ki
— Oracle implementation: key values repeated for each row-id with this key
—DB2: key value followed by list of row-ids
47 IndexIndex Structures:Structures: B+B+ --TreesTrees –– SearchSearch algorithmalgorithm
Search in unique tree: one path from root to a leaf
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Start: root node already in main memory
Lookup Algorithm:
lookup(page p, item x) if p is a leaf page if x==p.item[k] return p.pointer[k] else return NULL else if x
48 IndexIndex Structures:Structures: B+B+ --TreesTrees –– DataData insertioninsertion
Insert in unique tree: values only once inserted
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr 1. Node does not become too large:
Insertion Algorithm: — Locate the leaf page where the item should appear — If leaf page is not full: include item in the page
112
95 100 101 112 222 290 345
49 IndexIndex Structures:Structures: B+B+ --TreesTrees –– DataData insertioninsertion
2. Node does become too large:
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr If the leaf page has 2n+1 items after insertion then — create new page with n(+1) items, leave other with n items — insert the pointer of the new page and the first item in the parent directory
112 260
95 100 101
112 222
260 290 345 50 IndexIndex Structures:Structures: B+B+ --TreesTrees –– DataData insertioninsertion
Apply splitting recursively if necessary Eventually split root and create one more level Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr 260
101 112 302 345
95 100 260 290
101 102 105 302 340
112 222 345 400 401
51 IndexIndex Structures:Structures: B+B+ --TreesTrees –– DeletionDeletion
1. Case: No Combining Pages
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr if leaf page B where item x appears has >= n+1 items — if x is not the first in the page, simply delete it — else find the parent of the leaf page where x appears and update it
112 260
95 100 101
112 222
260 290 345 52 IndexIndex Structures:Structures: B+B+ --TreesTrees –– DeletionDeletion
2. Case: Transferring Items From Siblings
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr if the leaf page B where x appears has n items if there is a neighbor B’ with >n items — then transfer the first (or the last item) of B’ to B and update the appropriate ancestors of B 101 260
95 100 101
101 222
260 290 53 IndexIndex Structures:Structures: B+B+ --TreesTrees –– DeletionDeletion
2. Case: Transferring Items From Siblings
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr if the leaf page B where x appears has n items if there is a neighbor B’ with >n items — then … (see previous page) — else combine B with a neighbor B’, delete from parent of B the first item of the rightmost of B and B’
101
95 100 101 222 290
Deletion often not fully implemented 54 IndexIndex Structures:Structures: B+B+ --TreesTrees ImplementationsImplementations
Oracle: — Standard index form ( called B-tree index) Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr —Syntax: CREATE [ UNIQUE ]INDEX
MySQL: Standard index form Syntax: CREATE [UNIQUE] INDEX
55 IndexIndex Structures:Structures: IndexIndex TreesTrees withwith datadata leavesleaves
Leaf of B+ index: additional I/O to access row
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr rowId rowId ...... rowId ......
Header Key value
Data leaf: row in leaf node instead of row-id
.... Non key values Key value 56 IndexIndex Structures:Structures: IndexIndex TreesTrees withwith datadata leavesleaves
Key value not repeated in row data No pointer (row-id) in leaf pages Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr
Advantages: —Space reduction — Very good performance if key is long and row is short to medium, otherwise frequent splits
Primary key indexes: Leaves containing row data Secondary indexes: Use primary key as "pointer"
57 IndexIndex Structures:Structures: IndexIndex TreesTrees withwith datadata leavesleaves
Oracle: Called index organized tables (IOT) Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr CREATE TABLE
Differences to regular table: Table IOT Row-id is identifier Primary key is identifier Physical row-id Logical row-id Row-id based access Primary key based access Unique Constraints allowed Unique constraints not allowed LONG and LOB supported LOBs supported, not LONG MySQL: not supported
58 IndexIndex Structures:Structures: HashHash indexindex
Apply hash function on search key h:K→B B: numbering of n buckets , |K| >> |B| Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Assign keys to buckets according h(K)
K Buckets 0 4 Record with key value Ki (blocks) hash function h 1 K1
2 K2 K3
Advantage: Efficient access, if inserts infrequent
59 K IndexIndex Structures:Structures: HashHash indexindex 0 4
Disadvantages 1 K1
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr K No sequential scan 2 2 K3 Static: No dynamic increase of space Full buckets → pointers to overflow buckets
Point queries efficient Range queries inefficient Non unique index: retrieval has to scan rehash chain
⇒ Most DBS don't use simple hash
60 IndexIndex Structures:Structures: ExtensibleExtensible HashHash indexindex Important concept Dynamic growth of hash table Principle idea: Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — h(K) is n-bit Bit-string, n large, e.g. 32 bits — Not all bits used for addressing buckets, e.g., 232 Blocks — Use as many bits as necessary to address used blocks — Use Bits right or left ordered (here right-ordered)
Global depth 1 Local depth 1 110 ***.. .. * 0 ***.. .. * 1 1 1 Bit used 111 Directory 1001 Records
61 IndexIndex Structures:Structures: ExtensibleExtensible HashHash indexindex
Indirect addressing of blocks by hash values and directory (= pointer array PA)
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr PA may grow, doubles each time Same block may have more than one pointer
Row access(key K) — Take last i Bits of h(K), i= global depth
—PA[bi...b1] is Pointer to block — Access block, search sequentially for row
One I/O if directory in main memory already
62 IndexIndex Structures:Structures: ExtensibleExtensible HashHash indexindex
Example — (Primary) Extensible Hash Index on Movie
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — h(id)=id mod 7, bucket size = 3 2
095 Psycho suspense 1960 Hitchcock 2.00 ...
112 ET comedy 1982 Spielberg 1.50 ... 2 0 0 2 0 1 100 Jaws horror 1975 Spielberg 1.50 ... 1 0 345 Star Wars I SciFi 1999 Lucas 2.00 ... 1 1 1
222 Psycho suspense 1998 Van Sant 2.20 ...
290 Star Wars IV ScFi 1997 Lucas 2.00 ...
63 IndexIndex Structures:Structures: ExtensibleExtensible HashHash indexindex
Hash key inserts: — Split buckets and increase local depth if necessary
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Increase global depth if necessary
Hash key deletions Delete empty buckets Shrink directory if all local depths smaller than global depth
Variant: ‘start buckets’ not deleted or merged Variant: Merge 2 neighboring buckets if content fits in one, neighbors: same local depth, similar hash values
64 IndexIndex Structures:Structures: BitmapBitmap IndexIndex
Useful if few different values in a large table
...... File 3, block 10-12: ...... Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr ...... Bitmap index: start end Key ROWID ROWID bitmap
Efficient implementation of set operations
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Example: SELECT x,y,z FROM people WHERE (color = 'Blue' OR color = 'Red' ) AND sex = ‘f'
(
66 IndexIndex Structures:Structures: BitmapBitmap IndexIndex
Advantage — If few values and many rows e.g. sex, marital status,…
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Compression of bit lists saves space — Efficient processing of OR / AND queries Disadvantage — Updates expensive: bitmaps and data blocks must be locked — B+ -tree : one row is locked during update
Oracle Syntax: CREATE BITMAP INDEX
67 IndexIndex Structures:Structures: FunctionFunction--basedbased indexesindexes
Indexes functions and expressions Useful if operation performed frequently Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr
Example: — Indexes on from_date,until_date not sufficient SELECT mem_no FROM rental WHERE DAYS_BETWEEN(from_date,until_date)>10;
Oracle Syntax: CREATE INDEX
68 IndexIndex Structures:Structures: ClusteringClustering Important concept
Principle: put related data into a group (a cluster) Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Goal: efficient access to related ("clustered") data
Typical statistical technique No statistics available during DB design
Two approaches: Clustering of heterogeneous objects (table clustering) Index clustering
Be careful with different cluster notions 69 IndexIndex Structures:Structures: TableTable ClusteringClustering
Tables frequently accessed together (join)
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Example: — Access to a Movie record is often followed by access to a tape containing this movie: SELECT t.id, t.format, m.id, m.title FROM Tape t, Movie m WHERE m.id = t.movie_id;
— Tape- and movie records with the same movie_id should be placed in one block
Cluster: set of blocks with interleaving (related) rows of more than one table 70 IndexIndex Structures:Structures: TableTable ClusteringClustering
Example: — Cluster key: movie_id Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr
095 Psycho suspense 1960 Hitchcock 2.00 ... 0001 DVD
100 Jaws horror 1975 Spielberg 1.50 ...
112 ET comedy 1982 Spielberg 1.50 ...
0002 DVD
222 Psycho suspense 1998 Van Sant 2.20 ... 0003 VHS
290 Star Wars IV ScFi 1997 Lucas 2.00 ...
345 Star Wars I SciFi 1999 Lucas 2.00 ...
0004 DVD
0005 VHS
0009 VHS
71 IndexIndex Structures:Structures: TableTable ClusteringClustering
Clusters are defined by common cluster key Cluster key often primary / foreign key Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr Need to chain the records, otherwise more block accesses required than unclustered
Table cluster definition: — First create a cluster CREATE CLUSTER mtc(id NUMBER);
— Then create the tables in the cluster CREATE TABLE Movie(…) CLUSTER mtc(id); CREATE TABLE Tape(…) CLUSTER mtc(movie_id);
72 IndexIndex Structures:Structures: IndexIndex ClusteringClustering
Row-ids in a leaf page normally not ordered Sequential index scan means random access to rows Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr
Example: Heap Storage, index without cluster
Root K.. K.. …K..
...... Leaf nodes K.. K.. …K.. K.. K.. …K.. ... K.. K.. …K...... Data records
73 IndexIndex Structures:Structures: IndexIndex ClusteringClustering
Clustering index: — Physical order on non-key attribute
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Obvious: only one cluster per table — Faster access to rows with same value in cluster attribute
Example: K.. K.. …K..
...... K.. K.. …K.. K.. K.. …K.. ... K.. K.. …K......
74 IndexIndex Structures:Structures: ClusteringClustering implementationimplementation
Oracle: — Table cluster supported
Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — Index cluster supported (only hash index)
CREATE INDEX
Postgres: — Table cluster not supported — Index cluster supported CLUSTER MySQL: table and index cluster not supported 75 PhysicalPhysical DesignDesign ++ IndexIndex Structures:Structures: SummarySummary Data stored on disk Access time crucial in querying Dr A. Hinze, University of Waikato, New Zealand,2006, COMP 329 :Database Administration :Database UniversityWaikato, NewZealand,2006, COMP 329 of Hinze, A. Dr — I/Os is most important cost measure — Access Time: Seek time + Rotational time + Transfer time Indexes accelerate access to secondary storage — B+ tree is standard in most DBs — Often extensible hashing supported — Clustering: related data in physical neighborhood Great differences in physical organization in DBS Indexing not standardized 76