Chapter 16 Disk Storage, Basic File Structures, Hashing, and Modern Storage

- Databases are stored as files of records stored on disks - Physical database file structures - Physical levels of three schema architecture

1

- The collection of data in a DB must be stored on some storage medium. The DBMS software can retrieve, update, and process this data as needed - Storage media forms a hierarchy

2

-primary, secondary, tertiary, etc.. - offline storage, archiving databases (larger capacity, less cost, slower access, not directly accessible by CPU)

Memory Hierarchies and Storage Devices - Cache, static RAM (Prefetch, Pipeline) - Dynamic RAM (main memory(

Secondary and Tertiary Storage - (magnetic disks, CD, DVD (measured in KB, MB, TB, PB - programs are in main memory (DRAM) -permanent databases reside in secondary storage - main memory buffers are used to read and write to secondary storage - Flash memory: non volatile, NAND and NOR flash based - Optical disks: CDs (700MB) and (4.5 – 15GB), Blue Ray (54GB) - Magnetic Tapes and Juke Boxes

Depending upon the intended use and application requirements, data is kept in one or more levels of hierarchy

3

Storage Organization of Database -Large amount of data that must persist for a long period of time (called persistent data) - parts of this data are accessed and processed repeatedly during the storage period - transient data during the period of execution - most DBs are stored on secondary storage (magnetic disks) - DB is too large to fit in main memory - permanent loss on disk is less likely - less cost on disk than primary storage

4

5

6

- A range of cylinders have the same number of sectors per arc. - A common sector size is 512 - A division of a track into equal sized disk blocks (or pages) is set by OS during formatting - Fixed size can’t be changed dynamically - Block sizes 512b – 8192b - Blocks are separated by fixed size interblock gaps - Storage capacity and transfer rates improving all the time, also cost is down at the same time ($100/TB) Disk - Random access addressable device - Transfer from disk to main memory is in units of blocks - Hardware address of block consists of (cylinder#, track#, block#) - Modern disks have a single number called LBA (logical block address) - The LBA 0 – n-1 is mapped to the right block on the disk - The LBA maps to a contiguous address in main memory - One block at a time or a cluster to transfer - controls the disk drive - Standard interface from a computer to a disk is called SCSI (small computer system interface) - Connection of HDDs, CDs and DVDs to a computer is through SATA (Serial AT attachment), 16 bit IBM AT ), 1.5Gbps – 6Gbps - New SATA is NL-SAS (nearline SAS) - The controller accepts high level I/O commands and takes appropriate action to position the arm and cause read/write - Seek time 5-10msec - Rotational latency 4msec - Block transfer time

7

- Transfer several consecutive blocks on the same track or cylinder to be effective (avoids seek time and rotational latency for blocks except the first one, total time 9-60msec, subsequent blocks 0.4 to 2mses) - Locating data on a disk is a major bottleneck – need efficient techniques to do this… Making Data Access More Efficient on Disk (1) Buffering a. Mismatch of speeds of CPU and disks b. Application using current data and I/O fetching new data to the buffer (2) Organization a. Use contiguous cylinders and tracks b. Avoid movement of arm and seek time (3) Prefetch a. Read data ahead of request b. Read consecutive blocks on tracks or cylinders though not needed c. May not be efficient for random data (4) Scheduling a. Proper scheduling of I/O requests b. Efficient scheduling algorithms (e.g elevator) (5) Use Log Disks a. Log disks to hold data temporarily b. Single disk used to hold logging of writes c. All blocks go to disk sequentially, avoiding seek time d. Place data and log files on the log disk e. Not possible to do for most applications

8

(6) Use Flash Memory a. Use SSDs or Flash memory instead of hard disks b. Do writes and updates to battery backup DRAM c. Later save to hard disk

9

10

Solid State Device Storage (SDD) Use flash memory as intermediate storage enterprise flash drives (EFDs) Storage Devices - Sequential access devices to access nth block on tape - Read/write head is used to access tapes - Used for backup and recovery

Buffering on Blocks When several blocks to be transferred to memory and all the block addresses are known, several buffers can be reserved in memory to speed up the transfer. When one buffer being read/written by I/O, CPU can process other buffer.

11

- Processes A, B are running concurrently in interleaved fashion, C, D are running in parallel. - Use of two buffers shown in Fig. 16.4. File A is in one buffer and File B is in another buffer (double buffering) - Double buffering permits contiguous reading or writing of data blocks, thus reducing seek time.

12

Buffer Management

- It is impossible to bring all data into memory at the same time - Buffer is a part of main memory that is available to receive blocks or pages of data from disk - Buffer manager is a software component of a DBMS, which manages buffers. It knows, which pages to bring and which buffer to use

13

- The size of the shared buffer pool is a parameter for the DBMS controlled by DBAs

Two kinds of buffer management: 1. Controls the main memory directly (RDBMS) 2. Allocates buffers in virtual memory (OS Control), OODBMS Goals: 1. Maximize probability that a requested page is found in main memory 2. Efficient page replacement algorithm

Keeps Information: 1. A pin-count (number of requests or number of current users); If the count is 0, it is unpinned; a pinned block should not be allowed to write to disk 2. A dirty bit a. a dirty bit is set when a page is updated by any application program b. make sure no of buffers fit in main memory c. if the requested amount exceeds buffer pool, use page replacement d. if the space is in virtual memory, OS thrashing may happen e. if the requested page is already in the buffer pool, increment pin count f. if the page is not in the buffer pool: i. choose a page replacement

14

ii. if dirty bit is on in the replacement page (old copy is on the disk), use the slot for a new page and copy the data and release the buffer to an application. Buffer Replacement Strategies 1. LRU (least recently used); maintain a time stamp; least used page is replaced 2. Clock priority; round robin variant of LRU; flag 0 or 1; if 0, use it; if 1, reset to 0, if dirty bit is set then write to disk

Flag 0 or 1 in each slot

3. FIFO a. Notes the time each page loaded into memory b. Simple approach c. It may bring back the same block (sometimes) LRU and Clock policies best policies for DB applications

15

Placing File Records on Disk Set of records are organized into set of files. Records and Record Types: - Data is in the form of records - Each record consists of collection of related data values or items (corresponds to a field) Record type is a collection of records Record structure is an entity Data type is associated with each field Standard data types: integer, long, float, char, …. Other data types: date, time, … struct employee { Char name[30]; Char ssn[9]; Int salary; Int job-code; Char department[20]; }; Database also have to store unstructured data (binary large objects, BLOBs), digital images, videos as pointers to the blobs included in the record.

16

Files, Fixed and Variable Lengths - Same record type in a file - If every record is same size, then it is called fixed length record - If different records have different lengths, it is called variable length records o Variable length fields (name) o Repeating fields, or repeating group fields o Different types of records o Separator characters are used for variable length fields o If too many fields, but less actual fields; then . format is used . - Repeating fields; one char to separate values; one char to separate fields and one char to terminate; (= , ||, #) - These characters are the part of the file system, but hidden from the programmer (0x0d and 0x0a)

Record Blocking Records are stored in blocks (sectors) Block size B Record size R Unit of transfer from disk to memory is a block

If B > R, bfr (blocking factor) = Ɩ B/R ɺ records per block (integer division) If it does not divide evenly, unused space is:

17

B – (bfr * R) bytes

To utilize space, a record may be spanned in two blocks: If R > B spanned record; number of blocks needed for a file of r records: b = ɾ r/bfr ɿ blocks (next integer value)

Allocation of Files on Disk - Contiguous - Linked - Index - (clusters and extents)

File Headers - Contains information about files (disk addresses, record format descriptions) - Records are copied into memory and searched one block at a time

18

19

20

Contiguous Allocation

21

Linked Allocation

22

Indexed Allocation

23

Operations on Files - Retrieval - Updates A simple or compound selection conditions are used: Ssn = ‘12345678’ Department = ‘Research’ Salary > 30000 Complex conditions must be decomposed into simple conditions to locate records on the disk. A high level programs like DBMS software use file operations such as: - Open - Reset - Find (or Locate) - Read (or Get) - Find Next - Delete - Modify - Insert - Close - Scan (returns first or next record) - FindAll - FindChar (Record at a time operations except reset and close)

24

Files of Unordered Records (Heap) - Records are placed in a file the way they arrived and inserted, new records are placed at the end; This arrangement is called HEAP. - The last disk block is copied into buffer from the disk; the record is inserted into the buffer, the buffer is copied back to the disk; the address of the last file block is kept in the file header - Inserting a record is very efficient (new records are at the end) - Searching involves linear search (b/2) - To delete a record, find a block that has the record, copy to buffer, delete the record, write buffer back to disk - This leaves unused space in the disk block (wasted) - Another method, keep a delete marker in the record, bit or , search considers only valid records - Require periodic reorganization of file to remove deleted records - Deleted records can be reused for new records, but needs more book keeping - Soring can also be used for deleted records, but expensive Buffer

Record

25

Direct or Relative File -fixed length records -un-spanned blocks -contiguous allocation File records: 0, 1, 2, …., r-1 Records in each block: 0, 1, 2, ….., bfr – 1 ith record of a file is located in a block: Ɩ i/bfr ɺ and (i mod bfr) is that block no of records = 550 bfr = 20 i = 221 i is located at (i/bfr), that is 221/20 = 11th block and (i mod bfr), that is (221 mod 20) = 1st record in the block

Files of Ordered Records - Order the records based on one of its field - This leads to ordered or sequential file - If the ordered field is a key field of the file, unique value in each record (name of the employee as a key)

26

27

Advantages of ordered files: 1. No sorting required to access, key can be used 2. Search type can be a key value or a range of key values 3. Finding next record from current record is easy

4. Binary search can be used to speed up search log2b 5. Inserting and deleting is expensive 6. Keep some unused space (for inserting), same problem after it is finished 7. Another approach, use master file and overflow file; overflow file can be sorted and merged with the master file during the file reorganization

28

Hashing Techniques Search conditions on a single field called hash field. In most cases, the hash field is also a key field. Internal Hashing: Hashing used as an internal file structure within a program Hash table 0 - (m-1) We have m slots, whose addresses correspond to the array indexes We have a hash function that translates to value between 0 and m-1

One common function: h(k) = k mod m

Other functions called “folding” use arithmetic functions such as add, or logical function such as xor

Collision occurs and has to be resolved: - Open addressing (checks available subsequent addresses) - Chaining - Multiple hashing

External hashing for disk files Hashing for disk files is called external hashing to suit the characteristics of disk storage.

29

- is made of buckets - Each bucket holds multiple records - A bucket can be one or more disk blocks (cluster) - Hashing function maps a key into a relative bucket number rather than assigning an absolute block address to the bucket - A table maintained in the file header converts the bucket number to the corresponding disk block address Fig. 16.9 - Fixed number of buckets called static hashing

SKIP 16.8.3

30

31

32

33

Files on Mixed Records

We assume all records of a particular file are of same type of records. In most databases, numerous types of records have relationships and there is a need for mixed records in the files.

-numerous types of entities are related in various ways -they need to be clustered in the same block or blocks -OODBs also need clustering of objects - there are other types of data structures such as B trees to store DB data for efficient access

Parallelizing Disk Access Using RAID Technology

- Redundant array of independent disks - Provide reliability and high performance - Large array of independent disks acting as a single

Data Striping: distribute data across many disks to appear as a single logical disk Bit-level Striping: individual bits are split among disks Block-level Stripping: blocks across disks

34

35

36

Reliability with RAID

- For n disks, likelihood of failure is n times - MTBF of 200,000 hours, for 100 disks, the MTBF is 83 days - To improve reliability, mirroring or shadowing and other RAID techniques are used - When redundancy is used, MTR (mean time to repair) of 24 hours, MTBF becomes 90 years ((200000)^2 / 2*24)

Modern Storage Architecture

(1) Storage Area Networks (SAN) (2) Network Attached Storage (NAS) (3) iSCSI and Other Network-based Protocols (SCSI commands are encapsulated and put into IP packets using this protocol) (4) Automated Storage Tiering (moves data between different storage types (SATA, SAS, .) (5) Object-based storage (instead of files, objects are used; Facebook and other big data uses object storage)…Also uses SCSI commands to transmit objects on the Internet (Microsoft Azure, Openstack Swift protocol)

37

Storage Area Networks

38

Network Attached Storage

39