<<

Crash Recovery

Chapter 18

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1

Review: The ACID properties

 A tomicity: All actions in the Xact happen, or none happen.  C onsistency: If each Xact is consistent, and the DB starts consistent, it ends up consistent.  I solation: Execution of one Xact is isolated from that of other Xacts.  D urability: If a Xact commits, its effects persist.

 The Recovery Manager guarantees Atomicity & Durability.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 2 Motivation

 Atomicity: . Transactions may abort (“Rollback”).  Durability: . What if DBMS stops running? (System crashes or media failures) Desired Behavior after  crash! system restarts: T1 – T1, T2 & T3 should be T2 durable. T3 – T4 & T5 should be T4 aborted (effects not seen). T5

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 3

Assumptions

is in effect. . Strict 2PL, in particular.  Updates are happening “in place”. . i.e. data is overwritten on (deleted from) the disk.

 A simple scheme to guarantee Atomicity & Durability?

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 4 ARIES

 When the recovery manager is invoked after a crash, restart proceeds in three phrases: . Analysis: identifies dirty pages and active transactions . Redo: Repeats some actions necessary . Undo: Undoes actions which did not commit.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 5

Principles

 Write-Ahead Logging . Changes are written ahead in a stable storage.  Repeating History During Redo . When restart, bring system back to the state when crash happened.  Logging Changes During Undo . Changes during undo is also logged.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 6 Basic Idea: Logging

 The log is a history of actions executed by the DBMS.  Physically, the log is stored in stable storage to survive crashes.  The most recent portion of the log, the log tail, is kept in memory and is periodically forced to stable storage.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 7

Basic Idea: Logging

 Record REDO and UNDO information, for every update, in a log. . Sequential writes to log (put it on a separate disk). . Minimal info (diff) written to log, so multiple updates fit in a single log page.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 8 Log Record

 Every log record has a unique id – Log Sequence Number (LSN) and LSN always increases. . It could be the memory address of the log record.  For recovery, every page in the database has the LSN of the most recent log record that describes a change to this page – pageLSN.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 9

Log Records A linked list of one transaction

LogRecord fields: Possible log record types: prevLSN  Update XID  Commit type  Abort  End Additional Fields  Compensation Log Records (CLRs) . for UNDO actions

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 10 Additional Fields of Log Record

 Commit (Commit record) . Transaction id is included and it force writes the log tail to a stable storage.  Abort (Abort record) . Transaction id is included. Undo is initiated for the transaction.  End (End record) . After additional actions after Abort/Commit. The transaction id is included.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 11

Update Log Records LSN:  pageID: modified page id.  length: length of the change in bytes  offset: offset of the change  before-image: the value of the changed bytes before the change (for undo)  after-image: the value after the change (for redo)  Redo-only update vs. undo-only update

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 12 Transaction

 One entry for each active transaction.  Each entry includes . transaction id, . the status (in progress, committed, or aborted), . lastLSN (the LSN of the most recent log record for this transaction). transID Status lastLSN T100 In progress 4 T200 In progress 3

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 13

pageID recLSN Dirty Page Table P500 1 P600 2 P505 4  Dirty page: pages with changes not reflected on disk.  One entry for each dirty page in the system’s buffer pool.  Each entry includes . recLSN (the LSN of the first log record that caused the page dirty, which is also the earliest log record to be redone)

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 14 pageID recLSN P500 1 Dirty Page Table P600 2 P505 4

LSN Prev transID Type pageID Length Offset Before- After- LSN image image 1 T100 Update P500 3 21 ABC DEF

2 null T200 Update P600 3 41 HIJ KLM

3 2 T200 Update P500 3 20 GDE QRS

4 1 T100 Update P505 3 21 TUV WXY LOG transID Status lastLSN T100 In progress 4 Transaction Table T200 In progress 3

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 15

Write-Ahead Logging (WAL) Protocol

 The Write-Ahead Logging Protocol: 1. Must write the log record for an update before the corresponding data page gets to disk. (Guarantee Atomicity) 2. Must write all log records to stable storage for a Xact before commit. (Guarantee Durability)  WAL is the fundamental rule to ensure a record of every change to the database is available while recovering a crash.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 16 The Big Picture: What’s Stored Where

LOG RAM DB LogRecords prevLSN Xact Table XID Data pages lastLSN type each status pageID with a length pageLSN Dirty Page Table offset recLSN before-image Master Record after-image flushedLSN . System keeps track of flushedLSN, The max LSN flushed so far. Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 17

Transaction DB RAM Commits LSNs pageLSNs flushedLSN

 System keeps track of flushedLSN. Log records  WAL: Before a page is written, flushed to disk . pageLSN flushedLSN  Write commit record to log.  All log records up to Xact’s lastLSN are flushed. . Guarantees that flushedLSN  lastLSN. . Note that log flushes are sequential, synchronouspageLSN writes to disk. “Log tail” in RAM . Many log records per log page.  Write end record to log.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 18 Simple Transaction Abort

 For now, consider an explicit abort of a Xact. . No crash involved.  We want to “play back” the log in reverse order, UNDOing updates. . Get lastLSN of Xact from Xact table. . Can follow chain of log records backward via the prevLSN field. . Before starting UNDO, write an Abort log record. • For recovering from crash during UNDO!

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 19

Abort, cont.

 To perform UNDO, must have a on data!  Before restoring old value of a page, write a CLR: . You continue logging while you UNDO!! . CLR has one extra field: undoNextLSN • Points to the next LSN to undo (i.e. the prevLSN of the record we’re currently undoing). . CLRs never Undone (but they might be Redone when repeating history: guarantees Atomicity!)  At end of UNDO, write an “end” log record.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 20 Example of CLR

LSN Prev transID Type pageI Length Offset Before- After- LSN D image image 1 Null T100 Update P500 3 21 ABC DEF

2 Null T200 Update P600 3 41 HIJ KLM

3 2 T200 Update P500 3 20 GDE QRS

4 1 T100 Update P505 3 21 TUV WXY

 To undo the last Update, a CLR is written as

LSN: 21 , WXY , TUV , 1 >

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 21

Checkpointing

 Periodically, the DBMS creates a checkpoint, in order to minimize the time taken to recover in the event of a system crash. Write to log: . begin_checkpoint record: Indicates when chkpt began. . end_checkpoint record: Contains current Xact table and dirty page table. This is a `fuzzy checkpoint’: • Other Xacts continue to run; so these tables accurate only as of the time of the begin_checkpoint record. • No attempt to force dirty pages to disk; effectiveness of checkpoint limited by oldest unwritten change to a dirty page. (So it’s a good idea to periodically flush dirty pages to disk!) . Store LSN of chkpt record in a safe place (master record).

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 22 Crash Recovery: Big Picture

Oldest log rec. of Xact  Start from a checkpoint (found active at crash via master record). Smallest  Three phases. Need to: recLSN in dirty page – Figure out which Xacts table after committed since checkpoint, Analysis which failed (Analysis). – REDO all actions. Last chkpt • (repeat history) – UNDO effects of failed Xacts. CRASH A R U Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 23

Recovery: The Analysis Phase

 Analysis phase identifies . Where to start the Redo pass (the point in the log) . What are dirty (pages in the buffer pool at the time of crash). . Who are active ( transactions should be undone).

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 24 Recovery: The Analysis Phase

 Reconstruct state at checkpoint. . Initializing D.P.T. and Xact Table via end_checkpoint record.  Scan log forward from checkpoint. . End record: Remove T from Xact table. . Other records: Add T to Xact table, set lastLSN=LSN, change T status to Last commit/To be Undone. chkpt . Update record: If P not in Dirty Page CRASH Table, add P to D.P.T., set its A R U recLSN=LSN.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 25

Recovery: The Analysis Phase

 At the end, transaction table has a list of all transactions with status U (they were active while crash).  At the end, the dirty page table has all dirty pages that were dirty during crash, and also some pages were written to disk.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 26 Example of Analysis

LSN Prev transID Type pageID Length Offset Before- After- LSN image image 1 Null T100 Update P500 3 21 ABC DEF 2 Null T200 Update P600 3 41 HIJ KLM 3 2 T200 Update P500 3 20 GDE QRS 4 1 T100 Update P505 3 21 TUV WXY 5 3 T200 Commit 6 5 T200 End  The dirty page table and transaction table is lost in the crash. Two tables are empty now and are recreated as:

pageID recLSN transID Status lastLSN P500 1 T100 To be Undone 4 P600 2 P505 4 Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 27

Recovery: The REDO Phase

 Repeating History to reconstruct state at crash by checking Dirty Page Table: . Reapply all updates (even of aborted Xacts!), Smallest redo CLRs. recLSN . Scan forward from log rec containing smallest in dirty page recLSN in D.P.T. table . after For each CLR or update log rec LSN, REDO the Analysis action unless: • Affected page is not in the Dirty Page Table, or • Affected page is in D.P.T., but has recLSN > LSN, or Last • pageLSN (in DB) LSN. chkpt  To REDO an action: CRASH . Reapply logged action. A R U . Set pageLSN to LSN. No additional logging!

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 28 Example of Redo LSN Prev transID Type pageID LSN page page pageID recLSN 1 Null T100 Update P500 ID LSN P500 1 2 Null T200 Update P600 P600 2 P600 2 3 2 T200 Update P500 P500 3 P505 4 4 1 T100 Update P505 P505 4 5 3 T200 Commit 6 5 T200 End 1. Start from LSN No.1, check Log Rec which action should be redone. 2. LSN No.1 should be redone. The PageLSN of P500 is set to 1. 3. LSN No.2 should not be redone since the PageLSN of the affected page in the database P600 is equal to LSN No.2 4. LSN No. 3 should be redone. The PageLSN of P500 is set to 3. 5. LSN No.4 should be redone. The PageLSN of P505 is set to 4.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 29

Recovery: The UNDO Phase

Oldest log rec. of Xact  Scan backward from the end of the active at log rec to abort transactions in crash Transaction table (loser Smallest recLSN in transactions). dirty page  Undo actions of loser transactions table after Analysis in the reverse order.

Last chkpt

CRASH A R U Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 30 Recovery: The UNDO Phase

ToUndo={ l | l a lastLSN of a “loser” Xact} Repeat: . Scan backward. Choose largest LSN among ToUndo. . If this LSN is a CLR and undonextLSN==NULL • Write an End record for this Xact. . If this LSN is a CLR, and undonextLSN != NULL • Add undonextLSN to ToUndo . Else this LSN is an update. Undo the update, write a CLR, add prevLSN to ToUndo. Until ToUndo is empty.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 31

Example of Undo LSN Prev transID Type pageID LSN transID Status lastLSN 1 Null T100 Update P500 T100 To be Undone 4 2 Null T200 Update P600 3 2 T200 Update P500 4 1 T100 Update P505 5 3 T200 Commit 6 5 T200 End

1. Start from LSN No.4, the Update is undone, a CLR is written with undoNextLSN equal to LSN No.1. 2. LSN No.1, the Update is undone, a CLR is written with undoNextLSN equal to Null. 3. The end log record is written for T100.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 32 Summary of Logging/Recovery

 Recovery Manager guarantees Atomicity & Durability.  Use WAL to allow STEAL/NO-FORCE w/o sacrificing correctness.  LSNs identify log records; linked into backwards chains per transaction (via prevLSN).  pageLSN allows comparison of data page and log records.

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 33

Summary, Cont.

 Checkpointing: A quick way to limit the amount of log to scan on recovery.  Recovery works in 3 phases: . Analysis: Forward from checkpoint. . Redo: Forward from oldest recLSN. . Undo: Backward from end to first LSN of oldest Xact alive at crash.  Upon Undo, write CLRs.  Redo “repeats history”: Simplifies the logic!

Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 34