Logging and Recovery Review: the ACID Properties Motivation Assumptions

Motivation • Atomicity: – Transactions may abort (“Rollback”). Logging and • Durability: Recovery – What if DBMS stops running? (Causes?) Chapter 18 Y Desired Behavior after crash! If you are going to be in the system restarts: T1 logging business, one of the – T1, T2 & T3 should be T2 things that you have to do is to durable. T3 learn about heavy equipment. – T4 & T5 should be T4 - Robert VanNatta, aborted (effects not seen). T5 Logging History of Columbia County Review: The ACID properties Assumptions • A tomicity: All actions in the Xact happen, or none happen. • Concurrency control is in effect. • C onsistency: If each Xact is consistent, and the DB starts – Strict 2PL, in particular. consistent, it ends up consistent. • Updates are happening “in place”. • I solation: Execution of one Xact is isolated from that of other – i.e. data is overwritten on (deleted from) the disk. Xacts. • D urability: If a Xact commits, its effects persist. • A simple scheme to guarantee Atomicity & Durability? • The Recovery Manager guarantees Atomicity & Durability. 1 Handling the Buffer Pool Basic Idea: Logging • Force write to disk at • Record REDO and UNDO information, for every commit? No Steal Steal update, in a log. – Poor response time. – Sequential writes to log (put it on a separate disk). – But provides durability. Force Trivial – Minimal info (diff) written to log, so multiple updates • Steal buffer-pool frames fit in a single log page. from uncommited Xacts? • Log: An ordered list of REDO/UNDO actions – If not, poor throughput. No Force Desired – Log record contains: – If so, how can we ensure <XID, pageID, offset, length, old data, new data> atomicity? – and additional control info (which we’ll see soon). More on Steal and Force Write-Ahead Logging (WAL) •STEAL (why enforcing Atomicity is hard) • The Write-Ahead Logging Protocol: – To steal frame F: Current page in F (say P) is c Must force the log record for an update before written to disk; some Xact holds lock on P. the corresponding data page gets to disk. • What if the Xact with the lock on P aborts? d Must write all log records for a Xact before • Must remember the old value of P at steal time (to support UNDOing the write to page P). commit. •NO FORCE(why enforcing Durability is hard) • #1 guarantees Atomicity. – What if system crashes before a modified page is • #2 guarantees Durability. written to disk? – Write as little as possible, in a convenient place, at • Exactly how is logging (and recovery!) done? commit time,to support REDOing modifications. – We’ll study the ARIES algorithms. 2 DB RAM WAL & the Log LSNs pageLSNs flushedLSN Other Log-Related State • Each log record has a unique Log Sequence Number (LSN). Log records • Transaction Table: – LSNs always increasing. flushed to disk – One entry per active Xact. • Each data page contains a pageLSN. –Contains XID, status (running/commited/aborted), – The LSN of the most recent log record and lastLSN. for an update to that page. • Dirty Page Table: • System keeps track of flushedLSN. – One entry per dirty page in buffer pool. – The max LSN flushed so far. –ContainsrecLSN -- the LSN of the log record which pageLSN • WAL: Before a page is written, “Log tail” first caused the page to be dirty. in RAM –pageLSN≤ flushedLSN Log Records Normal Execution of an Xact Possible log record types: LogRecord fields: • Update • Series of reads & writes, followed by commit or abort. prevLSN •Commit XID •Abort – We will assume that page write is atomic on disk. type •End (signifies end of commit • In practice, additional details to deal with non-atomic writes. pageID or abort) • Strict 2PL. update length • Compensation Log Records • STEAL, NO-FORCE buffer management, with Write- records offset (CLRs) Ahead Logging. only before-image – for UNDO actions after-image – (and some other tricks!) 3 Checkpointing Simple Transaction Abort • Periodically, the DBMS creates a checkpoint, in order to minimize the time taken to recover in the event of a system • For now, consider an explicit abort of a Xact. crash. Write to log: –No crash involved. – begin_checkpoint record: Indicates when chkpt began. • We want to “play back” the log in reverse – end_checkpoint record: Contains current Xact table and dirty page order, UNDOing updates. table. This is a `fuzzy checkpoint’: –GetlastLSN of Xact from Xact table. • Other Xacts continue to run; so these tables only known to reflect some mix of state after the time of the begin_checkpoint record. – Can follow chain of log records backward via the • No attempt to force dirty pages to disk; effectiveness of checkpoint prevLSN field. limited by oldest unwritten change to a dirty page. (So it’s a good idea – Note: before starting UNDO, could write an Abort to periodically flush dirty pages to disk!) log record. – Store LSN of chkpt record in a safe place (master record). •Why bother? The Big Picture: What’s Stored Where Abort, cont. LOG DB RAM • To perform UNDO, must have a lock on data! – No problem! LogRecords • Before restoring old value of a page, write a CLR: prevLSN Xact Table Data pages lastLSN – You continue logging while you UNDO!! XID – CLR has one extra field: undonextLSN type each status • Points to the next LSN to undo (i.e. the prevLSN of the record we’re pageID with a pageLSN Dirty Page Table currently undoing). length –CLR contains REDO info recLSN offset –CLRsnever Undone before-image master record flushedLSN • Undo needn’t be idempotent (>1 UNDO won’t happen) after-image • But they might be Redone when repeating history (=1 UNDO guaranteed) • At end of all UNDOs, write an “end” log record. 4 Transaction Commit Recovery: The Analysis Phase • Write commit record to log. • Reconstruct state at checkpoint. • All log records up to Xact’s lastLSN are flushed. –via end_checkpoint record. – Guarantees that flushedLSN ≥ lastLSN. • Scan log forward from begin_checkpoint. – Note that log flushes are sequential, synchronous –Endrecord: Remove Xact from Xact table. writes to disk. – Other records: Add Xact to Xact table, set – Many log records per log page. lastLSN=LSN, change Xact status on commit. • Make transaction visible – Update record: If P not in Dirty Page Table, – Commit() returns, locks dropped, etc. • Add P to D.P.T., set its recLSN=LSN. • Write end record to log. Crash Recovery: Big Picture Recovery: The REDO Phase • We repeat History to reconstruct state at crash: Oldest log –Reapply all updates (even of aborted Xacts!), redo rec. of Xact Y Start from a checkpoint (found active at crash CLRs. via master record). • Scan forward from log rec containing smallest Smallest Y Three phases. Need to: recLSN in D.P.T. For each CLR or update log rec LSN, recLSN in dirty page – Figure out which Xacts REDO the action unless: table after committed since checkpoint, – Affected page is not in the Dirty Page Table, or Analysis which failed (Analysis). – Affected page is in D.P.T., but has recLSN > LSN, or – REDO all actions. –pageLSN(in DB) ≥ LSN. Last chkpt X (repeat history) • To REDO an action: – UNDO effects of failed Xacts. – Reapply logged action. CRASH –SetpageLSN to LSN. No additional logging! A RU 5 Recovery: The UNDO Phase Example: Crash During Restart! LSN LOG ToUndo={ l | l a lastLSN of a “loser” Xact} 00,05 begin_checkpoint, end_checkpoint Repeat: RAM 10 update: T1 writes P5 – Choose largest LSN among ToUndo. 20 update T2 writes P3 undonextLSN – If this LSN is a CLR and undonextLSN==NULL Xact Table 30 T1 abort lastLSN 40,45 CLR: Undo T1 LSN 10, T1 End •Write an End record for this Xact. status – If this LSN is a CLR, and undonextLSN != NULL Dirty Page Table 50 update: T3 writes P1 recLSN 60 update: T2 writes P5 •AddundonextLSN to ToUndo flushedLSN CRASH, RESTART • (Q: what happens to other CLRs?) 70 CLR: Undo T2 LSN 60 ToUndo – Else this LSN is an update. Undo the update, 80,85 CLR: Undo T3 LSN 50, T3 end write a CLR, add prevLSN to ToUndo. CRASH, RESTART Until ToUndo is empty. 90 CLR: Undo T2 LSN 20, T2 end Example of Recovery Additional Crash Issues LSN LOG • What happens if system crashes during Analysis? During REDO? RAM 00 begin_checkpoint 05 end_checkpoint • How do you limit the amount of work in REDO? Xact Table 10 update: T1 writes P5 prevLSNs – Flush asynchronously in the background. lastLSN 20 update T2 writes P3 – Watch “hot spots”! status 30 T1 abort Dirty Page Table • How do you limit the amount of work in UNDO? recLSN 40 CLR: Undo T1 LSN 10 – Avoid long-running Xacts. flushedLSN 45 T1 End 50 update: T3 writes P1 ToUndo 60 update: T2 writes P5 CRASH, RESTART 6 Logical vs. Physical Logging Nested Top Actions • Roughly, ARIES does: • Trick to support physical operations you do not –Physical REDO want to ever be undone –Logical UNDO –Example? • Why? • Basic idea – At end of the nested actions, write a dummy CLR • Nothing to REDO in this CLR – Its UndoNextLSN points to the step before the nested action. Logical vs. Physical Logging, Cont. Summary of Logging/Recovery • Page-oriented REDO logging • Recovery Manager guarantees Atomicity & – Independence of REDO (e.g. indexes & tables) Durability. – Not quite physical, but close • Use WAL to allow STEAL/NO-FORCE w/o • Can have logical operations like increment/decrement sacrificing correctness. (“escrow transactions”) • LSNs identify log records; linked into • Logical UNDO backwards chains per transaction (via – To allow for simple management of physical prevLSN). structures that are invisible to users • pageLSN allows comparison of data page and – To allow for logical operations log records. • Handles escrow transactions 7 Summary, Cont. • Checkpointing: A quick way to limit the amount of log to scan on recovery.

Logging and Recovery Review: the ACID Properties Motivation Assumptions

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support