6.005 Elements of Software Construction Fall 2008
Total Page:16
File Type:pdf, Size:1020Kb
MIT OpenCourseWare http://ocw.mit.edu 6.005 Elements of Software Construction Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 12/5/2008 Databases Database servers sit behind big web sites like Amazon and eBay ¾Databases are the standard way to maintain the state of a web site Databases are embedded in many applications ¾Firefox browsing history is stored as a database on disk ¾Subversion stores your source code in a database Relational Databases Embedded database is an alternative to saving and loading a file format Rob Miller ¾Instead of saving Java heap objects to a file with a textual format like XML, you can store the data in a database instead Fall 2008 © Robert Miller 2008 © Robert Miller 2008 Benefits of Using a Database Relational Databases Persistence A relational database is a set of named tables ¾Databases are persistent by default – updates to the database are ¾A table has a fixed set of named columns (aka fields or attributes) and a immediately stored on disk varying set of unnamed rows (aka records or tuples) ¾Usually robust to program crashes and hardware reboots Person table 3 columns ¾Contrast with objects in the Java heap, which disappear on a crash First Name LastName Email Query performance 2 rows Daniel Jackson [email protected] ¾Databases build and maintain indexes to answer complex queries quickly, Rob Miller [email protected] e.g. “find books written by Stephen King in 2004” Concurrency ¾Each cell in the table stores a value of a primitive data type ¾Databases provide an effective synchronization mechanism, transactions, • e.g. string, integer, date, time that allows safe concurrent updates to a pile of relational data • object references are represented by integer IDs A table represents a relation ¾In general, a mathematical relation is a set of n-tuples (a binary relation is special case, which is a set of pairs) © Robert Miller 2008 © Robert Miller 2008 1 12/5/2008 Example Pure Relational View An object model we want to store in a database One table per binary relation songName relation duration relation albumTracks songId songName songId duration Song Album 1 Mr. Brightside 1 4:17 2 Somebody Told Me 2 5:57 songName duration albumName albumArtist 3 Girlfriend 3 3:42 ! lyric !! lyric relation ! songId lyric Integer ? String Artist artistName 3Hey, hey, you, you, I don’t like your girlfriend... albumName relation albumTracks relation albumId albumName albumId songId 101 Hot Fuss 101 1 102 The Best Damn Thing 101 2 102 3 © Robert Miller 2008 © Robert Miller 2008 Class/Relation View Bad Designs Often all the exactly-one (!) relations for a class are Relations with other multiplicities (+, *, ?) generally combined into a single table should not be combined Song relation ¾Otherwise, ? relation would force columns to have empty cells songId songName duration songId songName lyric 1Mr. Brightside 4:17 1Mr. Brightside 2 Somebody Told Me 5:57 2 Somebody Told Me 3 Girlfriend 3:24 3 Girlfriend Hey, hey, you, you, ... • Sometimes this is done anyway for performance reasons, just like nulls ¾This table actually represents the Song class are sometimes useful for Java field values ¾Analogy to obj ects on the J ava heap ¾Multiplicity + and * would force columns to become arrays • id column is the object’s address in memory, and other columns are albumId albumName albumTracks fields of the object 101 Hot Fuss 1, 2, 3, ... ¾The id column is usually automatically generated by the database system so that all songs have a unique ID albumId albumName track1 track2 track3 track4 • Analogy: Java’s new operator automatically generates a fresh address 101 Hot Fuss1234 © Robert Miller 2008 © Robert Miller 2008 2 12/5/2008 Querying a Relational Database Relational Algebra SQL (“Structured Query Language”) SELECT is based on a few simple operations that can ¾SQL is a standard language for querying (and mutating) a relational be performed on relations database ¾Each operation takes one or more relations and produces a relation ¾Most database systems support some flavor of SQL PROJECT filters the columns ¾SQL’s SELECT statement offers a compact language for retrieving subsets SELECT filters the rows of relational data PRODUCT adjoins columns from two relations • Find all songs longer than 5 minutes RENAME renames columns SELECT songName FROM Song WHERE duration > 300 ¾A relation is a set of rows, so the usual set operations also apply ¾If you know nothing else about SQL, you should know about SELECT UNION • Note that SQL is case-insensitive, so SELECT and select are the same, INTERSECTION as are songName and songname DIFFERENCE © Robert Miller 2008 © Robert Miller 2008 Projection Selection Projection keeps a set of named columns and discards Selection keeps the subset of rows that match a the rest predicate and discards the rest SELECT songId, duration SELECT * FROM Song FROM Song WHERE duration > 300 songId duration 1 4:17 songId songName duration 2 4:57 2 Somebody Told Me 5:57 3 5:57 ¾Like filtering on the rows © Robert Miller 2008 © Robert Miller 2008 3 12/5/2008 Product Joins Cartesian product A join is a special case of Cartesian product ¾The Cartesian product of two relations R1 and R2 is the result of ¾When the two relations share a column, we only want to concatenate concatenating each row in R1 with all rows in R2 rows that have the same value for that column SELECT * SELECT * use the table name to disambiguate FROM Song, Album FROM Song, AlbumTracks columns with the same name WHERE Song.songId = AlbumTracks.songId songId songName duration albumId albumName songId songName duration albumId songId 1Mr. Brightside 4:17 101 Hot Fuss 1Mr. Brightside 4:17 101 1 2 Somebody Told Me 5:57 101 Hot Fuss 2 Somebody Told Me 5:57 101 1 3 Girlfriend 3:24 101 Hot Fuss 3 Girlfriend 3:24 101 1 1 Mr. Brightside 4:17 102 The Best Damn Thing 1Mr. Brightside 4:17 101 2 2 Somebody Told Me 5:57 102 The Best Damn Thing 2 Somebody Told Me 5:57 102 2 3 Girlfriend 3:24 102 The Best Damn Thing 3 Girlfriend 3:24 102 2 ¾Join can be represented by a product followed by a selection © Robert Miller 2008 © Robert Miller 2008 Question Other Set Operations How do I get a list of songName, albumName pairs? Union, intersection, difference of relations ¾Find songs longer than 5 minutes or containing “midnight” in the lyric songName albumName SELECT songId FROM Song WHERE duration > 300 Mr. Brightside Hot Fuss UNION SELECT songId FROM Lyrics WHERE lyric LIKE ‘%midnight%’ Somebody Told Me Hot Fuss ¾Find songs longer than 5 minutes for which we have the lyrics Girlfriend The Best Damn Thing SELECT songId FROM Song WHERE duration > 300 INTERSECT SELECT songId FROM Lyrics SELECT songName, albumName ¾Find albums that don’t have any tracks FROM Song, AlbumTracks, Album WHERE Song.songId = AlbumTracks.songId SELECT albumId FROM Album AND AlbumTracks.albumId = Album.albumId EXCEPT SELECT albumId FROM AlbumTracks ¾These operations are rarely used in practice, because select predicates can usually do the job, and database systems are good at optimizing SELECT © Robert Miller 2008 © Robert Miller 2008 4 12/5/2008 Aggregate Functions Grouping Accumulating a column of data into a single value GROUP BY computes aggregate functions on subsets ¾How long is the album Thriller? of the tuples SELECT SUM(duration) ¾How long is each album? FROM Song, Album, AlbumTracks SELECT albumName, SUM(duration) WHERE Song.songId = AlbumTracks.songId FROM Song, Album, AlbumTracks AND Album.albumId = AlbumTracks.albumId WHERE Song.songId = AlbumTracks.songId AND Album.albumName = “Thriller” AND Album.albumId = AlbumTracks.albumId GROUP BY albumName SUM 10:14 albumName SUM Hot Fuss 10:14 Other aggregate functions The Best Damn Thing 3:47 ¾AVG ¾COUNT ¾MAX ¾MIN © Robert Miller 2008 © Robert Miller 2008 Exercise Mutating the Database Write SELECT statements for the following queries Insert a row ¾Find the name of the album with the song named “Girlfriend” INSERT INTO Song VALUES (4, “Thriller”, 6:02) Update rows ¾Find names of albums for which we have lyrics (for at least one song) UPDATE Song SET songName=“Smile Like You Mean It”, duration=4:57 WHERE songId = 1 Delete rows ¾List all albums,,g showing album name and number of songs DELETE FROM Song songId songName duration WHERE songName = “Girlfriend” 1Mr. Brightside 4:17 2 Somebody Told Me 5:57 3 Girlfriend 3:24 © Robert Miller 2008 © Robert Miller 2008 5 12/5/2008 Concurrency in Databases Transaction Example Transactions allow concurrent database modifications Transfer money between bank accounts ¾A transaction is a block of SQL statements that need to execute together BEGIN TRANSACTION SELECT balance FROM Account WHERE accountId = 1 Transactions implement ACID semantics and put it in local variable balance1 ¾Atomicity – either the full effects of a transaction are recorded or no trace SELECT balance FROM Account WHERE accountId = 2 of it will be found and call it balance2 ¾Consistency – a transaction is recorded only if it preserves invariants balance1 -= 100 • e.g., every AlbumTrack row must contain an albumId that exists in balance2 += 100 Album and a songId that exists in Song UPDATE Account SET balance=balance1 WHERE accountId = 1 ¾Isolation – if two transactions operate on the same data, the outcome will UPDATE Account SET balance=balance2 WHERE accountId = 2 always be same as executing them sequentially one after the other COMMIT ¾Durability – if the transaction completes, its effects will never be lost © Robert Miller 2008 © Robert Miller 2008 Transactions vs. Locks Summary Transaction is tentative until successful commit Relations as database tables ¾COMMIT fails if a simultaneous transaction changed the same rows and ¾Relational database is a relation-centric implementation of an object model managed to commit first Normal form ¾If commit fails, the transaction is rolled back – i.e., it has no effect on the ¾All rows are unique, no entries can be null database ¾Your program can retry the transaction if the commit failed Relational algebra for querying Database handles low-level concurrency mechanisms ¾Project, select, and join operators combine relations ¾SQL select statement uses all three operators ¾e.g.