Database Systems I Foundations of Databases

Summer term 2010

Melanie Herschel [email protected]

Database Systems Group, University of Tübingen

1

Chapter 0 Overview

• Overview

• Administrativa

• A little bit of History

2 Foundations of Databases | Summer term 2010 Credit: Michael Marcol Melanie Herschel | University of Tübingen http://www.freedigitalphotos.net/images/view_photog.php?photogid=371 Welcome Everyone!

First of all, let me introduce myself...

Grew up in Bavaria & Lorraine Student at the University of Cooperative Education Stuttgart 2000 - 2003 Information Technology Research Assistant at Humboldt University in Berlin & at the Hasso-Plattner-Institute Potsdam 2003 - 2007 Data quality & data integration 2007 PhD defense Post-Doc researcher at the IBM Almaden Research Center 2008 - 2009 Data provenance / query understanding Research assistant at Tübingen University since 06/2009 “Debugging” queries with Nautilus

Melanie B315, HerschelSand 13

Tel +49 7071 29-75481 [email protected] Email http://www-db.informatik.uni-tuebingen.de/team/herschel Web

3 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Welcome Everyone!

... and now it is your turn.

locals or “neigscheckte”? Which semester?

Computer science, bioinformatics, other studies?

Bachelor vs. Diplom?

Prior experience with databases?

4 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen Where to Meet Databases

5 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

What is this course about?

• Convince you that there is more to database technology than just open-file(), read()/write(), and close-file().

• Make you see how versatile the strictly tabular data model supported by relational databases can be. • Make you best friends with SQL, the principal language spoken by relational database systems.

• We will encounter a healthy mix of good, clean theory and highly relevant CS practice.

6 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen What is this course about? Structure

• What are databases? ‣ Motivation, history, data independence, database usage • Modeling databases using the Entity-Relationship Model ‣ Entities, relationships, cardinalities, diagrams • Developing relational databases 1. Introduction ‣ Relational model, ER -> relational, normal forms 2. ER-Modeling • Relational algebra 3. Relational model(ing) ‣ Criteria for query languages, operators 4. Relational algebra • SQL 5. SQL ‣ SQL DDL, SQL DML, SELECT... FROM... WHERE... 6. Programming • Programming for databases ‣ JDBC

7 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

What is this course about? ER-Modeling

1,1 1,n Customer owns Account

Firstname Lastname DOB Number Type Balance

• We want to model customers and their accounts 1. Introduction (saving account, checking account, credit card, ...) 2. ER-Modeling 3. Relational model(ing) • Customers and accounts have attributes, e.g., a 4. Relational algebra person has a firstname, lastname, and date of birth. 5. SQL 6. Programming • A customer can have one or more accounts. • We assume that an account can only have one owner.

9 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen What is this course about? Relational Model

1,1 1,n Customer owns Account

Firstname Lastname DOB Number Type Balance

1. Introduction Customer 2. ER-Modeling Cust_ID Firstname Lastname DOB 3. Relational model(ing) 4. Relational algebra Account CREATE TABLE Account ( 5. SQL Number Type Balance Cust_ID Number INTEGER, 6. Programming Type CHAR(25), Balance DOUBLE, Cust_ID INTEGER, PRIMARY KEY (Number), FOREIGN KEY Cust_ID REFERENCES Customer

) 10 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

What is this course about? Relational Model

Customer Cust_ID Firstname Lastname DOB 1 John Doe 1.1.1970 2 Jane Smith 2.2.1977 4 Peter Miller 3.3.1983 ...... 1. Introduction 2. ER-Modeling Account 3. Relational model(ing) Number Type Balance Cust_ID 4. Relational algebra 123 Checking 2000 1 5. SQL 124 Saving 5000 1 6. Programming 987 Checking 100 2 975 Credit 500 2 777 Saving 6000 4 ......

11 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen What is this course about? Relational Algebra & SQL

Declarative queries • Not “How do I generate the query result?” • But “What data does the query result contain?” Natural language query Name of account owner and account number of all accounts with a balance of more than 1000 Euros. 1. Introduction Relational Algebra 2. ER-Modeling 3. Relational model(ing) πlastname,number (σbalance > 1000 ( 4. Relational algebra σ (customer × account)) c.cust_id = a.cust_id 5. SQL SQL 6. Programming SELECT c.lastname, a.number FROM Customer c, Account a WHERE a.balance > 1000 AND a.cust_id = c.cust_id 12 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

What is this course about? Programming (if time permits)

• Accessing the database from an external program • For instance, from a JAVA program, we can communicate with a database via JDBC. ‣ JDBC Driver

‣ Open Connection

‣ Querying the database 1. Introduction 2. ER-Modeling ‣ Processing query results 3. Relational model(ing) ... 4. Relational algebra ResultSet r = 5. SQL statement.executeQuery(myQueryString); 6. Programming while (r.next()) { String lastname = r.getString(1); String accountNumber = r.getInt(2); Address a = getAddress(lastname, accountNumber); initCommercial(a);

} 13 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen Chapter 0 Overview

• Overview

• Administrativa

• A little bit of History

14 Foundations of Databases | Summer term 2010 Melanie Herschel | University of Tübingen

Administrativa Course schedule

Lectures When? Where? Monday, 10:15 - 11:45 Sand 6/7 kleiner Hörsaal Tuesday, 10:15 - 11:45 Sand 6/7 großer Hörsaal

Practicals When? Where? Thursday, 14:15 - 15:45 Sand 13, A104 • First assignment available on April 15, 2010.

• First assignment will be discussed April 22, 2010.

15 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen Administrativa Keep up to date

http://www-db.informatik.uni-tuebingen.de/teaching/ss10/db1 Please visit regularly, as the latest slides and news will be posted there.

http://twitter.com/DBatUTuebingen Stay tuned to the latest database group news.

https://cis.informatik.uni-tuebingen.de/db1-ss-10/ Register in order to view assignments and access your scores.

16 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Administrativa Evaluating your Performance

End-term exam • 90 mins. examination on Monday, July 12th, from 10:15 - 11:45. • No supplemental material is allowed. • Passing earns you 6 ECTS. Assignments and grading • We will distribute, collect, and grade weekly assignments. • Assignments will be available on our website or via CIS. • You have one week to complete each assignment. • You may - and you should - work in teams of two. • You should hand in your assignments in paper form at Manuel Mayr’s office. • Scoring 2/3 of the overall points in the assignments earns you an additional 2 ECTS.

• Your scores will be available via CIS only. 17 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen These slides...

Quizzies Definition

Examples Code snippets

18 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Read a book, write some SQL

Any introductory book is fine, two suggestions are •Raghu Ramakrishnan and Johannes Gehrke. Database Management Systems. McGraw-Hill. (Vorlesung ist stark an diesem Buch orientiert) •Alfons Kemper and André Eickler. Datenbanksysteme: Eine Einführung. Oldenbourg Verlag

Install IBM DB2 V9.5 Express-C •We will bring it with us for almost any lecture. Dowload at http://www-01.ibm.com/software/data/db2/express/

19 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen Further!"#$"%"&'#$"%($)% literature !"#$%"& '"(")"*+ ,"$-. !"

20 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen *"+#,&-().(//&0&1'&2($"/3(/4565$"."&7&0&89.."%&:;;<

Questions and Feedback

• Questions anytime! • During the lecture • In dedicated office hour: Monday, 15:00 - 16:00

• Email, phone • Feedback and suggestions are highly appreciated

• Slides • Information on the Website • ...

21 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen Before we get started...... Commercials

Studien- / Bachelor- / Diplom- / Masterarbeiten http://www-db.informatik.uni-tuebingen.de/teaching/studentische-arbeiten

22 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

Selected Fun Problems of the ACM Programming Contest

Proseminar SS 2010 www-db.informatik.uni-tuebingen.de Vorbesprechung: 15.04.2010, 09:30, B305b

23 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen Chapter 0 Overview

• Overview

• Administrativa

• A little bit of History

24 Foundations of Databases | Summer term 2010 Melanie Herschel | University of Tübingen

A little bit of History Data Collection and Storage Technology

: “punch card tabulating machine” •Companies (precursors of IBM) •Tabulating Machine Corp - 1896 •Computing-Tabulating-Recording Company (C-T-R) - 1911 •International Business Machines Corporation (IBM) - 1924

Credit: IBM Archive, History Museum, Mountain View, CA

Computer History Museum, Mountain View, CA 25 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen A little bit of History Hard-Drive Technology

• Magnetic Disc Drive: RAMAC 350 by IBM - 1955/56 •“Random Access Method of Accounting and Control” •Developed in San Jose, CA •Technical lead: Reynold B. Johnson (1906-1998) •Since then, HDD development has followed Moore’s Law •Cost per MB has decreased by half approx. every 18 months. •The areal density (bits/inch2) has doubled approx. every 18 months.

Credit: IBM Archive, Computer History Museum, Mountain View, CA 26 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

A little bit of History Timeline

Database systems based on the relational model Specialization to new types of data

Object-oriented databases

Database systems based on hierarchical model, Scaling database systems to network model the very large and the very small

1960 1970 1980 1990 2000

27 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen A little bit of History Database Systems in The Sixties

Early 1960s • Data is stored in files. •Application dependent organization of the data •Integrated Data Store (IDS) by General Electric Late 1960s •File management systems (SAM, ISAM) •Basic operations on the data are possible, e.g., sorting •Information Management System (IMS) by IBM ‣ Still in use on mainframes today, with more than 1 Billion $ revenue.

28 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

A little bit of History Relational Database Management Systems

1970s • Database systems emerge •Ted Codd (IBM): relational data model as conceptual basis for relational database systems - 1970. •System R (IBM): first prototype of a relational database management system - 1974 ‣ Roughly 80,000 lines of code (PL/1, PL/S, Assembler) ‣ SEQUEL as a query language ‣ First installation in 1977. •Ingres (University of Berkeley) - 1975 ‣ QUEL as a query language ‣ Precursor of Postgres, Sybase, ... •Oracel Version 2 - 1979

29 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen A little bit of History Relational Database Management Systems

Credit: Prof. Freytag, Ringvorlesung 2005 30 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen

A little bit of History Database Development in the 1980s and 1990s

• Systems become smaller and smaller. ‣DBMS run on smaller hardware. ‣DBMS become part of standard installations. •Systems process more and more data. ‣Gigabyte, Terabyte ‣Large and complex (Multimedia-) objects ‣Persistent storage using hard drives and tertiary storage (tapes, DVDs) ‣Distributed and parallel processing •Object-oriented database systems

31 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen A little bit of History Database Development in the new Millennium

• Support for new types of data ‣XML- and semi-structured data ‣Multimedia (Pictures, Audio, Video) •Federated databases: integrating data from heterogeneous sources (databases, files, web-sources) •Mobile databases ‣Managing data on handheld devices (PDA, mobile phone, ...) •Data Warehouses: From transaction processing to analytical processing •Information retrieval •Distributed processing •... 32 Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen