Data and Its Structure

• Data is actually stored as bits, but it is difficult to The Relational work with data at this level. • It is convenient to data at different levels of abstraction. Chapter 4 • Schema: Description of data at some abstraction level. Each level has its own schema. • We will be concerned with three schemas: physical, conceptual, and external.

1 2

Three Schemas in the Relational Data Model Physical Data Level • Physical schema describes details of how data is payroll billing records stored: tracks, cylinders, indices etc. of relations • Early applications worked at this level – View 1 View 2 View 3 External schemas explicitly dealt with details. • Problem: Routines were hard-coded to deal with Set of tables physical representation. – Changes to data structure difficult to make. Physical schema – Application code becomes complex since it must deal How data (tables) is stored? with details. – Rapid implementation of new features impossible. 3 4

Conceptual Data Level Conceptual Data Level (con’t) • Hides details. – In the , the conceptual schema presents Conceptual view data as a set of tables. of data • DBMS maps from conceptual to physical schema Application automatically. • Physical schema can be changed without changing application: Physical view of – DBMS would change mapping from conceptual to data physical transparently DBMS – This property is referred to as physical data independence

5 6

1 External Data Level External Data Level (con’t) • In the relational model, the external schema also presents data as a set of relations. • Application is written in terms of an external schema. • An external schema specifies a view of the data • A view is computed when accessed (not stored). in terms of the conceptual level. It is tailored to • Different external schemas can be provided to different the needs of a particular category of users. categories of users. – Portions of stored data should not be seen by some • Translation from external to conceptual done users. automatically by DBMS at run time. • Students should not see their files in full. • Conceptual schema can be changed without changing • Faculty should not see billing data. application: – that can be derived from stored data – Mapping from external to conceptual must be might be viewed as if it were stored. changed. • GPA not stored, but calculated when needed. • Referred to as conceptual data independence. 7 8

Data Model • Tools and language for describing: Relational Model – Conceptual and external schema (a schema: • A particular way of structuring data (using description of data at some level (e.g., tables, relations) attributes, constraints, domains) • (DDL) •Simple SQL – Integrity constraints, domains (DDL) • Mathematically based – Operations on data – Expressions (≡ queries) can be analyzed by DBMS • Data manipulation language (DML) – Queries are transformed to equivalent expressions – Optional: Directives that influence the physical automatically () schema (affects performance, not semantics) • Optimizers have limits (=> programmer needs to know • Storage definition language (SDL) how queries are evaluated and optimized)

9 10

Relation Instance Instance (Example) • Relation is a set of ordering immaterial Id Name Address Status – No duplicates – Cardinality of relation = number of tuples 11111111 John 123 Main freshman • All tuples in a relation have the same structure; constructed from the same set of attributes 12345678 Mary 456 Cedar sophmore – Attributes are named (ordering is immaterial) – Value of an attribute is drawn from the attribute’s domain 44433322 Art 77 So. 3rd senior • There is also a special value (value unknown or undefined), which belongs to no domain 87654321 Pat 88 No. 4th sophmore – Arity of relation = number of attributes

Student 11 Student 12

2 Relation Schema Relational

• Relation name • of relations • Attribute names & domains • Each relation consists of a schema and an • Integrity constraints like instance – The values of a particular attribute in all tuples • = set of relation schemas are unique constraints among relations (inter-relational – The values of a particular attribute in all tuples constraints) are greater than 0 • Database instance = set of (corresponding) • Default values relation instances

13 14

Database Schema (Example) Integrity Constraints

• Student (Id: INT, Name: STRING, Address: STRING, • Part of schema Status: STRING) • Restriction on state (or of sequence of states) of • Professor (Id: INT, Name: STRING, DeptId: DEPTS) data base • Course (DeptId: DEPTS, CrsName: STRING, • Enforced by DBMS CrsCode: COURSES) • Intra-relational - involve only one relation • Transcript (CrsCode: COURSES, StudId: INT, – Part of relation schema – e.g., all Ids are unique Grade: GRADES, Semester: SEMESTERS) • Inter-relational - involve several relations • Department(Department DeptId: DEPTS, Name: STRING) – Part of relation schema or database schema

15 16

Constraint Checking Kinds of Integrity Constraints

• Automatically checked by DBMS • Static – restricts legal states of database • Protects database from errors – Syntactic (structural) • Enforces enterprise rules • e.g., all values in a must be unique – Semantic (involve meaning of attributes) • e.g., cannot register for more than 18 credits • Dynamic – limitation on sequences of database states • e.g., cannot raise salary by more than 5%

17 18

3 Key Constraint •A key constraint is a sequence of attributes Key Constraint (cont’d) A1,…,An (n=1 possible) of a relation schema, S, with the following property: • - set of attributes containing key – A relation instance s of S satisfies the key constraint iff at most one in s can contain a particular set of –(Id, Name) is a superkey of Student values, a1,…,an, for the attributes A1,…,An • Every relation has a key – Minimality: no of A1,…,An is a key constraint • Key • Relation can have several keys: – Set of attributes mentioned in a key constraint – : Id in Student (can’t be null) •e.g., Id in Student,Student – : (Name, Address) in Student • e.g., (StudId, CrsCode, Semester) in Transcript –It is minimal: no subset of a key is a key •(Id, Name) is not a key of Student 19 20

Foreign Key Constraint Constraint • : Item named in one relation must refer to tuples that describe that item in another (Example) – Transcript (CrsCode) references Course(Course CrsCode ) – Professor(Professor DeptId) references Department(Department DeptId) A2 • Attribute A1 is a foreign key of R1 referring to attribute A2 in R2,R2 A1 v3 v1 v5 if whenever there is a value v of A1, there is a tuple of R2 in which A has value v, and A is a key of R2 v2 v1 2 2 v3 v6 – This is a special case of referential integrity: A must be a candidate key 2 v4 v2 of R2 (e.g., CrsCode is a key of Course in the above) R2 Course -- v7 – If no row exists in R2 => violation of referential integrity v3 v4 – Not all rows of R2 need to be referenced: relationship is not symmetric R1 R2 (e.g., some course might not be taught) Foreign key – Value of a foreign key might not be specified (DeptId column of some Candidate key professor might be null)

21 22

Foreign Key (cont’d) Foreign Key (cont’d) • Foreign key might consist of several columns

•Names of A1 and A2 need not be the same. –(CrsCode, Semester) of Transcript references – With tables: (CrsCode, Semester) of Teaching Teaching(Teaching CrsCode: COURSES, Sem: SEMESTERS, ProfId: INT) Professor(Professor Id: INT, Name: STRING, DeptId: DEPTS) • R1(R1 A1, …An) references R2(R2 B1, …Bn) ProfId attribute of Teaching references Id attribute of Professor – There exists a 1 - 1 correspondance between A1,…An • R1 and R2 need not be distinct. and B1,…Bn – Employee(Id:INT, MgrId:INT, ….) – Ai and Bi have same domains (although not • Employee(Employee MgrId) references Employee(Employee Id) necessarily the same names) – Every manager is also an employee and hence has a unique – B ,…,B is a candidate key of R2 row in Employee 1 n R2

23 24

4 Inclusion Dependency SQL • Referential integrity constraint that is not a foreign key constraint • Language for describing database schema • Teaching(Teaching CrsCode, Semester) references and operations on tables Transcript(Transcript CrsCode, Semester) • Data Definition Language (DDL): (no empty classes allowed) sublanguage of SQL for describing schema • Target attributes do not form a candidate key in Transcript (StudId missing) • No simple enforcement mechanism for inclusion dependencies in SQL (requires assertions -- later)

25 26

Table Declaration

Tables CREATE Student ( Id: INTEGER, • SQL entity that corresponds to a relation Name: CHAR(20), Address: CHAR(50), • An element of the database schema Status: CHAR(10) • SQL-92 is currently the most supported )

standard but is now superseded by Id Name Address Status SQL:1999 101222333 John 10 Cedar St Freshman • Database vendors generally deviate from 234567890 Mary 22 Main St Sophomore standard, but eventually converge

27 Student 28

Primary/Candidate Keys Null

CREATE TABLE Course ( • Problem: Not all information might be known when CrsCode:CHAR(6), row is inserted (e.g., Grade might be missing from CrsName: CHAR(20), Transcript)Transcript DeptId: CHAR(4), • A column might not be applicable for a particular Descr: CHAR(100), row (e.g., MaidenName if row describes a male) PRIMARY KEY (CrsCode), • Solution: Use place holder – null UNIQUE (DeptId, CrsName) -- candidate key – Not a value of any domain (although called null value) ) • Indicates the absence of a value – Not allowed in certain situations Things that start with 2 dashes are • Primary keys and columns constrained by NOT NULL comments 29 30

5 Default Value Semantic Constraints in SQL

-Value to be assigned if attribute value in a row • Primary key and foreign key are examples is not specified of structural (syntactic) constraints • Semantic constraints

CREATE TABLE Student ( – Express the logic of the application at hand: Id: INTEGER, • e.g., number of registered students ≤ maximum Name: CHAR(20) NOT NULL, enrollment Address: CHAR(50), Status: CHAR(10) DEFAULT ‘freshman’, PRIMARY KEY (Id) )

31 32

6