Possible and Certain SQL Keys

Total Page:16

File Type:pdf, Size:1020Kb

Possible and Certain SQL Keys Possible and Certain SQL Keys Henning Kohler¨ Sebastian Link Xiaofang Zhou School of Engineering & Department of The University of Queensland Advanced Technology Computer Science Brisbane, Australia; and Massey University The University of Auckland Soochow University Palmerston Nth, New Zealand Auckland, New Zealand Suzhou, China [email protected] [email protected] [email protected] ABSTRACT tial for many other data models, including semantic models, Driven by the dominance of the relational model, the re- object models, XML and RDF. They are fundamental in quirements of modern applications, and the veracity of data, many classical areas of data management, including data we revisit the fundamental notion of a key in relational modeling, database design, indexing, transaction process- databases with NULLs. In SQL database systems primary ing, query optimization. Knowledge about keys enables us key columns are NOT NULL by default. NULL columns to i) uniquely reference entities across data repositories, ii) may occur in unique constraints which only guarantee unique- minimize data redundancy at schema design time to process ness for tuples which do not feature null markers in any of updates efficiently at run time, iii) provide better selectiv- the columns involved, and therefore serve a different func- ity estimates in cost-based query optimization, iv) provide a tion than primary keys. We investigate the notions of pos- query optimizer with new access paths that can lead to sub- sible and certain keys, which are keys that hold in some or stantial speedups in query processing, v) allow the database all possible worlds that can originate from an SQL table, re- administrator (DBA) to improve the efficiency of data ac- spectively. Possible keys coincide with the unique constraint cess via physical design techniques such as data partitioning of SQL, and thus provide a semantics for their syntactic def- or the creation of indexes and materialized views, vi) enable inition in the SQL standard. Certain keys extend primary access to the deep Web, and vii) provide new insights into keys to include NULL columns, and thus form a sufficient application data. Modern applications raise the importance and necessary condition to identify tuples uniquely, while of keys even further. They can facilitate the data integra- primary keys are only sufficient for that purpose. In ad- tion process and prune schema matches. Keys can further dition to basic characterization, axiomatization, and simple help with the detection of duplicates and anomalies, provide discovery approaches for possible and certain keys, we inves- guidance in repairing and cleaning data, and provide con- tigate the existence and construction of Armstrong tables, sistent answers to queries over dirty data. The discovery of and describe an indexing scheme for enforcing certain keys. keys from data is one of the core activities in data profiling. Our experiments show that certain keys with NULLs do oc- According to a Gartner forecast, the NoSQL market will cur in real-world databases, and that related computational be worth 3.5 billion US dollars annually by 2018, but by that problems can be solved efficiently. Certain keys are therefore time the market for relational database technology will be semantically well-founded and able to maintain data quality worth more than ten times that number, namely 40 billion in the form of Codd's entity integrity rule while handling the US dollars annually [12]. This underlines that \relational requirements of modern applications, that is, higher volumes databases are here to stay, and that there is no substitute for of incomplete data from different formats. important data" [12]. For these reasons relational databases must meet basic requirements of modern applications, in- clusive of big data. On the one hand, relational technology 1. INTRODUCTION must handle increases in the volume, variety and velocity Keys have always been a core enabler for data manage- of data. On the other hand, veracity and quality of this ment. They are fundamental for understanding the struc- data must still be enforced, to support data-driven decision ture and semantics of data. Given a collection of entities, a making. As a consequence, it is imperative that relational key is a set of attributes whose values uniquely identify an databases can acquire as much data as possible without sac- entity in the collection. For example, a key for a relational rificing reasonable levels of data quality. Nulls constitute table is a set of columns such that no two different rows have a convenient relational tool to accommodate high volumes matching values in each of the key columns. Keys are essen- of incomplete data from different formats. Unfortunately, nulls oppose the current golden standard for data quality in terms of keys, that is, Codd's rule of entity integrity. Entity This work is licensed under the Creative Commons Attribution• NonCommercial•NoDerivs 3.0 Unported License. To view a copy of this li• integrity is one of Codd's integrity rules which states that cense, visit http://creativecommons.org/licenses/by•nc•nd/3.0/. Obtain per• every table must have a primary key and that the columns mission prior to any use beyond those covered by the license. Contact which form the primary key must be unique and not null copyright holder by emailing [email protected]. Articles from this volume [11]. The goal of entity integrity is to ensure that every tu- were invited to present their results at the 41st International Conference on ple in the table can be identified efficiently. In SQL, entity Very Large Data Bases, August 31st • September 4th 2015, Kohala Coast, integrity is enforced by adding a primary key clause to a Hawaii. Proceedings of the VLDB Endowment, Vol. 8, No. 11 schema definition. The system enforces entity integrity by Copyright 2015 VLDB Endowment 2150•8097/15/07. 1118 not allowing operations, that is inserts and updates, that title author journal produce an invalid primary key. Operations that create a The uRNA database Zwieb C Nucleic Acids 1997 duplicate primary key or one containing nulls are rejected. The uRNA database Zwieb C Nucleic Acids 1996 Consequently, relational databases exhibit a trade-off be- Genome wide detect. Ragh. R Nucleic Acids 1997 tween their main mechanism to accommodate the require- ments of modern applications and their main mechanism to Table 3: Possible World w2 of snippet I in Table 1 guarantee entity integrity. We illustrate this trade-off by a powerful example. Con- sider the snapshot I of the RFAM (RNA families) data set This approach naturally suggests two semantics of keys in Table 1, available at http://rfam.sanger.ac.uk/. The over SQL tables. A possible key p hXi is satisfied by an SQL data set violates entity integrity as every potential primary table I if and only if there is some possible world of I in key over the schema is violated by I. In particular, column which the key X is satisfied. A certain key c hXi is satisfied journal carries the null marker ?. Nevertheless, every tuple by an SQL table I if and only if the key X is satisfied in in I can be uniquely identified by a combination of column every possible world of I. In particular, the semantics of pairs, such as (title, journal) or (author, journal). This is certain keys does not prevent the entry of incomplete tuples not possible with SQL unique constraints which cannot al- that can still be identified uniquely, independently of which ways uniquely identify tuples in which null markers occur data null marker occurrences represent. For example, the in the columns involved. We conclude that the syntactic snapshot I in Table 1 satisfies the possible key p hjournali requirements of primary keys are sufficient to meet the goal as witnessed by world w in Table 2, but violates the cer- of entity integrity. However, the requirements are not nec- 1 tain key c hjournali as witnessed by world w in Table 3. essary since they prohibit the entry of some data without 2 Moreover, I satisfies the certain keys c htitle; journali and good reason. The inability to insert important data into the c hauthor; journali. The example illustrates that primary database may force organizations to abandon key validation keys cannot enforce entity integrity on data sets that orig- altogether, exposing their future database to less data qual- inate from modern applications, while certain keys can. In ity, inefficiencies in data processing, waste of resources, and fact, primary keys are only sufficient to uniquely identify poor data-driven decision making. Finding a notion of keys tuples in SQL tables, while certain keys are also necessary. that is sufficient and necessary to meet the goal of entity Our observations provide strong motivation to investigate integrity will bring forward a new database technology that possible and certain keys in detail. In particular, our re- accommodates the requirements of modern applications bet- search provides strong evidence that certain keys can meet ter than primary keys do. the golden standard of entity integrity and meet the require- ments of modern applications. The contributions of our re- title author journal search can be summarized as follows: The uRNA database Zwieb C Nucleic Acids 1997 1. We propose a possible world semantics for keys over SQL The uRNA database Zwieb C Nucleic Acids 1996 tables. Possible keys hold in some possible world, while cer- ? Genome wide detect. Ragh. R tain keys hold in all possible worlds. Hence, certain keys identify tuples without having to declare all attributes NOT Table 1: Snippet I of the RFAM data set NULL. While primary keys provide a sufficient condition to identify tuples, the condition is not necessary. Certain keys These observations have motivated us to investigate keys provide a sufficient and necessary condition, thereby not pro- over SQL tables from a well-founded semantic point of view.
Recommended publications
  • Referential Integrity in Sqlite
    CS 564: Database Management Systems University of Wisconsin - Madison, Fall 2017 Referential Integrity in SQLite Declaring Referential Integrity (Foreign Key) Constraints Foreign key constraints are used to check referential integrity between tables in a database. Consider, for example, the following two tables: create table Residence ( nameVARCHARPRIMARY KEY, capacityINT ); create table Student ( idINTPRIMARY KEY, firstNameVARCHAR, lastNameVARCHAR, residenceVARCHAR ); We can enforce the constraint that a Student’s residence actually exists by making Student.residence a foreign key that refers to Residence.name. SQLite lets you specify this relationship in several different ways: create table Residence ( nameVARCHARPRIMARY KEY, capacityINT ); create table Student ( idINTPRIMARY KEY, firstNameVARCHAR, lastNameVARCHAR, residenceVARCHAR, FOREIGNKEY(residence) REFERENCES Residence(name) ); or create table Residence ( nameVARCHARPRIMARY KEY, capacityINT ); create table Student ( idINTPRIMARY KEY, firstNameVARCHAR, lastNameVARCHAR, residenceVARCHAR REFERENCES Residence(name) ); or create table Residence ( nameVARCHARPRIMARY KEY, 1 capacityINT ); create table Student ( idINTPRIMARY KEY, firstNameVARCHAR, lastNameVARCHAR, residenceVARCHAR REFERENCES Residence-- Implicitly references the primary key of the Residence table. ); All three forms are valid syntax for specifying the same constraint. Constraint Enforcement There are a number of important things about how referential integrity and foreign keys are handled in SQLite: • The attribute(s) referenced by a foreign key constraint (i.e. Residence.name in the example above) must be declared UNIQUE or as the PRIMARY KEY within their table, but this requirement is checked at run-time, not when constraints are declared. For example, if Residence.name had not been declared as the PRIMARY KEY of its table (or as UNIQUE), the FOREIGN KEY declarations above would still be permitted, but inserting into the Student table would always yield an error.
    [Show full text]
  • The Unconstrained Primary Key
    IBM Systems Lab Services and Training The Unconstrained Primary Key Dan Cruikshank www.ibm.com/systems/services/labservices © 2009 IBM Corporation In this presentation I build upon the concepts that were presented in my article “The Keys to the Kingdom”. I will discuss how primary and unique keys can be utilized for something other than just RI. In essence, it is about laying the foundation for data centric programming. I hope to convey that by establishing some basic rules the database developer can obtain reasonable performance. The title is an oxymoron, in essence a Primary Key is a constraint, but it is a constraint that gives the database developer more freedom to utilize an extremely powerful relational database management system, what we call DB2 for i. 1 IBM Systems Lab Services and Training Agenda Keys to the Kingdom Exploiting the Primary Key Pagination with ROW_NUMBER Column Ordering Summary 2 www.ibm.com/systems/services/labservices © 2009 IBM Corporation I will review the concepts I introduced in the article “The Keys to the Kingdom” published in the Centerfield. I think this was the inspiration for the picture. I offered a picture of me sitting on the throne, but that was rejected. I will follow this with a discussion on using the primary key as a means for creating peer or subset tables for the purpose of including or excluding rows in a result set. The ROW_NUMBER function is part of the OLAP support functions introduced in 5.4. Here I provide some examples of using ROW_NUMBER with the BETWEEN predicate in order paginate a result set.
    [Show full text]
  • Keys Are, As Their Name Suggests, a Key Part of a Relational Database
    The key is defined as the column or attribute of the database table. For example if a table has id, name and address as the column names then each one is known as the key for that table. We can also say that the table has 3 keys as id, name and address. The keys are also used to identify each record in the database table . Primary Key:- • Every database table should have one or more columns designated as the primary key . The value this key holds should be unique for each record in the database. For example, assume we have a table called Employees (SSN- social security No) that contains personnel information for every employee in our firm. We’ need to select an appropriate primary key that would uniquely identify each employee. Primary Key • The primary key must contain unique values, must never be null and uniquely identify each record in the table. • As an example, a student id might be a primary key in a student table, a department code in a table of all departments in an organisation. Unique Key • The UNIQUE constraint uniquely identifies each record in a database table. • Allows Null value. But only one Null value. • A table can have more than one UNIQUE Key Column[s] • A table can have multiple unique keys Differences between Primary Key and Unique Key: • Primary Key 1. A primary key cannot allow null (a primary key cannot be defined on columns that allow nulls). 2. Each table can have only one primary key. • Unique Key 1. A unique key can allow null (a unique key can be defined on columns that allow nulls.) 2.
    [Show full text]
  • Pizza Parlor Point-Of-Sales System CMPS 342 Database
    1 Pizza Parlor Point-Of-Sales System CMPS 342 Database Systems Chris Perry Ruben Castaneda 2 Table of Contents PHASE 1 1 Pizza Parlor: Point-Of-Sales Database........................................................................3 1.1 Description of Business......................................................................................3 1.2 Conceptual Database.........................................................................................4 2 Conceptual Database Design........................................................................................5 2.1 Entities................................................................................................................5 2.2 Relationships....................................................................................................13 2.3 Related Entities................................................................................................16 PHASE 2 3 ER-Model vs Relational Model..................................................................................17 3.1 Description.......................................................................................................17 3.2 Comparison......................................................................................................17 3.3 Conversion from E-R model to relational model.............................................17 3.4 Constraints........................................................................................................19 4 Relational Model..........................................................................................................19
    [Show full text]
  • Data Definition Language
    1 Structured Query Language SQL, or Structured Query Language is the most popular declarative language used to work with Relational Databases. Originally developed at IBM, it has been subsequently standard- ized by various standards bodies (ANSI, ISO), and extended by various corporations adding their own features (T-SQL, PL/SQL, etc.). There are two primary parts to SQL: The DDL and DML (& DCL). 2 DDL - Data Definition Language DDL is a standard subset of SQL that is used to define tables (database structure), and other metadata related things. The few basic commands include: CREATE DATABASE, CREATE TABLE, DROP TABLE, and ALTER TABLE. There are many other statements, but those are the ones most commonly used. 2.1 CREATE DATABASE Many database servers allow for the presence of many databases1. In order to create a database, a relatively standard command ‘CREATE DATABASE’ is used. The general format of the command is: CREATE DATABASE <database-name> ; The name can be pretty much anything; usually it shouldn’t have spaces (or those spaces have to be properly escaped). Some databases allow hyphens, and/or underscores in the name. The name is usually limited in size (some databases limit the name to 8 characters, others to 32—in other words, it depends on what database you use). 2.2 DROP DATABASE Just like there is a ‘create database’ there is also a ‘drop database’, which simply removes the database. Note that it doesn’t ask you for confirmation, and once you remove a database, it is gone forever2. DROP DATABASE <database-name> ; 2.3 CREATE TABLE Probably the most common DDL statement is ‘CREATE TABLE’.
    [Show full text]
  • 3 Data Definition Language (DDL)
    Database Foundations 6-3 Data Definition Language (DDL) Copyright © 2015, Oracle and/or its affiliates. All rights reserved. Roadmap You are here Data Transaction Introduction to Structured Data Definition Manipulation Control Oracle Query Language Language Language (TCL) Application Language (DDL) (DML) Express (SQL) Restricting Sorting Data Joining Tables Retrieving Data Using Using ORDER Using JOIN Data Using WHERE BY SELECT DFo 6-3 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 3 Data Definition Language (DDL) Objectives This lesson covers the following objectives: • Identify the steps needed to create database tables • Describe the purpose of the data definition language (DDL) • List the DDL operations needed to build and maintain a database's tables DFo 6-3 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 4 Data Definition Language (DDL) Database Objects Object Description Table Is the basic unit of storage; consists of rows View Logically represents subsets of data from one or more tables Sequence Generates numeric values Index Improves the performance of some queries Synonym Gives an alternative name to an object DFo 6-3 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 5 Data Definition Language (DDL) Naming Rules for Tables and Columns Table names and column names must: • Begin with a letter • Be 1–30 characters long • Contain only A–Z, a–z, 0–9, _, $, and # • Not duplicate the name of another object owned by the same user • Not be an Oracle server–reserved word DFo 6-3 Copyright © 2015, Oracle and/or its affiliates. All rights reserved. 6 Data Definition Language (DDL) CREATE TABLE Statement • To issue a CREATE TABLE statement, you must have: – The CREATE TABLE privilege – A storage area CREATE TABLE [schema.]table (column datatype [DEFAULT expr][, ...]); • Specify in the statement: – Table name – Column name, column data type, column size – Integrity constraints (optional) – Default values (optional) DFo 6-3 Copyright © 2015, Oracle and/or its affiliates.
    [Show full text]
  • Chapter 9 – Designing the Database
    Systems Analysis and Design in a Changing World, seventh edition 9-1 Chapter 9 – Designing the Database Table of Contents Chapter Overview Learning Objectives Notes on Opening Case and EOC Cases Instructor's Notes (for each section) ◦ Key Terms ◦ Lecture notes ◦ Quick quizzes Classroom Activities Troubleshooting Tips Discussion Questions Chapter Overview Database management systems provide designers, programmers, and end users with sophisticated capabilities to store, retrieve, and manage data. Sharing and managing the vast amounts of data needed by a modern organization would not be possible without a database management system. In Chapter 4, students learned to construct conceptual data models and to develop entity-relationship diagrams (ERDs) for traditional analysis and domain model class diagrams for object-oriented (OO) analysis. To implement an information system, developers must transform a conceptual data model into a more detailed database model and implement that model in a database management system. In the first sections of this chapter students learn about relational database management systems, and how to convert a data model into a relational database schema. The database sections conclude with a discussion of database architectural issues such as single server databases versus distributed databases which are deployed across multiple servers and multiple sites. Many system interfaces are electronic transmissions or paper outputs to external agents. Therefore, system developers need to design and implement integrity controls and security controls to protect the system and its data. This chapter discusses techniques to provide the integrity controls to reduce errors, fraud, and misuse of system components. The last section of the chapter discusses security controls and explains the basic concepts of data protection, digital certificates, and secure transactions.
    [Show full text]
  • Sqlite/Sqlite Primary Key.Htm Copyright © Tutorialspoint.Com
    SSQQLL -- PPRRIIMMAARRYY KKEEYY http://www.tutorialspoint.com/sqlite/sqlite_primary_key.htm Copyright © tutorialspoint.com A primary key is a field in a table which uniquely identifies the each rows/records in a database table. Primary keys must contain unique values. A primary key column cannot have NULL values. A table can have only one primary key, which may consist of single or multiple fields. When multiple fields are used as a primary key, they are called a composite key. If a table has a primary key defined on any fields, then you can not have two records having the same value of that fields. Note: You would use these concepts while creating database tables. Create Primary Key: Here is the syntax to define ID attribute as a primary key in a COMPANY table. CREATE TABLE COMPANY( ID INT PRIMARY KEY , NAME TEXT NOT NULL, AGE INT NOT NULL UNIQUE, ADDRESS CHAR (25) , SALARY REAL , ); To create a PRIMARY KEY constraint on the "ID" column when COMPANY table already exists, use the following SQLite syntax: ALTER TABLE COMPANY ADD PRIMARY KEY (ID); For defining a PRIMARY KEY constraint on multiple columns, use the following SQLite syntax: CREATE TABLE COMPANY( ID INT PRIMARY KEY , NAME TEXT NOT NULL, AGE INT NOT NULL UNIQUE, ADDRESS CHAR (25) , SALARY REAL , ); To create a PRIMARY KEY constraint on the "ID" and "NAMES" columns when COMPANY table already exists, use the following SQLite syntax: ALTER TABLE COMPANY ADD CONSTRAINT PK_CUSTID PRIMARY KEY (ID, NAME); Delete Primary Key: You can clear the primary key constraints from the table, Use Syntax: ALTER TABLE COMPANY DROP PRIMARY KEY ; Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js.
    [Show full text]
  • Data Definition Language (Ddl)
    DATA DEFINITION LANGUAGE (DDL) CREATE CREATE SCHEMA AUTHORISATION Authentication: process the DBMS uses to verify that only registered users access the database - If using an enterprise RDBMS, you must be authenticated by the RDBMS - To be authenticated, you must log on to the RDBMS using an ID and password created by the database administrator - Every user ID is associated with a database schema Schema: a logical group of database objects that are related to each other - A schema belongs to a single user or application - A single database can hold multiple schemas that belong to different users or applications - Enforce a level of security by allowing each user to only see the tables that belong to them Syntax: CREATE SCHEMA AUTHORIZATION {creator}; - Command must be issued by the user who owns the schema o Eg. If you log on as JONES, you can only use CREATE SCHEMA AUTHORIZATION JONES; CREATE TABLE Syntax: CREATE TABLE table_name ( column1 data type [constraint], column2 data type [constraint], PRIMARY KEY(column1, column2), FOREIGN KEY(column2) REFERENCES table_name2; ); CREATE TABLE AS You can create a new table based on selected columns and rows of an existing table. The new table will copy the attribute names, data characteristics and rows of the original table. Example of creating a new table from components of another table: CREATE TABLE project AS SELECT emp_proj_code AS proj_code emp_proj_name AS proj_name emp_proj_desc AS proj_description emp_proj_date AS proj_start_date emp_proj_man AS proj_manager FROM employee; 3 CONSTRAINTS There are 2 types of constraints: - Column constraint – created with the column definition o Applies to a single column o Syntactically clearer and more meaningful o Can be expressed as a table constraint - Table constraint – created when you use the CONTRAINT keyword o Can apply to multiple columns in a table o Can be given a meaningful name and therefore modified by referencing its name o Cannot be expressed as a column constraint NOT NULL This constraint can only be a column constraint and cannot be named.
    [Show full text]
  • Functional Dependency and Normalization for Relational Databases
    Functional Dependency and Normalization for Relational Databases Introduction: Relational database design ultimately produces a set of relations. The implicit goals of the design activity are: information preservation and minimum redundancy. Informal Design Guidelines for Relation Schemas Four informal guidelines that may be used as measures to determine the quality of relation schema design: Making sure that the semantics of the attributes is clear in the schema Reducing the redundant information in tuples Reducing the NULL values in tuples Disallowing the possibility of generating spurious tuples Imparting Clear Semantics to Attributes in Relations The semantics of a relation refers to its meaning resulting from the interpretation of attribute values in a tuple. The relational schema design should have a clear meaning. Guideline 1 1. Design a relation schema so that it is easy to explain. 2. Do not combine attributes from multiple entity types and relationship types into a single relation. Redundant Information in Tuples and Update Anomalies One goal of schema design is to minimize the storage space used by the base relations (and hence the corresponding files). Grouping attributes into relation schemas has a significant effect on storage space Storing natural joins of base relations leads to an additional problem referred to as update anomalies. These are: insertion anomalies, deletion anomalies, and modification anomalies. Insertion Anomalies happen: when insertion of a new tuple is not done properly and will therefore can make the database become inconsistent. When the insertion of a new tuple introduces a NULL value (for example a department in which no employee works as of yet).
    [Show full text]
  • CA RC/Query for DB2 for Z/OS User Guide
    CA RC/Query® for DB2 for z/OS User Guide Version 17.0.00, Third Edition This Documentation, which includes embedded help systems and electronically distributed materials, (hereinafter referred to as the “Documentation”) is for your informational purposes only and is subject to change or withdrawal by CA at any time. This Documentation is proprietary information of CA and may not be copied, transferred, reproduced, disclosed, modified or duplicated, in whole or in part, without the prior written consent of CA. If you are a licensed user of the software product(s) addressed in the Documentation, you may print or otherwise make available a reasonable number of copies of the Documentation for internal use by you and your employees in connection with that software, provided that all CA copyright notices and legends are affixed to each reproduced copy. The right to print or otherwise make available copies of the Documentation is limited to the period during which the applicable license for such software remains in full force and effect. Should the license terminate for any reason, it is your responsibility to certify in writing to CA that all copies and partial copies of the Documentation have been returned to CA or destroyed. TO THE EXTENT PERMITTED BY APPLICABLE LAW, CA PROVIDES THIS DOCUMENTATION “AS IS” WITHOUT WARRANTY OF ANY KIND, INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT. IN NO EVENT WILL CA BE LIABLE TO YOU OR ANY THIRD PARTY FOR ANY LOSS OR DAMAGE, DIRECT OR INDIRECT, FROM THE USE OF THIS DOCUMENTATION, INCLUDING WITHOUT LIMITATION, LOST PROFITS, LOST INVESTMENT, BUSINESS INTERRUPTION, GOODWILL, OR LOST DATA, EVEN IF CA IS EXPRESSLY ADVISED IN ADVANCE OF THE POSSIBILITY OF SUCH LOSS OR DAMAGE.
    [Show full text]
  • Discussion #3 SQL Primary and Foreign
    DS 100/200: Principles and Techniques of Data Science Date: Feb 7, 2020 Discussion #3 Name: SQL Primary and Foreign Key Definitions: • Primary key: The column or minimal set of columns that uniquely determines the values in all the remaining columns. This is a statement about the schema and should hold for all data that could be put in the table. Below are some constraints on the primary key: { The data within these columns must be unique. { No value in the columns can be NULL. • Foreign key: A set of one or more columns in a table that refers to the primary key in another table. Foreign keys have the following properties: { We can have NULL values in foreign keys. { We can have non-unique foreign keys in a table. { The foreign key is not null it should reference a particular primary key in another table. 1. Examples? What might be good examples of common primary keys and how might they be referenced as foreign keys. 2. Consider the following sample of the baby names table. What is the primary key? 1 Discussion #3 2 SQL Practice 3. For this question, we will be working with the UC Berkeley Undergraduate Career Survey dataset. Each year, the UC Berkeley career center surveys graduating seniors for their plans after graduating. Below is a sample of the full dataset. The full dataset contains many thousands of rows. j name c name c location m name Llama Technician Google MOUNTAIN VIEW EECS Software Engineer Salesforce SF EECS Open Source Maintainer Github SF Computer Science Big Data Engineer Microsoft REDMOND Data Science Data Analyst Startup BERKELEY Data Science Analyst Intern Google SF Philosophy Table 1: survey Table Each record of the survey table is an entry corresponding to a student.
    [Show full text]