Natural Versus Surrogate Keys. Performance and Usability

Database Systems Journal vol. II, no. 2/2011 55 Natural versus Surrogate Keys. Performance and Usability Dragos-Paul POP Academy of Economic Studies, Bucharest, ROMANIA [email protected] Choosing the primary key for a table proves to be one of the most important steps in database design. But what happens when we have to pick between a natural or a surrogate key? Is there any performance issue that we must have in mind? Does the literature have a preferred pick? Is usability a concern? We’ll have a look at the advantages and disadvantages of both natural and surrogate keys and the performance and usability issues they address. Keywords: primary keys, natural keys, surrogate keys, superkey, candidate key, unique key, performance, usability. Introduction sometimes called a minimal superkey, 1 Choosing a primary key is really because a candidate key is, in fact, a important because it affects the database minimal set of attributes necessary to at the performance and usability levels. uniquely identify a tuple. In other words, a The literature speaks of both natural and set of attributes is said to be a candidate key surrogate keys and gives reasons for if there are no two tuples with the same choosing one over the other. value for the key and there is no other subset Before we get to talk about natural and of these attributes that can form a candidate surrogate keys in a relational key. This is where the “minimal” property of transactional table, we must define a few the candidate key derives from. A table can other key concepts used in the relational have multiple candidate keys. database model architecture. The Another concept related to the superkey is concepts to be defined are the superkey, the unique key. Again, just like a superkey the candidate key, the unique key and the and a candidate key, the unique key can primary key. uniquely identify each row in a table. 2 Key concepts Although this is not a rule, unique keys tend The superkey is defined as being a set of to only comprise a single column. The attributes of a given table that verifies difference between candidate keys and one main condition. The condition in unique keys is that, in practice, unique keys question is that there are no two distinct do not enforce the NOT NULL constraint. tuples in that table with an identical value This means they can contain the NULL for the superkey set. Also, the attributes value and still uniquely identify table rows. comprising the superkey set are said to be Why? Because of the way NULL is treated functionally dependent. This makes true by the database management systems. the following statement: if S is a NULL is not a value, but the absence of a superkey for the relation R, than the value, so the unique key concept holds true cardinality of the projection of R over S even for rows with NULL for the unique is the same as the cardinality of R. After key. This is because identification of two a table has gone through normalization, equal keys is done based on their values, and we can say that all its attributes form a since NULL is not a value, two keys superkey, because there are no two tuples containing NULL are not considered to be that are identical for all the values of the equal. This is by no means to be taken as a set. rule, because it differs in implementation From the superkey concept, we can across database management systems. A define the candidate key. This is better definition of a unique key is that two 56 Natural Versus Surrogate Keys. Performance and Usability tuples cannot have the same value for the record in a table. But this is where the unique key if NULL values are not used. similarity stops. They are keys that don’t So, a unique key only uniquely identifies have a natural relationship with the rest of rows that contain a value other than the columns in a table. The surrogate key is NULL for the key. As for the candidate just a value that is generated and then stored key, a table can have multiple unique with the rest of the columns in a record. The keys. key value is typically generated at run time The primary key is probably the most right before the record is inserted into a important concept in database design. A table. It is sometimes also referred to as a primary key is, basically, one of the dumb key, because there is no meaning candidate keys in a table. It is a unique associated with the value. Surrogate keys key that does not contain (and never will) are commonly a numeric number. NULL values can also be made a primary An important distinction between a key. For some tables, even a superkey surrogate and a primary key depends on can be a primary key (but that is a little whether the database is a current database or odd). So how and why is a primary key a temporal database. Since a current different form all the others? A table can database stores only currently valid data, only have one primary key and this key is there is a one-to-one correspondence the preferred way of identifying between a surrogate in the modeled world individual tuples. and the primary key of some object in the 3 Natural and surrogate keys database. In this case the surrogate may be Choosing the primary key has proven to used as a primary key, resulting in the term be the difficult part in database design. surrogate key. In a temporal database, This is because there are two types of however, there is a many-to-one relationship primary keys: natural and surrogate. between primary keys and the surrogate. The natural key, also called a domain key Since there may be several objects in the or an intelligent key, is a candidate key database corresponding to a single surrogate, that is logically related to the table. That we cannot use the surrogate as a primary is, it has business meaning, or business key; another attribute is required, in addition value. It is something that can be found in to the surrogate, to uniquely identify each nature, it makes sense. object. A natural key is a single column or set of Authors have argued that a surrogate should columns that uniquely identifies a single have the following characteristics: record in a table, where the key columns • the value is unique system‐wide, are made up of real data. When I say hence never reused “real data” I mean data that has meaning • the value is system generated and occurs naturally in the world of data. • the value is not modifiable by the user A natural key is a column value that has a or application relationship with the rest of the column • the value contains no semantic values in a given data record. Here are some examples of natural keys values: meaning Social Security Number, ISBN, and • the value is not visible to the user or TaxId. application On the other hand, the surrogate key is • the value is not composed of several not derived from real data; it does not values from different domains have any business meaning or logic. It is In practice, the surrogate key is frequently a a key most often generated by the number generated by the database database or made up using an algorithm. management system. For example, Oracle A surrogate key like a natural key is a uses sequences to accomplish this task, column that uniquely identifies a single while SQL server gives the “identity Database Systems Journal vol. II, no. 2/2011 57 column” option. PostgreSQL users have that are created from a hash of the ID the “serial” option, and MySQL ones use of your Ethernet card, or an an auto_increment attribute. Having the equivalent software representation, key independent of all other columns and the current datetime of your insulates the database relationships from computer system. The algorithm for changes in data values or database design doing this is defined by the Open (making the database more agile) and Software Foundation guarantees uniqueness. (www.opengroup.org). 4 Surrogate Key Implementation • Strategies Globally unique identifiers (GUIDs). There are several common options for GUIDs are a Microsoft standard that implementing surrogate keys: extend UUIDs, following the same • Key values assigned by the strategy if an Ethernet card exists and database. Most of the leading if not then they hash a software ID database vendors – companies and the current datetime to produce a such as Oracle, Microsoft and IBM value that is guaranteed unique to the – implement a surrogate key machine that creates it. strategy called incremental keys. • High‐low strategy. The basic idea is The basic idea is that they maintain that your key value, often called a a counter within the database persistent object identifier (POID) or server, writing the current value to simply an object identified (OID), is in a hidden system table to maintain two logical parts: A unique HIGH value consistency, which they use to that you obtain from a defined source assign a value to newly created and an N‐digit LOW value that your table rows. Every time a row is application assigns itself. Each time created the counter is incremented that a HIGH value is obtained the LOW and that value is assigned as the value will be set to zero. For example, key value for that row.

Natural Versus Surrogate Keys. Performance and Usability

Genesys Info Mart Physical Data Model for an Oracle Database

Create Table Identity Primary Key Sql Server

Alter Table Column Auto Increment Sql Server

Keys Are, As Their Name Suggests, a Key Part of a Relational Database

Rfgen Users Guide

CANDIDATE KEYS a Candidate Key of a Relation Schema R Is a Subset X

Attunity Compose 3.1 Release Notes - April 2017

There Are Only Four Types of Relational Keys Candidate Key As Stated

Data Vault Data Modeling Specification V 2.0.2 Focused on the Data Model Components

All Keys in Dbms with Example

3 Data Definition Language (DDL)

CSC 443 – Database Management Systems Data and Its Structure