Session: G12 Null and Void? Dealing with Nulls in DB2

Craig S. Mullins President & Principal Consultant Mullins Consulting, Inc. http://www.CraigSMullins.com

Thursday, May 11, 2006 • 08:30 a.m. – 09:40 a.m. Platform: DB2 for z/OS

1 Agenda • Definition • Some History • Types of Nulls • Inapplicable versus Applicable Data • Nulls and Keys • Distinguished Nulls • Using Nulls in DB2 • Problems with Nulls • Guidance and Advice

Mullins Consulting, Inc. http://www.CraigSMullins.com 2 © 2006, Mullins Consulting, Inc.

2 What is a NULL?

• NULL represents the absence of a value. • It is not the same as zero or an empty string. • A null is not a “null value” – there is no value. • Maybe a “Null Lack of Value”

Mullins Consulting, Inc. http://www.CraigSMullins.com 3 © 2006, Mullins Consulting, Inc.

3 What is the Difference, You Ask?

• Consider the following columns: • TERMINATION_DATE – null or a valid date? • SALARY – null or zero? • SSN – non-US resident? • ADDRESS – different composition by country, so let some components be null? • HAIR_COLOR – what about bald men?

Mullins Consulting, Inc. http://www.CraigSMullins.com 4 © 2006, Mullins Consulting, Inc.

When are nulls useful? Well, defining a column as NULL provides a place holder for data you might not yet know. For example, when a new employee is hired and is inserted into the EMP table, what should the employee termination date column be set to? I don’t know about you, but I wouldn’t want any valid date to be set in that column for my employee record. Instead, null can be used to specify that the termination date is currently unknown.

Let’s consider another example. Suppose that we also capture employee’s hair color when they are hired. Consider three potential entity occurrences: a man with black hair, a woman with unknown hair color, and a bald man. The woman with the unknown hair color and the bald man both could be assigned as null, but for different reasons. The woman’s hair color would be null meaning presently unknown; the bald man’s hair color could be null too, in this case meaning not applicable.

How could you handle this without using nulls? You would need to create special values for the HairColor column that mean “bald” and “unknown.” This is possible for a CHAR column like HairColor. But what about a DB2 DATE column? All occurrences of a column assigned as a DATE data type are valid dates. It might not be possible to use a special date value to mean “unknown.” This is where using nulls can be practical.

4 Ted Codd on Nulls

• Dr. E.F. Codd built the notion of nulls into the relational model, but not in the original 1970 paper • The extensions to RM/V1 to support nulls weren't defined until the 1979 paper Extending the Database Relational Model to Capture More Meaning, ACM TODS 4, No. 4 (December 1979). • He later defined two types of nulls in RM/V2 • These were called marks • A-mark: applicable, but unknown • I-mark: inapplicable (known to be inapplicable)

Mullins Consulting, Inc. http://www.CraigSMullins.com 5 © 2006, Mullins Consulting, Inc.

A null allows a column to be specifically empty. A null is distinct from an empty string or a number with a value of zero. Of course, a null cannot apply to primary keys, which must contain values. Most database implementations support the concept of a non-null field constraint that prevents nulls in a specific table column… as does DB2.

5 Inapplicable versus Applicable

• Inapplicable: • These are perhaps better dealt with by creating better data models • Social Security Number for a European • Consulting fee for a full-time employee • Applicable, but unknown: • These are the ones that are “true” nulls • Future dates not yet known • Information not supplied • Not everyone has a middle name

Mullins Consulting, Inc. http://www.CraigSMullins.com 6 © 2006, Mullins Consulting, Inc.

6 Modeling Fixes Inapplicable Columns

Mullins Consulting, Inc. http://www.CraigSMullins.com 7 © 2006, Mullins Consulting, Inc.

Super-type – sub-type modeling can be a better approach to inapplicable nulls. Consider the approach shown in the slide. Wouldn’t it be better to have only those columns which are always applicable in each entity/table? For example, the alternate here would be to have just the Supplier entity, but it would have a nullable Postal Code and a nullable Zip Code. For domestic suppliers postal code would be null (inapplicable); for international suppliers zip code would be blank (inapplicable).

By creating the sub-types of Supplier we can model our way out of inapplicable nulls.

7 Nulls and Keys

• Primary Key • A primary key cannot be null • Foreign Key(s) • Foreign keys can be null • ON DELETE SET NULL – when the PK that refers to the FK row is deleted, the FK is set to null

Mullins Consulting, Inc. http://www.CraigSMullins.com 8 © 2006, Mullins Consulting, Inc.

8 Distinguished Nulls: Not all Nulls are Equal • Sometimes, based on domains and knowledge of the data a null may not really be a null. • Consider the COLOR column • With a domain of BLACK, WHITE, and RED • With a UNIQUE constraint on it • Now, say we have three rows in the table, one of which has “BLACK” for the COLOR and the other two are null. • Are those nulls really completely null? We know something about them. • What if one of the nulls is changed to WHITE? That last null is not really null, is it? • But DB2 does not automatically change it to RED… • Can you (or your program)?

Mullins Consulting, Inc. http://www.CraigSMullins.com 9 © 2006, Mullins Consulting, Inc.

Create a domain called Tricolor that is limited to the values 'Red', 'White', and 'Black' and a column in a table drawn from that domain with a UNIQUE constraint on it. If my table has a 'Red' and two NULL values in that column, I have some information about the two NULLs. I know they will be either ('White', 'Black') or ('Black', 'White') when their rows are resolved. This is what Chris Date calls a "distinguished NULL", which means we have some information in it.

If my table has a 'Red', a 'White', and a NULL value in that column, can I change the last NULL to 'Black' because it can only be 'Black' under the rule? Or do I have to wait until I see an actual value for that row? There is no clear way to handle this in SQL. Multiple values cannot be put in a column, nor can the database automatically change values as part of the column declaration.

9 A Lot of Arguments

• Some relational experts have argued against nulls (e.g. Date, Darwen, Pascal) • There are many reasons; many of them centered around the “quirks” of how nulls are implemented and the difficulty of “three-valued” logic. • Also, nulls violate “relational” – at least according to some • A null is not a value, and therefore cannot exist in a mathematical relation

Mullins Consulting, Inc. http://www.CraigSMullins.com 10 © 2006, Mullins Consulting, Inc.

10 What About DB2?

• Well, DB2 is not a “pure” relational DBMS • It supports and uses nulls

• Let’s take a look at how DB2 uses nulls and how you use DB2 to specify nulls and manipulate nulls in your data • …as well as some of the “oddities” this can create.

Mullins Consulting, Inc. http://www.CraigSMullins.com 11 © 2006, Mullins Consulting, Inc.

11 Nulls in DB2

• All columns are nullable unless defined with: • NOT NULL or NOT NULL WITH DEFAULT • DB2 uses a null-indicator to set a column to null • One byte • Zero or positive value means not null • Negative value means null • Not stored in column, but associated with column

Mullins Consulting, Inc. http://www.CraigSMullins.com 12 © 2006, Mullins Consulting, Inc.

All data types include the null “value”. Distinct from all non-null values, a null is a special “value” that denotes the absence of a value. Although all data types include the null, some sources of values cannot provide for nulls. For example, constants, columns that are defined as NOT NULL, and special registers cannot contain nulls; the COUNT and COUNT_BIG functions cannot return a null value; and ROWID columns cannot store a null although a null can be returned for a ROWID column as the result of a query.

12 Nulls in DB2 (continued)

• Nulls do NOT save space – ever! • Nulls are not variable columns but… • Can be used with variable columns to (perhaps) save space • Programs have to be coded to explicitly deal with the possibility of null • (example on next slide)

Mullins Consulting, Inc. http://www.CraigSMullins.com 13 © 2006, Mullins Consulting, Inc.

Let’s take a moment to clear up a common misunderstanding right here: nulls NEVER save storage space in DB2 for OS/390 and z/OS. Every nullable column requires one additional byte of storage for the null indicator. So, a CHAR(10) column that is nullable will require 11 bytes of storage per row – 10 for the data and 1 for the null indicator. This is the case regardless of whether the column is set to null or not.

In some DBMS products a null column is treated as variable length. If the column is set to null it will not take up storage space (in other words, the row becomes variable length). This is NOT the case with DB2.

13 Using Null Indicator Variables

You can use null indicator variables with corresponding host variables in the following situations: • SET clause of the UPDATE statement • VALUES clause of the INSERT statement • INTO clause of the SELECT or FETCH statement

Mullins Consulting, Inc. http://www.CraigSMullins.com 14 © 2006, Mullins Consulting, Inc.

DB2 represents null in a special “hidden” column known as an indicator variable. An indicator variable is defined to DB2 for each column that can accept nulls. The indicator variable is transparent to an end user, but must be provided for when programming in a host language (such as COBOL or PL/I). The null indicator is used by DB2 to track whether its associated column is null or not. A positive value or a value of 0 means the column is not null and any actual value stored in the column is valid. If a CHAR column is truncated on retrieval because the host variable is not large enough, the indicator value will contain the original length of the truncated column. A negative value indicates that the column is set to null. If the value is -2 then the column was set to null as the result of a data conversion error.

14 Using a Null Indicator in SQL

EXEC SQL SELECT EMPNO, SALARY INTO :EMPNO, :SALARY:SALARY-IND FROM EMP WHERE EMPNO = '000100' END-EXEC.

Mullins Consulting, Inc. http://www.CraigSMullins.com 15 © 2006, Mullins Consulting, Inc.

15 Defining Null Indicators

01 EMP-INDICATORS. 10 WORKDEPT-IND PIC S9(4) USAGE COMP. 10 PHONENO-IND PIC S9(4) USAGE COMP. 10 HIREDATE-IND PIC S9(4) USAGE COMP. 10 JOB-IND PIC S9(4) USAGE COMP. 10 EDLEVEL-IND PIC S9(4) USAGE COMP. 10 SEX-IND PIC S9(4) USAGE COMP. 10 BIRTHDATE-IND PIC S9(4) USAGE COMP. 10 SALARY-IND PIC S9(4) USAGE COMP. 10 BONUS-IND PIC S9(4) USAGE COMP. 10 COMM-IND PIC S9(4) USAGE COMP.

Mullins Consulting, Inc. http://www.CraigSMullins.com 16 © 2006, Mullins Consulting, Inc.

Null indicators should be defined in the WORKING-STORAGE section of your COBOL program as computational variables, with a picture clause specification of PIC S9(4). The null indicator variables for the DSN8810.EMP table look like this: 01 EMP-INDICATORS. 10 WORKDEPT-IND PIC S9(4) USAGE COMP. 10 PHONENO-IND PIC S9(4) USAGE COMP. 10 HIREDATE-IND PIC S9(4) USAGE COMP. 10 JOB-IND PIC S9(4) USAGE COMP. 10 EDLEVEL-IND PIC S9(4) USAGE COMP. 10 SEX-IND PIC S9(4) USAGE COMP. 10 BIRTHDATE-IND PIC S9(4) USAGE COMP. 10 SALARY-IND PIC S9(4) USAGE COMP. 10 BONUS-IND PIC S9(4) USAGE COMP. 10 COMM-IND PIC S9(4) USAGE COMP.

This structure contains the null indicators for all the nullable columns of the EMP table.

16 Defining a Null Indicator Structure

• Null indicator structures can be coded instead of individual null indicators. A null indicator structure is defined as a null indicator variable with an OCCURS clause. • Null indicator structures enable host structures to be used when nullable columns are selected. The variable should occur once for each column in the corresponding host structure, as shown in the following section of code:

01 IDEPT PIC S9(4) USAGE COMP OCCURS 5 TIMES.

Mullins Consulting, Inc. http://www.CraigSMullins.com 17 © 2006, Mullins Consulting, Inc.

17 Using a Null Indicator Structure

EXEC SQL SELECT DEPTNO, DEPTNAME, MGRNO, ADMRDEPT, LOCATION FROM DEPT INTO :DCLDEPT:DEPT-IND WHERE DEPTNO = 'A00' END-EXEC.

Mullins Consulting, Inc. http://www.CraigSMullins.com 18 © 2006, Mullins Consulting, Inc.

Null indicator structures can be coded in much the same way as the host structures discussed previously. Null indicator structures enable host structures to be used when nullable columns are selected. A null indicator structure is defined as a null indicator variable with an OCCURS clause. The variable should occur once for each column in the corresponding host structure, as shown in the following section of code: 01 IDEPT PIC S9(4) USAGE COMP OCCURS 5 TIMES. This null indicator structure defines the null indicators needed for retrieving rows from the DSN8810.DEPT table using a host structure. There are five columns in the DCLDEPT host structure, so the IDEPT null indicator structure occurs five times. When using a host structure for a table where any column is nullable, one null indicator per column in the host structure is required. These host structure and null indicator structure can be used together as follows: EXEC SQL SELECT DEPTNO, DEPTNAME, MGRNO, ADMRDEPT, LOCATION FROM DEPT INTO :DCLDEPT:DEPT-IND WHERE DEPTNO = 'A00' END-EXEC.

18 And What If…

• …You do not specify a null indicator or structure for a nullable column? • SQLCODE -305 • SQLSTATE 22002 THE NULL VALUE CANNOT BE ASSIGNED TO OUTPUT HOST VARIABLE NUMBER position-number BECAUSE NO INDICATOR VARIABLE IS SPECIFIED

Mullins Consulting, Inc. http://www.CraigSMullins.com 19 © 2006, Mullins Consulting, Inc.

Use a null indicator variable when referencing a nullable column. Failure to do so results in a —305 SQLCODE.

19 SQL Coding for Nulls

• Predicates not the same when checking for NULL • Remember, a null is a lack of value, so the normal operators are not applicable • That means no >, <, =, <>, etc. when checking for null • Instead use: IS [NOT] NULL

Mullins Consulting, Inc. http://www.CraigSMullins.com 20 © 2006, Mullins Consulting, Inc.

20 This is Correct SELECT EMPNO, WORKDEPT, SALARY FROM EMP WHERE SALARY IS NULL; This is Incorrect SELECT EMPNO, WORKDEPT, SALARY FROM EMP WHERE SALARY = NULL;

Mullins Consulting, Inc. http://www.CraigSMullins.com 21 © 2006, Mullins Consulting, Inc.

21 But, This is Correct, Too

• In UPDATE statements you can set a (nullable) column to be null using = • For example, this statement sets the MGRNO to null for every department managed by manager number ‘000010’:

UPDATE DSN8810.DEPT SET MGRNO = NULL WHERE MGRNO = '000010';

Mullins Consulting, Inc. http://www.CraigSMullins.com 22 © 2006, Mullins Consulting, Inc.

22 Converting Nulls

• Sometimes you will want to convert a null to a “normal” value to better be able to deal with it in the context of your application. • DB2 offers three functions to help you manage and/or convert nulls in your SQL: • NULLIF • COALESCE • IFNULL

Mullins Consulting, Inc. http://www.CraigSMullins.com 23 © 2006, Mullins Consulting, Inc.

Since host languages do not support NULLs, you will either need to use null indicators to support nulls, or replace the null with a valid value that the host languange understands.

There are three options to replace nulls with expressions: NULLIF, COALESCE, and IFNULL.

Do not confuse NULLIF and IFNULL (good luck, there)! We’ll explain the differences in upcoming slides.

23 NULLIF

• The NULLIF function returns null if the two arguments are equal; otherwise, it returns the value of the first argument. • Assume that host variables PROFIT, EXPENSES, and INCOME have decimal data types with the values of 2500.00, 1500.00, and 4000.00 respectively. • The following function will then return a null: NULLIF (:INCOME - :EXPENSES , :PROFIT)

Mullins Consulting, Inc. http://www.CraigSMullins.com 24 © 2006, Mullins Consulting, Inc.

The NULLIF function supports two arguments and it returns null if the two arguments are equal; otherwise, it returns the value of the first argument. The two arguments must be compatible and comparable.

NULLIF(V1, V2) is equivalent to the following CASE expression: NULLIF(V1, V2) := CASE WHEN (V1 = V2) THEN NULL ELSE V1 END;

Neither argument can be a BLOB, CLOB, DBCLOB, or distinct type. Character- string arguments are compatible and comparable with datetime values. The attributes of the result are the attributes of the first argument. For example, if the result of the first argument is a character string, the result of the other must also be a character string; if the result of the first argument is a number, the result of the other must also be a number. The result of using NULLIF(e1,e2) is the same as using the CASE expression: CASE WHEN e1=e2 THEN NULL ELSE e1 END When e1=e2 evaluates to unknown because one or both arguments is null, CASE expressions consider the evaluation not true. In this case, NULLIF returns the value of the first argument.

24 NULLIF Gotcha?

• Consider the following function spec:

NULLIF (x1, x2) • But think about these situations: • x1 , x2 <372> (or any other value) • x1 , x2 • x1 <0>, x2 (a – a)

• What is the result of the function for these?

Mullins Consulting, Inc. http://www.CraigSMullins.com 25 © 2006, Mullins Consulting, Inc.

The first situation returns NULL as a result even though the two arguments do not match. This is so because the first argument is NULL, so it is returned because the two do not match (the match is unknown, therefore not true, because one of the arguments is NULL)

The second situation returns NULL, too. And it is not because the arguments match, it is because they do not match (unknown), so the first argument is returned.

The third situation is a little tricky, too. You would think that a minus a is always zero so the two should match and return a null. But if a is null, then null minus null is null, and null does not match zero. So a null is returned. All other scenarios for a cause 0 to be returned, though.

25 COALESCE

• The COALESCE function returns the value of the first non-null expression. • Here is an example from the sample database that comes with DB2. • The HIREDATE column of DSN8810.EMP is nullable. • The following query selects all employee rows for which HIREDATE is either unknown or earlier than Jan 1, 1960.

SELECT * FROM DSN8810.EMP WHERE COALESCE(HIREDATE,DATE(’1959-12-31’)) < ’1960-01-01’;

Mullins Consulting, Inc. http://www.CraigSMullins.com 26 © 2006, Mullins Consulting, Inc.

The COALESCE function returns the value of the first non-null expression. The arguments must be compatible.

The arguments can be of either a built-in or user-defined data type. The arguments are evaluated in the order in which they are specified, and the result of the function is the first argument that is not null. The result can be null only if all arguments can be null. The result is null only if all arguments are null. The selected argument is converted, if necessary, to the attributes of the result.

The COALESCE function can also handle a subset of the functions provided by CASE expressions. The result of using COALESCE(e1,e2) is the same as using the expression: CASE WHEN e1 IS NOT NULL THEN e1 ELSE e2 END VALUE can be specified as a synonym for COALESCE.

Another example: Assume that SCORE1 and SCORE2 are SMALLINT columns in table GRADES, and that nulls are allowed in SCORE1 but not in SCORE2. Select all the rows in GRADES for which SCORE1 + SCORE2 > 100, assuming a value of 0 for SCORE1 when SCORE1 is null. SELECT * FROM GRADES WHERE COALESCE(SCORE1,0) + SCORE2 > 100;

26 COALESCE (continued)

• COALESCE may contain more than two arguments. • For example:

SELECT COALESCE(C1, C2, “X”)

• In this case, if both C1 and C2 are null, then the constant value “X” is returned.

Mullins Consulting, Inc. http://www.CraigSMullins.com 27 © 2006, Mullins Consulting, Inc.

If the COALESCE function has more than two arguments, the rules are applied to the first two arguments to determine a candidate result type. The rules are then applied to that candidate result type and the third argument to determine another candidate result type. This process continues until all arguments are analyzed and the final result type is determined.

27 Common Usage of COALESCE

• To convert nulls in a SELECT list where there are columns that have to be added, but one can be a NULL

SELECT EMPNO, FIRSTNME, LASTNME, (COALESCE(SALARY, 0) + COALESCE(COMM, 0) + COALESCE(BONUS, 0)) AS TOTAL_COMP FROM DSN8810.EMP;

Mullins Consulting, Inc. http://www.CraigSMullins.com 28 © 2006, Mullins Consulting, Inc.

28 IFNULL

• IFNULL is almost identical to the COALESCE function… • …except that IFNULL is limited to two arguments instead of multiple arguments. • Do not confuse IFNULL with NULLIF; they are two different functions with different purposes.

Mullins Consulting, Inc. http://www.CraigSMullins.com 29 © 2006, Mullins Consulting, Inc.

29 Comparing Nulls

• Inconsistent Treatment? • Not equal in predicates • That is, WHERE NULL = NULL is not true, but unknown • “Equal” in terms of sort order though • That is, all nulls group together for ORDER BY and GROUP BY • Is this inconsistent? • Would you really want a separate row for each null in your output for GROUP BY and ORDER BY?

Mullins Consulting, Inc. http://www.CraigSMullins.com 30 © 2006, Mullins Consulting, Inc.

30 GROUP BY and ORDER BY

• When a nullable column participates in an ORDER BY or GROUP BY clause, the returned nulls are grouped together at the high end of the sort order.

Mullins Consulting, Inc. http://www.CraigSMullins.com 31 © 2006, Mullins Consulting, Inc.

When a nullable column participates in an ORDER BY or GROUP BY clause, the returned nulls are grouped at the high end of the sort order.

31 Grouping Nulls

• Nulls group together (“equal” for sort order) • What if you want to GROUP BY an alternate column if the first is null? • Could use the COALESCE function (where x can be null and y cannot):

SELECT COALESCE(x, y), . . . FROM Table1 GROUP BY 1

• Of course, if y is a constant, then all of the nulls still group together…

Mullins Consulting, Inc. http://www.CraigSMullins.com 32 © 2006, Mullins Consulting, Inc.

SELECT x, y, … FROM Table1 WHERE x IS NOT NULL GROUP BY x UNION ALL SELECT x, y, … FROM Table1 WHERE x IS NULL GROUP BY y;

32 When is a Null Equal to a Null?

• In an ORDER BY or GROUP BY clause • When duplicates are eliminated by SELECT DISTINCT or COUNT(DISTINCT column) • In unique indexes without the WHERE NOT NULL specification

Mullins Consulting, Inc. http://www.CraigSMullins.com 33 © 2006, Mullins Consulting, Inc.

When a nullable column participates in an ORDER BY or GROUP BY clause, the returned nulls are grouped at the high end of the sort order. Nulls are considered to be equal when duplicates are eliminated by SELECT DISTINCT or COUNT (DISTINCT column). A unique index considers nulls to be equivalent and disallows duplicate entries because of the existence of nulls, unless the WHERE NOT NULL clause is specified in the index. For comparison in a SELECT statement, two null columns are not considered equal. When a nullable column participates in a predicate in the WHERE or HAVING clause, the nulls that are encountered cause the comparison to evaluate to UNKNOWN.

33 The Logic of Nulls?

• Two- Versus Three-Valued Logic

Mullins Consulting, Inc. http://www.CraigSMullins.com 34 © 2006, Mullins Consulting, Inc.

Two-valued logic, also known as Boolean algebra, was developed by George Boole. Of course, with nulls, two-valued logic no longer applied. Therefore, SQL requires three-valued logic: TRUE, FALSE, and UNKNOWN. UNKNOWN is a logical value and is not the same as a NULL, which is a data “value” marker. Here are the tables for the three operators: x NOT ======TRUE FALSE UNK UNK FALSE TRUE

AND | TRUE UNK FALSE ======TRUE | TRUE UNK FALSE UNK | UNK UNK FALSE FALSE | FALSE FALSE FALSE

OR | TRUE UNK FALSE ======TRUE | TRUE TRUE TRUE UNK | TRUE UNK UNK FALSE | TRUE UNK FALSE

34 Problems with Nulls

• These “problem” areas will be covered in the ensuing slides: • Arithmetic with Nulls • Functions and Nulls • EXISTS and Nulls • IN and Nulls

Mullins Consulting, Inc. http://www.CraigSMullins.com 35 © 2006, Mullins Consulting, Inc.

35 Arithmetic with Nulls • Any time a null participates in an arithmetic expression the results is null. • For example: • 5 + NULL • NULL / 501324 • 102 – NULL • 51235 * NULL • NULL**3 • NULL + NULL

Mullins Consulting, Inc. http://www.CraigSMullins.com 36 © 2006, Mullins Consulting, Inc.

The basic rule for math with NULLs is that they propagate. An arithmetic operation with a NULL will return a NULL.

Since a NULL is a missing value, you cannot determine the results of a mathematical expression containing a NULL.

36 Here’s a Strange One

• What is the result of NULL/0? • At first glance a mathematician would say it should return an error because dividing any number by zero is invalid. • But, the result in DB2 will be NULL. • As it would be in any SQL compliant DBMS.

Mullins Consulting, Inc. http://www.CraigSMullins.com 37 © 2006, Mullins Consulting, Inc.

However, the expression (NULL / 0) looks strange to people. The first thought is that a division by zero should return an error; if NULL is a true missing value, there is no value to which it can resolve and make that expression valid. However, SQL propagates the NULL, while a non-NULL value divided by zero will cause a runtime error.

37 Functions and Null

• AVG, COUNT DISTINCT, SUM, MAX and MIN omit column occurrences that are set to null. • So nulls are discounted by these functions • This can cause some “interesting” anomalies

Mullins Consulting, Inc. http://www.CraigSMullins.com 38 © 2006, Mullins Consulting, Inc.

38 SUM and Nulls

• Functions can complicate things. Consider this SQL:

SELECT SUM(SALARY) FROM EMP WHERE DEPTNO > 999;

• Returning NULL • Even if SALARY is defined as NOT NULL this can return a NULL • If no DEPTNO is greater than 999

Mullins Consulting, Inc. http://www.CraigSMullins.com 39 © 2006, Mullins Consulting, Inc.

39 The Effect of Null on COUNT()

• Consider the following three statements: • SELECT COUNT(*) FROM DSN8810.EMP; • SELECT COUNT(EMPNO) FROM DSN8810.EMP; • SELECT COUNT(HIREDATE) FROM DSN8810.EMP; • Along with the following data:

EMPNO . . . LASTNAME . . . HIREDATE ′000010′ HAAS 1995-10-23 ′000030′ KWAN 1980-04-03 ′000555′ MULLINS < null >

Mullins Consulting, Inc. http://www.CraigSMullins.com 40 © 2006, Mullins Consulting, Inc.

40 AVG and Nulls

• Sometimes what seems1 to be the logical result is not so with nulls. Consider: • AVG(col) • SUM(col)/COUNT(*) • Although it would seem that these two functions should return the same result, they may not. • AVG and SUM eliminate NULLs, but COUNT(*) does not

1 at least in our two-valued logic world Mullins Consulting, Inc. http://www.CraigSMullins.com 41 © 2006, Mullins Consulting, Inc.

The AVG, COUNT DISTINCT, SUM, MAX, and MIN functions omit column occurrences set to null. The COUNT(*) function, however, does not omit columns set to null because it operates on rows. Thus, AVG is not equal to SUM/COUNT(*) when the average is being computed for a column that can contain nulls. To clarify with an example, if the COMM column is nullable (and nulls exist), the result of the following query: SELECT AVG(COMM) FROM DSN8810.EMP; is not the same as for this query: SELECT SUM(COMM)/COUNT(*) FROM DSN8810.EMP;

41 EXISTS • Although a null is not a value, it does exist. • So, unknowns do exist in an EXISTS statement. • Consider, for example, an unknown celebrity birthday in the following SQL. • The birthday is unknown, so it may be the same!

SELECT P1.emp_name, 'born on a day w/o a famous person' FROM Personnel AS P1 WHERE NOT EXISTS (SELECT * FROM Celebrities AS C1 WHERE P1.birthday = C1.birthday);

Mullins Consulting, Inc. Source: Joe Celko, SQL for Smarties, 3rd ed., Morgan Kaufmann Publishers, 2005 http://www.CraigSMullins.com 42 © 2006, Mullins Consulting, Inc.

For example, what if we want to find all Personnel who were not born on the same day as a famous person. This can be answered with the following query, like this: SELECT P1.emp_name, ' was born on a day without a famous person!' FROM Personnel AS P1 WHERE NOT EXISTS (SELECT * FROM Celebrities AS C1 WHERE P1.birthday = C1.birthday); But assume that among the Celebrities, we have a movie star who will not admit her age, shown in the row ('Gloria Glamour', NULL). A new SQL programmer might expect that Ms. Glamour would not match to anyone, since we do not know her birthday yet. Actually, she will match to everyone, since there is a chance that they may match when some tabloid newspaper finally gets a copy of her birth certificate. But work out the subquery in the usual way to convince yourself: ... WHERE NOT EXISTS (SELECT * FROM Celebrities WHERE P1.birthday = NULL); becomes ...

WHERE NOT EXISTS (SELECT * FROM Celebrities WHERE UNKNOWN); becomes ... WHERE TRUE; and you will see that the predicate tests to UNKNOWN because of the NULL comparison, and therefore fails whenever we look at Ms. Glamour. Another problem with NULLs is found when you attempt to convert IN predicates to EXISTS predicates. Using our example of matching our Personnel to famous people, the query can be rewritten as: SELECT P1.emp_name, ' was born on a day without a famous person!' FROM Personnel AS P1 WHERE P1.birthday NOT IN (SELECT C1.birthday FROM Celebrities AS C1); However, consider a more complex version of the same query, where the celebrity has to have been born in City. The IN 42 predicate would be IN

• Nulls can be confusing when using the IN predicate • Consider two tables: • TableA contains: 20, 40, 60, 80 • TableB contains: 20, , 40 • What is the result of the following query?

SELECT * Answer FROM TableA Empty set WHERE Column NOT IN (SELECT Column FROM TableB);

Mullins Consulting, Inc. http://www.CraigSMullins.com 43 © 2006, Mullins Consulting, Inc.

NULLs can create problems in a NOT IN() predicate with a subquery.

To execute this query, think about it like this: SELECT * FROM TableA WHERE NOT ((Col = 20) OR (Col = NULL) OR (Col = 40));

Which is equivalent to this: SELECT * FROM TableA WHERE ( (Col <> 20) AND (Col <> NULL) AND (Col <> 40) );

The NULL reduces the predicate UNKNOWN so the results are always empty.

43 IN (continued)

• OK, let’s reverse that. Consider two tables: • TableC contains: 20, 40, , 80 • TableD contains: 20, 40, 60 • What is the result of the similar query?

SELECT * FROM TableC Answer WHERE Column NOT IN 80 (SELECT Column FROM TableD);

Mullins Consulting, Inc. http://www.CraigSMullins.com 44 © 2006, Mullins Consulting, Inc.

In this example, where there are no NULLs in the subquery, we get the answer we might expect.

SELECT * FROM TableC WHERE ((20 <> 20) AND (20 <> 40) AND (20 <> 60)) – FALSE UNION ALL SELECT * FROM TableC WHERE ((40 <> 20) AND (40 <> 40) AND (40 <> 60)) – FALSE UNION ALL SELECT * FROM TableC WHERE ((NULL <> 20) AND (NULL <> 40) AND (NULL <> 60)) – UNKNOWN UNION ALL SELECT * FROM TableC WHERE ((80 <> 20) AND (80 <> 40) AND (80 <> 60)) – TRUE!

The result is one row Æ 80.

44 Outer Joins

• NULLs are generated when you issue outer joins

SELECT PART, SUPPLIER, A.PRODNO, PRODUCT FROM Parts A FULL OUTER JOIN Products P ON A.PRODNO = P.PRODNO;

• When there is a Part w/o a Product • PART, SUPPLIER, and PRODNO are NULL • When there is a Product w/o a Part • PRODUCT is NULL • Unless you convert them using COALESCE

Mullins Consulting, Inc. http://www.CraigSMullins.com 45 © 2006, Mullins Consulting, Inc.

The NULLs that are generated by the OUTER JOIN can occur in columns derived from source table columns that have been declared to be NOT NULL. Even if you tried to avoid all the problems with NULLs by making every column in every table of your database schema NOT NULL, they could still occur in OUTER JOIN and function results.

45 Outer Joins (continued)

• NULLs are generated when you issue outer joins • Unless you convert them using COALESCE

SELECT COALESCE(PART, “No Part”), COALESCE(SUPPLIER, “No Supplier”), COALESCE(A.PRODNO, P.PRODNO) COALESCE(PRODUCT, “No Product”) FROM Parts A FULL OUTER JOIN Products P ON A.PRODNO = P.PRODNO;

Mullins Consulting, Inc. http://www.CraigSMullins.com 46 © 2006, Mullins Consulting, Inc.

46 Advice on Nulls

• Define the meaning of each null for each column • Use data modeling to avoid inapplicable nulls • Try to avoid when arithmetic is involved • But do not bury your head in the sand • Study the theory • Know the reality

Mullins Consulting, Inc. http://www.CraigSMullins.com 47 © 2006, Mullins Consulting, Inc.

47 Define Their Meaning

• Define the meaning of each null, for each column • If a column is to be nullable, the application and database designers must define what it means when the column is null. • The best “meaning” for a null is usually applicable, but unknown, but you may have exceptions. • If you have exceptions, be sure to document the meaning and train users appropriately.

Mullins Consulting, Inc. http://www.CraigSMullins.com 48 © 2006, Mullins Consulting, Inc.

Yet humans do need to deal with nulls, and they have the ability to ask questions that the database server cannot. What does it mean to have a null salary? Does it mean that salary doesn't apply? Or does it mean that an employee has "no salary" in the sense that that person's salary is US$0? Or does it mean that a value would be applicable, but we simply don't know that value? Or could it mean any of the foregoing, depending on which employee we are talking about? Although it is sometimes necessary, you can begin to see that human interpretation of nulls can be quite dangerous.

48 Use Data Modeling to Remove Inapplicable Nulls • Whenever possible try to create entities where every attribute is applicable • In doing so, then the only nulls will be applicable, but unknown • Recall data modeling discussion from slide 7

Mullins Consulting, Inc. http://www.CraigSMullins.com 49 © 2006, Mullins Consulting, Inc.

49 Try to Avoid When…

• If possible, avoid nullable columns when they will be involved in arithmetic. • If nulls are involved in equations and functions then: • You will need to use COALESCE to avoid null or realize that the result for every equation or math function involving a null will be null. • Remember the situation with AVG and SUM/COUNT(*)

Mullins Consulting, Inc. http://www.CraigSMullins.com 50 © 2006, Mullins Consulting, Inc.

Whenever possible, avoid nulls in columns that must participate in arithmetic logic (for example, DECIMAL money values), and especially when functions will be used. The AVG, COUNT DISTINCT, SUM, MAX, and MIN functions omit column occurrences set to null. The COUNT(*) function, however, does not omit columns set to null because it operates on rows. Thus, AVG is not equal to SUM/COUNT(*) when the average is being computed for a column that can contain nulls.

50 Do Not “Bury Your Head”

• Learn what nulls are and how they work • Even if you define every column as not nullable your queries can still return a null • Remember the slides on functions

Mullins Consulting, Inc. http://www.CraigSMullins.com 51 © 2006, Mullins Consulting, Inc.

51 Study the Theory, Know the Reality

• Understand the legitimate “problems” that experts like Chris Date have with nulls • Read the literature on the web and database books for guidance and enlightenment • Details on next two slides • Understand the reality that existing SQL database systems like DB2 are created with nullability built into them • As long as you use them, you need to understand how they use nulls to deal with missing information Mullins Consulting, Inc. http://www.CraigSMullins.com 52 © 2006, Mullins Consulting, Inc.

52 Literature on Nulls on the Web

• Codd’s 12 Rules http://www.itworld.com/nl/db_mgr/05072001/ • On Nulls and Empty Sets (Fabian Pascal, Chris Date, ) http://www.dbdebunk.com/page/page/2296478.htm • Old Approach, True Implementations by Fabian Pascal http://www.dbazine.com/ofinterest/oi-articles/pascal28 • The Final Null in the Coffin by Fabian Pascal http://www.dbdebunk.com/page/page/1396241.htm • How To Handle Missing Information Without Using Nulls http://web.onetel.com/~hughdarwen/TheThirdManifesto/Missing-info-without- nulls.pdf • There’s Only One Relational Model by Chris Date http://www.dbdebunk.com/page/page/622839.htm • An Interview with Chris Date http://www.oreillynet.com/pub/a/network/2005/07/29/cjdate.html?page=1 • Nulls – A Conversion Headache by John Hindmarsh http://www.embarcadero.com/resources/tech_papers/nullsconversion.html • Using Nulls in DB2 by Craig S. Mullins http://www.craigsmullins.com/bp7.htm

Mullins Consulting, Inc. http://www.CraigSMullins.com 53 © 2006, Mullins Consulting, Inc.

53 Books for Additional Research • Codd, E.F., The Relational Model for Database Management Version 2, Reading, MA: Addison-Wesley (1990): ISBN 0201141922 (out of print)

• Date, C.J., An Introduction to Database Systems, 8th ed., Reading, MA: Addison-Wesley (2003): ISBN 0321197844 • Date, C.J., Database In-Depth, Reading, MA: Addison-Wesley (2005): ISBN: 0596100124 • Date, C.J., The Database Relational Model, Reading, MA: Addison- Wesley (2001): ISBN 0201612941

• Celko, Joe, SQL For Smarties: Advanced SQL Programming, 3rd ed., , CA: Morgan Kaufmann (2005): ISBN 0123693799

• Web links to amazon.com for these books available at: http://www.craigsmullins.com/booklnks.htm

Mullins Consulting, Inc. http://www.CraigSMullins.com 54 © 2006, Mullins Consulting, Inc.

54 Session G12 NULL and Void? Dealing with Nulls in DB2

Craig S. Mullins Mullins Consulting, Inc. [email protected]

www.craigsmullins.com www.DB2portal.com

Mullins Consulting, Inc. http://www.CraigSMullins.com 55 © 2006, Mullins Consulting, Inc.

55