IDUG NA 2006 Craig Mullins: Null and Void? Dealing with Nulls In

Total Page:16

File Type:pdf, Size:1020Kb

IDUG NA 2006 Craig Mullins: Null and Void? Dealing with Nulls In Session: G12 Null and Void? Dealing with Nulls in DB2 Craig S. Mullins President & Principal Consultant Mullins Consulting, Inc. http://www.CraigSMullins.com Thursday, May 11, 2006 • 08:30 a.m. – 09:40 a.m. Platform: DB2 for z/OS 1 Agenda • Definition • Some History • Types of Nulls • Inapplicable versus Applicable Data • Nulls and Keys • Distinguished Nulls • Using Nulls in DB2 • Problems with Nulls • Guidance and Advice Mullins Consulting, Inc. http://www.CraigSMullins.com 2 © 2006, Mullins Consulting, Inc. 2 What is a NULL? • NULL represents the absence of a value. • It is not the same as zero or an empty string. • A null is not a “null value” – there is no value. • Maybe a “Null Lack of Value” Mullins Consulting, Inc. http://www.CraigSMullins.com 3 © 2006, Mullins Consulting, Inc. 3 What is the Difference, You Ask? • Consider the following columns: • TERMINATION_DATE – null or a valid date? • SALARY – null or zero? • SSN – non-US resident? • ADDRESS – different composition by country, so let some components be null? • HAIR_COLOR – what about bald men? Mullins Consulting, Inc. http://www.CraigSMullins.com 4 © 2006, Mullins Consulting, Inc. When are nulls useful? Well, defining a column as NULL provides a place holder for data you might not yet know. For example, when a new employee is hired and is inserted into the EMP table, what should the employee termination date column be set to? I don’t know about you, but I wouldn’t want any valid date to be set in that column for my employee record. Instead, null can be used to specify that the termination date is currently unknown. Let’s consider another example. Suppose that we also capture employee’s hair color when they are hired. Consider three potential entity occurrences: a man with black hair, a woman with unknown hair color, and a bald man. The woman with the unknown hair color and the bald man both could be assigned as null, but for different reasons. The woman’s hair color would be null meaning presently unknown; the bald man’s hair color could be null too, in this case meaning not applicable. How could you handle this without using nulls? You would need to create special values for the HairColor column that mean “bald” and “unknown.” This is possible for a CHAR column like HairColor. But what about a DB2 DATE column? All occurrences of a column assigned as a DATE data type are valid dates. It might not be possible to use a special date value to mean “unknown.” This is where using nulls can be practical. 4 Ted Codd on Nulls • Dr. E.F. Codd built the notion of nulls into the relational model, but not in the original 1970 paper • The extensions to RM/V1 to support nulls weren't defined until the 1979 paper Extending the Database Relational Model to Capture More Meaning, ACM TODS 4, No. 4 (December 1979). • He later defined two types of nulls in RM/V2 • These were called marks • A-mark: applicable, but unknown • I-mark: inapplicable (known to be inapplicable) Mullins Consulting, Inc. http://www.CraigSMullins.com 5 © 2006, Mullins Consulting, Inc. A null allows a column to be specifically empty. A null is distinct from an empty string or a number with a value of zero. Of course, a null cannot apply to primary keys, which must contain values. Most database implementations support the concept of a non-null field constraint that prevents nulls in a specific table column… as does DB2. 5 Inapplicable versus Applicable • Inapplicable: • These are perhaps better dealt with by creating better data models • Social Security Number for a European • Consulting fee for a full-time employee • Applicable, but unknown: • These are the ones that are “true” nulls • Future dates not yet known • Information not supplied • Not everyone has a middle name Mullins Consulting, Inc. http://www.CraigSMullins.com 6 © 2006, Mullins Consulting, Inc. 6 Modeling Fixes Inapplicable Columns Mullins Consulting, Inc. http://www.CraigSMullins.com 7 © 2006, Mullins Consulting, Inc. Super-type – sub-type modeling can be a better approach to inapplicable nulls. Consider the approach shown in the slide. Wouldn’t it be better to have only those columns which are always applicable in each entity/table? For example, the alternate here would be to have just the Supplier entity, but it would have a nullable Postal Code and a nullable Zip Code. For domestic suppliers postal code would be null (inapplicable); for international suppliers zip code would be blank (inapplicable). By creating the sub-types of Supplier we can model our way out of inapplicable nulls. 7 Nulls and Keys • Primary Key • A primary key cannot be null • Foreign Key(s) • Foreign keys can be null • ON DELETE SET NULL – when the PK that refers to the FK row is deleted, the FK is set to null Mullins Consulting, Inc. http://www.CraigSMullins.com 8 © 2006, Mullins Consulting, Inc. 8 Distinguished Nulls: Not all Nulls are Equal • Sometimes, based on domains and knowledge of the data a null may not really be a null. • Consider the COLOR column • With a domain of BLACK, WHITE, and RED • With a UNIQUE constraint on it • Now, say we have three rows in the table, one of which has “BLACK” for the COLOR and the other two are null. • Are those nulls really completely null? We know something about them. • What if one of the nulls is changed to WHITE? That last null is not really null, is it? • But DB2 does not automatically change it to RED… • Can you (or your program)? Mullins Consulting, Inc. http://www.CraigSMullins.com 9 © 2006, Mullins Consulting, Inc. Create a domain called Tricolor that is limited to the values 'Red', 'White', and 'Black' and a column in a table drawn from that domain with a UNIQUE constraint on it. If my table has a 'Red' and two NULL values in that column, I have some information about the two NULLs. I know they will be either ('White', 'Black') or ('Black', 'White') when their rows are resolved. This is what Chris Date calls a "distinguished NULL", which means we have some information in it. If my table has a 'Red', a 'White', and a NULL value in that column, can I change the last NULL to 'Black' because it can only be 'Black' under the rule? Or do I have to wait until I see an actual value for that row? There is no clear way to handle this in SQL. Multiple values cannot be put in a column, nor can the database automatically change values as part of the column declaration. 9 A Lot of Arguments • Some relational experts have argued against nulls (e.g. Date, Darwen, Pascal) • There are many reasons; many of them centered around the “quirks” of how nulls are implemented and the difficulty of “three-valued” logic. • Also, nulls violate “relational” – at least according to some • A null is not a value, and therefore cannot exist in a mathematical relation Mullins Consulting, Inc. http://www.CraigSMullins.com 10 © 2006, Mullins Consulting, Inc. 10 What About DB2? • Well, DB2 is not a “pure” relational DBMS • It supports and uses nulls • Let’s take a look at how DB2 uses nulls and how you use DB2 to specify nulls and manipulate nulls in your data • …as well as some of the “oddities” this can create. Mullins Consulting, Inc. http://www.CraigSMullins.com 11 © 2006, Mullins Consulting, Inc. 11 Nulls in DB2 • All columns are nullable unless defined with: • NOT NULL or NOT NULL WITH DEFAULT • DB2 uses a null-indicator to set a column to null • One byte • Zero or positive value means not null • Negative value means null • Not stored in column, but associated with column Mullins Consulting, Inc. http://www.CraigSMullins.com 12 © 2006, Mullins Consulting, Inc. All data types include the null “value”. Distinct from all non-null values, a null is a special “value” that denotes the absence of a value. Although all data types include the null, some sources of values cannot provide for nulls. For example, constants, columns that are defined as NOT NULL, and special registers cannot contain nulls; the COUNT and COUNT_BIG functions cannot return a null value; and ROWID columns cannot store a null although a null can be returned for a ROWID column as the result of a query. 12 Nulls in DB2 (continued) • Nulls do NOT save space – ever! • Nulls are not variable columns but… • Can be used with variable columns to (perhaps) save space • Programs have to be coded to explicitly deal with the possibility of null • (example on next slide) Mullins Consulting, Inc. http://www.CraigSMullins.com 13 © 2006, Mullins Consulting, Inc. Let’s take a moment to clear up a common misunderstanding right here: nulls NEVER save storage space in DB2 for OS/390 and z/OS. Every nullable column requires one additional byte of storage for the null indicator. So, a CHAR(10) column that is nullable will require 11 bytes of storage per row – 10 for the data and 1 for the null indicator. This is the case regardless of whether the column is set to null or not. In some DBMS products a null column is treated as variable length. If the column is set to null it will not take up storage space (in other words, the row becomes variable length). This is NOT the case with DB2. 13 Using Null Indicator Variables You can use null indicator variables with corresponding host variables in the following situations: • SET clause of the UPDATE statement • VALUES clause of the INSERT statement • INTO clause of the SELECT or FETCH statement Mullins Consulting, Inc.
Recommended publications
  • Database Management Systems Ebooks for All Edition (
    Database Management Systems eBooks For All Edition (www.ebooks-for-all.com) PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sun, 20 Oct 2013 01:48:50 UTC Contents Articles Database 1 Database model 16 Database normalization 23 Database storage structures 31 Distributed database 33 Federated database system 36 Referential integrity 40 Relational algebra 41 Relational calculus 53 Relational database 53 Relational database management system 57 Relational model 59 Object-relational database 69 Transaction processing 72 Concepts 76 ACID 76 Create, read, update and delete 79 Null (SQL) 80 Candidate key 96 Foreign key 98 Unique key 102 Superkey 105 Surrogate key 107 Armstrong's axioms 111 Objects 113 Relation (database) 113 Table (database) 115 Column (database) 116 Row (database) 117 View (SQL) 118 Database transaction 120 Transaction log 123 Database trigger 124 Database index 130 Stored procedure 135 Cursor (databases) 138 Partition (database) 143 Components 145 Concurrency control 145 Data dictionary 152 Java Database Connectivity 154 XQuery API for Java 157 ODBC 163 Query language 169 Query optimization 170 Query plan 173 Functions 175 Database administration and automation 175 Replication (computing) 177 Database Products 183 Comparison of object database management systems 183 Comparison of object-relational database management systems 185 List of relational database management systems 187 Comparison of relational database management systems 190 Document-oriented database 213 Graph database 217 NoSQL 226 NewSQL 232 References Article Sources and Contributors 234 Image Sources, Licenses and Contributors 240 Article Licenses License 241 Database 1 Database A database is an organized collection of data.
    [Show full text]
  • Missing Data in the Relational Model
    Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2013 Missing Data in the Relational Model Marion Morrissett Virginia Commonwealth University Follow this and additional works at: https://scholarscompass.vcu.edu/etd Part of the Engineering Commons © The Author Downloaded from https://scholarscompass.vcu.edu/etd/3004 This Dissertation is brought to you for free and open access by the Graduate School at VCU Scholars Compass. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of VCU Scholars Compass. For more information, please contact [email protected]. c Marion R. Morrissett, 2013 All Rights Reserved Dedication This research is dedicated to content, data with missing values that represent the always-complete real world. And to structure, the relational model created and developed by the scientists, researchers, teachers, and practitioners who populate my test case database. MISSING DATA IN THE RELATIONAL MODEL A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at Virginia Commonwealth University. by MARION R. MORRISSETT Bachelor of Arts, University of Virginia, 1972 Mathematical Sciences Certificate in Computer Science, Virginia Commonwealth University, 1987 Master of Science, Virginia Commonwealth University, 1994 Doctor of Philosophy, Virginia Commonwealth University, 2013 Director: LORRAINE M. PARKER ASSOCIATE PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE Virginia Commonwealth University Richmond, Virginia May, 2013 ii Acknowledgments Many people have provided help and support during this work. My friends and family listened to my dissertation status reports, the programmers among them heard the technical details and all were patient. John Cookson, Tom Nicholls, and Paul Bruggeman shared their experience with problems created and solved by computers.
    [Show full text]
  • Oral History of C. J. Date
    Oral History of C. J. Date Interviewed by: Thomas Haigh Recorded: June 13, 2007 Mountain View, California CHM Reference number: X4090.2007 © 2007 Computer History Museum Table of Contents BACKGROUND AND EDUCATION..............................................................................................4 FIRST JOB IN COMPUTING: WORKING FOR LEO....................................................................7 JOINING IBM ................................................................................................................................9 DATABASE MANAGEMENT AND PL/I ......................................................................................12 DESIGNING A DATABASE LANGUAGE ...................................................................................13 FIRST CONTACTS WITH TED CODD.......................................................................................16 THE INTRODUCTION TO DATABASE SYSTEMS BOOK.........................................................17 THE ACM DEBATE ....................................................................................................................21 THE RELATIONAL MODEL........................................................................................................24 SQL.............................................................................................................................................27 EVOLUTION OF THE INTRODUCTION TO DATABASE SYSTEMS BOOK .............................32 LEAVING IBM .............................................................................................................................34
    [Show full text]
  • Missing Data in the Relational Model Marion Morrissett Virginia Commonwealth University
    Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2013 Missing Data in the Relational Model Marion Morrissett Virginia Commonwealth University Follow this and additional works at: http://scholarscompass.vcu.edu/etd Part of the Engineering Commons © The Author Downloaded from http://scholarscompass.vcu.edu/etd/3004 This Dissertation is brought to you for free and open access by the Graduate School at VCU Scholars Compass. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of VCU Scholars Compass. For more information, please contact [email protected]. c Marion R. Morrissett, 2013 All Rights Reserved Dedication This research is dedicated to content, data with missing values that represent the always-complete real world. And to structure, the relational model created and developed by the scientists, researchers, teachers, and practitioners who populate my test case database. MISSING DATA IN THE RELATIONAL MODEL A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at Virginia Commonwealth University. by MARION R. MORRISSETT Bachelor of Arts, University of Virginia, 1972 Mathematical Sciences Certificate in Computer Science, Virginia Commonwealth University, 1987 Master of Science, Virginia Commonwealth University, 1994 Doctor of Philosophy, Virginia Commonwealth University, 2013 Director: LORRAINE M. PARKER ASSOCIATE PROFESSOR, DEPARTMENT OF COMPUTER SCIENCE Virginia Commonwealth University Richmond, Virginia May, 2013 ii Acknowledgments Many people have provided help and support during this work. My friends and family listened to my dissertation status reports, the programmers among them heard the technical details and all were patient. John Cookson, Tom Nicholls, and Paul Bruggeman shared their experience with problems created and solved by computers.
    [Show full text]