Front cover Data Links Managing Files Using DB2
Understand the Data Links architecture, unleashed for the first time
Explore planning, migration, the Reconcile utility, and recovery
Learn about HSM and HACMP support
Rodolphe Michel Amit Arora Kevin Crooks Aman Lalla David Shields
ibm.com/redbooks
International Technical Support Organization
Data Links: Managing Files Using DB2
December 2001
SG24-6280-00 Take Note! Before using this information and the product it supports, be sure to read the general information in “Special notices” on page 343.
First Edition (December 2001)
This edition applies to IBM DB2 Universal Database EE V7 and Data Links V7.
Comments may be addressed to: IBM Corporation, International Technical Support Organization Dept. QXXE Building 80-E2 650 Harry Road San Jose, California 95120-6099
When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you.
© Copyright International Business Machines Corporation 2001. All rights reserved. Note to U.S Government Users – Documentation related to restricted rights – Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp. Contents
Figures ...... ix
Tables ...... xv
Preface ...... xvii The team that wrote this redbook...... xviii Special notice ...... xix IBM trademarks ...... xx Comments welcome...... xx
Chapter 1. Introduction ...... 1 1.1 Why Data Links ...... 2 1.2 Data Links overview ...... 3 1.2.1 Data Links File Manager (DLFM) ...... 4 1.2.2 Data Links File System Filter (DLFF) ...... 5 1.2.3 The DATALINK data type ...... 6 1.3 Applications that use Data Links...... 7 1.3.1 Link Integrity+ ...... 7 1.3.2 VPM with DB2 Data Links ...... 8
Chapter 2. Technical architecture ...... 11 2.1 Overview of the Data Links architecture ...... 12 2.1.1 Data Links server ...... 12 2.1.2 DB2 Universal Database server ...... 15 2.1.3 DB2 client ...... 15 2.2 DATALINK data type ...... 18 2.2.1 Attributes of DATALINK type...... 23 2.2.2 Scalar functions for DATALINK data type ...... 24 2.2.3 DATALINK options ...... 26 2.3 Security/authentication ...... 30 2.3.1 Concept of tokenized file names ...... 30 2.3.2 Database configuration parameters ...... 31 2.3.3 How access tokens work...... 32 2.4 Data Links on UNIX and Windows ...... 33 2.4.1 Data Links File Manager (DLFM) ...... 34 2.4.2 Data Links File System Filter (DLFF) ...... 43 2.4.3 Linking and unlinking files ...... 47 2.4.4 Transaction support ...... 53 2.5 Data Links on DCE-DFS...... 57
© Copyright IBM Corp. 2001 iii 2.5.1 Data Links File Manager (DLFM) ...... 58 2.5.2 Data Manager Application (DMAPP)...... 59 2.5.3 Data Links File System Cache Manager (DLFS-CM) ...... 62
Chapter 3. Application development ...... 65 3.1 Choosing suitable applications for using Data Links ...... 66 3.2 Transactional semantics for files in the application ...... 66 3.3 Data Links versus LOBs ...... 67 3.3.1 Using LOBs ...... 67 3.3.2 Using Data Links ...... 68 3.4 Application development tasks ...... 69 3.4.1 Application deployment considerations...... 69 3.4.2 Checking whether Data Links has been enabled ...... 70 3.4.3 Choosing DATALINK options ...... 71 3.4.4 Changing DATALINK options ...... 74 3.4.5 Querying DATALINK options ...... 74 3.5 Coding considerations ...... 75 3.5.1 Host variable declaration...... 75 3.5.2 Creating and linking a new file ...... 76 3.5.3 Reading a linked file ...... 77 3.5.4 Updating a linked file...... 79 3.5.5 Unlinking a file...... 79 3.5.6 Scalar functions used with the DATALINK data type ...... 80 3.5.7 Error handling ...... 81 3.6 Using multiple file servers ...... 82 3.6.1 Supporting multiple links to the same file ...... 83 3.7 Migrating existing applications to use Data Links ...... 84 3.7.1 Migrating an application that uses files ...... 84 3.7.2 Migrating an application that uses LOBs...... 85
Chapter 4. Planning Data Links deployment ...... 91 4.1 Deployment options ...... 92 4.1.1 Single server implementation ...... 92 4.1.2 Single Universal Database and multiple DLFMs...... 92 4.1.3 Multiple Universal Databases and single DLFM ...... 93 4.1.4 Multiple DLFMs on a single host ...... 94 4.1.5 Multiple DB2s and multiple DLFMs ...... 94 4.2 File systems and sizing ...... 95 4.2.1 The DLFM backup (archive directory)...... 96 4.2.2 Data Links controlled file systems...... 97 4.2.3 Using NFS and NIS...... 97 4.3 Planning the backup of the DLFM_DB database ...... 98 4.4 Performance tuning tips ...... 98
iv Data Links: Managing Files Using DB2 4.4.1 Optimum logging levels...... 98 4.4.2 Location of file servers ...... 98 4.4.3 Number of files per directory ...... 99 4.4.4 Token algorithms...... 99 4.4.5 DLFM backup, home, and log directories ...... 99
Chapter 5. Data Links Manager administration ...... 101 5.1 Identifying the tables and servers in Data Links ...... 102 5.2 Checking for Data Links control over a file system ...... 103 5.3 Other useful DLFM commands ...... 104
Chapter 6. Using Tivoli Storage Manager ...... 107 6.1 Introduction to Tivoli Storage Manager ...... 108 6.1.1 Storage device concepts...... 110 6.1.2 Policy concepts ...... 112 6.1.3 Security concepts ...... 114 6.1.4 Communication methods ...... 114 6.2 Data Links with the Backup-Archive Client ...... 114 6.3 Data Links and Tivoli Space Manager ...... 116 6.3.1 Overview of Tivoli Space Manager ...... 116 6.3.2 Tools, processes, and interfaces ...... 123 6.3.3 Data Links support for HSM ...... 125 6.3.4 Current restrictions ...... 127
Chapter 7. High Availability support on AIX ...... 131 7.1 Introduction ...... 132 7.2 HACMP cluster configuration for hot standby ...... 132 7.2.1 Hot standby setup for a host DB2 server ...... 134 7.2.2 Hot standby setup for a Data Links server ...... 135 7.3 HACMP cluster configuration for mutual takeover...... 136 7.3.1 Configuration...... 137 7.3.2 Sequence of events ...... 141 7.4 The scripts ...... 142 7.4.1 Additional considerations for DB2 Universal Database Version 6 . 146 7.4.2 Final considerations ...... 147
Chapter 8. Creating a new database ...... 149 8.1 Overview ...... 150 8.2 Backup ...... 151 8.3 EXPORT (dlfm_export)...... 152 8.4 The db2look command ...... 154 8.5 The restore command ...... 155 8.6 Copying the linked files ...... 156 8.7 DLFM commands ...... 157
Contents v 8.8 Running the Import utility ...... 157 8.9 Running the Load utility ...... 158
Chapter 9. Data replication...... 161 9.1 Overview of DB2 replication ...... 162 9.2 Why replicate linked files ...... 162 9.3 Supported platforms ...... 163 9.4 Replication components ...... 164 9.4.1 Change-capture ...... 164 9.4.2 Apply ...... 166 9.4.3 Subscription sets and subscription set members ...... 167 9.5 Data Links replication ...... 168 9.5.1 Capturing DATALINK values...... 169 9.5.2 How Apply handles DATALINK values ...... 169 9.6 Implementing replication with Data Links ...... 172 9.6.1 Before we begin ...... 172 9.6.2 Defining the replication source ...... 174 9.6.3 Defining the subscription set and subscription set member ...... 178 9.6.4 Configuring the source database ...... 182 9.6.5 Binding the Capture and Apply programs ...... 183 9.6.6 Creating the password file for the Apply program ...... 183 9.6.7 Configuration files used by ASNDLCOPY...... 184 9.6.8 Configuration files used by ASNDLCOPYD ...... 187 9.6.9 Starting and stopping the Capture and Apply programs ...... 188
Chapter 10. The Reconcile utility...... 191 10.1 Overview ...... 192 10.2 When to run the Reconcile utility ...... 194 10.3 Situations that require the Reconcile utility ...... 196 10.3.1 Reconcile algorithm...... 197
Chapter 11. Recovery ...... 201 11.1 Overview ...... 202 11.1.1 Crash recovery ...... 202 11.1.2 Version or full database recovery ...... 205 11.1.3 Restore and rollforward recovery ...... 207 11.2 DLFM backup considerations ...... 208 11.2.1 Environment backup considerations ...... 210 11.3 DLFM restore considerations ...... 211 11.4 Recovery history file ...... 214 11.4.1 Events recorded in the history file ...... 214 11.4.2 Data recorded in the history file ...... 215 11.5 Restoring an offline backup without rollforward...... 215 11.6 Restoring and rolling forward to a point in time ...... 219
vi Data Links: Managing Files Using DB2 11.7 Tablespace recovery ...... 224 11.8 Recovering the dlfm_db to a point in time...... 231
Chapter 12. Garbage collection ...... 235 12.1 Garbage collection ...... 236 12.2 Garbage collection scenario ...... 238
Chapter 13. Migrating to DB2 UDB Version 7 ...... 243 13.1 Migration options ...... 244 13.1.1 DB2IMIGR and MIGRATE database commands ...... 244 13.1.2 Migrating the DB2 UDB V6.x database server ...... 250 13.1.3 Migrating databases using an offline backup ...... 254
Chapter 14. Moving a Data Links file system to a new disk ...... 259 14.1 Migrating a DLFS-enabled file system (AIX) ...... 260 14.2 Migrating a DLFS-enabled file system (Solaris) ...... 262
Chapter 15. Replacing or upgrading a machine...... 265 15.1 Replacing or upgrading a DB2 machine ...... 266 15.1.1 Assumption ...... 266 15.1.2 Steps to perform ...... 266 15.2 Replacing or upgrading a DLFM machine...... 267 15.2.1 Steps to perform ...... 267
Chapter 16. Problem determination...... 269 16.1 Solving problems ...... 270 16.1.1 Problem solving process ...... 270 16.1.2 Information needed to analyze a problem...... 271 16.1.3 DB2 Universal Database or DLFM ‘hang’ situations ...... 273 16.1.4 DB2 Universal Database or DLFM crash ...... 275 16.1.5 The DB2 Trace ...... 276 16.2 Solutions to common problems...... 286 16.2.1 Available resources...... 287 16.2.2 DLFM server problems ...... 287 16.2.3 DB2 server problems ...... 290 16.2.4 File system problems ...... 292 16.2.5 Frequently Asked Questions (FAQs) ...... 294
Appendix A. BNF specifications for DATALINK ...... 297
Appendix B. Overview of DCE-DFS on AIX...... 301 Distributed Computing Environment (DCE) ...... 302 Distributed File Service (DFS) ...... 303
Contents vii Appendix C. VPM and Data Links ...... 307 Installation overview ...... 308 Installing DB2 Data Links Manager 6.1 GA...... 309 Preliminary installation steps...... 310 Data Links post-installation ...... 311 Making Data Links work with VPM ...... 312 VPM and Data Link tokens ...... 314 Adapting VPM to work with Data Links ...... 317 Writing a model ...... 320 Additional information...... 329
Appendix D. Logging priorities for DLFF and DLFSCM...... 331 Modifying the DLFF logging priorities on AIX...... 332 Modifying the DLFSCM logging priorities in DCE-DFS (on AIX) ...... 334 Modifying the DLFF logging priorities on Solaris ...... 336 Modifying the DLFF logging level on Windows ...... 337
Related publications ...... 339 IBM Redbooks ...... 339 Other resources ...... 339 Referenced Web sites ...... 340 How to get IBM Redbooks ...... 341 IBM Redbooks collections...... 341
Special notices ...... 343
Index ...... 345
viii Data Links: Managing Files Using DB2 Figures
1-1 Architecture of the Data Links technology ...... 4 2-1 Data Links overview in UNIX and Windows environments ...... 16 2-2 Data Links overview in a DCE-DFS environment ...... 17 2-3 DATALINK data type ...... 18 2-4 Retrieving the Data Link value ...... 19 2-5 Accessing Data Linked files through a browser ...... 20 2-6 DATALINK column definition syntax ...... 29 2-7 Relationship between DB2 servers and Data Links servers ...... 34 2-8 DLFM process model: DB2 server...... 38 2-9 DLFM process model: Data Links Manager...... 39 2-10 DLFM process model: Complete picture ...... 40 2-11 Attributes before the link operation ...... 42 2-12 Attributes after the link operation ...... 42 2-13 Overview of Data Links implementation...... 46 2-14 Link-file operation...... 49 2-15 Control flow of SQL insert statement ...... 50 2-16 Unlink process ...... 52 2-17 Commit processing transactions ...... 56 2-18 DLFMs in a single DCE cell ...... 59 2-19 The DMAPP implementation ...... 61 2-20 Data Links architecture on DCE-DFS ...... 63 3-1 DATALINK access token ...... 73 3-2 DATALINK options stored in SYSCOLPROPERTIES table ...... 75 3-3 Using multiple DLFM file servers ...... 83 3-4 Externalizing LOB data ...... 89 3-5 Moving LOB table data to DATALINK table ...... 90 4-1 Single server implementation...... 92 4-2 Single UDB and one to many DLFMs ...... 93 4-3 Multiple UDBs and a single DLFM ...... 94 4-4 Multiple DB2 and multiple DLFMs ...... 95 5-1 Select from sysibm.syscolproperties ...... 102 5-2 List databases and Data Links Managers ...... 102 5-3 The dlfs file systems ...... 103 6-1 Storage management ...... 112 6-2 Policy concepts ...... 113 6-3 Tivoli Space Manager overview ...... 117 6-4 Data Links and Tivoli Space Manager ...... 125 6-5 Selective Migration of READ PERMISSION DB file ...... 127
© Copyright IBM Corp. 2001 ix 6-6 dostatfs.c ...... 128 6-7 VFS numbers of DLFS and FSM ...... 128 6-8 Result of dostatfs on /dlfsfsmtest ...... 129 6-9 dsmls utility behavior ...... 129 7-1 Host DB2 (or) Data Links File Manager cluster ...... 133 7-2 Mutual takeover environment...... 137 7-3 The /var/db2 files show the global variables and instances...... 140 7-4 The dlfs_cfg file must exist on both servers ...... 140 7-5 The contents of /etc/vfs ...... 141 7-6 List of dlfm_ programs ...... 147 8-1 The steps used to create the new database ...... 151 8-2 Backup database command ...... 152 8-3 Quiesce and export to the IXF file type ...... 152 8-4 Contents of the export control file ...... 153 8-5 Sample dlfm_export ...... 153 8-6 Export using delimited output...... 154 8-7 Delimited file before and after editing ...... 154 8-8 The db2look command and the output it produced ...... 155 8-9 Restore command, get dbm cfg, and list datalinks managers ...... 156 8-10 Sample dlfm_import ...... 157 8-11 The dlfm add_db and dlfm add_prefix commands...... 157 8-12 Import delimited file with DATALINK column type ...... 158 8-13 The Load utility...... 159 9-1 Change Capture...... 165 9-2 Defining a replication source ...... 166 9-3 Apply program data flow ...... 167 9-4 Subscription set and subscription set members ...... 168 9-5 DATALINK values before and after replication ...... 169 9-6 File reference mapping ...... 170 9-7 SOURCE.MANAGERS table ...... 172 9-8 SOURCE.MANAGERS table contents...... 173 9-9 Environment before replication ...... 173 9-10 Defining a replication source ...... 174 9-11 Selecting columns to be replicated ...... 175 9-12 Saving the replication source definition ...... 175 9-13 SQL to define the replication source ...... 176 9-14 Defining the replication source by running an SQL file ...... 176 9-15 Viewing the replication source ...... 177 9-16 Defining the replication subscription ...... 178 9-17 Define replication subscription dialog ...... 178 9-18 Changing the target table name...... 179 9-19 Selecting the primary key for the target ...... 179 9-20 Restricting replicated rows...... 180
x Data Links: Managing Files Using DB2 9-21 Subscription timing...... 181 9-22 Saving the replication subscription ...... 181 10-1 Reconcile warning when DLFM server is not available ...... 192 10-2 Extract of a lock snapshot for a table being reconciled ...... 193 10-3 Output of the db2_recon_aid utility with the CHECK option ...... 193 10-4 Extract of db2diag.log showing a table in DRP state ...... 194 10-5 Extract of a DB2DART report showing a table in DRP state ...... 194 10-6 Determining when to run the Reconcile utility ...... 196 11-1 Two-phase commit...... 204 11-2 Version or full database recovery ...... 206 11-3 Rollforward recovery ...... 207 11-4 Asynchronous archive request...... 209 11-5 Processing that takes place during a backup ...... 210 11-6 Environment backup considerations ...... 211 11-7 Processing that takes place during a restore...... 212 11-8 Restore with the WITHOUT DATALINK option ...... 212 11-9 Restore without specifying the WITHOUT DATALINK option ...... 213 11-10 Selecting results prior to insert and restore ...... 216 11-11 The ls results of the Data Link file system prior to insert ...... 216 11-12 Inserting and selecting after a new link ...... 217 11-13 List files after the link operation has completed ...... 218 11-14 Restore command and files that were unlinked ...... 218 11-15 Restore of an offline backup ...... 219 11-16 List history to find backup and point in time ...... 220 11-17 Restore with rolling forward and rollforward pending status ...... 221 11-18 Rollforward to obtain minimum CUT time ...... 221 11-19 Rollforward and log messages...... 222 11-20 Select statement with warning message ...... 222 11-21 Reconcile command and log messages ...... 223 11-22 Restore and rollforward to a point-in-time ...... 224 11-23 Removing dlfm_backup files and removing a Data Linked file ...... 225 11-24 Tablespace restore and rollforward ...... 225 11-25 Using db2dart to see the table status of DRP ...... 226 11-26 Selecting the data before reconcile is run ...... 227 11-27 Reconcile and the exceptions ...... 228 11-28 The ddl to create the exception table for reconcile ...... 228 11-29 Information from the exception table for the reconcile ...... 229 11-30 Selecting the data after reconcile has run ...... 230 11-31 Tablespace recovery scenario ...... 231 11-32 Restore command and dlfm stop ...... 232 11-33 Rollforward and messages ...... 232 11-34 The list registered databases output ...... 233 11-35 The db2_recon_aid utility and output...... 233
Figures xi 11-36 DLFM_DB database point-in-time recovery...... 234 12-1 Expired database backups...... 237 12-2 Four database backups are taken ...... 238 12-3 Active database backup being restored...... 239 12-4 Database backups taken with a new log sequence number ...... 239 12-5 Backup (BK1) is marked as expired...... 240 12-6 New log sequence created after restore of backup (BK6) ...... 240 12-7 Garbage collection marks backup BK2 as expired ...... 241 12-8 All backups prior to and including BK5 are marked as expired ...... 241 12-9 Inactive databases may become active because they are retained . . 242 13-1 DB2DART utility output reporting no errors ...... 245 13-2 Verifying that the database can be migrated with the db2ckmig utility 246 13-3 Instance migration using the db2imigr utility ...... 246 13-4 Connecting to a database that requires migration ...... 247 13-5 Successful migration of the database using the migrate command. . . 247 13-6 Verifying that the database can be migrated with the db2ckmig utility 248 13-7 Instance migration using the db2imigr utility ...... 248 13-8 Successful migration of the DLFM instance...... 249 13-9 Output of the db2set command ...... 249 13-10 DB2DART utility output reporting no errors ...... 252 13-11 Stopping DB2 Services on Windows NT ...... 252 13-12 Verifying that the database can be migrated with the db2ckmig utility 253 13-13 Extract of a recovery history file...... 256 13-14 Restoring into an existing database...... 256 13-15 Rollforward completing with a warning ...... 258 16-1 Extract of an entry written to the db2diag.log file ...... 273 16-2 Information about each component in the db2diag.log file ...... 273 16-3 Extract of a trap file ...... 274 16-4 Extract of a trace entry in the formatted trace file ...... 278 16-5 Information about each component in a formatted trace file ...... 279 16-6 An SQL1036 error message when connecting to the database . . . . . 280 16-7 Extract of the DB2DIAG.LOG with the SQL1036 error message. . . . . 281 16-8 Output of the DB2 Trace format command ...... 282 16-9 Function flow structure...... 283 16-10 Extract of the trace flow file ...... 284 16-11 Extract of trace flow showing the SQL1036 error ...... 285 16-12 Trace format file ...... 286 B-1 DCE architecture ...... 302 B-2 CDS entry in DNS format...... 305 C-1 DB2 V6 Fixpak 5 ...... 310 C-2 Interpreting DL_FEATURES values...... 319 C-3 Creating a model in VPM ...... 321 C-4 Creating and saving a model ...... 322
xii Data Links: Managing Files Using DB2 C-5 Confirm Write ...... 322 C-6 Saved model in VPM ...... 323 C-7 Read-Only file ...... 324 C-8 Opening a model in CATIA ...... 325 C-9 A model in CATIA ...... 326 C-10 File under Data Links control now ...... 327 C-11 Backup directory ...... 328 C-12 Files backed up under the Backup directory ...... 328
Figures xiii xiv Data Links: Managing Files Using DB2 Tables
2-1 Arguments to the SQLBuildDataLink function ...... 22 2-2 Possible combinations of DATALINK attributes...... 29 2-3 DLFM results and corresponding actions by DLFF ...... 45 3-1 DATALINK options...... 71 3-2 Host language variable declaration for DATALINKS data type ...... 76 4-1 Parameters that can affect the size of the archive directory ...... 96 B-1 Some commonly used terms in DCE-DFS environment ...... 306 C-1 Creating your file systems ...... 310
© Copyright IBM Corp. 2001 xv xvi Data Links: Managing Files Using DB2 Preface
The amount of data that is stored digitally is growing rapidly because computer systems and storage systems have become very affordable. The file paradigm is very common for such data types as video, image, text, graphics, and engineering drawings because capture, edit, and delivery tools use the file paradigm for these data types. A large number of applications store, retrieve, and manipulate data in files. Many of these applications need search capabilities to find the data in the files. These search capabilities, however, do not require physically bringing the data into the database system, because their raw content is not needed for the query.
Typically, you would extract features of an image or a video and store them in the database for performing a search on the extracted features. The applications combine the search capabilities of SQL with the advantages of working directly with files to manipulate the raw data. In general, the approach involves the ability to store a reference to such files, along with parametric data that describes their contents.
Data Links is a new feature of DB2 Universal Database (UDB) that extends the management umbrella of the relational database management system (RDBMS), to data stored in external operating system files as if the data was stored directly in the database. Data Links provides several levels of control over external data such as referential integrity, access control, coordinated backup and recovery, and transaction consistency.
This IBM Redbook provides you with sufficient information to effectively deploy Data Links in a complex environment. First it describes the technical architecture of Data Links, developing applications in a Data Links environment, and planning a deployment of Data Links. Then, it covers administering a Data Links environment, setting up Tivoli Storage Manager as a backup server with Data Links, and implementing high-availability cluster multiprocessing (HACMP) with Data Links. It includes a full chapter on data replication and, in particular, the replication of Data Linked files. It then describes the Reconcile utility and how the DB2 backup and recovery mechanism supports Data Links. This redbook concludes by providing some hints and tips for problem determination in a Data Links environment.
© Copyright IBM Corp. 2001 xvii This IBM Redbook is intended to be read by anyone who requires both introductory and detailed information on Data Links. Prior to reading this redbook, you should have a good understanding of DB2 Universal Database, and in particular, be familiar with data replication, database backup, and recovery concepts.
The team that wrote this redbook
This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization (ITSO), San Jose Center.
Rodolphe Michel is a Senior Data Management Specialist for DB2 UDB on UNIX and Windows NT at the ITSO, San Jose Center, where he conducts projects on all areas of DB2 UDB. He writes extensively and teaches IBM classes and workshops worldwide on all areas of DB2 Universal Database.
Amit Arora is a Sr. Software Engineer in IBM India Software Labs. He has two years of experience as a developer in the Data Links Project. He holds a Bachelor of Engineering (Honors) degree in Computer Science from REC Durgapur, India. His areas of expertise include UNIX internals and Data Links technology.
Kevin Crooks is a Database Administrator for the Boeing Company in Seattle, Washington (USA). He has 12 years of experience on DB2 for OS/390 and four years of expertise in the DB2 Universal Database field. He has worked at Boeing for 15 years. His areas of expertise include Data Links and DB2 UDB on AIX. He is also an IBM certified DB2 UDB database administrator (DBA).
Aman Lalla is a DB2 UDB Engine Support Specialist at the IBM Toronto Laboratory in Canada. He has five years of experience with DB2 on the UNIX and Intel platforms. His areas of expertise include database recovery and problem determination. He has two years Data Links experience. Prior to joining the IBM Toronto Lab, he was part of IBM Global Services South Africa providing on-site DB2 Common Server/UDB customer support.
David Shields is a DB2 Database Administrator for the Boeing Company in Seattle, Washington (USA). He has worked with DB2 for five years, including two years on OS/390 and three years on AIX. He provides database support to the Boeing engineering communities in Seattle and St. Louis, Missouri. He also worked as an IMS DBA for nine years prior to working with DB2.
xviii Data Links: Managing Files Using DB2 Thanks to the following people for their contributions to this project:
Nagraj Alur Karen Brannon Vitthal Gogate Joshua W Hui Inderpal Narang (Inventor of the Data Links technology) Ajay Sood Mahadevan Subramanian Parag Tijare IBM Almaden Research Center, San Jose, USA
Poorna Ambati Frank Butt Steven Elliot (Manager of the DB2 UDB Data Links Development) Graziela Kunde Bomma Shashidhar Mohan V Singamshetty S R Sreejith IBM Silicon Valley Lab, San Jose, USA
Suparna Bhattacharya Amit Das IBM Software Labs, Bangalore, India
Brian Baker and Amr Roushdi, of IBM Dassault Systèmes International Competency Center (IDSICC), Paris, France, who gave us permission to reproduce their report “Installing & Configuring VPM with DB2 Data Links” in Appendix C, “VPM and Data Links” on page 307.
Special notice
This publication is intended to help database developers, database administrators, and system administrators to deploy a Data Links environment. The information in this publication is not intended as the specification of any programming interfaces that are provided by DB2 Universal Database or Data Links. See the PUBLICATIONS section of the IBM Programming Announcement for the above products for more information about what publications are considered to be product documentation.
Preface xix IBM trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States and/or other countries:
e (logo)® Redbooks Logo AFS® OS/2® AIX® OS/390® AS/400® Perform™ DataPropagator™ Redbooks™ DB2® RETAIN® DB2 Universal Database™ S/390® DFS™ SP™ DPI® Tivoli® DRDA® TME® IBM® Lotus® IBM.COM™ Lotus Notes® Informix™ Notes® MVS™ Domino™
Comments welcome
Your comments are important to us!
We want our IBM Redbooks to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways: Use the online Contact us review redbook form found at: ibm.com/redbooks Send your comments in an Internet note to: [email protected] Mail your comments to the address on page ii.
xx Data Links: Managing Files Using DB2 1
Chapter 1. Introduction
DB2 is the IBM family of relational database management systems (RDBMS) products, with DB2 Universal Database (UDB) being the company's flagship for the implementation of object-relational extensibility. Data Links is a new feature of DB2 UDB, which extends the management umbrella of the RDBMS, to data stored in external operating system files as if the data was stored directly in the database. DB2 Data Links is available on the following environments: Journaled File System (JFS) on IBM AIX File System Migrator (FSM) on IBM AIX Distributed File Service (DFS) in Transarc’s Distributed Computing Environment (DCE) on IBM AIX UNIX File System (UFS) on SUN Solaris NTFS-formatted drive on Windows NT Integrated File System (IFS) on IBM ~ iSeries (AS/400)
Data Links provides several levels of control over external data such as referential integrity, access control, coordinated backup and recovery, and transaction consistency.
Referential integrity is supported with Data Links to ensure that users cannot delete or rename any external file as long as it is referenced in the database. Access control is enhanced with DB2’s permission used to grant or deny a user the ability to read a referenced external file, with read access control being
© Copyright IBM Corp. 2001 1 optional. With coordinated backup and recovery, the DBMS is responsible for backup and recovery of external data in synchronization with the associated database; this type of control over external data is optional. Transaction consistency requires that changes that affect both the database and external file be executed within a transactional context to preserve the logical integrity and consistency of the data.
1.1 Why Data Links
The amount of data stored digitally is growing rapidly because computer systems and storage systems have become very affordable. The file paradigm is very common for such data types as video, image, text, graphics, and engineering drawings because capture, edit, and delivery tools use the file paradigm for these data types. A large number of applications store, retrieve, and manipulate data in files.
These applications may use files to store their data for one or more of the following reasons: Cost You should consider the expense required to rewrite applications that use standard file I/O semantics to use a database as a repository. Also, your applications may use existing tools that work with the file paradigm. Replacing these tools can be expensive. Performance The store and forward model of data is unacceptable for performance reasons. For example, it may be unacceptable for the database manager to materialize a Binary Large Object (BLOB) into a file, and the converse, each time the data needs to be accessed as a file. Also, data is captured in high volumes, and you do not want to store it in the database. Network considerations You want to access data directly from a file server that is physically close to a workstation. For example, the file server can be configured so that the network distance is much shorter to the user, compared to the database where all the BLOBs are stored. The number of bytes that flow for a large object are much larger than the number of bytes for an answer of an SQL query. Network distance between resources is, therefore, a significant consideration.
2 Data Links: Managing Files Using DB2 Isochronous delivery The application uses a stream server because it has real time requirements for delivery and capture. The data is expected to be large, and you may require isochronous delivery. An example of isochronous delivery may be a video server that delivers high-quality (or “jitter-free”) video to a client workstation in real time. In these kinds of applications, it is likely that such data will not be moved into the database as a BLOB, but rather stay on the file server.
Many of these applications need search capabilities to find the data in the files. These search capabilities, however, do not require physically bringing the data into the database system, because their raw content is not needed for the query. Typically, you would extract features of an image or a video and store them in the database for performing a search on the extracted features. An example of the features that can be extracted from an image are color, shape, and texture. The IBM DB2 Universal Database Extender for Image product supports extraction and search functions on such features.
The ability to store a reference to such files, along with parametric data that describes their contents is, in general, the approach used by these applications to combine the search capabilities of SQL with the advantages of working directly with files to manipulate the raw data. The DB2 relational extenders for text, voice, image (and so on) provide this functionality. The extenders allow you to specify whether the object itself is to be maintained either inside or outside the database.
Currently, the DB2 relational extenders do not provide referential integrity between files on a server and their references in databases. Therefore, it is possible to independently delete either the reference or the file. Moreover, the extenders do not provide access control to the related files or coordinated backup and recovery schemes for a database and its associated files.
DB2 Data Links technology solves these problems and provides the functionality required by such applications. Future releases of the DB2 relational extenders will use Data Links technology.
1.2 Data Links overview
By extending the reach of the RDBMS to operating system files, Data Links gives users flexibility to store data inside or outside the database as appropriate. To store and reference data outside of a DBMS, a database application developer declares a column of DATALINK data type when creating an SQL table. The value stored in the DATALINK column is then used to represent and reference data in an external file.
Chapter 1. Introduction 3 Figure 1-1 illustrates the architecture of the Data Links technology. As shown in this figure, Data Links has two components: Data Links engine Data Links Manager
DB2 Application Archive Server SQL Access Path Standard File Access Protocol
Data Links File Manager (DLFM)
Control DLFM_DB Data Links File Path for (meta data System Filter Data Links repository) (DLFF) Integrity db2agents Native File S ystem : JFS, NTFS, UFS DB2 Server with (Solaris), Data Links Ext. Data Links Manager DFS-DCE/AIX on File Server Storage
Figure 1-1 Architecture of the Data Links technology
The Data Links engine resides on the host database server and is implemented as part of the database (DB2) engine code. It is responsible for processing SQL requests involving DATALINK columns such as table creation, and select, insert, delete, and update of records with a DATALINK column.
The Data Links Manager consists of two components: Data Links File Manager (DLFM) Data Links File System Filter (DLFF)
At a high level, DLFM applies constraints on the files that are referenced by the host database, and DLFF enforces the constraints when file system commands or operations affect these files. For example, a file rename or delete would be rejected if that file was referenced by the database.
1.2.1 Data Links File Manager (DLFM) The Data Links File Manager resides with the file server, which can be local or remote to the host database server, and plays a key role in managing external files. It is responsible for executing the link/unlink operations with transactional semantics within the file system. To do this, DLFM maintains its own DB2 repository about files that are linked to (referenced in) the database. When a file
4 Data Links: Managing Files Using DB2 is initially linked to the database, the DLFM applies the constraints for referential integrity, access control, and backup and recovery as specified in the DATALINK column definition. If the DBMS controls read access, for example, the DLFM changes the owner of the file to the DBMS and marks the file “read only” as well.
All these changes to the DLFM repository and to the file system are applied as part of the same DBMS transaction as the initiating SQL statement. If the SQL statement is rolled back, the changes made by the DLFM on the file system side are undone as well.
The DLFM is also responsible for coordinating backup and recovery of external files with the database. When the DBMS transaction that includes a Link File operation commits and the DBMS is responsible for recovery of the file, the DLFM initiates a backup of the newly linked file. This file backup is done asynchronously and is not part of the database transaction for performance reasons.
In addition, note that by doing it this way, the database backup itself is not slowed down because the referenced file has already been backed up. This is particularly important in the case of very large files. Coordinated backup and recovery of external files with DB2 data can be done directly to disk or through an archive server supported by DB2 UDB, such as Tivoli Storage Manager.
1.2.2 Data Links File System Filter (DLFF) The Data Links File System Filter is a thin, database-control layer on the file system that intercepts certain file system calls (for example, file-open, file-rename, and file-delete) issued by the application. If the file is referenced in a database, the DLFF is responsible for enforcing referential integrity constraints and access-control requirements defined for the file. This ensures that any access request meets DBMS security and integrity requirements.
The DLFF will, for example, reject a user-level request to rename or delete a file referenced by the database. This avoids “dangling pointers” in which a file is referenced by the database, but the actual file does not exist. DLFF also validates any authorization token embedded in the file pathname for a file-open operation.
Data Links provides a new and innovative DBMS capability. By providing tight integration of file system data with the object-relational DBMS, Data Links allows DB2 UDB to guarantee the integrity of data whether it is stored inside or outside the database. Although companies in the CAD/CAM application marketplace
Chapter 1. Introduction 5 were the early supporters of Data Links, Data Links applies to application problems in a wide variety of market segments, especially as it relates to content management. Web, Internet, and e-commerce applications are important examples of these new market segments.
1.2.3 The DATALINK data type Data Links technology includes the DATALINK data type that is implemented as an SQL data type in DB2 Universal Database, which references an object stored external to a database.
You use the DATALINK data type, just like any other SQL data type, to define columns in tables. In NT File System (NTFS) and JFS environments, the DATALINK values encode the name of a Data Links server containing the file and the filename in terms of a Uniform Resource Locator (URL). The DATALINK value is robust in terms of integrity, access control, and recovery. DB2 treats a DATALINK value as if the object were stored in the database. You register a set of known Data Links servers. The only Data Links server names that you can specify in a DATALINK value are those that have been registered to a DB2 database.
In Distributed Computing Environment-Distributed File Service (DCE-DFS) environments, the Data Links Manager is registered for the entire cell, and linked files are referred to in terms of a URL with a scheme – dfs and the DFS pathname of the file.
Even though the DATALINK value represents an object that is stored outside the database system, you can use SQL queries to search parametric data to obtain the file name that corresponds to the query result. You can create indexes on files containing video, images, text, or other media formats, and store those attributes in tables along with the DATALINK value. With a central repository of files on a file server and DATALINK data types in a database, you can obtain answers to questions like: What do I have? Where can I find what I’m looking for?
These are examples of applications that can use the DATALINK data type: Medical applications, in which X-rays are stored on the file server and the attributes are stored in a database. Entertainment industry applications that perform asset management of video clips. The video clips are stored on a file server, but attributes about the clips are stored in a database. Access control is required for accessing the video clips based on database privileges of accessing the meta information.
6 Data Links: Managing Files Using DB2 World Wide Web applications that manage millions of files and allow access control based on database privileges. Financial applications, which require distributed capture of check images and a central location for those images. CAD/CAM applications, where the engineering drawings are kept as files, and the attributes are stored in the database. Queries are run against the drawing attributes.
1.3 Applications that use Data Links
Among the applications that use Data Links, there are two applications that illustrate the wide range of applications that can benefit greatly from Data Links: Link Integrity+ Dassault Systems’ VPM product
1.3.1 Link Integrity+ Link Integrity+ is a Web asset integrity solution from the IBM Almaden Research Center in San Jose, California. It exploits IBM DB2 UDB’s unique Data Links technology to guarantee the referential integrity (RI) of an intranet’s Web objects such as Web pages, hyperlinks, images, server-side-programs, and templates.
While there are many products in the marketplace that report on broken links and missing images “after-the-fact”, Link Integrity+ proactively prevents the occurrence of broken links and the irksome “404 file not found” message. It does this by inhibiting any malicious or accidental changes to Web pages that could compromise the referential integrity of Web assets.
Link Integrity+'s architecture supports a two-phase approach to delivering Web content: Phase 1: Validates the referential integrity of hyperlinks, images, server-side programs and templates Phase 2: “Installs” the Web content on the Web site in atomic fashion, with minimal transient problems
Link Integrity+ also supports the enforcement of an organization's guidelines for Web content such as the inclusion of appropriate headers, footers, and disclaimers. A critical Link Integrity+ function is its support of multiple independent webmaster domains within a geographically distributed intranet of heterogeneous Web servers. It includes an e-mail and pager notification system that alerts webmasters to the impact on Web pages in their domain, of deletions, or updates of Web pages in another webmaster's domain.
Chapter 1. Introduction 7 Link Integrity+ exploits IBM DB2 UDB Data Links, Java JDBC, Java Mail, Java Beans Activation Framework, Java Native Interface (JNI), Structured Query Language (SQL), and Extensible Markup Language (XML) technologies in its implementation.
The Link Integrity+ architecture provides the ability to deliver a significantly higher level of Web asset integrity to an organization's intranet. It synergistically integrates the IBM DB2 UDB unique Data Links technology with innovative application design. Link Integrity+ is a prototype that demonstrates that its architecture is capable of supporting the “real world” environment of geographically distributed heterogeneous Web sites with multiple webmasters managing multiple domains or sub-domains. Its staging area approach enables the enforcement of referential integrity of Web assets and the enforcement of an organization's guidelines for Web content.
Because it is the main conduit for getting content on the Web, Link Integrity+ can be educated to become sensitive to information of interest to specific individuals. In other words, Link Integrity+ can be integrated with personalization and information delivery systems to notify and deliver in very timely fashion, information to individuals based on available user-profiles and subscription information. The Link Integrity+ trigger mechanism for use by content developers significantly enhances the productivity of webmasters by taking over routine and mundane activities, and only alerting them to get involved when problems are detected. By guaranteeing the integrity of an intranet's hyperlinks, the chances of an end user encountering the “404 file not found” message is minimized, which contributes to a positive experience for the user visiting the Web site. Note that an end user may still experience the “404 file not found” message due to caching of pages that may occur in the browser, Internet Service Provider (ISP), proxy, and other caches.
For more information, refer to “Link Integrity+: A Web Asset Integrity Solution”, Nagraj Alur, Ramani Ranjan Routray, IBM Almaden Research Center paper.
1.3.2 VPM with DB2 Data Links This demonstrates a methodology behind how IBM middleware (DB2 and Data Links) can provide solutions for data archive and restoration on a large enterprise basis. This applies specifically when working with IBM & Dassault Systemes CATIA and VPM.
Data Links technology has been supported in VPM since the general availability (GA) of VPM 1.2. This technology support provides four primary capabilities: Logical data consistency: For example, an engineer cannot delete or rename a file that is referenced by its corresponding part description in the database.
8 Data Links: Managing Files Using DB2 Transaction consistency: If a transaction is rolled back in the database, the link to the appropriate version of the file at this site is maintained. Security and access: Files controlled by Data Links can either be totally protected by the database, preventing unauthorized file system access, or opened, to allow file system access. Synchronized backup and recovery: Using DB2 with Data Links ensures consistent backup and recovery of ENOVIAVPM meta data and the associated CATIA models. This makes the overall process more automatic and less database administrator (DBA)-intensive. In the past, administrative tasks were performed outside of the CATIA environment. This required a separate backup strategy for external CATIA files, which introduced a large risk of inconsistencies between the database and related external files.
For additional information, refer to Appendix C, “VPM and Data Links” on page 307.
Chapter 1. Introduction 9 10 Data Links: Managing Files Using DB2 2
Chapter 2. Technical architecture
This chapter provides a detailed description of the Data Links technical architecture. The following topics are discussed: Overview of the Data Links architecture The SQL data type DATALINK How Data Links maintains security The different components of Data Links on AIX, Solaris, and Windows The different components of Data Links on DCE-DFS
© Copyright IBM Corp. 2001 11 2.1 Overview of the Data Links architecture
DB2 Data Links can be installed on the following environments: Journaled File System (JFS) on IBM AIX File System Migrator (FSM) on IBM AIX
Note: FSM is the file system filter for Tivoli Space Manager client (also known as Hierarchical Storage Manager (HSM)), which provides the space management capabilities. Data Links support for Tivoli Space Manager is discussed in Chapter 6, “Using Tivoli Storage Manager” on page 107.