Managing Files Using DB2

Home , Database transaction, Db2 Database

Front cover Data Links Managing Files Using DB2

Understand the Data Links architecture, unleashed for the first time

Explore planning, migration, the Reconcile utility, and recovery

Learn about HSM and HACMP support

Rodolphe Michel Amit Arora Kevin Crooks Aman Lalla David Shields

ibm.com/redbooks

International Technical Support Organization

Data Links: Managing Files Using DB2

December 2001

SG24-6280-00 Take Note! Before using this information and the product it supports, be sure to read the general information in “Special notices” on page 343.

First Edition (December 2001)

This edition applies to IBM DB2 Universal Database EE V7 and Data Links V7.

Comments may be addressed to: IBM Corporation, International Technical Support Organization Dept. QXXE Building 80-E2 650 Harry Road San Jose, California 95120-6099

When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you.

© Copyright International Business Machines Corporation 2001. All rights reserved. Note to U.S Government Users – Documentation related to restricted rights – Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp. Contents

Figures ...... ix

Tables ...... xv

Preface ...... xvii The team that wrote this redbook...... xviii Special notice ...... xix IBM trademarks ...... xx Comments welcome...... xx

Chapter 1. Introduction ...... 1 1.1 Why Data Links ...... 2 1.2 Data Links overview ...... 3 1.2.1 Data Links File Manager (DLFM) ...... 4 1.2.2 Data Links File System Filter (DLFF) ...... 5 1.2.3 The DATALINK data type ...... 6 1.3 Applications that use Data Links...... 7 1.3.1 Link Integrity+ ...... 7 1.3.2 VPM with DB2 Data Links ...... 8

Chapter 2. Technical architecture ...... 11 2.1 Overview of the Data Links architecture ...... 12 2.1.1 Data Links server ...... 12 2.1.2 DB2 Universal Database server ...... 15 2.1.3 DB2 client ...... 15 2.2 DATALINK data type ...... 18 2.2.1 Attributes of DATALINK type...... 23 2.2.2 Scalar functions for DATALINK data type ...... 24 2.2.3 DATALINK options ...... 26 2.3 Security/authentication ...... 30 2.3.1 Concept of tokenized file names ...... 30 2.3.2 Database configuration parameters ...... 31 2.3.3 How access tokens work...... 32 2.4 Data Links on UNIX and Windows ...... 33 2.4.1 Data Links File Manager (DLFM) ...... 34 2.4.2 Data Links File System Filter (DLFF) ...... 43 2.4.3 Linking and unlinking files ...... 47 2.4.4 Transaction support ...... 53 2.5 Data Links on DCE-DFS...... 57

Chapter 3. Application development ...... 65 3.1 Choosing suitable applications for using Data Links ...... 66 3.2 Transactional semantics for files in the application ...... 66 3.3 Data Links versus LOBs ...... 67 3.3.1 Using LOBs ...... 67 3.3.2 Using Data Links ...... 68 3.4 Application development tasks ...... 69 3.4.1 Application deployment considerations...... 69 3.4.2 Checking whether Data Links has been enabled ...... 70 3.4.3 Choosing DATALINK options ...... 71 3.4.4 Changing DATALINK options ...... 74 3.4.5 Querying DATALINK options ...... 74 3.5 Coding considerations ...... 75 3.5.1 Host variable declaration...... 75 3.5.2 Creating and linking a new file ...... 76 3.5.3 Reading a linked file ...... 77 3.5.4 Updating a linked file...... 79 3.5.5 Unlinking a file...... 79 3.5.6 Scalar functions used with the DATALINK data type ...... 80 3.5.7 Error handling ...... 81 3.6 Using multiple file servers ...... 82 3.6.1 Supporting multiple links to the same file ...... 83 3.7 Migrating existing applications to use Data Links ...... 84 3.7.1 Migrating an application that uses files ...... 84 3.7.2 Migrating an application that uses LOBs...... 85

Chapter 4. Planning Data Links deployment ...... 91 4.1 Deployment options ...... 92 4.1.1 Single server implementation ...... 92 4.1.2 Single Universal Database and multiple DLFMs...... 92 4.1.3 Multiple Universal Databases and single DLFM ...... 93 4.1.4 Multiple DLFMs on a single host ...... 94 4.1.5 Multiple DB2s and multiple DLFMs ...... 94 4.2 File systems and sizing ...... 95 4.2.1 The DLFM backup (archive directory)...... 96 4.2.2 Data Links controlled file systems...... 97 4.2.3 Using NFS and NIS...... 97 4.3 Planning the backup of the DLFM_DB database ...... 98 4.4 Performance tuning tips ...... 98

iv Data Links: Managing Files Using DB2 4.4.1 Optimum logging levels...... 98 4.4.2 Location of file servers ...... 98 4.4.3 Number of files per directory ...... 99 4.4.4 Token algorithms...... 99 4.4.5 DLFM backup, home, and log directories ...... 99

Chapter 5. Data Links Manager administration ...... 101 5.1 Identifying the tables and servers in Data Links ...... 102 5.2 Checking for Data Links control over a file system ...... 103 5.3 Other useful DLFM commands ...... 104

Chapter 6. Using Tivoli Storage Manager ...... 107 6.1 Introduction to Tivoli Storage Manager ...... 108 6.1.1 Storage device concepts...... 110 6.1.2 Policy concepts ...... 112 6.1.3 Security concepts ...... 114 6.1.4 Communication methods ...... 114 6.2 Data Links with the Backup-Archive Client ...... 114 6.3 Data Links and Tivoli Space Manager ...... 116 6.3.1 Overview of Tivoli Space Manager ...... 116 6.3.2 Tools, processes, and interfaces ...... 123 6.3.3 Data Links support for HSM ...... 125 6.3.4 Current restrictions ...... 127

Chapter 7. High Availability support on AIX ...... 131 7.1 Introduction ...... 132 7.2 HACMP cluster configuration for hot standby ...... 132 7.2.1 Hot standby setup for a host DB2 server ...... 134 7.2.2 Hot standby setup for a Data Links server ...... 135 7.3 HACMP cluster configuration for mutual takeover...... 136 7.3.1 Configuration...... 137 7.3.2 Sequence of events ...... 141 7.4 The scripts ...... 142 7.4.1 Additional considerations for DB2 Universal Database Version 6 . 146 7.4.2 Final considerations ...... 147

Chapter 8. Creating a new database ...... 149 8.1 Overview ...... 150 8.2 Backup ...... 151 8.3 EXPORT (dlfm_export)...... 152 8.4 The db2look command ...... 154 8.5 The restore command ...... 155 8.6 Copying the linked files ...... 156 8.7 DLFM commands ...... 157

Contents v 8.8 Running the Import utility ...... 157 8.9 Running the Load utility ...... 158

Chapter 9. Data replication...... 161 9.1 Overview of DB2 replication ...... 162 9.2 Why replicate linked files ...... 162 9.3 Supported platforms ...... 163 9.4 Replication components ...... 164 9.4.1 Change-capture ...... 164 9.4.2 Apply ...... 166 9.4.3 Subscription sets and subscription set members ...... 167 9.5 Data Links replication ...... 168 9.5.1 Capturing DATALINK values...... 169 9.5.2 How Apply handles DATALINK values ...... 169 9.6 Implementing replication with Data Links ...... 172 9.6.1 Before we begin ...... 172 9.6.2 Defining the replication source ...... 174 9.6.3 Defining the subscription set and subscription set member ...... 178 9.6.4 Configuring the source database ...... 182 9.6.5 Binding the Capture and Apply programs ...... 183 9.6.6 Creating the password file for the Apply program ...... 183 9.6.7 Configuration files used by ASNDLCOPY...... 184 9.6.8 Configuration files used by ASNDLCOPYD ...... 187 9.6.9 Starting and stopping the Capture and Apply programs ...... 188

Chapter 10. The Reconcile utility...... 191 10.1 Overview ...... 192 10.2 When to run the Reconcile utility ...... 194 10.3 Situations that require the Reconcile utility ...... 196 10.3.1 Reconcile algorithm...... 197

Chapter 11. Recovery ...... 201 11.1 Overview ...... 202 11.1.1 Crash recovery ...... 202 11.1.2 Version or full database recovery ...... 205 11.1.3 Restore and rollforward recovery ...... 207 11.2 DLFM backup considerations ...... 208 11.2.1 Environment backup considerations ...... 210 11.3 DLFM restore considerations ...... 211 11.4 Recovery history file ...... 214 11.4.1 Events recorded in the history file ...... 214 11.4.2 Data recorded in the history file ...... 215 11.5 Restoring an offline backup without rollforward...... 215 11.6 Restoring and rolling forward to a point in time ...... 219

vi Data Links: Managing Files Using DB2 11.7 Tablespace recovery ...... 224 11.8 Recovering the dlfm_db to a point in time...... 231

Chapter 12. Garbage collection ...... 235 12.1 Garbage collection ...... 236 12.2 Garbage collection scenario ...... 238

Chapter 13. Migrating to DB2 UDB Version 7 ...... 243 13.1 Migration options ...... 244 13.1.1 DB2IMIGR and MIGRATE database commands ...... 244 13.1.2 Migrating the DB2 UDB V6.x database server ...... 250 13.1.3 Migrating databases using an offline backup ...... 254

Chapter 14. Moving a Data Links file system to a new disk ...... 259 14.1 Migrating a DLFS-enabled file system (AIX) ...... 260 14.2 Migrating a DLFS-enabled file system (Solaris) ...... 262

Chapter 15. Replacing or upgrading a machine...... 265 15.1 Replacing or upgrading a DB2 machine ...... 266 15.1.1 Assumption ...... 266 15.1.2 Steps to perform ...... 266 15.2 Replacing or upgrading a DLFM machine...... 267 15.2.1 Steps to perform ...... 267

Chapter 16. Problem determination...... 269 16.1 Solving problems ...... 270 16.1.1 Problem solving process ...... 270 16.1.2 Information needed to analyze a problem...... 271 16.1.3 DB2 Universal Database or DLFM ‘hang’ situations ...... 273 16.1.4 DB2 Universal Database or DLFM crash ...... 275 16.1.5 The DB2 Trace ...... 276 16.2 Solutions to common problems...... 286 16.2.1 Available resources...... 287 16.2.2 DLFM server problems ...... 287 16.2.3 DB2 server problems ...... 290 16.2.4 File system problems ...... 292 16.2.5 Frequently Asked Questions (FAQs) ...... 294

Appendix A. BNF specifications for DATALINK ...... 297

Appendix B. Overview of DCE-DFS on AIX...... 301 Distributed Computing Environment (DCE) ...... 302 Distributed File Service (DFS) ...... 303

Contents vii Appendix C. VPM and Data Links ...... 307 Installation overview ...... 308 Installing DB2 Data Links Manager 6.1 GA...... 309 Preliminary installation steps...... 310 Data Links post-installation ...... 311 Making Data Links work with VPM ...... 312 VPM and Data Link tokens ...... 314 Adapting VPM to work with Data Links ...... 317 Writing a model ...... 320 Additional information...... 329

Appendix D. Logging priorities for DLFF and DLFSCM...... 331 Modifying the DLFF logging priorities on AIX...... 332 Modifying the DLFSCM logging priorities in DCE-DFS (on AIX) ...... 334 Modifying the DLFF logging priorities on Solaris ...... 336 Modifying the DLFF logging level on Windows ...... 337

Related publications ...... 339 IBM Redbooks ...... 339 Other resources ...... 339 Referenced Web sites ...... 340 How to get IBM Redbooks ...... 341 IBM Redbooks collections...... 341

Special notices ...... 343

Index ...... 345

viii Data Links: Managing Files Using DB2 Figures

1-1 Architecture of the Data Links technology ...... 4 2-1 Data Links overview in UNIX and Windows environments ...... 16 2-2 Data Links overview in a DCE-DFS environment ...... 17 2-3 DATALINK data type ...... 18 2-4 Retrieving the Data Link value ...... 19 2-5 Accessing Data Linked files through a browser ...... 20 2-6 DATALINK column definition syntax ...... 29 2-7 Relationship between DB2 servers and Data Links servers ...... 34 2-8 DLFM process model: DB2 server...... 38 2-9 DLFM process model: Data Links Manager...... 39 2-10 DLFM process model: Complete picture ...... 40 2-11 Attributes before the link operation ...... 42 2-12 Attributes after the link operation ...... 42 2-13 Overview of Data Links implementation...... 46 2-14 Link-file operation...... 49 2-15 Control flow of SQL insert statement ...... 50 2-16 Unlink process ...... 52 2-17 Commit processing transactions ...... 56 2-18 DLFMs in a single DCE cell ...... 59 2-19 The DMAPP implementation ...... 61 2-20 Data Links architecture on DCE-DFS ...... 63 3-1 DATALINK access token ...... 73 3-2 DATALINK options stored in SYSCOLPROPERTIES table ...... 75 3-3 Using multiple DLFM file servers ...... 83 3-4 Externalizing LOB data ...... 89 3-5 Moving LOB table data to DATALINK table ...... 90 4-1 Single server implementation...... 92 4-2 Single UDB and one to many DLFMs ...... 93 4-3 Multiple UDBs and a single DLFM ...... 94 4-4 Multiple DB2 and multiple DLFMs ...... 95 5-1 Select from sysibm.syscolproperties ...... 102 5-2 List databases and Data Links Managers ...... 102 5-3 The dlfs file systems ...... 103 6-1 Storage management ...... 112 6-2 Policy concepts ...... 113 6-3 Tivoli Space Manager overview ...... 117 6-4 Data Links and Tivoli Space Manager ...... 125 6-5 Selective Migration of READ PERMISSION DB file ...... 127

© Copyright IBM Corp. 2001 ix 6-6 dostatfs.c ...... 128 6-7 VFS numbers of DLFS and FSM ...... 128 6-8 Result of dostatfs on /dlfsfsmtest ...... 129 6-9 dsmls utility behavior ...... 129 7-1 Host DB2 (or) Data Links File Manager cluster ...... 133 7-2 Mutual takeover environment...... 137 7-3 The /var/db2 files show the global variables and instances...... 140 7-4 The dlfs_cfg file must exist on both servers ...... 140 7-5 The contents of /etc/vfs ...... 141 7-6 List of dlfm_ programs ...... 147 8-1 The steps used to create the new database ...... 151 8-2 Backup database command ...... 152 8-3 Quiesce and export to the IXF file type ...... 152 8-4 Contents of the export control file ...... 153 8-5 Sample dlfm_export ...... 153 8-6 Export using delimited output...... 154 8-7 Delimited file before and after editing ...... 154 8-8 The db2look command and the output it produced ...... 155 8-9 Restore command, get dbm cfg, and list datalinks managers ...... 156 8-10 Sample dlfm_import ...... 157 8-11 The dlfm add_db and dlfm add_prefix commands...... 157 8-12 Import delimited file with DATALINK column type ...... 158 8-13 The Load utility...... 159 9-1 Change Capture...... 165 9-2 Defining a replication source ...... 166 9-3 Apply program data flow ...... 167 9-4 Subscription set and subscription set members ...... 168 9-5 DATALINK values before and after replication ...... 169 9-6 File reference mapping ...... 170 9-7 SOURCE.MANAGERS table ...... 172 9-8 SOURCE.MANAGERS table contents...... 173 9-9 Environment before replication ...... 173 9-10 Defining a replication source ...... 174 9-11 Selecting columns to be replicated ...... 175 9-12 Saving the replication source definition ...... 175 9-13 SQL to define the replication source ...... 176 9-14 Defining the replication source by running an SQL file ...... 176 9-15 Viewing the replication source ...... 177 9-16 Defining the replication subscription ...... 178 9-17 Define replication subscription dialog ...... 178 9-18 Changing the target table name...... 179 9-19 Selecting the primary key for the target ...... 179 9-20 Restricting replicated rows...... 180

x Data Links: Managing Files Using DB2 9-21 Subscription timing...... 181 9-22 Saving the replication subscription ...... 181 10-1 Reconcile warning when DLFM server is not available ...... 192 10-2 Extract of a lock snapshot for a table being reconciled ...... 193 10-3 Output of the db2_recon_aid utility with the CHECK option ...... 193 10-4 Extract of db2diag.log showing a table in DRP state ...... 194 10-5 Extract of a DB2DART report showing a table in DRP state ...... 194 10-6 Determining when to run the Reconcile utility ...... 196 11-1 Two-phase commit...... 204 11-2 Version or full database recovery ...... 206 11-3 Rollforward recovery ...... 207 11-4 Asynchronous archive request...... 209 11-5 Processing that takes place during a backup ...... 210 11-6 Environment backup considerations ...... 211 11-7 Processing that takes place during a restore...... 212 11-8 Restore with the WITHOUT DATALINK option ...... 212 11-9 Restore without specifying the WITHOUT DATALINK option ...... 213 11-10 Selecting results prior to insert and restore ...... 216 11-11 The ls results of the Data Link file system prior to insert ...... 216 11-12 Inserting and selecting after a new link ...... 217 11-13 List files after the link operation has completed ...... 218 11-14 Restore command and files that were unlinked ...... 218 11-15 Restore of an offline backup ...... 219 11-16 List history to find backup and point in time ...... 220 11-17 Restore with rolling forward and rollforward pending status ...... 221 11-18 Rollforward to obtain minimum CUT time ...... 221 11-19 Rollforward and log messages...... 222 11-20 Select statement with warning message ...... 222 11-21 Reconcile command and log messages ...... 223 11-22 Restore and rollforward to a point-in-time ...... 224 11-23 Removing dlfm_backup files and removing a Data Linked file ...... 225 11-24 Tablespace restore and rollforward ...... 225 11-25 Using db2dart to see the table status of DRP ...... 226 11-26 Selecting the data before reconcile is run ...... 227 11-27 Reconcile and the exceptions ...... 228 11-28 The ddl to create the exception table for reconcile ...... 228 11-29 Information from the exception table for the reconcile ...... 229 11-30 Selecting the data after reconcile has run ...... 230 11-31 Tablespace recovery scenario ...... 231 11-32 Restore command and dlfm stop ...... 232 11-33 Rollforward and messages ...... 232 11-34 The list registered databases output ...... 233 11-35 The db2_recon_aid utility and output...... 233

Figures xi 11-36 DLFM_DB database point-in-time recovery...... 234 12-1 Expired database backups...... 237 12-2 Four database backups are taken ...... 238 12-3 Active database backup being restored...... 239 12-4 Database backups taken with a new log sequence number ...... 239 12-5 Backup (BK1) is marked as expired...... 240 12-6 New log sequence created after restore of backup (BK6) ...... 240 12-7 Garbage collection marks backup BK2 as expired ...... 241 12-8 All backups prior to and including BK5 are marked as expired ...... 241 12-9 Inactive databases may become active because they are retained . . 242 13-1 DB2DART utility output reporting no errors ...... 245 13-2 Verifying that the database can be migrated with the db2ckmig utility 246 13-3 Instance migration using the db2imigr utility ...... 246 13-4 Connecting to a database that requires migration ...... 247 13-5 Successful migration of the database using the migrate command. . . 247 13-6 Verifying that the database can be migrated with the db2ckmig utility 248 13-7 Instance migration using the db2imigr utility ...... 248 13-8 Successful migration of the DLFM instance...... 249 13-9 Output of the db2set command ...... 249 13-10 DB2DART utility output reporting no errors ...... 252 13-11 Stopping DB2 Services on Windows NT ...... 252 13-12 Verifying that the database can be migrated with the db2ckmig utility 253 13-13 Extract of a recovery history file...... 256 13-14 Restoring into an existing database...... 256 13-15 Rollforward completing with a warning ...... 258 16-1 Extract of an entry written to the db2diag.log file ...... 273 16-2 Information about each component in the db2diag.log file ...... 273 16-3 Extract of a trap file ...... 274 16-4 Extract of a trace entry in the formatted trace file ...... 278 16-5 Information about each component in a formatted trace file ...... 279 16-6 An SQL1036 error message when connecting to the database . . . . . 280 16-7 Extract of the DB2DIAG.LOG with the SQL1036 error message. . . . . 281 16-8 Output of the DB2 Trace format command ...... 282 16-9 Function flow structure...... 283 16-10 Extract of the trace flow file ...... 284 16-11 Extract of trace flow showing the SQL1036 error ...... 285 16-12 Trace format file ...... 286 B-1 DCE architecture ...... 302 B-2 CDS entry in DNS format...... 305 C-1 DB2 V6 Fixpak 5 ...... 310 C-2 Interpreting DL_FEATURES values...... 319 C-3 Creating a model in VPM ...... 321 C-4 Creating and saving a model ...... 322

xii Data Links: Managing Files Using DB2 C-5 Confirm Write ...... 322 C-6 Saved model in VPM ...... 323 C-7 Read-Only file ...... 324 C-8 Opening a model in CATIA ...... 325 C-9 A model in CATIA ...... 326 C-10 File under Data Links control now ...... 327 C-11 Backup directory ...... 328 C-12 Files backed up under the Backup directory ...... 328

Figures xiii xiv Data Links: Managing Files Using DB2 Tables

2-1 Arguments to the SQLBuildDataLink function ...... 22 2-2 Possible combinations of DATALINK attributes...... 29 2-3 DLFM results and corresponding actions by DLFF ...... 45 3-1 DATALINK options...... 71 3-2 Host language variable declaration for DATALINKS data type ...... 76 4-1 Parameters that can affect the size of the archive directory ...... 96 B-1 Some commonly used terms in DCE-DFS environment ...... 306 C-1 Creating your file systems ...... 310

The amount of data that is stored digitally is growing rapidly because computer systems and storage systems have become very affordable. The file paradigm is very common for such data types as video, image, text, graphics, and engineering drawings because capture, edit, and delivery tools use the file paradigm for these data types. A large number of applications store, retrieve, and manipulate data in files. Many of these applications need search capabilities to find the data in the files. These search capabilities, however, do not require physically bringing the data into the database system, because their raw content is not needed for the query.

Typically, you would extract features of an image or a video and store them in the database for performing a search on the extracted features. The applications combine the search capabilities of SQL with the advantages of working directly with files to manipulate the raw data. In general, the approach involves the ability to store a reference to such files, along with parametric data that describes their contents.

Data Links is a new feature of DB2 Universal Database (UDB) that extends the management umbrella of the relational database management system (RDBMS), to data stored in external operating system files as if the data was stored directly in the database. Data Links provides several levels of control over external data such as referential integrity, access control, coordinated backup and recovery, and transaction consistency.

This IBM Redbook provides you with sufficient information to effectively deploy Data Links in a complex environment. First it describes the technical architecture of Data Links, developing applications in a Data Links environment, and planning a deployment of Data Links. Then, it covers administering a Data Links environment, setting up Tivoli Storage Manager as a backup server with Data Links, and implementing high-availability cluster multiprocessing (HACMP) with Data Links. It includes a full chapter on data replication and, in particular, the replication of Data Linked files. It then describes the Reconcile utility and how the DB2 backup and recovery mechanism supports Data Links. This redbook concludes by providing some hints and tips for problem determination in a Data Links environment.

© Copyright IBM Corp. 2001 xvii This IBM Redbook is intended to be read by anyone who requires both introductory and detailed information on Data Links. Prior to reading this redbook, you should have a good understanding of DB2 Universal Database, and in particular, be familiar with data replication, database backup, and recovery concepts.

The team that wrote this redbook

This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization (ITSO), San Jose Center.

Rodolphe Michel is a Senior Data Management Specialist for DB2 UDB on UNIX and Windows NT at the ITSO, San Jose Center, where he conducts projects on all areas of DB2 UDB. He writes extensively and teaches IBM classes and workshops worldwide on all areas of DB2 Universal Database.

Amit Arora is a Sr. Software Engineer in IBM India Software Labs. He has two years of experience as a developer in the Data Links Project. He holds a Bachelor of Engineering (Honors) degree in Computer Science from REC Durgapur, India. His areas of expertise include UNIX internals and Data Links technology.

Kevin Crooks is a Database Administrator for the Boeing Company in Seattle, Washington (USA). He has 12 years of experience on DB2 for OS/390 and four years of expertise in the DB2 Universal Database field. He has worked at Boeing for 15 years. His areas of expertise include Data Links and DB2 UDB on AIX. He is also an IBM certified DB2 UDB database administrator (DBA).

Aman Lalla is a DB2 UDB Engine Support Specialist at the IBM Toronto Laboratory in Canada. He has five years of experience with DB2 on the UNIX and Intel platforms. His areas of expertise include database recovery and problem determination. He has two years Data Links experience. Prior to joining the IBM Toronto Lab, he was part of IBM Global Services South Africa providing on-site DB2 Common Server/UDB customer support.

David Shields is a DB2 Database Administrator for the Boeing Company in Seattle, Washington (USA). He has worked with DB2 for five years, including two years on OS/390 and three years on AIX. He provides database support to the Boeing engineering communities in Seattle and St. Louis, Missouri. He also worked as an IMS DBA for nine years prior to working with DB2.

xviii Data Links: Managing Files Using DB2 Thanks to the following people for their contributions to this project:

Nagraj Alur Karen Brannon Vitthal Gogate Joshua W Hui Inderpal Narang (Inventor of the Data Links technology) Ajay Sood Mahadevan Subramanian Parag Tijare IBM Almaden Research Center, San Jose, USA

Poorna Ambati Frank Butt Steven Elliot (Manager of the DB2 UDB Data Links Development) Graziela Kunde Bomma Shashidhar Mohan V Singamshetty S R Sreejith IBM Silicon Valley Lab, San Jose, USA

Suparna Bhattacharya Amit Das IBM Software Labs, Bangalore, India

Brian Baker and Amr Roushdi, of IBM Dassault Systèmes International Competency Center (IDSICC), Paris, France, who gave us permission to reproduce their report “Installing & Configuring VPM with DB2 Data Links” in Appendix C, “VPM and Data Links” on page 307.

Special notice

This publication is intended to help database developers, database administrators, and system administrators to deploy a Data Links environment. The information in this publication is not intended as the specification of any programming interfaces that are provided by DB2 Universal Database or Data Links. See the PUBLICATIONS section of the IBM Programming Announcement for the above products for more information about what publications are considered to be product documentation.

Preface xix IBM trademarks

The following terms are trademarks of the International Business Machines Corporation in the United States and/or other countries:

e (logo)® Redbooks Logo AFS® OS/2® AIX® OS/390® AS/400® Perform™ DataPropagator™ Redbooks™ DB2® RETAIN® DB2 Universal Database™ S/390® DFS™ SP™ DPI® Tivoli® DRDA® TME® IBM® Lotus® IBM.COM™ Lotus Notes® Informix™ Notes® MVS™ Domino™

Comments welcome

Your comments are important to us!

We want our IBM Redbooks to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways: Use the online Contact us review redbook form found at: ibm.com/redbooks Send your comments in an Internet note to: [email protected] Mail your comments to the address on page ii.

xx Data Links: Managing Files Using DB2 1

Chapter 1. Introduction

DB2 is the IBM family of relational database management systems (RDBMS) products, with DB2 Universal Database (UDB) being the company's flagship for the implementation of object-relational extensibility. Data Links is a new feature of DB2 UDB, which extends the management umbrella of the RDBMS, to data stored in external operating system files as if the data was stored directly in the database. DB2 Data Links is available on the following environments: Journaled File System (JFS) on IBM AIX File System Migrator (FSM) on IBM AIX Distributed File Service (DFS) in Transarc’s Distributed Computing Environment (DCE) on IBM AIX UNIX File System (UFS) on SUN Solaris NTFS-formatted drive on Windows NT Integrated File System (IFS) on IBM ~ iSeries (AS/400)

Data Links provides several levels of control over external data such as referential integrity, access control, coordinated backup and recovery, and transaction consistency.

Referential integrity is supported with Data Links to ensure that users cannot delete or rename any external file as long as it is referenced in the database. Access control is enhanced with DB2’s permission used to grant or deny a user the ability to read a referenced external file, with read access control being

© Copyright IBM Corp. 2001 1 optional. With coordinated backup and recovery, the DBMS is responsible for backup and recovery of external data in synchronization with the associated database; this type of control over external data is optional. Transaction consistency requires that changes that affect both the database and external file be executed within a transactional context to preserve the logical integrity and consistency of the data.

1.1 Why Data Links

The amount of data stored digitally is growing rapidly because computer systems and storage systems have become very affordable. The file paradigm is very common for such data types as video, image, text, graphics, and engineering drawings because capture, edit, and delivery tools use the file paradigm for these data types. A large number of applications store, retrieve, and manipulate data in files.

These applications may use files to store their data for one or more of the following reasons: Cost You should consider the expense required to rewrite applications that use standard file I/O semantics to use a database as a repository. Also, your applications may use existing tools that work with the file paradigm. Replacing these tools can be expensive. Performance The store and forward model of data is unacceptable for performance reasons. For example, it may be unacceptable for the database manager to materialize a Binary Large Object (BLOB) into a file, and the converse, each time the data needs to be accessed as a file. Also, data is captured in high volumes, and you do not want to store it in the database. Network considerations You want to access data directly from a file server that is physically close to a workstation. For example, the file server can be configured so that the network distance is much shorter to the user, compared to the database where all the BLOBs are stored. The number of bytes that flow for a large object are much larger than the number of bytes for an answer of an SQL query. Network distance between resources is, therefore, a significant consideration.

2 Data Links: Managing Files Using DB2 Isochronous delivery The application uses a stream server because it has real time requirements for delivery and capture. The data is expected to be large, and you may require isochronous delivery. An example of isochronous delivery may be a video server that delivers high-quality (or “jitter-free”) video to a client workstation in real time. In these kinds of applications, it is likely that such data will not be moved into the database as a BLOB, but rather stay on the file server.

Many of these applications need search capabilities to find the data in the files. These search capabilities, however, do not require physically bringing the data into the database system, because their raw content is not needed for the query. Typically, you would extract features of an image or a video and store them in the database for performing a search on the extracted features. An example of the features that can be extracted from an image are color, shape, and texture. The IBM DB2 Universal Database Extender for Image product supports extraction and search functions on such features.

The ability to store a reference to such files, along with parametric data that describes their contents is, in general, the approach used by these applications to combine the search capabilities of SQL with the advantages of working directly with files to manipulate the raw data. The DB2 relational extenders for text, voice, image (and so on) provide this functionality. The extenders allow you to specify whether the object itself is to be maintained either inside or outside the database.

Currently, the DB2 relational extenders do not provide referential integrity between files on a server and their references in databases. Therefore, it is possible to independently delete either the reference or the file. Moreover, the extenders do not provide access control to the related files or coordinated backup and recovery schemes for a database and its associated files.

DB2 Data Links technology solves these problems and provides the functionality required by such applications. Future releases of the DB2 relational extenders will use Data Links technology.

1.2 Data Links overview

By extending the reach of the RDBMS to operating system files, Data Links gives users flexibility to store data inside or outside the database as appropriate. To store and reference data outside of a DBMS, a database application developer declares a column of DATALINK data type when creating an SQL table. The value stored in the DATALINK column is then used to represent and reference data in an external file.

Chapter 1. Introduction 3 Figure 1-1 illustrates the architecture of the Data Links technology. As shown in this figure, Data Links has two components: Data Links engine Data Links Manager

DB2 Application Archive Server SQL Access Path Standard File Access Protocol

Data Links File Manager (DLFM)

Control DLFM_DB Data Links File Path for (meta data System Filter Data Links repository) (DLFF) Integrity db2agents Native File S ystem : JFS, NTFS, UFS DB2 Server with (Solaris), Data Links Ext. Data Links Manager DFS-DCE/AIX on File Server Storage

Figure 1-1 Architecture of the Data Links technology

The Data Links engine resides on the host database server and is implemented as part of the database (DB2) engine code. It is responsible for processing SQL requests involving DATALINK columns such as table creation, and select, insert, delete, and update of records with a DATALINK column.

The Data Links Manager consists of two components: Data Links File Manager (DLFM) Data Links File System Filter (DLFF)

At a high level, DLFM applies constraints on the files that are referenced by the host database, and DLFF enforces the constraints when file system commands or operations affect these files. For example, a file rename or delete would be rejected if that file was referenced by the database.

1.2.1 Data Links File Manager (DLFM) The Data Links File Manager resides with the file server, which can be local or remote to the host database server, and plays a key role in managing external files. It is responsible for executing the link/unlink operations with transactional semantics within the file system. To do this, DLFM maintains its own DB2 repository about files that are linked to (referenced in) the database. When a file

4 Data Links: Managing Files Using DB2 is initially linked to the database, the DLFM applies the constraints for referential integrity, access control, and backup and recovery as specified in the DATALINK column definition. If the DBMS controls read access, for example, the DLFM changes the owner of the file to the DBMS and marks the file “read only” as well.

All these changes to the DLFM repository and to the file system are applied as part of the same DBMS transaction as the initiating SQL statement. If the SQL statement is rolled back, the changes made by the DLFM on the file system side are undone as well.

The DLFM is also responsible for coordinating backup and recovery of external files with the database. When the DBMS transaction that includes a Link File operation commits and the DBMS is responsible for recovery of the file, the DLFM initiates a backup of the newly linked file. This file backup is done asynchronously and is not part of the database transaction for performance reasons.

In addition, note that by doing it this way, the database backup itself is not slowed down because the referenced file has already been backed up. This is particularly important in the case of very large files. Coordinated backup and recovery of external files with DB2 data can be done directly to disk or through an archive server supported by DB2 UDB, such as Tivoli Storage Manager.

1.2.2 Data Links File System Filter (DLFF) The Data Links File System Filter is a thin, database-control layer on the file system that intercepts certain file system calls (for example, file-open, file-rename, and file-delete) issued by the application. If the file is referenced in a database, the DLFF is responsible for enforcing referential integrity constraints and access-control requirements defined for the file. This ensures that any access request meets DBMS security and integrity requirements.

The DLFF will, for example, reject a user-level request to rename or delete a file referenced by the database. This avoids “dangling pointers” in which a file is referenced by the database, but the actual file does not exist. DLFF also validates any authorization token embedded in the file pathname for a file-open operation.

Data Links provides a new and innovative DBMS capability. By providing tight integration of file system data with the object-relational DBMS, Data Links allows DB2 UDB to guarantee the integrity of data whether it is stored inside or outside the database. Although companies in the CAD/CAM application marketplace

Chapter 1. Introduction 5 were the early supporters of Data Links, Data Links applies to application problems in a wide variety of market segments, especially as it relates to content management. Web, Internet, and e-commerce applications are important examples of these new market segments.

1.2.3 The DATALINK data type Data Links technology includes the DATALINK data type that is implemented as an SQL data type in DB2 Universal Database, which references an object stored external to a database.

You use the DATALINK data type, just like any other SQL data type, to define columns in tables. In NT File System (NTFS) and JFS environments, the DATALINK values encode the name of a Data Links server containing the file and the filename in terms of a Uniform Resource Locator (URL). The DATALINK value is robust in terms of integrity, access control, and recovery. DB2 treats a DATALINK value as if the object were stored in the database. You register a set of known Data Links servers. The only Data Links server names that you can specify in a DATALINK value are those that have been registered to a DB2 database.

In Distributed Computing Environment-Distributed File Service (DCE-DFS) environments, the Data Links Manager is registered for the entire cell, and linked files are referred to in terms of a URL with a scheme – dfs and the DFS pathname of the file.

Even though the DATALINK value represents an object that is stored outside the database system, you can use SQL queries to search parametric data to obtain the file name that corresponds to the query result. You can create indexes on files containing video, images, text, or other media formats, and store those attributes in tables along with the DATALINK value. With a central repository of files on a file server and DATALINK data types in a database, you can obtain answers to questions like: What do I have? Where can I find what I’m looking for?

These are examples of applications that can use the DATALINK data type: Medical applications, in which X-rays are stored on the file server and the attributes are stored in a database. Entertainment industry applications that perform asset management of video clips. The video clips are stored on a file server, but attributes about the clips are stored in a database. Access control is required for accessing the video clips based on database privileges of accessing the meta information.

6 Data Links: Managing Files Using DB2 World Wide Web applications that manage millions of files and allow access control based on database privileges. Financial applications, which require distributed capture of check images and a central location for those images. CAD/CAM applications, where the engineering drawings are kept as files, and the attributes are stored in the database. Queries are run against the drawing attributes.

1.3 Applications that use Data Links

Among the applications that use Data Links, there are two applications that illustrate the wide range of applications that can benefit greatly from Data Links: Link Integrity+ Dassault Systems’ VPM product

1.3.1 Link Integrity+ Link Integrity+ is a Web asset integrity solution from the IBM Almaden Research Center in San Jose, California. It exploits IBM DB2 UDB’s unique Data Links technology to guarantee the referential integrity (RI) of an intranet’s Web objects such as Web pages, hyperlinks, images, server-side-programs, and templates.

While there are many products in the marketplace that report on broken links and missing images “after-the-fact”, Link Integrity+ proactively prevents the occurrence of broken links and the irksome “404 file not found” message. It does this by inhibiting any malicious or accidental changes to Web pages that could compromise the referential integrity of Web assets.

Link Integrity+'s architecture supports a two-phase approach to delivering Web content: Phase 1: Validates the referential integrity of hyperlinks, images, server-side programs and templates Phase 2: “Installs” the Web content on the Web site in atomic fashion, with minimal transient problems

Link Integrity+ also supports the enforcement of an organization's guidelines for Web content such as the inclusion of appropriate headers, footers, and disclaimers. A critical Link Integrity+ function is its support of multiple independent webmaster domains within a geographically distributed intranet of heterogeneous Web servers. It includes an e-mail and pager notification system that alerts webmasters to the impact on Web pages in their domain, of deletions, or updates of Web pages in another webmaster's domain.

Chapter 1. Introduction 7 Link Integrity+ exploits IBM DB2 UDB Data Links, Java JDBC, Java Mail, Java Beans Activation Framework, Java Native Interface (JNI), Structured Query Language (SQL), and Extensible Markup Language (XML) technologies in its implementation.

The Link Integrity+ architecture provides the ability to deliver a significantly higher level of Web asset integrity to an organization's intranet. It synergistically integrates the IBM DB2 UDB unique Data Links technology with innovative application design. Link Integrity+ is a prototype that demonstrates that its architecture is capable of supporting the “real world” environment of geographically distributed heterogeneous Web sites with multiple webmasters managing multiple domains or sub-domains. Its staging area approach enables the enforcement of referential integrity of Web assets and the enforcement of an organization's guidelines for Web content.

Because it is the main conduit for getting content on the Web, Link Integrity+ can be educated to become sensitive to information of interest to specific individuals. In other words, Link Integrity+ can be integrated with personalization and information delivery systems to notify and deliver in very timely fashion, information to individuals based on available user-profiles and subscription information. The Link Integrity+ trigger mechanism for use by content developers significantly enhances the productivity of webmasters by taking over routine and mundane activities, and only alerting them to get involved when problems are detected. By guaranteeing the integrity of an intranet's hyperlinks, the chances of an end user encountering the “404 file not found” message is minimized, which contributes to a positive experience for the user visiting the Web site. Note that an end user may still experience the “404 file not found” message due to caching of pages that may occur in the browser, Internet Service Provider (ISP), proxy, and other caches.

For more information, refer to “Link Integrity+: A Web Asset Integrity Solution”, Nagraj Alur, Ramani Ranjan Routray, IBM Almaden Research Center paper.

1.3.2 VPM with DB2 Data Links This demonstrates a methodology behind how IBM middleware (DB2 and Data Links) can provide solutions for data archive and restoration on a large enterprise basis. This applies specifically when working with IBM & Dassault Systemes CATIA and VPM.

8 Data Links: Managing Files Using DB2 Transaction consistency: If a transaction is rolled back in the database, the link to the appropriate version of the file at this site is maintained. Security and access: Files controlled by Data Links can either be totally protected by the database, preventing unauthorized file system access, or opened, to allow file system access. Synchronized backup and recovery: Using DB2 with Data Links ensures consistent backup and recovery of ENOVIAVPM meta data and the associated CATIA models. This makes the overall process more automatic and less database administrator (DBA)-intensive. In the past, administrative tasks were performed outside of the CATIA environment. This required a separate backup strategy for external CATIA files, which introduced a large risk of inconsistencies between the database and related external files.

For additional information, refer to Appendix C, “VPM and Data Links” on page 307.

Chapter 1. Introduction 9 10 Data Links: Managing Files Using DB2 2

Chapter 2. Technical architecture

This chapter provides a detailed description of the Data Links technical architecture. The following topics are discussed: Overview of the Data Links architecture The SQL data type DATALINK How Data Links maintains security The different components of Data Links on AIX, Solaris, and Windows The different components of Data Links on DCE-DFS

DB2 Data Links can be installed on the following environments: Journaled File System (JFS) on IBM AIX File System Migrator (FSM) on IBM AIX

Note: FSM is the file system filter for Tivoli Space Manager client (also known as Hierarchical Storage Manager (HSM)), which provides the space management capabilities. Data Links support for Tivoli Space Manager is discussed in Chapter 6, “Using Tivoli Storage Manager” on page 107.

Distributed File Service (DFS) in Transarc’s Distributed Computing environment (DCE) on IBM AIX

Note: Refer to Appendix B, “Overview of DCE-DFS on AIX” on page 301, for an overview of DCE-DFS.

UNIX File System (UFS) on SUN-Solaris NTFS-formatted drive on Windows NT Integrated File System (IFS) on IBM ~ iSeries (AS/400)

Note: Data Links on iSeries and AS/400 is out of the scope of this book. Refer to the IBM Redbook DB2 UDB for AS/400 Object Relational Support, SG24-5409, for Data Links implementation on the iSeries.

A typical environment using DB2 Data Links Manager (DLM) has the following components: Data Links server DB2 Universal Database server DB2 client

The following sections provide a brief overview of these components.

2.1.1 Data Links server A Data Links server consists of the following components: Data Links File Manager (DLFM) Either one of the following Data Links File System Filters (DLFF)

12 Data Links: Managing Files Using DB2 – In JFS, FSM, NTFS, and UFS environments – Data Manager Application (DMAPP) in DCE-DFS environments DB2 Logging Manager

Data Links File Manager (DLFM) DLFM is a set of user-level processes that keeps track of all the files on a particular Data Links server that are linked to a DB2 database. The DLFM receives and processes link-file and unlink-file messages that arise from SQL INSERT, UPDATE, and DELETE statements that reference a DATALINK column.

For each linked file, the DLFM tracks: The database instance The fully qualified table name The column name referred to in the SQL statement

Another vital role of the DLFM is to respond to all the queries sent by the DLFF (in AIX, Solaris, and Windows NT) or the DMAPP (in a DCE-DFS environment). These queries can be requests for a file or token information.

Definition: A token is a dynamically generated string used to provide users access to read a file under READ PERMISSION DB control. The DLFF rejects any operation that tries to access the READ PERMISSION DB file without a valid token, unless it has been originated by the super-user. Refer 2.3.3, “How access tokens work” on page 32, to understand better the concept of tokens.

For referential integrity, DLFF (on UNIX and Windows NT) and DMAPP (on DCE-DFS) should be able to recognize all the files which are under the control of Data Links.

At the time of table creation, it is possible to specify some options for the DATALINK column. One of these options is RECOVERY=YES. This option allows DB2 to provide point-in-time, rollforward recovery for any file that has a reference in this DATALINK column. Therefore, if this option is specified at the time of creation of table, the DLFM not only keeps track of the currently linked files, but also tracks the previously linked files.

Note: All the DATALINK options (for example, READ PERMISSION DB and RECOVERY=YES) are discussed in 2.2.3, “DATALINK options” on page 26.

Chapter 2. Technical architecture 13 Data Links File System Filter (DLFF) This is a filter file system layer that sits on top of the base file systems like JFS, FSM, UFS and NTFS. It is also known as Data Links File System (DLFS). DLFF maintains referential integrity by ensuring that linked files are not deleted or renamed, and that the file's attributes are not changed. In case of READ PERMISSION DB files, it also filters commands to ensure that proper access authority exists. AIX and Solaris file systems under the control of a DLFF can be NFS-exported and mounted on a DB2 client. Windows NT file systems under the DLFF control can be netshared.

Data Manager Application (DMAPP) DMAPP maintains referential integrity of Data Linked files in a DCE-DFS environment by filtering commands to ensure that all the files that are linked under the Data Links control are not deleted, renamed, and that their file’s attributes are not changed.

Note: Refer Appendix B, “Overview of DCE-DFS on AIX” on page 301, for an introduction to DCE-DFS concepts and the various terms used here.

The DMAPP monitors file sets that reside in DMLFS aggregates that are Data Manager-enabled. Once an aggregate is Data Manager-enabled, the aggregate can contain file sets that may be brought under Data Links control. The DMAPP can then manage the data within these filesets after the aggregate is exported into the namespace. Making an Links File System aggregate Data Manager-enabled is part of the Storage Management Toolkit (SMT) provided by Transarc.

Note: As you have seen, DLFF is the filter file system on the Data Links server on AIX, Solaris and Windows NT. In DCE-DFS environments, the file system operations at the file server are filtered by the DMAPP. Therefore, it is also known as DLFS-DMAPP.

DB2 Logging Manager This is a component of DLFM that maintains the logging information in the DLFM_DB database. This DLFM_DB database contains registration information about the databases that can connect to a Data Links server. It also contains information about the mount points of the file systems on AIX or Solaris, or the sharename of the drives on Windows NT, that are managed by a DLFF. The DLFM_DB database also contains information about files that have been linked, unlinked, or backed up on a Data Links server or in a DCE cell. This database is created during the installation of DB2 Data Links Manager.

14 Data Links: Managing Files Using DB2 2.1.2 DB2 Universal Database server The DB2 Universal Database server is the location of the main database where the Data Links Manager is registered. In NTFS, JFS, FSM, and UFS environments, more than one Data Links Manager can be registered with a database.

The database may have tables with columns of the DATALINK data type. In DCE-DFS environments, the DB2 server can only register one DCE cell. Also, the DFS client must be installed on the DB2 server in order to allow access to configuration information that is stored in DFS.

Note: Data Links implementation for more than one DCE cell is currently not available, but IBM is considering it as a future enhancement.

The DB2 server and the Data Links server use a reserved port to communicate with each other. The default value of this port is “50100” at installation time and can be configured at installation time.

2.1.3 DB2 client The client connects to a remote DB2 server and accesses the tables with DATALINK columns. In UNIX environments, the remote client may directly access the Data Linked files by exporting the Data Links file system (file system under the Data Links control) from the Data Links server and mounting it through the Network File System (NFS) on the DB2 client. In Windows NT, the drive under Data Links control can be shared with the DB2 clients.

Figure 2-1 shows an overview of the interaction between a DB2 server, the DB2 Data Links Manager components, the backup media, and a remote client application in NTFS and JFS environments. The DB2 server has the user database (also known as the host database), which has the CELEBS table with a DATALINK column.

The client application performs the following actions to access a Data Linked file:

1. The client application issues a CONNECT statement to a database on a DB2 server.

2. The application then issues a SELECT statement on the table that contains a DATALINK column and receives the URL.

3. The application on the DB2 client uses a shared drive (on Windows NT) or an NFS-mount directory (on AIX or Solaris) to access the file from the file server (Data Links server).

Chapter 2. Technical architecture 15 These steps are shown in Figure 2-1 (in UNIX and windows environments) and in Figure 2-2 (in DCE-DFS environments).

User Database

CELEBS K Backup IN Media L A T DLFM A (Data Links D File Manager) DB2 Server

Logging Manager DLFM_DB Database DLFF (Data Links Filesystem (1) (2) Filter) Native (3) File System Shared Directory File or NFS Mount

DB2 Client

Data Links Server

Figure 2-1 Data Links overview in UNIX and Windows environments

In DCE-DFS environments on AIX systems, the applications use the DFS client, which is also a DB2 client to connect to the database and access the files. A DB2 Data Links DFS Client Enabler, also known as the DLFS Cache Manager (DLFS-CM), is required to access the files referenced by the DATALINK columns created with READ PERMISSION DB specified.

Figure 2-2 gives an overview of Data Links implementation in the DCE-DFS environment. It shows how the steps followed by an application on the DB2 client (which is also a DFS client) access a Data Linked file. You can see that only one of the Data Links servers in a DCE cell can have the DB2 Logging Manager (DLFM_DB). The other Data Links servers on different nodes connect to this DLFM_DB as a DB2 client.

16 Data Links: Managing Files Using DB2 DB2 Data Links Manager in One DCE-DFS Cell

Backup DLFM Media DB2 Data Links User Database Client Server to CELEBS (DFS Server) K

DLFM_DB I

DMAPP A

DMLFS DB2 Server FILE

(3) Backup Media (1) (2) DLFM DB2 Data Links Server Server (DFS Server) DMAPP DLFM_DB (3) DB2 Client Database Application DB2 DFS DMLFS FILE Client Enabler DFS Client

DB2/DFS Client

Figure 2-2 Data Links overview in a DCE-DFS environment

Note: Do not confuse DLFF with DLFS-CM. As mentioned earlier, DLFF is the file system filter layer on the Data Links server on AIX, Solaris, and Windows NT. DLFS-CM is the filter layer on a DB2 client in a typical DCE-DFS environment, from where applications access Data Linked files.

Chapter 2. Technical architecture 17 2.2 DATALINK data type

DB2 Data Links Manager introduces the base SQL data type DATALINK, which is now part of the ANSI, ISO, and ODBC standards. The DATALINK column allows you to store a reference to a file (in the form of a URL) that you want to put under the control of the Data Links File Manager (Figure 2-3). The files referred by the DATALINK values are treated by DB2 as if they were stored inside the database, and therefore, can fully benefit from RDBMS properties like referential integrity, access control, and recovery.

UNIX DB2DB2 UDB Database Database Database

File File Server Table

DATALINK Value

DATALINK Value Windows NT

File File Server

Figure 2-3 DATALINK data type

Even though the DATALINK value represents an object that is stored outside the database system, SQL queries can be used to search meta data (related information stored in other columns along with the DATALINK values) to obtain the file name that corresponds to the query result. Indexes on files containing video, image, text, or media formats can be created and stored as attributes in tables along with the DATALINK values.

Typically, an application programmer would insert rows in these tables with meta-data about the file and its file reference (DATALINK value). The referenced file is said to be “linked” under Data Links control. The applications can then do a search based on this meta-data information and locate the files of its interest. Next, the applications can access these files using native file system APIs (like fopen, fread etc.) or a Web browser.

18 Data Links: Managing Files Using DB2 For example, the CELEBS table contains a DATALINK column that has URLs to pictures of various celebrities. This table also stores some meta-data information about each celebrity. Using this information, a search (using an SQL SELECT statement) can be done on the table for your favorite celebrities. Figure 2-4 shows how you can access the DATALINK value of the picture for celebrities from India whose pictures are in the .jpg format.

Figure 2-4 Retrieving the Data Link value

Now the picture can be accessed via a Web browser using the URL (or the DATALINK value), as shown in Figure 2-5.

Chapter 2. Technical architecture 19 Figure 2-5 Accessing Data Linked files through a browser

Note: HTTP protocol considers the semi-colon (;) to be a special character, and therefore, its equivalent escape sequence (“%3B”) is used instead. If an application does a select on the table with a DATALINK column, receives a tokenized file name (in case of READ PERMISSION DB), and wants to access that file through a Web browser, it needs to replace the “;” character with the “%3B” escape sequence itself.

For the applications to update or delete a file on the file system that is under Data Links control, they have to unlink the file by deleting the corresponding entry from the DB2 UDB table first. The reason is that Data Links enforces referential integrity on all the files under its control, and therefore, does not allow any delete or update operation on the file.

Note: The Update-in-place feature (which allows you to update a file while it is under the Data Links control) will be available in DB2 Universal Database V8.x.

20 Data Links: Managing Files Using DB2 Data Links has been designed to support a distributed computing environment, with capabilities that include a DATALINK column in a DB2 UDB table that can reference multiple file systems spread over one or more file servers associated with different operating systems, such as UNIX and Windows.

A single Data Links Manager can be associated with DATALINK columns in one or more DB2 UDB databases. A DATALINK column can reference files residing in Transarc’s distributed file system DCE-DFS. Bi-directional coordinated replication of Data Linked files is supported in an atomic, automatic, and consistent way in conjuction with DB2 UDB’s database replication capabilities. Chapter 9, “Data replication” on page 161, discusses replication of DB2 UDB tables having DATALINK columns and the files referenced by them.

In JFS, UFS, and NTFS environments, the DATALINK values encode the name of a Data Links server that contains the file and the file name. The only Data Links server names that can be specified in the DATALINK value are those that have been registered to a DB2 database. In DCE-DFS environments, the Data Links Manager is registered for the entire cell.

A number of scalar functions (column functions) are provided with the DATALINK data type that allows access to individual parts of the URL, such as the server address part. This allows you, for example, to convert a VARCHAR (the normal way you would define a URL) into the DATALINK data type.

A DATALINK value can be assigned to (or inserted into) a column in any of the following ways: DLVALUE scalar function: This function can be used to create a new DATALINK value and assign it to a column. Unless the value contains only a comment or the URL is exactly the same, the assignment links the file. SQLBuildDataLink CLI function: A DATALINK value can be constructed as a CLI parameter of the CLI function SQLBuildDataLink. This value can then be assigned to a column. Unless the value contains only a comment or the URL is exactly the same, the assignment would link the file. The function interface is shown here: SQLRETURN SQLBuildDataLink( SQLHSTMT StatementHandle, SQLCHAR FAR *LinkType, SQLINTEGER LinkTypeLength, SQLCHAR FAR *DataLocation, SQLINTEGER DataLocationLength, SQLCHAR FAR *Comment, SQLINTEGER CommentLength, SQLCHAR FAR *DataLinkValue, SQLINTEGER BufferLength, SQLINTEGER FAR *StringLengthPtr);

Chapter 2. Technical architecture 21 The function arguments of SQLBuildDataLink() are explained in Table 2-1. Table 2-1 Arguments to the SQLBuildDataLink function Data type Argument Use Description

SQLHSTMT Statement Handle input Used only for diagnostic reporting

SQLCHAR * LinkType input Always set to SQL_DATALINK_URL

SQLINTEGER LinkTypeLength input The length of the LinkType value

SQLCHAR * DataLocation input The complete URL value to be assigned

SQLINTEGER DataLocationLength input The length of the DataLocation value

SQLCHAR * Comment input The comment, if any, to be assigned

SQLINTEGER CommentLength input The length of the Comment value

SQLCHAR * DataLinkValue output The DATALINK value that is created by the function

SQLINTEGER BufferLength input Length of the DataLinkValue buffer

SQLINTEGER* StringLengthPtr output A pointer to a buffer in which the length of *DataLinkValue (excluding the null-termination character) is returned. If DataLinkValue is a null pointer, no length is returned. If the number of bytes available to return is greater than BufferLength minus the length of the null-termination character, then SQLSTATE 01004 is returned. In this case, subsequent use of the DATALINK value may fail.

22 Data Links: Managing Files Using DB2 The DATALINK value can be retrieved from the table by running a SELECT statement. Portions of a DATALINK value can be assigned to host variables by the use of scalar functions (such as DLLINKTYPE or DLURLPATH). These functions are discussed in 2.2.2, “Scalar functions for DATALINK data type” on page 24.

Note: In a READ PERMISSION DB case, the URL value returned, as a result of an SQL SELECT query on the table with DATALINK column, has an access token attached with it. Therefore in CLI, embedded programming, and JDBC, use in assigning sufficient storage to the variables in which Data Links values have to be stored. This storage space should be sufficient to accommodate the URL value, and the access token embedded with it.

2.2.1 Attributes of DATALINK type A DATALINK data type can have the following attributes: Link type: Currently the only supported type of link is the URL. Data location: The location of the file linked with a reference within DB2, in the form of a URL. The allowed scheme names for this URL are: – HTTP – FILE – UNC – DFS The other parts of the URL are: – The file server name for the HTTP, FILE, and UNC schemes – The cell name for the DFS scheme – The full file path name within the file server or cell See Appendix A, “BNF specifications for DATALINK” on page 297, for more information on the full Backus Naur Form (BNF) specifications for DATALINKs. Comment: Descriptive information (254 bytes maximum) can be specified. This is intended for application-specific uses such as further or alternative identification of the location of the data.

Chapter 2. Technical architecture 23 Note: Leading and trailing blank characters are trimmed while parsing data location attributes as URLs. Also, the scheme names (http, file, unc, dfs) and host are case-insensitive and are always stored in the database in uppercase. When a DATALINK value is fetched from a database, an access token is embedded within the URL attribute when appropriate. It is generated dynamically and is not a permanent part of the DATALINK value stored in the database.

A DATALINK value can possibly have only a comment attribute and an empty data location attribute. Such a value may even be stored in a column, but of course, no file will be linked to such a column. The total length of the comment and the data location attribute of a DATALINK value is currently limited to 200 bytes.

Note: Data Links cannot be exchanged with a Distributed Relational Database Architecture (DRDA) server.

2.2.2 Scalar functions for DATALINK data type Built-in scalar functions are provided for DATALINK data types. They are: Function to build a DATALINK value: DLVALUE: The DLVALUE function returns a DATALINK value. When the function is on the right-hand side of a SET clause in an UPDATE statement or is in a VALUES clause in an INSERT statement, it usually also creates a link to a file. However, if only a comment is specified (in which case the data-location is a zero-length string), the DATALINK value is created with empty linkage attributes so there is no file link. The following SQL statement can be used for inserting a row in the CELEBS table having two VARCHAR columns (country and pic_format) and one DATALINK column (picture): EXEC SQL INSERT INTO CELEBS VALUES (‘India’,’jpg’, DLVALUE(‘http://sol-e/datalinks/celebs/images/salman.jpg’)); Functions to extract the encapsulated values from a DATALINK value: – DLCOMMENT: This function returns the comment value, if it exists, from a DATALINK value. The result of the function is VARCHAR(254). Given a DATALINK value that was inserted into the picture column of a row in the CELEBS table using the scalar function, then DLCOMMENT(picture) will return the value 'comment':

24 Data Links: Managing Files Using DB2 DLVALUE('http://sol-e/datalinks/celebs/images/am.jpg,'URL','comment') – DLLINKTYPE: This function returns the linktype value from a DATALINK value. The result of the function is VARCHAR(4). Considering the DATALINK value as shown in Example 2-3, DLLINKTYPE(picture) will return the value 'URL'. – DLURLCOMPLETE: The DLURLCOMPLETE function returns the data location attribute from a DATALINK value with a link type of URL. When appropriate, the value includes a file access token. If the DATALINK value only includes the comment, the result that is returned is a zero-length string. The result of the function is VARCHAR(254). The DLURLCOMPLETE(picture) function on the CELEBS table would return the complete URL: HTTP://SOL-E/datalinks/celebs/images/04E2_CluJ3k__x2rxA5IJl1Q;am.jpg Here, 04E2_CluJ3k__x2rxA5IJl1Q; is the access token attached with the file name, since picture has the attribute of READ PERMISSION DB. – DLURLPATH: The DLURLPATH function returns the path and file name necessary to access a file within a given server from a DATALINK value with a linktype of URL. When appropriate, the value includes a file access token. If the DATALINK value only includes the comment, the result returned is a zero length string. The result of the function is VARCHAR(254). The DLURLPATH(picture) function on the CELEBS table would return: /datalinks/celebs/images/04E2_Cln8Jk__xjh7A5Lkl0L;am.jpg Here 04E2_Cln8Jk__xjh7A5Lkl0L; is the access token attached with the file name, since picture has the attribute of READ PERMISSION DB. – DLURLPATHONLY: The DLURLPATHONLY function returns the path and file name necessary to access a file within a given server from a DATALINK value with a linktype of URL. The value returned never includes a file access token. Again, if the DATALINK value only includes the comment the result returned is a zero length string. The result of the function is VARCHAR(254). The DLURLPATHONLY(picture) function on the CELEBS table would return: “/datalinks/celebs/images/am.jpg” Note that the file name returned in the path does not have an access token attached, although the picture column has the attribute of READ PERMISSION DB. – DLURLSCHEME: The DLURLSCHEME function returns the scheme from a DATALINK value with a linktype of URL. The value is always in uppercase. If the DATALINK value only includes the comment the result

Chapter 2. Technical architecture 25 returned is a zero length string. The result of the function is VARCHAR(20). Here DLURLSCHEME(picture) would return the value 'HTTP'. – DLURLSERVER: The DLURLSERVER function returns the file server from a DATALINK value with a linktype of URL. The value is always in uppercase. If the DATALINK value only includes the comment the result returned is a zero length string. The result of the function is VARCHAR(254). DLURLSERVER(picture) would return the name of the server as 'SOL-E'.

Note: The argument to all the above functions must be an expression that results in a value of the DATALINK data type. If the argument is null, the result is the null value.

It is important to distinguish between these DATALINK references to files and the LOB file reference variables. The similarity is that both contain a representation of a file. However, consider these points: DATALINKs are retained in the database, and both the links and the data in the linked files can be considered as a natural extension of data in the database. File reference variables exist temporarily on the client, and they may be considered as an alternative to a host program buffer.

2.2.3 DATALINK options You can define what level of control you want for a linked file by specifying certain options when defining the DATALINK column using the CREATE TABLE or ALTER TABLE ADD COLUMN SQL statements. Options include file access permission and the level of recovery support you want. You can specify the following options at the time you create the tables with DATALINK columns: LINKTYPE URL This defines the type of link as a URL. NO LINK CONTROL Specifies that there will not be any check made to determine that the file exists. Only the syntax of the URL will be checked. There is no database manager control over the file.

26 Data Links: Managing Files Using DB2 FILE LINK CONTROL Specifies that a check should be made for the existence of the file. Additional options may be used to give the database manager further control over the file. There are additional options to define the level of database manager control of the file link, which include: – INTEGRITY: Specifies the level of integrity of the link between a DATALINK value and the actual file. ALL: Any file specified as a DATALINK value is under the control of the database manager and may not be deleted or renamed using standard file system programming interfaces. – READ PERMISSION: Specifies how the permission to read the file, specified in a DATALINK value, is determined. • FS: The read access permission is determined by the file system permissions. Such files can be accessed without retrieving the file name from the column. • DB: The read access permission is determined by the database. Access to the file is only allowed by passing a valid file access token, returned on retrieval of the DATALINK value from the table, in the open operation. – WRITE PERMISSION: Specifies how permission to write to the file specified in a DATALINK value is determined. • FS: The write access permission is determined by the file system permissions. Such files can be accessed without retrieving the file name from the column. • BLOCKED: Write access is blocked. The file cannot be directly updated. In order to update a file, it should be copied, the copy should then be updated, and finally the DATALINK value should be updated to point to the new copy of the file. – RECOVERY: Specifies whether DB2 will support point in time recovery of files referenced by values in this column. • YES: DB2 will support point in time recovery of files referenced by values in this column. This value can only be specified when INTEGRITY ALL and WRITE PERMISSION BLOCKED are also specified. • NO: Specifies that point in time recovery will not be supported. – ON UNLINK: Specifies the action taken on a file when a DATALINK value is changed or deleted (unlinked). Note that this is not applicable when READ PERMISSION FS is used.

Chapter 2. Technical architecture 27 • RESTORE: Specifies that when a file is unlinked, the Data Links File Manager attempts to return the file to the owner with the permissions that existed at the time the file was linked. In the case where the user is no longer registered with the file server, the result is product-specific. This can only be specified when INTEGRITY ALL and WRITE PERMISSION BLOCKED are also specified. • DELETE: Specifies that the file will be deleted when it is unlinked. This can only be specified when READ PERMISSION DB and WRITE PERMISSION BLOCKED are also specified. MODE DB2OPTIONS This mode defines a set of default file link options. The options defined by DB2OPTIONS are: – INTEGRITY ALL – READ PERMISSION FS – WRITE PERMISSION FS – RECOVERY NO

Note: In the DB2OPTIONS mode, since the write control is under file system control, ON UNLINK option is not applicable. This option is now valid only for files linked with DATALINK column(s) having READ PERMISSION DB and WRITE PERMNISSION BLOCKED attributes.

Therefore, the DATALINK column definition syntax can be represented as shown in Figure 2-6.

28 Data Links: Managing Files Using DB2 DATALINK datalink-options-clause (integer)

datalink-options-clause: NO LINK CONTROL LINKTYPE URL FILE LINK CONTROL file-link-options-clause MODE DB2OPTIONS

file-link-options-clause: INTEGRITY ALL READ PERMISSION DB RECOVERY NO FS YES

WRITE PERMISSION FS ON UNLINK RESTORE BLOCKED DELETE

Figure 2-6 DATALINK column definition syntax

Now using these options, a DATALINK column may have one of the sets of attributes shown in Table 2-2. Table 2-2 Possible combinations of DATALINK attributes Optn. Referential Read Write Recovery Unlink DB Access # Integrity 1 FS FS No N/A 2 FS Blocked No N/A

3 FS Blocked Yes N/A

4 DB Blocked No Delete

5 DB Blocked Yes Delete

6 DB Blocked No Restore

7 DB Blocked Yes Restore

The other combinations of the DATALINK attributes are not currently supported (in DB2 Universal Database V7.x). Some of them are for future enhancements (for example, WRITE PERMISSION DB, which would be available from DB2 V8.x), and others are not mutually compatible (for example, READ PERMISSION DB and WRITE PERMISSION FS together).

Chapter 2. Technical architecture 29 Note: Obviously the “write” operation is more of a security threat than the “read” operation on any file. Therefore, having the “read” control under DB and the “write” control under FS for a file does not make sense and is not supported.

2.3 Security/authentication

We have seen that along with the file system benefits, Data Links also provides RDBMS benefits including referential integrity, coordinated backup and recovery, and access control. This section describes how Data Links provides access control.

2.3.1 Concept of tokenized file names DB2 and the Data Links Manager together provide file access control. When a DATALINK column (with READ PERMISSION DB) is accessed (using a SELECT statement), DB2 generates an access token and embeds it in the pathname of the file.

There are several conditions for file access control to operate correctly: File access control is provided only if the READ PERMISSION DB option is specified on the DATALINK column when the table is created. To access a READ PERMISSION DB file, the user needs to have the SELECT privileges on the table (or the SQL view) having DATALINK column, under which the file’s reference exists. Any file system API or command can be used to read the file. As shown earlier, files can also be accessed by the Web browser. Generation of the access token is shared secretly between DB2 and the Data Links File Manager. The DLFF contacts DLFM to validate a token. To be valid, an access token must be generated and used within a specified time interval as defined in the Data Links Access Token Expiry Interval (dl_expint) database configuration parameter. For each (SELECT statement) access, a new token is generated and remains valid for the time specified by dl_expint. Web addresses with embedded access tokens must be used by the application to access the files. Any attempt to open, read, or otherwise manipulate a file using Web addresses with the access token results in an access violation.

30 Data Links: Managing Files Using DB2 2.3.2 Database configuration parameters The following DB2 database configuration parameters pertain to Data Links: DATALINKS This option is specified in the database manager configuration. Using this, you can enable or disable the support of Data Links. A value of YES specifies that Data Links support is enabled for Data Links Manager linking files stored in native file systems (for example, JFS on AIX). A value of NO specifies that Data Links support is not enabled. The default value is NO. DL_EXPINT This is a database configuration parameter that specifies the interval of time (in seconds) during which the generated file access control token is valid. The number of seconds the token is valid begins from the time it is generated. The Data Links File Filter checks the validity of the token against this expiry time. This parameter can have values ranging from 1 to 31,536,000. The largest value corresponds to one year (365 days). The default value for this parameter is 60 seconds. This parameter applies to the DATALINK columns that specify READ PERMISSION DB. DL_NUM_COPIES This database parameter specifies the number of additional copies of a file to be made in the archive server (such as a Tivoli Storage Manager server) when a file is linked to the database. Its value ranges from 0 to 15. The default value for this parameter is 0. This parameter applies to the DATALINK columns that specify RECOVERY=YES.

Note: The products offered by Tivoli Storage Manager have now replaced the Adstar Distributed Storage Manager (ADSM) product set.

DL_DROP_TIME This parameter specifies the interval of time (in days) that files would be retained on an archive server (such as an ADSM/Tivoli Storage Manager server) after a DROP DATABASE is issued. The value of this parameter ranges from 0 to 365. The default value for this parameter is 1 day. A value of 0 means that the files are deleted immediately from the archive server when the DROP command or statement is issued. (The actual file is not deleted unless the ON UNLINK DELETE parameter was specified for the DATALINK column.) This parameter applies to the DATALINK columns that specify RECOVERY=YES.

Chapter 2. Technical architecture 31 DL_UPPER The parameter indicates whether the file access control tokens use uppercase letters. A value of YES specifies that all letters in an access control token are upper case. A value of NO specifies that the token can contain both uppercase and lowercase letters. The default value is NO. This parameter applies to the DATALINK columns that specify READ PERMISSION DB. DL_TOKEN This parameter specifies the algorithm used in the generation of DATALINK file access control tokens. It may have either of two values: MAC0 or MAC1 (message authentication code). The value of MAC1 generates a more secure message authentication code than MAC0, but also has more performance overhead. Again this parameter applies to the DATALINK columns that specify READ PERMISSION DB.

2.3.3 How access tokens work Depending on the configuration parameters discussed above, the DB2 engine creates the token for all the DATALINK values (with READ PERMISSION DB), which qualify the SELECT SQL statement. The token has embedded the time at which the token expires. This expiry time is calculated by adding the expiry interval (specified by the DL_EXPINT database configuration parameter) and the current time (the time at which the SQL SELECT statement is issued on the DATALINK column).

Important: In spite of the token being valid, the application still may receive an error while trying to access the READ PERMISSION DB file on the Data Links server. This may happen if the system clocks of the DB2 server and the Data Links server are not synchronized with each other.

Applications and users use this tokenized file name to access the files in the file system. The Data Links File System Filter sitting on top of the native file systems checks for the token in the file name. If it has any token attached with the file name and the tokenized file name does not already exist (although less probable, but it is possible that the tokenized file name exists as a normal or linked file), DLFF contacts one of the DLFM daemons (Upcall daemon) and asks it to validate the token. DLFM then uses DB2 engine code to verify the token. According to the result of this verification, DLFM prepares and sends a response to DLFF.

This response can either be “allowed” or “not allowed”. Therefore, if DLFF receives a response of “allowed”, it calls the base file system operation and lets the operation complete. Otherwise, it returns an error to the application. DLFM returns “not allowed” in two cases:

32 Data Links: Managing Files Using DB2 The user does not have the privilege to do the action. The request cannot be processed due to some error.

When a file is linked under Data Links control with the READ PERMISSION DB attribute, DLFM changes its owner to dlfm UID), which is unique to each Data Links server. DLFM also changes the permissions of this file to “read only” by the owner (that is, the dlfm UID). If the owner of the file has “execute permission” before linking, it is retained even after linking the file.

When a user tries to access a READ PERMISSION DB file without a token, DLFF returns a “permission denied” error message and does not allow the operation to go through. It recognizes these files by the owner (as described earlier, the owner of a READ PERMISSION FILE is dlfm UID).

2.4 Data Links on UNIX and Windows

In 2.1, “Overview of the Data Links architecture” on page 12, you saw that the Data Links implementation on AIX, Solaris, and Windows NT has the following components: DB2 server DB2 client Data Links File Manager Data Links File System Filter

Of these components, DLFM and DLFF are part of the Data Links Manager (DLM). This section discusses these components in detail. Then it describes what happens when a file is linked or unlinked by an application.

Note: DB2 Logging Manager is a part of DLFM. Therefore, DLFM can be said to have two components: Logging Manager Daemon processes

Figure 2-7 shows how one DB2 server can be registered with one or more Data Links servers and vice versa. There is a many-to-many (M:N) relationship between DB2 servers and Data Links servers.

Chapter 2. Technical architecture 33 Data Links DB2 Server-1 Server-1 on AIX

Data Links DB2 Server-2 Server-2 on Solaris ::

Data Links DB2 Server-M Server-N on Windows NT

Figure 2-7 Relationship between DB2 servers and Data Links servers

2.4.1 Data Links File Manager (DLFM) DLFM is a sophisticated SQL application with a set of daemon processes that reside at a file server node. These processes work cooperatively with the host database servers to manage external files. DLFM receives DBMS calls to perform the link/unlink operations using unit of work consistency and maintains a list of all files under its control.

Unit of work consistency: This concept means that the link/unlink operation will be within the same commit scope as the other SQL performed in the same unit of work.

When a file is linked under the Data Links control in a database, the DLFM applies the constraints for referential integrity, access control, and backup and recovery as specified in the DATALINK column. If the DATALINK column specifies READ PERMISSION DB access, for example, the DLFM changes the owner of the file to the DBMS (a special user ID registered with DLFF on each Data Links server, known as dlfm UID) and marks the file “read only”, whenever any file is

34 Data Links: Managing Files Using DB2 linked under it. All of these changes to the DLFM repository (the DLFM_DB database) and to the file system are applied as part of the same DBMS transaction as the initiating SQL statement. Therefore, if the transaction is rolled back, the changes made by the DLFM are undone as well.

To reduce the number of messages between the database server and the DLFM, the DLFM maintains a set of meta-data on the file systems and the files that are under database control. Other information is stored by the DB2 Logging Manager in the DLFM_DB database.

DLFM handles interactions with the database backup/restore process. You can use the option for the DATALINK column to indicate whether a linked file should participate in point-in-time recovery of the DB2 table. DLFM can interface with Tivoli Storage Manager to take copies of the external files when the database backup occurs.

Note: Chapter 6, “Using Tivoli Storage Manager” on page 107, discusses using Tivoli Storage Manager for external file backup.

The actual copying is asynchronous to the database transaction triggering the backup of the file (via SQL SELECT or UPDATE statement), since the external file size can be quite large (tens of megabytes). Once the backup of the file has completed, DLFM signals to the calling database that the backup can be marked as complete. The completion of the asynchronous copy operation is checked when the database server performs a backup of the SQL tables involving DATALINKS columns.

The DLFM tracks different versions of a referenced file and maintains the backup status of each in order to support point-in-time recovery. The DBMS also provides the DLFM with a recovery ID (RECOV_ID) for a file, whenever it is linked or unlinked, to help synchronize recovery of files with data. This is important because a file with the same name but different content may be linked and unlinked several times. Without a separate recovery ID (RECOV_ID) for each link operation, DLFM would not be able to restore the file to match the database state.

Note: Chapter 11, “Recovery” on page 201, discusses backup and recovery in detail.

DB2 Logging Manager This component maintains a database of information (or meta data) related to all the linked files. The name of the database in which all this information is maintained is DLFM_DB. It contains the following tables:

Chapter 2. Technical architecture 35 DFM_DBID: This table stores the database registration information. This table has the following main columns: – DBID: This field in this table represents the unique combination of the host database name, instance name, and host machine name. This is the primary key for this table. – HOSTNAME: The hostname registered with the DLFM. – DBINST: The DB2 instance on the HOSTNAME. – DBNAME: The database name (containing tables with the DATALINK column) on HOSTNAME, with schema DBINST. – XN_ID: Once the database DBNAME is dropped, the entry for that database is marked as deleted, and the transaction identifier of the transaction doing this is stored in XN_ID attribute. DFM_PRFX: This table stores the prefix registration information. It has the following main columns:

Prefix: This is the mount point (or stub) where the DLFS is mounted.

– PRFX_ID: A unique identifier that represents this prefix. This is the primary key of this table. – PRFX_NAME: This attribute maintains the name of the prefix. DFM_ACCESS: This table is used for controlling user access to files. It also stores directory patterns in which a particular user can link/unlink files in a particular prefix (the mount point of the file system under Data Links control). DFM_RCFILE: This table maintains a list of files received from the host DB2 (database having DATALINK column) during the RECONCILE process. It contains the list of files that are in the DB2 host database table. All the reconcile instances (running on one or more DB2 database having tables with DATALINK columns) share this table. The main columns in this table are: – PID: Process ID of the reconcile child on dlfm. – DBID: The identifier for database where reconcile is going on. – PRFXID: The identifier for the file system (or the prefix) where the files are stored. – STEMANAME: The name of the files. DFM_BOOT: This table maintains the boot information needed during the DLFM startup. DFM_GRP: This table consists of file group entries. Each group entry corresponds to a DATALINK column in an SQL table on the host database.

36 Data Links: Managing Files Using DB2 An entry in this table is put when the first file reference is inserted in the corresponding DATALINK column. DFM_FILE: This is the most accessed table that consists of the information of linked and unlinked files on the file server. A new entry is created in this table whenever a file is linked in the host database registered with this DLFM. This table retains the unlinked file entries if files need to be restored in the future via the host database restore utility (for example, when the DATALINK column under which they were linked has the ON UNLINK RESTORE option set). If instead ON UNLINK DELETE option was set, this table still retains the entry for the unlinked file until it is deleted by the Garbage Collector daemon. The columns of interest defined in this table are: DBID, FILENAME, TRANSACTION ID, RECOVERY ID, and FILE STATUS. DFM_XNSTATE: This table keeps track of all the active DLFM transactions. Transaction state is maintained for each transaction as long as it is active. The transaction state table is first kept in an in-memory table when the transaction starts. The entry is inserted into the SQL table when the transaction begins the first phase of the commit processing. Once the transaction is completed, its entry is removed from the table. DFM_ARCHIVE: This table contains the file and group entries that need to be archived to the archive server. When the Load utility is used to insert a large number of datalink files into a DATALINK column on the host database, instead of replicating each file entry in the archive table, only a group entry is inserted.

Note: The DB2 Load utility is capable of efficiently moving large quantities of data into newly created tables, or into tables that already contain data. The utility can handle all data types, including large objects (LOBs), user-defined types (UDTs), and DATALINKs. The Load utility is faster than the Import utility, because it writes formatted pages directly into the database, while the Import utility performs SQL INSERTs.

The entry from the archive table (DFM_ARCHIVE) is processed to make a copy of one file or a set of files. The corresponding entry is deleted from this table once the copy is over. DFM_BACKUP: This table stores information regarding backups. A new sequence number is assigned to each new backup that is taken. This table is mainly used by the Garbage Collector daemon. DFM_DIR: This table maintains the directory hierarchy for each prefix (or mount point). This is used for disk crash recovery support on UNIX. DFM_URL: This table is required for the extension of Link Integrity+ (refer to 1.3.1, “Link Integrity+” on page 7, for additional information). It contains

Chapter 2. Technical architecture 37 mapping for Web-root directory support. This table is currently not being used.

DLFM process model The DLFM has a main daemon that spawns a child agent (or process) when a connect request from a DB2 agent is received. The child agent then establishes a connection with the requesting DB2 agent. This child agent will serve all subsequent requests from the same connection. DLFM’s main daemon waits for another connect request from the same or different host DB2.

Applications on the host DB2 side would establish separate connections with the DLFM. Therefore, they are served by separate child agents on the DLFM side. In addition to the child agent, DLFM provides several other services implemented as daemons and they are also spawned by the main DLFM daemon.

The DLFM process model can be described from two angles: How DLFM processes interact with the DB2 server Figure 2-8 shows how DB agents on the DB2 server interact with DLFM daemons.

SQL: SELECT, INSERT, DELETE UTILITIES: BACKUP/RESTORE, LOAD, IMPORT, EXPORT

INT DATALINK

1 http://sol-e/datalinks/images/abc.gif DLFMD db2agent ...... db2agentdb2agent

Async DLFM_CHILD Daemon DB

DB2 Server TCP/IP Data LinksDLFM Manager

Figure 2-8 DLFM process model: DB2 server

38 Data Links: Managing Files Using DB2 How DLFM processes interact with rest of Data Links components and the archive server Figure 2-9 shows how DLFM daemons interact with other components of Data Links, like DLFF and DB2 Logging Manager. It also shows, how these daemons interact with the archive server, Tivoli Storage Manager, for example.

db2agent DLFMD Archive Subsystem

Retrieve Daemon Archive DLFM_CHILDDLFM_CHILD TCP/IP DLFM_CHILD Copy Daemon Server

LOCAL DISK/ TSM (ADSM)/ Change-Own Upcall XBSA Daemon Daemon Metadata in DB2 IPC tables Object Access/ Streams driver, Integrity File System Driver, DMAPP Subsystem Garbage Delete-group Collection Daemon DLFF Daemon Define-group Daemon Native File System: JFS, Solaris, NTFS, DFS-DCE (AIX)

Figure 2-9 DLFM process model: Data Links Manager

Figure 2-10 gives the complete picture of the DLFM process model.

Chapter 2. Technical architecture 39 Host DBM DB agent SQL Appl.

DLFMD

DelGrpd Child agent

Chownd Retrieved

Upcalld GCd Copyd

Archive DLFF DB2 Logging Manager Server

Figure 2-10 DLFM process model: Complete picture

The following sections describe the functionality and service provided by each of these daemons.

Delete Group daemon Whenever a table is dropped on the host DB2 side, then the corresponding file groups on the DLFM server, if any, also need to be deleted. There can be lots of files referenced by the DATALINK columns in the dropped table and all those files need to be unlinked. During the forward progress of the transaction, the file groups are marked deleted by the current transaction in the GROUP table. During prepare processing, the child agent notes the number of groups deleted by this transaction and records it with the transaction entry in the transaction table. The commit processing checks if any groups are deleted (by checking the deleted group count in the transaction entry in the current transaction) and if so, it sends the transaction ID to the Delete Group daemon. Using the transaction ID the Delete Group daemon finds all the groups deleted in this transaction and then unlinks all the files in each group.

40 Data Links: Managing Files Using DB2 The unlinking of the files by this daemon is asynchronous and the commit processing for the dropped table does not wait for it to complete. Note that the group entry is not deleted until all the files in that group have been unlinked. And as long as this transaction does not commit, the same file name is not allowed to be relinked. Therefore, if the DLFM fails before the Delete Group daemon has completed unlinking all the files from the deleted groups, then, after the DLFM restarts, the Delete Group daemon can still pick up all the committed transaction tables and resume its work.

Garbage Collector daemon The Garbage Collector daemon is another asynchronous process that cleans up of the DLFM meta data. There are two types of cleanups: Cleanup triggered by the database backup Cleanup the deleted groups whose lifetime has expired

The cleanup triggered by the database backup consists of cleaning up old backup entries according to the policy of keeping last N backups.

Note: The value of N can be specified in database configuration, by setting the NUM_DB_BACKUPS variable.

The last N+1 onwards backup entries and the corresponding unlink file entries from the FILE table are removed by the Garbage Collector daemon. It also removes the copies of those files from the archive server.

The cleanup of the deleted groups is based on the expiry of the lifetime. Each deleted file group is assigned a life span. Once the lifetime expires, the Garbage Collector daemon removes those deleted file group entries as well as the associated unlink file entries from the DLFM meta data tables. If archive copies associated with the unlinked file entries exist, they are also deleted from the archive server. The Upcall daemon services requests from DLFF to determine if a file is in the linked state. If it is a user’s request to delete, rename, or move the file via file system APIs, it is rejected by the DLFF. Its main purpose is to enforce referential integrity for the linked files.

Chown daemon The Chown daemon is a special process whose effective user ID is root. The Chown daemon needs superuser privilege since it manipulates attributes (such as ownership, permissions, etc.) of the files belonging to different users. A child agent communicates with the Chown daemon whenever it needs to access the file information, such as the file system ID, the inode, the last modification time,

Chapter 2. Technical architecture 41 the owner, the group etc. During commit processing, the child agent sends a request with a file name to the Chown daemon to take over the file, for example change owner and access permissions (if required), or to release the file to the file system to restore the original owner and access permissions.

For example, suppose before linking a file under Data Links with the READ PERMISSION DB attribute, the attributes of the file look like the example in Figure 2-11.

Figure 2-11 Attributes before the link operation

After the link operation, the Chown daemon changes the owner. The permissions of the file and the attributes (seen as root) of the file would look like the example in Figure 2-12.

Figure 2-12 Attributes after the link operation

Note: For READ PERMISSION DB files, Data Links implementation demands that the owner of the file (while it is under database control) should be changed to the dlfm UID (configured at installation time).

Now if ON UNLINK RESTORE is one of the attributes of the DATALINK column, under which the “demofile” file is linked, then after the unlink operation, the Chown daemon restores the original attributes of the file (Figure 2-11).

Since the Chown daemon runs as superuser, it is important to safeguard unauthorized requests. Therefore, the child agent communicates with the Chown daemon with proper authentication.

42 Data Links: Managing Files Using DB2 Copy daemon The Copy daemon is responsible for copying linked files from the file system to an archive server or to disk. When a file is linked, it is copied asynchronously by the Copy daemon if DLFM is responsible for restoring the file after a database restore.

Retrieve daemon The Retrieve daemon is responsible for restoring files from the archive server or from disk. When the host database is restored to a point in the past, the file system state may be out of sync with the new database state. As part of re-synchronization, files are restored by the Retrieve daemon from the archive server, if necessary.

Upcall daemon The purpose of the Upcall daemon is to interact with the DLFF. The DLFF requests some information from DLFM by making an upcall. The Upcall mechanism in UNIX uses a stream-based driver. In Windows NT, the Upcall mechanism uses asynchronous DeviceIoControl calls with buffered I/O.

2.4.2 Data Links File System Filter (DLFF) The DLFF module supports the file system functionality required by Data Links. On UNIX platforms, it is implemented as a Virtual File System, which layers just above the native UNIX file system (JFS on AIX, UFS on Solaris, etc.). The approach is quite similar on Windows NT, except that the implementation is in the form of a filter file system driver layered over the native file system (NTFS).

Virtual File System (VFS): This is an abstraction of a physical file system implementation. It provides a consistent interface to multiple file systems, both local and remote. This consistent interface allows the user to view the directory tree on the running system as a single entity even when the tree is made up of a number of diverse file system types. Therefore, the VFS interface (also known as the VNODE interface) provides a bridge between the physical file system (which manages storage of data) and the logical file system (which provides support for the system call interface).

The Data Links File System Filter enforces data integrity by making sure that no file interaction is allowed that is incompatible with the information in the database. It is designed to interfere as little as possible with the actual application. The DLFF is responsible for the interception of certain file system calls, such as file open, rename, and delete. Other file system calls, such as read and write are simply passed on to the underlying native file system. When a DATALINK value is retrieved from the database (assuming that the read

Chapter 2. Technical architecture 43 permission is controlled by the database), an access token is automatically generated. When an application attempts to open the file, the DLFF intercepts the file open call and ensures that the access token is valid, which means that the application may indeed open the file.

If the token is invalid or has expired, DLFF returns an error to the application. Otherwise if the token is valid, it calls the base (native) file systems operations to complete the user’s request. At this stage, the DLFF no longer interferes with any of the read or write operations on the file, leaving performance virtually unaffected. Because the DLFF must integrate with the file system, it is different for each platform. However, the API is consistent across the platforms, which lets you mix and match database and file server implementations.

File system operations intercepted by DLFF This section discusses how DLFF handles some of the important file system operations.

Open file When an open request comes for a file (by running cat on UNIX, and type on Windows), DLFF looks for an embedded token in the file name. It does so by searching for the “;” character at a fixed location in the file name. Even if it gets “;” in the right position, it checks if any such file (file name with embedded token) exists or not. If any such file exists, DLFF considers the file name as without token. If not, then DLFF assumes it to be a token generated by the host database and strips it from the file name. The stripped file name is then checked to exist. If it does not exist, DLFF returns error (ENOENT on UNIX). And if the file exists, the token (along with the file name and some other information) is passed to the DLFM as a query, to check its validity.

DLFM then checks the validity of the token and returns the result to DLFF. If the result is an error, DLFF returns it to the application (or user) who had requested the “open” operation. If DLFM says that the operation is allowed, DLFF changes the effective UID of the process (trying to open the file) to the dlfm UID (owner of READ PERMISSION DB files), and then calls the base file system “open” operation to complete the user’s request.

Note: Even though dlfm user is the owner and has read permission on the READ PERMISSION DB file, it needs a valid token to access it. This feature avoids security threats coming from NFS clients having a UID equivalent to the dlfm user.

If the file name did not have a token embedded, the open request is passed directly to the base file system.

44 Data Links: Managing Files Using DB2 Delete file If DLFF has a “delete file” request, it returns an error if the file is owned by dlfm UID (for READ PERMISSION DB). If not, it contacts DLFM to check whether the corresponding file is linked under the Data Links control. Now DLFM returns either of the results listed in Table 2-3. Table 2-3 DLFM results and corresponding actions by DLFF Result from DLFM Action by DLFF

File under Data Links control DLFF returns an error to signify unauthorized access

File not under Data Links control DLFF passes the request to the base file system

Rename file Rename file processing is similar to delete processing, except for the fact that in rename, the destination file is also checked to be under the Data Links control.

For example, a user wants to rename a file file1 to file2. In this case, DLFF checks the access permissions for the file1 file and that the file file2 (for example, the destination file) is not under Data Links control (to maintain referential integrity).

Rename directory Renaming the directory is not allowed in DLFF.

Set permissions No permissions can be set/modified on a file that is under WRITE PERMISSION BLOCKED control. If the owner of the file is dlfm UID, the DLFF rejects any set/modify permission operation. Otherwise, the DLFF contacts the DLFM to check if the file is Data Linked with the WRITE PERMISSION BLOCKED attribute. If so, it disallows any setting of the “w” bit on the file.

Note: Note that superuser (root on UNIX and Administrator on Windows) can perform any operation on any file. DLFM is not contacted for any of the requests originating from the superuser (except for the logging purpose).

How DLFF interacts with DLFM DLFF can recognize files under READ PERMISSION DB control, since they are owned by a unique user identifier (dlfm UID). However, when a file is linked under some other Data Links control option, DLFF prevents the rename and delete operations on the file too. In addition, for the files linked under WRITE PERMISSION FS, DLFF does not allow anyone (other than the superuser) to

Chapter 2. Technical architecture 45 enable “w” access to the file. In order to do so, DLFS needs to find out whether a given file is linked under the Data Links control. Because this information is not directly available to the DLFS VFS, it is achieved by making an upcall from the kernel-mode VFS to the user-mode DLFM Upcall daemon.

Figure 2-13 shows how an application accesses files under Data Links control and how DLFF and DLFM interact with each other.

DB2 Logging Manager Application DLFM Upcall Daemon User Level File System Calls

Kernel Level System Call Handler

Logical File System

Other Kernel VFS/Vnode Interface Services DLFS Kernel Extension

Streams Driver DLFF VFS

NFS VFS Other VFS Native VFS

Other Device Drivers Disk Device Driver

Disk

Figure 2-13 Overview of Data Links implementation

This section discusses the implementation of the upcall mechanism by which DLFF interacts with DLFM. This mechanism can be broken down into the following steps:

46 Data Links: Managing Files Using DB2 1. The DLFM Upcall daemon sleeps waiting for an upcall from DLFF. 2. The vnode operation for which an upcall needs to be made is invoked in the context of the thread that attempts to perform a rename, delete, or enable write permissions. Let us call this thread the “operation thread”. 3. DLFF makes an upcall by filling in a query buffer and waking up the DLFM Upcall daemon, and puts the operation thread to sleep waiting for the results of the upcall. 4. This results in a context switch to the DLFM Upcall daemon, which reads the query details and then queries the DB2 Logging Manager to service the upcall. 5. Once the results are available, the DLFM Upcall daemon sends them to the DLFF driver via a reply buffer. 6. At this stage, there is a context switch from the DLFM Upcall daemon back to the operation thread. The DLFM Upcall daemon is put to sleep waiting for the next upcall, while the operation thread wakes up, reads the results of the upcall from the reply buffer, and decides whether to allow the operation to proceed. 7. If another upcall needs to be made while the DLFM Upcall daemon is processing an upcall (for example, if a second rename request comes in before the results of an upcall made for an earlier rename are available), then the second upcall has to be queued until the DLFM Upcall daemon is ready to service it. 8. When the results of an upcall are available, there may be multiple operation threads that are waiting for the results of their requests. It is ensured that the correct thread is woken up and is passed the results of the upcall that it had requested

2.4.3 Linking and unlinking files Linking a file and unlinking it from the database control are the two most frequent operations that corresponds to SQL INSERT and DELETE respectively. Whenever an application inserts a file reference into a DATALINK column, the corresponding file on the server is linked by the DLFM. Linking involves applying certain constraints on the file so that subsequent rename and deletion of the referenced file, via normal file system APIs (or commands) are prevented to preserve referential integrity from the host database.

Furthermore, the access control mode of the DATALINK column determines the partial or full takeover of the file. In full access control (READ PERMISSION DB), the file ownership is changed to DB (to the dlfm user) and the file is marked read-only. Also an access token assigned by the host database is needed to access such a file.

Chapter 2. Technical architecture 47 Note: Superusers (root on UNIX and Administrator on Windows) can access files under full access control, without any token. They can perform any operation on these files.

All the files linked to the host database are guarded against move, delete, and rename operations by the DLFM and the DLFF. When a file is linked, the DLFM puts a new entry in the FILE table (dfm_file). This entry consists of (among other information): Database ID (DBID) Transaction ID (XN_ID) File name (STEMNAME) Recovery ID (RECOV_ID)

Recovery ID (RECOV_ID) generated at the host database consists of the database ID and a timestamp. It is guaranteed to be unique and monotonically increasing. For every link-file operation, the DLFM makes the following two checks: 1. If a link entry already exists for the same file in the DLFM meta-data table (dfm_file table), it rejects the link-file operation since the file is already in the linked state. 2. If an unlink entry exists for the same file in the DLFM table whose unlink transaction has not committed (for example, in-flight or in-doubt state), it rejects the link-file operation since the outcome of the unlink transaction is still unknown.

Figure 2-14 shows the steps followed by the DLFM when a link-file request is received.

48 Data Links: Managing Files Using DB2 Yes Reject the LINK Link entry in Link dfm_file ? Request Operation

Unlink entry Yes with Txn. not commited ?

Change the Link Type READ owner of the file Link Type Remove the write Yes PERMISSION No WRITE Yes to DLFM admin DB and WRITE permissions from PERMISSION the file uid and make it PERMISSION BLOCKED? read-only BLOCKED ?

Put an entry of the file in the dfm_file table and trigger the backup of the file

DONE

Figure 2-14 Link-file operation

Now let us look at the entire steps followed by the various components of DB2 Data Links, when a DB2 client issues an SQL INSERT for inserting a DATALINK value. Figure 2-15 explains the control flow for the entire process.

Chapter 2. Technical architecture 49 DB2 Client

(1) SQL Insert (3) SQL Commit DLFM Daemons

DLFM_DB (2) (e) Check file (f) Insert metadata

db2agents (2) (a) Connect (4) (b) Harden metadata (b) Get Prefixid (d) Takeover file DB2 Server with (c) Begin sub-transaction DataLink Extensions (d) Link file DLFF (Data Link (4) (a) Prepare Data Links Filesystem Filter) (c) Commit Manager

Figure 2-15 Control flow of SQL insert statement

In Figure 2-15, the following actions occur: 1. The application issues an SQL INSERT involving a DATALINK column. 2. In the connect phase, the following process begins: a. Based on the server name in the URL, the DB2 agent on the DB2 server issues a connect request to the corresponding DLFM. If the DB2 agent is already connected to this DLFM, then no connect request is issued. The DLFM checks whether this DB2 server is allowed to connect, depending on the existence of a corresponding entry in the dfm_dbid table.

Note: An entry in the dfm_dbid table is created when the database is registered with the DLFM using the command: dlfm add_db

b. The prefix name and the stem name are extracted from the URL. An invariant unique identifier, prefix-ID, is returned for the prefix name. This is to support variances of file system with respect to mount points. c. A sub-transaction between DB2 and the DLFM is started. The DB2 server sends the transaction ID to the DLFM. d. At this stage, the DB2 server instructs the DLFM to link the file.

50 Data Links: Managing Files Using DB2 e. The DLFM checks that the file exists and is not a symbolic link. It also makes some other checks as shown in Figure 2-14. f. The DLFM inserts certain file meta data into DLFM_DB. 3. The application on the DB2 client issues an SQL COMMIT statement. 4. The DB2 server performs a two-phase commit with the DLFMs involved in this transaction. It involves the following steps: a. The DB2 server sends out a prepare-to-commit the sub-transaction (first phase of the two-phase commit). b. The DLFM hardens the meta data in the DLFM_DB database. c. If all the DLFMs involved in this operation respond YES to prepare-to-commit, the DB2 server sends out the actual commit order to all the DLFMs (second phase of the two-phase commit). Otherwise, the user transaction is aborted. At this stage, the sub-transaction is finally committed. d. Constraints are applied to the file to support referential integrity. This may involve changing the attributes and ownership of the file.

During an unlink-file operation, the table entry for the file is marked as unlinked. It also updates the unlinked transaction ID and the unlinked timestamp in the entry. At any given time, the DLFM FILE table (dfm_file) can have, at most, one linked entry for a given file. There can be multiple unlinked entries for a file because many successive link and unlink operations could have taken place for the same file.

The unlinked entry is used in the coordinated backup-and-restore operation to identify the correct version of the file from the archive server, if needed. In this case, the unlinked file entry is later removed by the Garbage Collector daemon (described in “DLFM process model” on page 38) when it is no longer needed. If file recovery is not needed, the unlinked entry is deleted in the second phase of the commit processing.

Note that the entry is not deleted sooner than in the second phase of commit since it would not be possible to undo the action, if the transaction’s outcome is aborted after the first phase.

During the link-file operation, file-entry checking and insertion must be an atomic operation. Otherwise, there is a small time window during which two DLFM agents could both check for, and not find, the linked entry for a file and then proceed to insert the two linked entries for the same file.

Chapter 2. Technical architecture 51 To enforce the atomicity of the link operation, a unique index on the file name column and a new check-flag are defined. During link-file operation, the check-flag attribute is set to zero, and during the unlink-file operation the check-flag is set to the recovery ID provided by the host database. This unique index prevents two linked entries to be inserted but allows multiple unlinked entries for the same file.

Figure 2-16 shows the unlink process.

UNLINK Request

Mark the table entry as "unlinked", update unlink Txn. id and time stamp entry

No ON UNLINK Delete the file RESTORE ?

Yes

Restore the original permissions of the file DONE

Figure 2-16 Unlink process

During the forward progress of a transaction, DLFM manipulates the entries in the FILE table as per link/unlink file operations. If the transaction needs to be rolled back, DLFM uses the recovery mechanism provided by the local database to undo the actions of the transaction. The file server, on the other hand, does

52 Data Links: Managing Files Using DB2 not support transactional semantics in general. Therefore, actual takeover or release of the file from the file system is done during the second phase of the two-phase commit process and is done by the Chown daemon (as described in “DLFM process model” on page 38).

DLFM also supports the unlinking of a file from one datalink column and the re-linking of the same file to another datalink column within the same transaction. This is an important customer requirement where current and old versions of the file are maintained in separate tables.

When an error occurs during regular link or unlink processing, DLFM reports the error status to the host database, which results in either statement-level (savepoint) or transaction-level rollback at the host database. If a link or unlink file request is initiated by a savepoint rollback at the host database, then any error reported by the DLFM local database results in rolling back the entire transaction at the host database. This is because DLFM treats the local database as a “black box” and it is not possible to rollback a rollback.

In addition, if a severe error, such as deadlock, occurs in the local database, the host database rolls back the full transaction. This is because the current transaction has already been rolled back in the local database. Also since DLFM does not write recovery log records for its own link and unlink file operations, it is not possible to do a database-style rollback. In the design, undoing a link (or unlink) file operation is done by sending the DLFM another link (or unlink) file request but with a special in_backout flag set to true. For a link file request with in_backout set, DLFM deletes the linked file entry that was inserted by the current transaction. For an unlink request with the flag set, the unlinked file entry is restored back to linked state.

2.4.4 Transaction support When a new transaction is started by the application, the host database assigns a new transaction ID. In the case of an XA transaction, the host database also generates a local transaction ID that is different from the global XA transaction ID.

Chapter 2. Technical architecture 53 Note: The X/Open Distributed Transaction Processing (DTP) standard is an open standard for OnLine Transaction Processing (OLTP). DTP can be described as middleware that allows an application (possibly transaction oriented) to be distributed across multiple machines in an heterogeneous environment. DTP comprises of Application, Resource Manager and Transaction Manager. XA is the Resource Manager to the Transaction Manager protocol defined by the DTP standard. For details on this, refer to Chapter 10 in the DB2 Administration Guide, which is available online at: http://www.ibm.com/cgi-bin/db2www/data/db2/udb/winos2unix/support/ v7pubs.d2w/en_main

A transaction ID is associated with a particular database so that there is no problem with transaction ID being the same between different databases. The transaction ID generated at a specific database is guaranteed to be monotonically increasing, which is absolutely essential. This ID is passed to the DLFM on each API invocation. The DLFM associates the transaction ID with each operation that changes the DLFM meta data and state. The reason is that the DLFM does not have logging services of its own, but uses a local database for persistence and logging (DB2 Logging Manager). By associating the transaction ID with the operation, and storing them in the database tables, it can relate the actions performed by a particular transaction. This is important because: The actions done by a DLFM for a particular sub-transaction may need to be undone if the DLFM records the transaction ID as persistent information along with other information in the FILE table. Entries associated with a transaction are identified by this ID during the commit processing if the host transaction aborts after the sub-transaction completes the prepare phase (for example, completed the first phase of the two-phase commit protocol) in the DLFM. Certain actions on the file system have to be performed during the second phase of the two-phase commit processing of the transaction.

DLFM uses the two-phase commit protocol to enforce the transactional semantics. Four APIs are provided by the DLFM for this purpose: Begin Transaction Prepare Commit Abort

A sub-transaction starts when the host database makes a Begin Transaction API call to a DLFM.

54 Data Links: Managing Files Using DB2 Note: It is possible that files may be linked or unlinked to multiple DLFMs in a given host database transaction. This implies that a host DB2 transaction may involve sub-transactions on multiple DLFMs. For sake of clarity, we restrict the discussion of the transaction management to only one DLFM.

The transaction ID (XN_ID) generated at the host database is passed along with the Begin Transaction call. All subsequent API calls by the host database within the same transaction for linking and unlinking files are tagged with the same transaction ID and are processed within the same transaction context by the DLFM. Once all operations are done under the present transaction, as a part of the commit processing on the host database, the host database sends a Prepare request to the DLFM. Prepare request processing on the DLFM ensures that all the operations on the file server are made persistent by issuing an SQL commit to the local database. A separate transaction table is used for keeping the transaction ID, its state, and other related information. The transaction entry for the current transaction is not made into the transaction table until the Prepare request for the transaction has arrived. After the Prepare transaction request is done successfully on all DLFMs, the host database sends a Commit transaction request to the DLFMs. On the other hand, if the Prepare request fails, an Abort request is sent to the DLFMs.

Important: When multiple DLFMs are involved in a transaction, if one of the DLFMs fails to prepare the transaction, the host database sends an Abort request to all the remaining DLFMs, even though they may have prepared successfully.

Normally the Prepare, Commit, and Abort APIs are invoked by the host database as part of an application’s SQL COMMIT. If the transaction is a branch of a global (distributed) transaction, the Prepare request to the DLFM is invoked as part of the global prepare processing and the Commit/Abort request is invoked only when the outcome of the global transaction is known.

It is assumed that the commit transaction processing should not fail on the DLFM side if the Prepare transaction processing is successful. But that is not always true because there is a major difference between the database’s SQL commit processing and the DLFM’s commit processing (refer to Figure 2-17).

Chapter 2. Technical architecture 55 SQL Transaction ( Txn )

Update R1 Update R2 Prepare Txn. Commit/ Abort Txn.

Write R1 log Write R2 log Force logs Release locks

DLFM Transaction link file1 link file2 Prepare Txn. Commit/ Abort Txn.

DLFM: sql insert sql insert insert/commit del/upt/commit DB: Write log Write log force logs log/rel. locks

Figure 2-17 Commit processing transactions

The SQL COMMIT processing does not acquire any new locks. It, in fact, releases all the locks acquired by the present transaction. On the other hand, the DLFM uses the SQL interface to update the meta data and its state stored in its local database during commit processing. For a Commit request, for example, DLFM retrieves entries from the FILE table and deletes an entry from the TRANSACTION table. This, in turn, requires additional locks to be acquired by the DLFM. Since deadlocks are always possible when new locks are acquired, retry logic is included in the commit processing, and it keeps retrying until it succeeds. However, if a deadlock occurs among committing or aborting transactions, retry does not break the deadlock. In our case, deadlocks have been found to occur between a committing transaction and one of the DLFM daemons, but not between two or more committing or aborting transactions. This is because table entries inserted or updated by two concurrent transactions are always disjoint.

Note: This is enforced by the corresponding locking of the host database.

56 Data Links: Managing Files Using DB2 Therefore, this retry logic can resolve deadlocks formed in the DLFM commit/abort processing.

During a Prepare transaction processing, the DLFM inserts an entry into the TRANSACTION table and marks the transaction as prepared. If the DLFM fails after the transaction is prepared, that transaction remains in an in-doubt state. It is the host database’s responsibility to resolve the in-doubt transactions with the DLFM. Either the host database restart processing does it, or if the DLFM is unavailable at the restart, the host database spawns a daemon whose sole task is to poll the DLFM periodically and resolve the in-doubts when the DLFM is up. In-doubt transactions are resolved based on the outcome of the parent transactions in the host database.

2.5 Data Links on DCE-DFS

Section 2.4, “Data Links on UNIX and Windows” on page 33, showed you how Data Links is implemented on UNIX and Windows NT. This section discusses Data Links implementation on DCE-DFS.

A DCE-DFS setup can involve a network of multiple nodes organized into separate cells or administrative domains, presenting a uniform location transparent file system namespace to any file system client within the environment. Data Links has been implemented for a single cell environment in DCE-DFS. A typical setup contains a few file server machines and several file system client machines.

Unlike the Data Links implementation on UNIX and Windows where Data Links is only made of a server, Data Links in a DCE-DFS environment is made both of a server and a client. Every file server (where Data Linked files are stored) has Data Links server on it, and all the DFS client nodes from where READ PERMISSION DB Data Linked files have to be accessed, have a Data Links client on them.

Data Links in the DCE-DFS environment has the following components: Data Links File Manager (DLFM) Data Manager Application (DMAPP) Data Links File System Cache Manager (DLFS-CM)

DLFM and DMAPP are the part of the Data Links server. DLFS-CM is the Data Links client, which is also known as the DFS Client Enabler for Data Links.

Let us now look at the role of each of these components in detail.

Chapter 2. Technical architecture 57 2.5.1 Data Links File Manager (DLFM) Like on UNIX and Windows, DLFM has two components: DLFM daemons The process model of these daemons is similar to the one described in “DLFM process model” on page 38. DB2 Logging Manager DB2 Logging Manager maintains all the information regarding Data Linked files in the tables of a database of its own (DLFM_DB). There may be more than one DLFM in a cell, but there is only one DB2 Logging Manager (for example, a single DLFM_DB database in the entire cell). Therefore, if any DLFM daemons have to access information from the Logging Manager, they do so by connecting to the node having DLFM_DB database (refer to Figure 2-18).

Note: Complete location transparency in DCE-DFS is implemented at the cell level, in the sense, that within a given cell, a portion of the namespace could be on a fileset that could reside on any fileserver within the cell without the user needing to be aware of it. In fact, the fileset could have read-only replicas on different fileservers, and a fileset could even be migrated from one fileserver to another transparently to a user even while it is actually in use by a file system client. For this reason, meta data (that is, information about linked files maintained by the Data Links Manager) is maintained in a per-cell common database, rather than separately for each file server.

58 Data Links: Managing Files Using DB2 NODE 1

DLFM Server DB2 Logging Single DCE Manager (DLFM_DB) Cell

NODE 2 NODE n

DLFM DLFM Server Server DB2 DB2 Client Client

Figure 2-18 DLFMs in a single DCE cell

2.5.2 Data Manager Application (DMAPP) The core functionality of DLFS on the DFS File Server is achieved by DMAPP running in user mode on the file server to intercept events (data manager events) that are generated for various file system calls and provide the extra access control and referential integrity features for files linked under Data Links. The design of the DMAPP component is based on the DFS Storage Management Toolkit (SMT), an implementation of the Data Management Application Programming Interface (DMAPI) for DCE-DFS. DMAPI is a standard for a user level programming interface to implement logical extensions of the operating system for supporting data management applications, which need to intercept file system operations in a manner that is transparent to file system applications and users. DFS SMT also implements some extensions to DMAPI to support DFS-specific aspects including security aspects and file set-level operation notifications among others.

Chapter 2. Technical architecture 59 With this approach, DMAPP is implemented as a user level Data Management Application (DMAPP) that receives and can respond to events corresponding to file system operations in a transparent manner, both for accesses via the file-exporter (through the DFS path) and via the local OS (through the local mount path).

Note: File-exporter (also known as the DFS exporter) is the interface through which the applications on the DFS client accesses the files on the DFS server.

This application can also invoke callbacks into the file system, through certain specific DM API calls, as required for performing Data Links specific processing. In addition, there are some DM API calls that are used for determining the security context of the caller (which is required for recognizing privileged users). Upcalls to DLFM are handled through a suitable user-level IPC mechanism with DLFM, since both applications are in user mode.

DMAPP process model The main application runs as a single thread, waiting for events. As and when events are received, the DM application spawns a thread for handling each of these events. At boot time, DMAPP is started and initialized for the aggregates on the file server host machine. Now, whenever a DM aggregate (an LFS aggregate on the server machine is converted to a DMLFS type by the dmaggr command, therefore, activating the DFS SMT on it) on the DFS file server is exported into the DCE Namespace, the mount event is generated. At this point, the events to be intercepted for the aggregate are set, so that all future operations on files under that portion of the fileserver namespace are redirected to the DMAPP.

The DMAPP performs some pre-processing if required (depending on the operation requested), and then passes it on to the underlying file system to do the rest of the job (by means of responding to the events appropriately). Then, if required, the DMAPP does some post-processing on the results of the operation and also does some logging, mainly for disk-crash utility purposes. In some cases, however, when the DMAPP needs to disallow the operation (to provide Data Links functionality), it may directly return an error status instead of passing it on to the next layer. Since the mount event is intercepted for any aggregate that is exported and from there onward, intercepting all operations into file sets in that aggregate, it is ensured that DB-linked files are not accessible unless DMAPP can intercept the request. The events for an aggregate are reset on receiving an unexport request. Figure 2-19 shows the DMAPP implementation and how it interacts with SMT and DLFM.

60 Data Links: Managing Files Using DB2 DB2 Server DFS DFS Client (1) Server

IPC DLFM DMAPP DB2 (5) Application (4) (2) DFS Storage Management User Toolkit (SMT) (2) Mode Local Mount DFS Mount Kernel Path Path Mode System Call DFS Exporter (3) Handler VFS+ VFS/Vnode Interface DM aggr DMLFS Logical File System Kernel DMLFS VFS

NFS VFS CDFS VFS Other VFS LFS

Other Device Drivers Disk Device Driver

Data Links Components

DB2 Components

Figure 2-19 The DMAPP implementation

In Figure 2-19, the implementation process flows like this: 1. The DB2 application connects to the DB2 server database and obtains the DATALINK value of the file. 2. The application then accesses the file either through the DFS mount path or through the local mount path.

Chapter 2. Technical architecture 61 3. The DMLFS aggregate generates events corresponding to the request from the application and passes them to DFS SMT. 4. DFS SMT passes on these events to the DMAPP. 5. The DMAPP interacts with DLFM (if required) and determines whether the request has to be allowed.

2.5.3 Data Links File System Cache Manager (DLFS-CM) To improve performance, DFS implements enhanced caching at the DFS client. It is essential that the users should not be able to access READ DB PERMISSION files without a valid token. Now if there is no component of Data Links on the DFS client that can intercept a user’s request, it would be possible for anybody to access READ PERMISSION DB files from the cache, without any token. DLFS-CM is also known as the DFS Client Enabler for Data Links.

The Data Links component that is required for this is known as the Data Links Cache Manager (DLFS-CM). DLFS-CM is a kernel-level file system filter (like DLFF in Data Links implementation on UNIX and Windows) that sits on top of the DFS Client Cache Manager and filters some operations for proper Data Links functionality.

Therefore, the main responsibility of DLFS-CM is to provide support for specialized database access control through encrypted tokens embedded in the file name, for READ PERMISSION DB files. The main functionality of DMAPP is to provide referential integrity.

Figure 2-20 explains the complete architecture of Data Links on DCE-DFS.

62 Data Links: Managing Files Using DB2 IPC TO DB2 DLFM Data Manager DB2 Client User Upcall Application Application Daemon (DMAPP)

EVENTS (1) API Calls Logical Logical File System File System VFS (2) DFS DMAPI JFS DLFSCM VFS Kernel Ext. DM LFS Kernel File (3) VFS+ Exporter Disk Driver DFS Client VFS (4) and VFS+ DFS-LFS

DFS Client (DLFS enabled) DFS File Server (DLFS-enabled filesets) Data Links Components

DCE-DFS Components Figure 2-20 Data Links architecture on DCE-DFS

In Figure 2-20, the architecture flows as explained here: 1. The application on DFS client accesses the file under the Data Links enabled file set. 2. The logical file system layer passes the request to the DLFS-CM VFS. 3. DLFS-CM further passes the request to the base (DFS client VFS). 4. DFS Client VFS interacts with the DFS file-exporter (the interface between the DFS client and DFS server) and sends a request to access the file. Finally DMAPP is informed about the request through events, which interact with the DLFM and authenticates the request (as discussed in Figure 2-19 on page 61).

Chapter 2. Technical architecture 63 64 Data Links: Managing Files Using DB2 3

Chapter 3. Application development

This chapter discusses how application programs interact with the Data Links File Manager, and how to plan and create those applications. It begins with a discussion of what types of applications might benefit from the use of Data Links. Next it discusses how Data Links gives you the ability to apply advanced database programming concepts to an environment based on files, while at the same time allowing the use of traditional file-based APIs. It also compares the pros and cons of using large objects (LOBs) with Data Linked files and then discusses the day-to-day tasks that an application developer must deal with when using files managed by Data Links.

Any application that creates or retrieves large numbers of files is an ideal candidate to use Data Links. The content of the files can be anything: MP3 audio, video clips, images of any kind, engineering drawings, HTML files, word processing documents, etc. Data such as these are, in many cases, not stored in a database. Many existing applications store data in files for the performance benefits of direct file access without the overhead of a DBMS. Data Links provides the benefits of a database to externally stored files (security, integrity, transactional consistency, recoverability) without impacting the performance of applications that access those files.

Some examples of industries that use large numbers of files that could be managed with Data Links are: Aerospace and automotive engineering: Three-dimensional part geometry files Banking: Check images E-commerce: HTML files and images of products Bio-technology: Genetic sequences stored in files Music industry: Downloadable MP3 and WAV files Insurance: Insurance policies Internet services: E-mail messages, HTML files, downloadable software Medical industry: X-ray images

3.2 Transactional semantics for files in the application

Data Links applies many of the benefits of relational database technology to the management of external files. The concepts of access control, referential integrity, and backup and recovery can now be applied to external files as well as to data stored in a database. This helps to ensure that all of the data used by the enterprise is secure, up to date, and recoverable. Data Links provides these capabilities by storing information about the files being managed in its own DB2 database (DLFM_DB). The information kept about the files depends on the options chosen when defining the corresponding DATALINK columns.

When a file is linked, Data Links enforces referential constraints (if they have been defined) by validating that the file exists on the specified server and in the specified location. If the DATALINK column has also been defined as recoverable (RECOVERY=YES option), Data Links makes a backup copy of the file and record in the DLFM_DB database the attributes of the file such as when the file was linked, owner of the file, file access permissions, etc. This makes it possible

66 Data Links: Managing Files Using DB2 to perform a coordinated point-in-time recovery of a DB2 database and the external files linked to the database. If any of the options to apply database access controls to the file have been selected, Data Links changes the owner of the file or the file access permissions.

All of the actions performed by Data Links, the making of the backup copy, the changing of owner and access permissions, the recording of file attributes in DLFM_DB, all occur within the scope of a database transaction. If an SQL statement that links or unlinks a file fails, or the transaction is rolled back for any other reason, the changes made by Data Links and the information recorded in the DLFM_DB database are also rolled back. This ensures that the state of a DB2 database and its linked files is always consistent.

3.3 Data Links versus LOBs

This section discusses the pros and cons of using large objects (LOBs) over Data Linked files.

3.3.1 Using LOBs DB2 provides the ability to store large character or binary strings in what is generically referred to as a large object (LOB). Character strings can be stored in a Character Large Object (CLOB), while graphic or binary strings can be stored in a Binary Large Object (BLOB). Data requiring double-byte character sets, such as documents written in Kanji, can be stored in Double-Byte Character Large Objects (DBCLOBs).

LOBs can be used to store the actual content of files inside of a DB2 database. The advantage of doing this is that the data stored in an LOB can be recoverable; it is backed up when the db2 backup database command is run. This can also be a negative, because every time the database backup is performed, all of the LOB data must be written to the backup. If the volume of data stored in LOB columns is large, the database backup can take a long time to complete, as well as require a large amount of storage media to hold the output of the backup.

To be recoverable, the LOB column must be defined as LOGGED. This means that when the LOB column is populated, modified, or deleted, its data is written to the DB2 log file. Because DB2 logging is a serialized process, meaning that only one process at a time can write data to the log file while others wait, performance can suffer, particularly when logging very large LOBs. The performance impact due to logging activity for LOB columns can be avoided by defining the LOB column as NOT LOGGED, but doing so makes the data stored in the LOB column non-recoverable.

Chapter 3. Application development 67 Another advantage of using the LOB data types is that access to the data can be controlled using table access privileges, for example, SQL GRANTs. Any user who has not been granted the SELECT privilege cannot read the data stored in an LOB column. Users must also be specifically given the privilege to issue the SQL INSERT, UPDATE, or DELETE statements.

Application programs use file reference variables to transfer data between an external file and an LOB. The file reference variable does not contain the data, but points to the data with a path name and file name. Data is inserted into an LOB column using the SQL INSERT statement with a file reference variable which points to an input file containing the data. Data is copied from an LOB column into an output file using the SQL SELECT statement with a file reference variable pointing to the output file. When using a file reference variable, the entire content of the LOB or the input file is copied. If the amount of data moving between the file and the LOB column is large, performance can suffer.

3.3.2 Using Data Links Data Links provides many of the benefits of LOBs without the drawbacks. Files that are linked to a database can be recoverable (when the DATALINK column is defined using the RECOVERY YES option). When a file is linked, the file data is not written to the DB2 log file, as with LOBs, but instead a backup copy of the file is made and stored in a separate file system or directory. The process of making the backup copy occurs asynchronously. The application program that linked the file does not have to wait for the backup copy to be created. In addition, the files are backed up only once, when they are linked. This means that the db2 backup database command does not repeatedly back up the same files over and over, so performance is improved, and the amount of media required to store the output of the backup is reduced.

Data Links also provides the flexibility to control access to the linked files through either table access privileges, for example, SQL GRANTs, or through file system access privileges. When a DATALINK column is defined using the READ PERMISSION DB option, access to the linked files is only granted to those users with the SQL SELECT privilege on the table containing the DATALINK column. Users who do not have this privilege cannot access the linked files. The advantage of this is increased security over the linked files. The disadvantage is that any legacy applications that use native file system APIs to read files may need to be changed to access a DB2 table before being able to access those files.

68 Data Links: Managing Files Using DB2 If the DATALINK column was defined with the READ PERMISSION FS option, access to the linked files is not controlled by table access privileges, but is instead controlled by the access permissions of the linked files. This means that the linked files can be accessed without accessing the DB2 table containing the DATALINK column. The advantage of this is that legacy applications do not need to be changed to access linked files. The disadvantage is that the linked files may be somewhat less secure.

One other advantage of using Data Links instead of LOBs is that data does not have to be transferred between a file and the database. Therefore, performance is improved. Applications can access the data in the linked files without the overhead of database logging, or the overhead of data transfer between the client application and the server, and yet still have the benefits of recoverability and database security controls.

3.4 Application development tasks

This section discusses the day-to-day tasks that an application developer must deal with when using files managed by Data Links.

3.4.1 Application deployment considerations Applications that use Data Links need to have access to both the DB2 database that stores the URLs pointing to the linked files, as well as the file system(s) where the files reside. Because the database server, the DLFM server and the file systems managed by it, and the application code can be hosted on different machines and different operating systems, some thought must be given to how each of these components interact. The database administrator who installs and configures Data Links on the DLFM server is responsible for ensuring that DLFM and any DB2 database that will use it can communicate (for details, see DB2 Data Links Manager Quick Beginnings, GC09-2966). The application programmer needs only to be concerned with how their program will access the DB2 database and the linked files.

Enabling access to a DB2 database Any program that accesses a DB2 database must do so through the DB2 Client Application Enabler (CAE) software. Whether your program runs on each end-user workstation, or on an application server, or on the DB2 database server, it always connects to the database through the DB2 CAE. There is a version of the CAE available for each of the supported platforms (including Windows, OS/2, and many of the UNIX platforms including AIX, Solaris, Linux, HP-UX, among others). Programmers need the Application Development CAE,

Chapter 3. Application development 69 which provides tools for building applications which access DB2 databases. End users usually only need the Run-Time CAE. For a complete description of how to install and configure the DB2 client software, see DB2 Data Links Manager Quick Beginnings, GC09-2966, for your client platform.

Enabling access to linked files An application program must also be able to access the file system on which the linked files reside. If the program is running on a Windows client, and DLFM and the linked files are on a Windows server, access to the linked files can be accomplished by sharing the drive on the server and mapping it on the client. NFS can be used to access the files on UNIX fileserver (Data Links server). The FTP and HTTP protocols can be used, independent of the platforms, to access a Data Linked file.

3.4.2 Checking whether Data Links has been enabled Before you can use the DATALINK data type in your applications, the DB2 database must be enabled for Data Links. Typically, the database administrator who installs and configures the DLFM server also registers the databases that will use that DLFM server, as well as register the DLFM server with those databases. This registration process is described in detail in DB2 Data Links Manager Quick Beginnings, GC09-2966.

You can obtain a list of all the Data Links File Managers that are registered with the database using the list datalinks managers command: db2 list datalinks managers for

Here, is the name of the database.

The command should return output similar to this: There are 1 DB2 Data Links Managers for database sample Type = Native Port = 50100 Name = UNUNBIUM.ALMADEN.IBM.COM

You can also check that the DB2 instance has been enabled for Data Links support by checking the database manager configuration parameter DATALINKS has been set to YES. On AIX, you could use this command: db2 get database manager configuration | grep -i datalinks

The command should return output similar to this: Data Links support (DATALINKS) = YES

If the DLFM server is registered with the database and the DATALINKS configuration parameter is set to YES, you are ready to begin using the DATALINK data type in your applications.

70 Data Links: Managing Files Using DB2 3.4.3 Choosing DATALINK options Before you create a table using the DATALINK data type, you must decide which options to use. Section 2.2.3, “DATALINK options” on page 26, describes in detail each of the valid options. This section discusses some considerations to make before you choose which DATALINK options to use. Table 3-1 shows the DATALINK options that are discussed with a brief description of the option. Table 3-1 DATALINK options DATALINK option Description

LINK CONTROL Validates file references

INTEGRITY ALL Prevents deletion of link file

READ PERMISSION Allows/disallows file to be read outside of database control

WRITE Allows/disallows file to be written to outside of database control PERMISSION

RECOVERY Provides ability to restore unlinked files

ON UNLINK Deletes or restores file when it is unlinked

How much control should the database have over the linked files?

The answer to this question alone may help you decide which DATALINKS options to use. At first glance, it may seem obvious that using Data Links to manage files stored outside of the database would imply a desire to have database controls over those files.

The question is to what degree do you want to impose those controls?

Data Links gives you the ability to apply the concept of referential integrity to the files being managed. Before a reference to a file can be put into the database, the file must exist. This helps you ensure that there are no invalid file references. But perhaps the application does not care if the file really exits. Data Links allows you to choose whether you want to use referential integrity for your files through the use of the LINK CONTROL option. If the NO LINK CONTROL option is used, none of the remaining options apply, and indeed, cannot be used.

Do you want to allow files that are referenced by the database to be deleted outside the scope of database control?

The DATALINK option, INTEGRITY ALL, gives you another way to enforce referential integrity by disallowing the deletion of a file that has been linked.

Chapter 3. Application development 71 How much control do you need over who reads your files?

Traditional file access controls can be used. Anyone who has permission to read the file does so by using standard file system APIs without being authorized by the database. This is particularly useful if you already have applications in place that access the files that will be managed with Data Links. By using READ PERMISSION FS these existing applications do not need to be modified to use SQL queries to retrieve a file reference before accessing a file. If, however, you want to use database access controls, you can use READ PERMISSION DB. This controls access to the linked files by requiring an access token to read the file. This access token is given to the user when the DATALINK value is read from the table. Any attempt to access a file without the access token is rejected.

Figure 3-1 shows the access token portion and the file name portion of the DATALINK value returned from a SELECT statement. Here is an example of trying to access the file without using the access token: copy World_Domintation_Plan.doc /tmp copy: junk3.txt: The file access permissions do not allow the specified action

The correct way to access the file is to use the access token with the file name. Note that because the access token is separated from the file name by a semicolon, the access token and file name may need to be enclosed in quotation marks to prevent the operating system from complaining. Here is an example of using the access token in a copy command: copy ”042E2_Ckg9sE__A.hqFsTnJm_;World_Domintation_Plan.doc” /tmp

Access to the file is controlled by granting the SELECT privilege on the table with the DATALINK column to authorized users. Any user who does not have the SELECT privilege on the table does not have access to the linked files. The disadvantage of using READ PERMISSION DB is that existing applications that access files need to be rewritten to access the database.

72 Data Links: Managing Files Using DB2 /projects/World_Domination_Plan.doc

My_Projects table

SELECT dlurlpathonly(datalink_column)INTO:V1 FROM My_Projects

V1 = "/projects/04E2_Ckg9sE__A.hqFsTnJm_;World_Domination_Plan.doc"

access token file name semicolon

Figure 3-1 DATALINK access token

The WRITE PERMISSION option and the RECOVERY option must be considered together. If you want the ability to restore the database to a point-in-time and have the linked files automatically restored to the same point-in-time, you must use the RECOVERY YES option. When using RECOVERY YES, a copy is made of each file that is linked, and characteristics of the file such as location, file size, creator, etc. are recorded in the DLFM_DB database by the DLFM.

When a point-in-time recovery is performed on the DB2 database, DB2 requests DLFM to restore the files to the state they were in at that point-in-time. To provide this service, DLFM needs to have a backup copy of each different version of the file. If a file is changed outside the scope of database control, DLFM has no way of knowing what changes have been made to a file, and therefore, cannot support point-in-time recovery. The WRITE PERMISSION BLOCKED option forces the user to make changes to a copy of the original linked file, and then update the DATALINK value to point to the copy. This causes the original file to become unlinked, and the new file to become linked, and therefore, backed up by DLFM. With this method DLFM knows about each different version of the file and can participate in point-in-time recovery. This is why the WRITE PERMISSION BLOCKED option is required if you use the RECOVERY YES option.

If you are not interested in coordinated point-in-time recovery of the database and linked files, you can always use WRITE PERMISSION FS. This gives you the freedom to modify linked files without accessing the database and without updating the DATALINK values that point to the files.

Chapter 3. Application development 73 You also must decide what to do with files that become unlinked. When you delete a row containing a DATALINK value that points to a linked file (for example, the DATALINK value is not NULL and not zero-length) or you set the DATALINK value to NULL or zero length, the linked file becomes unlinked. You can choose to have the unlinked file immediately deleted from the file system or directory by using the ON UNLINK DELETE option (note that if you are using the RECOVERY YES option, the backup copy of the file is not deleted). This can be a useful cleanup mechanism.

Unneeded files do not waste any disk space, and you do not have to implement a separate cleanup process. However, you may decide that it is better to keep the file. If you are using READ PERMISSION DB, the owner of the file is changed to dlfm and its access permissions are set to read-only when the file is linked. The ON UNLINK RESTORE option resets the file owner to the original owner, as well as resetting the access permissions to their original state.

3.4.4 Changing DATALINK options Once a table is created with a DATALINK column and the DATALINK options are chosen, the only way to modify those options is to export the data from the table, drop the table, recreate the table with the new DATALINK options, and load or import the data into the new table. There is no command that changes the DATALINKS options, so choose these options with care.

3.4.5 Querying DATALINK options When a table is created with a DATALINK column, the DB2 system catalog records the DATALINK options that were used. These options are recorded in the SYSIBM.SYSCOLPROPERTIES table in the DL_FEATURES column. These options can be viewed using the DB2 Command Line Processor with the following query: db2 select colname, tabschema, tabname, dl_features from sysibm.syscolproperties

COLNAME TABSCHEMA TABNAME DL_FEATURES ------PICTURE TARGET MGR_COPY UFAFBYR

Figure 3-2 provides the meaning of each of the values stored in the DL_FEATURES column.

74 Data Links: Managing Files Using DB2 DL_FEATURES column from the SYSIBM.SYSCOLPROPERTIES table

UFAFBYR

On Unlink R=Restore, D=Delete, N=Not applicable Recovery Y=Yes, N=No Write Permission F=FS, B=Blocked Read Permission F=FS, D=DB Integrity A=ALL, N=None Link Control F=FILE, N=No Linktype U=URL

Figure 3-2 DATALINK options stored in SYSCOLPROPERTIES table

3.5 Coding considerations

One of the advantages of using Data Links to manage files is that applications built to access those files can use traditional file system APIs to read and write the files. A programmer who is familiar with building applications that access DB2 databases only needs to learn a few new functions that deal with the DATALINK data type. For a complete discussion of how to build applications using DB2, see DB2 Application Development Guide, SC09-2949, and DB2 Application Building Guide, SC09-2948.

This section discusses how to declare a host variable for the DATALINK data type, how to link a file, read a linked file, update a linked file, and unlink a file. It also discusses the scalar functions used with the DATALINK data type, and it reviews some of the common programming errors encountered when using Data Links.

3.5.1 Host variable declaration The DB2 host languages provide no host variable support for the DATALINK data type. This means that whenever dealing with a DATALINK column, programs must treat it as if it were a VARCHAR data type. Table 3-2 shows how to define a variable to hold a DATALINK value in each of the supported DB2 host languages.

Chapter 3. Application development 75 Table 3-2 Host language variable declaration for DATALINKS data type Host language DATALINK variable declaration

C struct tag { short int, char[254] } DATALINK_VAR_NAME or char DATALINK_VAR_NAME[255]

JAVA String DATALINK_VAR_NAME

PERL Host variables are not used. DATALINK values are placed into an array column. See Chapter 22, “Programming in Perl”, in DB2 Application Development Guide, SC09-2949.

COBOL 77 DATALINK_VAR_NAME PIC X(254)

FORTRAN SQL TYPE IS VARCHAR(254) DATALINK_VAR_NAME

REXX No host variable declaration is necessary. Host variable data type and size are determined at run time

Note that although the length of the host variable used to hold a DATALINK value is a maximum of 254 bytes, the maximum length of the data location portion plus the comment portion is 200 bytes. The additional 54 bytes is reserved for holding the access token that is generated by DLFM when retrieving a DATALINK value from a table defined using the READ PERMISSION DB option.

3.5.2 Creating and linking a new file Application programs using Data Links create files in the same way that they always have, by using the standard file APIs provided by their programming language of choice. When using the C language, the programmer might use the fopen function to create a file. For DLFM to link the file, the file must reside in a file system or directory that is managed by DLFM.

After you create the file, you build the DATALINK value in the form of a URL that points to the file. The URL consists of four parts: The URL scheme The hostname on which the file resides The path name of the file The file name itself

You need to store these values in a string variable. Here is a sample C code fragment to do this:

76 Data Links: Managing Files Using DB2 char URL[255]; strcpy (URL,”HTTP://myhost.com/projects/World_Domination_Plan.doc”);

The last step to link the file is to create a row in a table with a DATALINK column using the SQL INSERT statement. The string variable containing the URL must be cast into the DATALINK data type by using the DLVALUE built-in scalar function. Here is what an SQL INSERT statement might look like: INSERT INTO MY.DOCS VALUES (:doc_id, DLVALUE(:URL))

If the INSERT is successful, the file is now linked. Depending on the options used to define the DATALINK column, the file might be backed up by DLFM (when using RECOVERY YES), and its access permissions and ownership may be changed. Although an application program can always depend on an sqlcode of “0” as an indication that the SQL INSERT statement was successful and that the file was linked, you can also retrieve the DATALINK value from the table with an SQL SELECT statement. If the DATALINK value is returned, DLFM has validated that the URL does indeed point to a valid, linked file (assuming that the DATALINK column was defined using the FILE LINK CONTROL option).

3.5.3 Reading a linked file When you retrieve a DATALINK value from a table, the entire URL is returned, along with some leading characters, which indicate that what follows is a URL:

URL HTTP://myhost.com/projects/World_Domination_Plan.doc

To retrieve the URL without the leading characters, you can use the DLURLCOMPLETE scalar function. Here is a sample C code fragment to do this:

char[255] my_URL; SQL EXEC SELECT DLURLCOMPLETE(DATALINK_COLUMN_NAME) INTO :my_URL;

The SELECT would return: HTTP://myhost.com/projects/World_Domination_Plan.doc

Web-enabled applications may be able to use this URL “as is”, directing a browser to open the file using HTTP protocol. Many times, however, the application program needs to use standard file system APIs to open and read the file. In this case, you typically need only the path name and file name to access the file. You can use another scalar function, DLURLPATH, to retrieve the path name and file name. Here is a C code example of retrieving the path name and file name and opening the file for read access: char[255] thefile;

SQL EXEC SELECT DLURLPATH(DATALINK_COLUMN_NAME) INTO :thefile

Chapter 3. Application development 77 FROM MY.TABLE WHERE FILE_ID=’123;

fopen (thefile, ‘r’);

Reading files linked with READ PERMISSION DB When you retrieve the URL or path name to a file that was linked using READ PERMISSION DB, an access token is returned along with the URL or path name. This access token must be used as part of the file name when reading the file, for example:

char[255] my_URL; SQL EXEC SELECT DLURLCOMPLETE(DATALINK_COLUMN_NAME) INTO :my_URL;

The select would return:

HTTP://myhost.com/projects/04E2_CkKGxE;World_Domination_Plan.doc

The access token “04E2_CkKGxE” is the string of characters immediately preceding the file name and is delimited from the file name by a semicolon. When reading the file, the access token, semicolon, and the file name must all be used. Some operating systems have problems with the semicolon embedded as part of the file name, unless the entire file name is quoted: ls -al “04E2_CkKGxE;World_Domination_Plan.doc”

When an application program retrieves a DATALINK value containing an access token, the access token is valid for a limited amount of time. The database configuration parameter DL_EXPINT determines the length of time for which the access token is valid. Any attempt to read a file with an expired access token is rejected, and it is necessary to access the table again to obtain a valid access token.

Key points to remember: The SQL SELECT statement is used to retrieve the complete URL or path name to the file. Once the application knows the location of the file, it can read it like any other file. Files linked with READ PERMISSION DB require a valid access token to read them.

78 Data Links: Managing Files Using DB2 3.5.4 Updating a linked file If you are using the DATALINK option WRITE PERMISSION FS, you can update a linked file without having to do anything special with the DATALINK value. Because DLFM does not support recovery of linked files when using WRITE PERMISSION FS (see 3.4.3, “Choosing DATALINK options” on page 71, for an explanation why), you are free to change the file as you like. Anyone who has write permission on the file can rename, delete, or change the file outside the scope of database control.

To update a linked file when using WRITE PERMISSION BLOCKED, you must first unlink the file by setting the DATALINK value to NULL (if the column is nullable) or a zero-length URL, or by deleting the row that points to the file. When the file is unlinked, DLFM no longer controls access to the file, and you are free to modify it. After you change the file, you can relink it by updating the NULL DATALINK value or inserting a row with a DATALINK value pointing to the file, and it will once again be controlled by DLFM. One advantage to using this method is that the name of the file being linked does not need to change.

An alternative to unlinking and relinking the file is to make a copy of the linked file, modify the copy, and update the DATALINK value to point to the new file. This method has the advantage of using fewer SQL statements to do the job, but the disadvantage is that you require more disk space, because, at least temporarily, you have both the original file and a copy taking up space on the disk. Another disadvantage of this method is that the name of the linked file may need to change, and the application program would need to contain logic to generate a new file name.

3.5.5 Unlinking a file A file can be unlinked by either deleting the row whose DATALINK column points to the file, or by updating the row and setting the DATALINK value to NULL or to point to another file.

When the file is unlinked, the action taken depends on the ON UNLINK option used to define the DATALINK column. The ON UNLINK DELETE option causes the file to be physically removed from the file system. This option must be used with care. Make sure you really want the file to be deleted. The ON UNLINK RESTORE option causes the file to be restored to the original ownership and access permissions that were in place when the file was linked. Remember that when using READ PERMISSION DB, DLFM changes the owner of the file to dlfm and the access permissions to read-only.

Chapter 3. Application development 79 3.5.6 Scalar functions used with the DATALINK data type Because there is currently no host language support for the DATALINK data type, application programs must use the built-in scalar functions to build a DATALINK value and to extract the various components of a DATALINK value. These functions are described in 2.2.2, “Scalar functions for DATALINK data type” on page 24.

Scalar functions with DB2 Call Level Interface DB2 Call Level Interface (CLI) provides two scalar functions for the creation and retrieval of DATALINK values: SQLBuildDataLink: This function is used to create a DATALINK value and is the CLI equivalent of the DLVALUE function described in 2.2.2, “Scalar functions for DATALINK data type” on page 24. SQLGetDataLinkAttr: This function is used to extract the various components which make up a DATALINK value, and is the functional equivalent of the remaining DATALINK scalar functions described in Chapter 2, “Technical architecture” on page 11.

For a complete description of the syntax, function arguments and usage of these functions, refer to DB2 UDB Call Level Interface Guide and Reference, SC09-2950. For a complete sample program using CLI that connects to a database, creates a table with a DATALINK column, inserts a row into the table, and then fetches the row, see Appendix B, “CLI Example,” in DB2 Data Links Manager Quick Beginnings, GC09-2966.

Scalar functions with JDBC The Sun JDBC 3.0 specification provides a DATALINK type code, which is defined in the java.sql.Types class, as well as methods for storing and retrieving references to externally stored data (for example, URLs pointing to linked files). At the time this redbook was written, the implementation of these methods (setURL and getURL) were not yet finalized. Application programs using JDBC should treat a DATALINK value as a String and use the getString and setString methods. Note that java applications can still use embedded SQL, which uses the scalar functions described in 2.2.2, “Scalar functions for DATALINK data type” on page 24.

80 Data Links: Managing Files Using DB2 3.5.7 Error handling When application programmers start to use a new data type, they usually find a whole new set of error conditions to deal with. The DATALINKS data type is no exception to this rule. It is important to understand what can go wrong so that your application program can handle the condition gracefully. This section briefly discusses the two most common errors that programmers are likely to encounter and suggests ways of dealing with the errors.

Tip: DB2 returns error codes to applications through the sqlcode portion of the SQL communications area (SQLCA). The sqlcode is of the form -nnnn or +nnnn, where nnnn is a 4-digit number. A text description of the error can be obtained by using the DB2 command line processor “?” command. When using the “?” command, the four-digit error number should be prefixed with the characters SQL and followed by the character N (for error messages; the sqlcode is negative), W (for warnings; sqlcode is positive), or I (for informational messages, sqlcode is positive). For example, if the application program receives an error code of -0358, enter: db2 ? SQL0358N

For an error code of +0100, enter: db2 ? SQL0100W

SQL0358N - Unable to access file... This is probably the most common error that you will encounter as you are developing a program to use Data Links. This error message is accompanied by a reason code (use the db2 ? SQL0358N command from the DB2 command line processor for a complete list of reason codes). The most common causes of this error are: Attempt to link a file that does not exist Format of the URL is bad (for example, does not begin with HTTP, etc.) File is already linked A linked file has been deleted

The cause of this error can usually be determined by looking closely at the DATALINK value and verifying that all of its components are valid: Is the protocol specified correct (HTTP:// or UNC:\)? Is the server name specified a valid Data Links server that is registered with the database? Is the path name to the file correct? Is the file name correct?

Chapter 3. Application development 81 SQL0357N - Data Links Manager Not Available This error occurs when attempting to access a a DATALINK value in a table that has been defined with the INTEGRITY ALL option, and the Data Links server is down. When using INTEGRITY ALL, DB2 contacts DLFM to check the validity of the URL before returning the URL to the application program. If DB2 is unable to contact the File Manager for any reason, an sqlcode of -357 is returned.

The obvious way to correct this error is to start the File Manger. If this is the first time that a Data Link server is being used, check that it is correctly registered with the database using the command: db2 list datalinks managers for database

3.6 Using multiple file servers

A single DB2 database can work with multiple DLFM file servers. Perhaps you want the database to control files stored on Windows NT and on AIX (see Figure 3-3). You can accomplish this by registering both file servers with the DB2 database, and by registering the DB2 database with each of the file servers. Note that the Data Links file servers can reside on any combination of the supported platforms. There are, however, a few restrictions.

First, if you register a Data Links file server that resides in a DFS cell, it can be the only file server registered with the database.

Next, the maximum number of file servers you can register with a database is 16.

Each of the DLFMs registered with a database must be on the same version and release level as the database. That means that when you apply fixpaks to the database or upgrade to a new version of DB2, you must also update all of the DLFM servers.

82 Data Links: Managing Files Using DB2 AIX DB2 UDB Database

File Table File Server

DATALINK Value Windows NT

DATALINK Value

File File Server

Figure 3-3 Using multiple DLFM file servers

3.6.1 Supporting multiple links to the same file If you attempt to insert a row with a DATALINK value pointing to a file that is already linked, you received an error (SQL0358N - reason code 25). DLFM allows you to link a file once. All subsequent attempts to link the file will fail, unless the file is first unlinked.

What if the DB2 database has two DLFMs registered? Couldn’t you link the file to both of them?

The answer is no, you cannot. The reason for this is that any given file system can only be managed by one DLFM. The file system must reside on the same server on which DLFM is running, and there can be only one instance of DLFM running on any given server. Because of this, you would be unable to register a file system with more than one DLFM, and therefore unable to link a file to more than one file manager.

Chapter 3. Application development 83 3.7 Migrating existing applications to use Data Links

Existing application programs that access files or LOBs can easily be migrated to use files that are managed by Data Links. Perhaps the biggest hurdle will be for programmers working with files who have never worked with DB2 or another relational database management system. Programmers unfamiliar with SQL should begin by reading DB UDB SQL Getting Started, SC09-2973. Those who are familiar with SQL but have never built applications using DB2 should read DB2 Application Development Guide, SC09-2949, and DB2 Application Building Guide, SC09-2948. These and other DB2 related publications are available online in HTML or PDF format, free of charge, at: http://www.ibm.com/software/data/db2/library

3.7.1 Migrating an application that uses files An application program that uses files usually has code that deals with the creation, modification, and deletion of those files. The program might prompt the user for information about which file name to open or create, or it might store or retrieve information about the file name and its attributes in a database. Using Data Links to manage files requires some initial preparation of the file server, perhaps some software installation and configuration on the client workstations, as well as changes in the application programs that access the files.

Changes on the file server The first thing that must be done to implement Data Links control of the files is to install the Data Links software on the file server. DB2 Data Links Manager Quick Beginnings, GC09-2966, describes how to install and configure Data Links.

Next you must prepare the file system or drive where your files reside to be managed by Data Links. If the files are on a UNIX platform, you must convert the file system to a DLFS file system. If the files are on a Windows NT platform, they must be put on to an NTFS formatted drive that was specified as a Data Links managed drive during installation of Data Links on the Windows NT server. DB2 Data Links Manager Quick Beginnings, GC09-2966, provides instructions for doing this.

Changes on the client workstation Any client workstation running applications that access DB2 databases must have the DB2 Client Application Enabler (CAE) installed. The CAE resolves the location and communication protocol(s) of a DB2 database when an application program issues a CONNECT command to connect to a database. Databases must be “cataloged” on the client to be accessible to the application. The Quick Beginnings manuals for UNIX (GC09-2970), Windows (GC09-2971), OS/2

84 Data Links: Managing Files Using DB2 (GC09-2968), and Linux (GC09-2972) each contain a section describing how to install and configure the DB2 CAE software. The CAE software for all supported clients is free and can be downloaded from: http://www.software.ibm.com/data/db2/udb

Changes to the application program The biggest change to any application program migrating from file access to using Data Links is that the program must interact with DB2. This means connecting to a DB2 database and using SQL to store and retrieve data from tables in the database. Any program that accesses files managed by Data Links must store and retrieve file references (URLs) in a DB2 table with a DATALINK column.

For a file to be linked (managed by Data Links), the program must perform an SQL INSERT operation with a DATALINK value that points to the file. When an application program wants to open a linked file, it does so by performing an SQL SELECT statement on a table with a DATALINK column and retrieving the path name and file name. Once the path name and file name are known, the program can use the standard file system APIs to read it. Depending on the DATALINK options chosen, the file may need to be unlinked before it can be changed. Once a file is unlinked, the standard file system APIs can be used to modify the file.

Section 3.5, “Coding considerations” on page 75, describes how files are linked and unlinked, how to read linked files, how to modify linked files, and how to use the DB2 built-in scalar functions to create and manipulate DATALINK values.

3.7.2 Migrating an application that uses LOBs Applications that use LOBs have many of the same considerations when migrating to use Data Links as do applications that use files. The Data Links software needs to be installed, and a file system or shared drive must be configured to hold the files. The DB2 database containing the LOB data must be registered with DLFM, and the Data Links File Server that will manage the files must be registered with the DB2 database. If an application using LOBs is running on client workstations, the DB2 Client Application Enabler must already be installed and configured on those workstations.

Three additional things must occur for a program to migrate from using LOBs to use files managed by Data Links. First, you must externalize the LOB data, that is, put the data which is stored in an LOB column into files. Second, you must create a new table with a DATALINK column instead of an LOB column. You can then populate the new table with data from the existing table and establish links to the files. Third, the programs must be changed to use the new DATALINK column.

Chapter 3. Application development 85 Externalizing LOB data using the Export utility There are a number of ways to put data contained in an LOB column into files. Perhaps the easiest way is to use the DB2 Export utility. The Export utility writes LOB data to a user-defined path, and names the files with a user-defined base file name followed by a sequence number. Here is an example of using the Export utility to put LOB column data into files: db2 export to my_table_data.del of del lobpath /datalinks/photos lobfile mylob1,mylob2 modified by lobsinfile select * from my.photo_table

In this example, all of the non-LOB data from the table my.photo_table is written to the delimited ASCII file named my_table_data.del, while the data from the LOB column is written to files named mylob1.001, mylob1.002, etc. up to mylob1.999. If there are more than 999 LOBS, the Export utility uses the next base file name specified, mylob2, and generates file names of mylob2.001, mylob2.002, etc. The files containing the externalized LOB data are written to the directory /datalinks/photos. You must be sure to specify enough base file names in the EXPORT command to handle the number of non-NULL LOB values.

Linking the files After exporting the data from a table with an LOB column and placing the LOB data in files, you are ready to build a new table with a DATALINK column in place of the LOB column. The easiest way to build the new table is to use the db2look utility to extract the table definition from the DB2 catalog, save it in a file, and modify the file by replacing the LOB column definition with a DATALINK column definition. Here is an example of using the db2look utility: db2look -d SAMPLE -u MY -t PHOTO_TABLE -e -o table.def

This example connects to the SAMPLE database and extracts the DDL for the table MY.PHOTO_TABLE and places it in a file named table.def (see DB2 UDB Command Reference, SC09-2951, for a complete description of the db2look utility). The table.def file may appear as shown in Example 3-1.

Example 3-1 The table.def file CREATE TABLE "MY “."PHOTO_TABLE" ( "LOBNO" CHAR(6) NOT NULL , "PHOTO_FORMAT" VARCHAR(10) NOT NULL , "PICTURE" BLOB(102400) LOGGED NOT COMPACT) IN "USERSPACE1";

You need to change the definition of the column named PICTURE from BLOB to DATALINK. The modified table.def file may appear as shown in Example 3-2.

86 Data Links: Managing Files Using DB2 Example 3-2 Modified table.def file CREATE TABLE "MY "."PHOTO_TABLE" ( "LOBNO" CHAR(6) NOT NULL , "PHOTO_FORMAT" VARCHAR(10) NOT NULL , "PICTURE" DATALINK LINKTYPE URL FILE LINK CONTROL INTEGRITY ALL READ PERMISSION FS WRITE PERMISSION BLOCKED RECOVERY YES ON UNLINK RESTORE) IN "USERSPACE1" ;

Next you drop the old MY.PHOTO_TABLE and create a new table with the modified DDL. You are almost ready to populate the new table. But first you need to change the delimited ASCII file created by the Export utility. You need to have a full URL pointing to your files rather than just the file name. You need to change the file from this: "000130","bitmap","mylob.001" "000130","gif","mylob.002" "000130","xwd","mylob.003" "000140","bitmap","mylob.004" to this: "000130","bitmap","HTTP://MY.DLFM.SERVER.COM/datalinks/photos/mylob.001" "000130","gif","HTTP://MY.DLFM.SERVER.COM/datalinks/photos/mylob.002" "000130","xwd","HTTP://MY.DLFM.SERVER.COM/datalinks/photos/mylob.003" "000140","bitmap","HTTP://MY.DLFM.SERVER.COM/datalinks/photos/mylob.004"

Remember that the value you supply for insertion into the DATALINK column must be in the form of a URL that points to the file.

You are now ready to populate the new, improved table with the data you exported from the original table by using the either the DB2 Import utility or the Load utility: db2 import from my_table_data.del of del insert into my.photo_table or db2 load from my_table_data.del of del insert into my.photo_table

For a complete discussion of using the import and load utilities, see 8.8, “Running the Import utility” on page 157, and 8.9, “Running the Load utility” on page 158.

Chapter 3. Application development 87 If the volume of data in the LOB table is too large, you may need to run the Export utility multiple times, each time exporting a subset of the table. For example, if the table contains a million LOBs, Export would require more than 1000 base file names. An EXPORT command with 1000 base file names may be too large for DB2 to process. Instead, you may split the job into multiple EXPORT commands and supply a WHERE clause on the SELECT statements to limit the number of files created: db2 export to my_table_data1.del of del lobpath /datalinks/photos lobfile mylob1,mylob2 modified by lobsinfile select * from my.photo_table where LOBNO < ‘500000’

and db2 export to my_table_data2.del of del lobpath /datalinks/photos lobfile mylob3,mylob4 modified by lobsinfile select * from my.photo_table where LOBNO >= ‘500000’

Alternative to using EXPORT If you want more control over the file names that are created than the Export utility gives you, you need to write code to extract the data stored in an LOB column. You can do this by using a file reference variable. The file reference variable represents a file, but does not contain the file data. A file reference variable can be used in an SQL SELECT statement to read data from an LOB column and place it in a file. Before invoking the SELECT statement, you need to set the attributes of the file reference variable. These include: The file name The file options (read, create, overwrite, etc.)

You create a file by placing a file name into the file reference variable, setting the file options to SQL_CREATE_FILE, and then executing a SELECT statement with the file reference variable. You need to do this for each row with a non-NULL LOB column. You can find examples of how to use file reference variables in DB2 UDB SQL Reference, SC09-2974.

There are two problems you must address. The first is how to name the files you are creating. The second problem is how to associate a row in the table with the LOB column to the external file.

Figure 3-4 shows an example of LOB data from the MANAGERS table being written to external files. In this example, the file names contain the key values from the MANAGERS table, ID. If you name your files with data from the primary key of the original table, you have a way to associate a row in the table to the external file. Why this is important will soon become apparent.

88 Data Links: Managing Files Using DB2 ID FNAME LNAME LOB_PHOTO B01 Bob Smith ...... file.B01 C72 Sue Sims ...... file.C72 RT3 Jim James ...... file.RT3 5G4 Annie Aston ...... file.5G4 MANAGERS Table

Figure 3-4 Externalizing LOB data

Creating a table with DATALINK column Next, you create a table with similar attributes as your table containing LOBs, except you will use a DATALINK column instead of an LOB column. For example, you might create a table named MANAGER_PHOTOS with the ID FNAME, LNAME columns, and a DATALINK column named DL_PHOTO.

The last step is to populate the new table by reading rows from the LOB table and inserting the data into the new table and supplying a DATALINK value that points to the appropriate file. This is illustrated in Figure 3-5.

Chapter 3. Application development 89 ID FNAME LNAME LOB_PHOTO B01 Bob Smith ...... C72 Sue Sims ...... RT3 Jim James ...... 5G4 Annie Aston ...... MANAGERS Table

1. SELECT

2. INSERT

ID FNAME LNAME DL_PHOTO B01 Bob Smith ...... file.B01 C72 Sue Sims ...... file.C72 RT3 Jim James ...... file.RT3 5G4 Annie Aston ...... file.5G4 MANAGER_PHOTOS Table

Figure 3-5 Moving LOB table data to DATALINK table

After you migrate your LOB data to files, create a new table structure using a DATALINK column, and link the files, you need to change all of the programs that accessed the LOB columns to now use the external files. Note that if the application programs run on client workstations, the file system or directory containing the linked files must be accessible to the client. See 3.4.1, “Application deployment considerations” on page 69, for a discussion of how to make the linked files accessible to clients.

90 Data Links: Managing Files Using DB2 4

Chapter 4. Planning Data Links deployment

This chapter discusses a number of different deployment options and the reasons why you may use them. It describes the most important file systems for the Data Links setup and the issues to consider when locating and sizing them.

Important: It is very important that the host chosen to be the Data Links File Manager has a hostname that will not be changed in the future. The hostname is stored throughout the DLFM system. The hostname must not contain the underscore character “_” this causes problems for DLFM.

Before using DB2 Data Links Manager, you need to consider these items: Data Links Manager can be installed on systems with: – DB2 UDB EE (AIX) – DB2 PE, WE, EE (Windows NT) – DB2 EE V7 (Solaris) Data Links Manager cannot be used with DB2 Enterprise Extended Edition. DATALINK columns cannot be part of a unique index, primary key, or foreign key. A DB2 UDB server with a table containing the DATALINK data type can connect to a DB2 Data Links Manager on Windows NT, AIX, or Solaris.

There are a number of different ways that the Data Link File Manager can be deployed. The single server implementation is the easiest to start with. The most complex is multiple DB2 servers connecting to multiple Data Link File Managers. There are many other ways to deploy and use Data Links, some of the most common are discussed in the following sections.

4.1.1 Single server implementation The single server implementation is the easiest to install and maintain (Figure 4-1). It consists of DB2 Universal Database and Data Links Manager installed on a single host machine. The single server implementation is commonly used on test and development systems and when only a single host machine is available.

DB2 Server

DB2 Client

Data Links File Manager (DLFM)

Figure 4-1 Single server implementation

4.1.2 Single Universal Database and multiple DLFMs The single Universal Database and one to many DLFMs on separate hosts seem to be the most common implementation for production systems (Figure 4-2). This implementation option can provide better performance by locating the Data Linked files geographically closer to the client. The Data Linked files are usually larger in size and therefore benefit from the closer network location and moving data over a local area network versus a wide area network. The UDB database can be on Windows NT, Solaris, or AIX and the DLFM can also be on Windows NT, Solaris, or AIX. This implementation also can provide increased availability.

92 Data Links: Managing Files Using DB2 Table with DATALINK column

Client application in DB2 UDB Server Client application San Jose in Seattle in Seattle

Shared directory Shared directory or NFS mount or NFS mount

File data File data Data Links File in San Jose in Seattle Data Links File Manager (DLFM) Manager (DLFM)

Figure 4-2 Single UDB and one to many DLFMs

4.1.3 Multiple Universal Databases and single DLFM The multiple Universal Databases and single DLFM option may be desirable when a development and a test database are on the same host and there is no other hardware available to install another DLFM. This implementation is not recommended mainly because it greatly complicates recovery. Figure 4-3 shows the layout.

Chapter 4. Planning Data Links deployment 93 DB2 Client

Shared directory or NFS mount

DB2 UDB File Data DB2 UDB host-2 Data Links File host-3 Table with Manager host-1 Table with DATALINK column DATALINK column

Figure 4-3 Multiple UDBs and a single DLFM

4.1.4 Multiple DLFMs on a single host The Data Links File System Filter (DLFF) has been designed so that only one Data Link File Manager is allowed per host. Multiple DLFMs on a single host is not supported today. For most applications, this should not be a problem because there are so many other deployment solutions.

4.1.5 Multiple DB2s and multiple DLFMs The most complex way to implement Data Links is by creating an environment where multiple DB2 databases connect to multiple DLFM hosts. Figure 4-4 illustrates this concept.

94 Data Links: Managing Files Using DB2 DB2 UDB host3

Ta ble with DATALINK column

DB2 UDB File data Data Links host-4 File Manager host-1

Ta ble with DATALINK column

DB2 UDB host-5

File data Data Links Ta ble with DATALINK File Manager column host-2

Figure 4-4 Multiple DB2 and multiple DLFMs

4.2 File systems and sizing

There are at least two very important planning items discussed in this section that pertain to file systems and sizing. The items deal with the DLFM backup directory and planning your file systems and directories where the Data Linked files will reside.

Chapter 4. Planning Data Links deployment 95 4.2.1 The DLFM backup (archive directory) For a description of the DLFM_BACKUP_DIR_NAME, refer to Chapter 4 “Choosing a backup method” in DB2 Data Links Manager Quick Beginnings, GC09-2966.

Note: When the default of disk copy is selected, the sizing of this directory can be important. If an initial load or import of data is being done on a table with a DATALINK column that is defined with RECOVERY YES, the backup file system or directory must have at least the same amount of space as all of the files to be linked. The space is required because a copy of each file that is inserted is placed in the DLFM_BACKUP_DIR_NAME directory.

Table 4-1 Parameters that can affect the size of the archive directory DB CFG parameter Description

RECOVERY YES When specified for a DATALINK column type, allows DB2 to support point in time recovery of Data Linked files. A copy of the file is placed on the archive server for recovery.

DL_NUM_COPIES The number of additional copies to be made on the archive server when a file is linked (0 to 15).

NUM_DB_BACKUPS Number of most recent database backups to retain. This triggers garbage collection, which can delete old files from the archive server.

REC_HIS_RETENTN Number of days historical information on backups is retained. Can also influence when garbage collection is triggered.

The DB2 database configuration parameter DL_NUM_COPIES can also affect the size of the backup file system. If the default of “0” is chosen, it has no bearing, but if a number from 1 to 15 is chosen the size required for the DLFM backup directory will increase proportionately.

Another database configuration parameter that can have an impact on the size of the DLFM backup directory is NUM_DB_BACKUPS. This parameter specifies the number of database backups to retain for a database. For more information, refer to Chapter 11, “Recovery” on page 201. When the specified number is reached, any corresponding file backups linked through a DB2 Data Links Manager are removed from the archive server or backup directory.

96 Data Links: Managing Files Using DB2 The database DLFM_DB, Data Links file system files, DLFM backup directory and the dlfm home directory should be placed on different file systems that do not share disks. The backup directory can also be the Tivoli Storage Manager archive server or XBSA (Net.Backup). For additional details on how to configure the archive server on NT, Solaris, and AIX, refer to Release Notes Version 7.2/Version 7.1 FixPack 3.

The DLFM backup directory can contain: Images of the DLFM_DB database Copies of linked files All updates when RECOVERY YES is specified for the DATALINK column and disk copy is the backup method

4.2.2 Data Links controlled file systems You must carefully consider the number of files that will be placed in each directory.

Tip: We recommend that, on AIX, no more than 2000 to 3000 files be placed in a single directory. This helps the performance of the file system especially when inserting files.

The application developers need to take this into consideration when designing the data manipulation processes for the Data Linked files. This can also be a consideration when initially populating Data Linked columns.

4.2.3 Using NFS and NIS The Network File System (NFS) is commonly used for sharing files between hosts. The file system that contains the linked files is usually exported from the DLFM server and mounted by the clients using NFS. We recommend that all of the other file systems used for DLFM be local to the host on which the DLFM is installed. They include: The dlfm home directory The database DLFM_DB log files The tablespace containers for the DLFM_DB database The archive file directory for the DLFM backups

The Network Information Service (NIS) is used as a single point of control for UNIX user IDs, groups, and a number of other files in the /etc directory. Do not have the dlfm user ID or UNIX group under control of NIS or any of the /etc files. These are best maintained as local files.

Chapter 4. Planning Data Links deployment 97 4.3 Planning the backup of the DLFM_DB database

Consider these points when planning the backup strategy for the database (DLFM_DB) that contains the meta data for all Data Linked files: Back up the DLFM_DB at the same time the DB2 Universal Database database is backed up. Make sure the user exit for archive logging is used for the DLFM_DB. Back up the file systems controlled by the Data Links Manager. They need to be unmounted, backed up (via the operating system), and then mounted again. See the DB2 Release Notes Version 7.2/Version 7.1 FixPack 3 “Backing up a Journalized File System on AIX”.

4.4 Performance tuning tips

There are a few tips to keep in mind that can result in better performance of Data Links.

4.4.1 Optimum logging levels The lower the logging level is, the better the performance is. We recommend you keep a minimum value of the logging levels, unless you want to debug something. In a Data Links scenario, two types of logging are involved.

Logging by DB2 The recommended value of logging level of DB2 is 3 (LOG_ERROR). For debugging purposes, it can be changed to 4 (LOG_DEBUG). It can be updated by issuing the following DB2 command: db2 update dbm cfg using loglevel

Logging by DLFF (DLFSCM on DCE-DFS) The logging by DLFF is tunable. You can even turn off the logging done by DLFF or DLFSCM. Refer to Appendix D, “Logging priorities for DLFF and DLFSCM” on page 331, to learn how to modify these logging levels.

4.4.2 Location of file servers For high performance, it is better to have the file servers located near to the applications. It avoids network traffic making Data Links perform even better.

98 Data Links: Managing Files Using DB2 4.4.3 Number of files per directory On AIX and Solaris, the suggested number of files per directory is 3000 or less.

4.4.4 Token algorithms There are two token algorithms supported by Data Links. They are MAC0 and MAC1. MAC1 is more complex and secured, but results in a performance overhead. Therefore, we recommend you use MAC0, unless security is a major concern.

Note: This does not mean that MAC0 is not safe. It’s just that MAC1 is safer!

4.4.5 DLFM backup, home, and log directories All the directories of DLFM should be preferably local to the Data Links server and not remote (NFS mounted on UNIX or shared drive on Windows). If these directories are remote, it may impact performance drastically.

Chapter 4. Planning Data Links deployment 99 100 Data Links: Managing Files Using DB2 5

Chapter 5. Data Links Manager administration

This chapter discusses a number of general administration commands for working with the Data Links Manager.

To identify the tables that contain columns with the DATALINK data type, you can issue a SELECT statement as shown in the Figure 5-1.

Figure 5-1 Select from sysibm.syscolproperties

The Data Links File Managers that are defined to a UDB database can be found by using the command shown in Figure 5-2. In this example, you first issue the list db directory command to retrieve the names of all of the databases. You need the database names for the list datalinks managers command.

Figure 5-2 List databases and Data Links Managers

102 Data Links: Managing Files Using DB2 5.2 Checking for Data Links control over a file system

To find out which file systems are controlled by Data Links, you can use the commands shown in Figure 5-3. The mount command displays information for the currently mounted file systems. We are searching for the Virtual file system (VFS) type of dlfs. If the mount command does not result in any output, then the Data Link files are not available. The following UNIX command is used to see if there are any dlfs file systems defined to the system in the /etc/filesystems file: lsfs -v dlfs or lsfs|grep dlfs

The Data Link files must be defined and mounted to be available. The following command must also be successful for Data Links to be completely set up: dlfm list registered prefixes

Before Data Links has control of a file system, the following actions must occur: 1. The utility /usr/lpp/db2_0n_0n/instance/dlfmfsmd creates a dlfs file system. The dlfmfsmd utility updates the /etc/filesystems and /etc/rc.dlfs files. Refer to DB2 Data Links Manager Quick Beginnings, GC09-2966, for the syntax. Verify the file system is of type dlfs by running: lsfs -v dlfs 2. The file systems are mounted by running the command: mount -v dlfs Use the mount command to verify. 3. The file system is defined to DLFM by: dlfm add_prefix Verify it with the command: dlfm list registered prefixes

Figure 5-3 The dlfs file systems

Chapter 5. Data Links Manager administration 103 5.3 Other useful DLFM commands

Most of the commands used to administer a DB2 File Manager are quite simple to use. The question is often which command to use. Here is a list of the most frequently used commands with an explanation of their syntax and what they do. Note that, unless otherwise stated, all of these commands are run on the DLFM server using the DLFM administrator user ID. dlfm: Lists all of the available DLFM commands with a brief explanation of what they do. Alternative forms are dlfm help or dlfm ? dlfm add_db: Registers a database with DLFM. Three input parameters are required: database name, instance name, and nodename. Here’s an example: dlfm add_db sample db2inst1 myhost.com This command populates the DLFM.DFM_DBID table in the DLFM_DB database. All parameters are converted to uppercase before they are stored in the DFM_DBID table. dlfm add_prefix: Registers a dlfs file system with the DLFM, for example: dlfm add_prefix This command populates the DLFM.DFM_PRFX table in the DLFM_DB database. Note that the file system name is case sensitive. dlfm bind: Binds executables used by DLFM to the DLFM_DB database. This command also updates DB2 statistics for the DLFM_DB database.

Important: The DB2 RUNSTATS utility should never be used to update statistics for DLFM_DB. Always use the dlfm bind command.

dlfm drop_dlm: Unregisters a DB2 database with DLFM. It requires the three input parameters database name, instance name, and nodename. Here’s an example: dlfm drop_dlm SAMPLE db2inst1 myhost.com dlfm create: Creates all of the tables in the DLFM_DB database that are used by the File Manager. After the tables are created, the dlfm bind command is invoked to update DB2 statistics for the tables. dlfm create_db: Creates and configures the DLFM_DB database. Archive logging is turned on, and an offline backup of the database is performed. dlfm drop_db: Drops the DLFM_DB database. dlfm help: Lists all of the available DLFM commands with a brief explanation of what they do. An alternative form is dlfm ?

104 Data Links: Managing Files Using DB2 dlfm list registered databases: Lists all databases registered with DLFM. It selects data from the DFM_DBID table in the DLFM_DB database. Also lists the instance name and hostname of each database. dlfm list registered prefixes: Lists all dlfs file systems that are registered with DLFM. Selects data from the DLFM.DFM_PRFX table. dlfm refresh key: Changes the key used for generating access control tokens for DATALINK files linked with READ PERMISSION DB. Outstanding tokens generated with old key are invalidated. The Data Links Manager must be restarted and all connections to registered DB2 databases terminated for this to take effect. Can be used as a security mechanism to prevent hacking of access tokens. dlfm restart: Stops and starts the File Manager. Some commands, such as dlfm refresh key, take effect only after File Manager has been restarted. If a change to DLFM is made and it does not appear to have taken effect, try the dlfm restart command. dlfm retrieve: Displays the status of all files managed by DLFM. This command presents an interactive dialog that prompts for hostname, database and instance name, and file system name. It also lists the status of all linked and unlinked files being tracked by DLFM that match the selection criteria.

Tip: The non-interactive retrieve_query command can be used instead of dlfm retrieve. This can be useful to capture the output of the command to a file. Consider this example: retrieve_query -o -h -d -i -p

Here is the name of the output file, is the name of the host on which the DB2 database resides, is the name of the DB2 database, is the name of the instance, and is the name of the dlfs file system. Note that the file system name supplied must exactly match the output of the dlfm list registered prefix command, that is, the file system name must end with a forward slash (/).

dlfm see: Shows the DLFM processes running on the system. See Chapter 2, “Technical architecture” on page 11, for a description of each DLFM process. dlfm setup: Starts the database manager, creates the DLFM_DB database and the tables used by the File Manager, and stops the database manager. A file containing configuration options can be used as input. dlfm shutdown: Stops the File Manager and removes all Inter Process Communications (IPCs). This command tries to shut down DLFM cleanly, but if unable to do so, it kills the DLFM processes. This command can be useful

Chapter 5. Data Links Manager administration 105 for fixpak installations or version upgrades because it assures that all processes and IPCs have been terminated. Note that the dlfm stop command does not necessarily remove all IPCs. dlfm start: Starts DLFM and issues a message to check the db2diag.log file for an indication of success. The dlfm see command can also be used to verify that the DLFM processes are running. dlfm startdbm: Starts the database manager for instance dlfm. It’s the same as db2start. dlfm stop: Stops the File Manager. This ends all of the processes that are connected to the database DLFM_DB. The dlfm see command shows the processes that are stopped by dlfm stop.

Important: Note that the dlfm stop command does not necessarily cleanup IPCs used by DLFM. If performing a version upgrade or fixpak installation, it is important to make sure that all IPCs have been removed. Use the dlfm shutdown command followed by ipcs | grep dlfm to verify that all IPCs have been removed.

dlfm stopdbm: Stops the database manager for instance dlfm. It’s the same as db2stop.

106 Data Links: Managing Files Using DB2 6

Chapter 6. Using Tivoli Storage Manager

This chapter discusses compatibility of DB2 Data Links with Tivoli Space Manager and the Backup-Archive Client. It is intended for people who are: Using Data Links and want to exploit the features of Tivoli Storage Manager Using Tivoli Storage Manager and plan to use Data Links New to both Tivoli Storage Manager and Data Links, and want to explore the benefits of having both, side by side

This chapter offers: An introduction to Tivoli Storage Manager (the entire product set) Data Links with the Backup-Archive Client Data Links with Tivoli Space Manager

Tivoli Storage Manager provides a set of products for distributed data and storage management in an enterprise network environment. These products are highly business centric, application aware, and are considered among the most scalable, interoperable and robust products in the industry. Tivoli Storage Manager supports a wide variety of platforms for mobile, small and large systems, and delivers many data management functions. Tivoli Storage Manager V4.1 supports six server platforms: Windows NT, Windows 2000, IBM-AIX, HP-UNIX, SUN-Solaris, and IBM MVS OS/390 server series. It has the following main products: Backup-Archive Client Tivoli Space Manager Tivoli Data Protection (TDP) for applications Tivoli Disaster Recovery Manager (DRM)

Backup-Archive Client This client helps in maintaining copies of files that may be required in the future for recovery purposes. The Tivoli Storage Manager (TSM) server maintains a separate repository that keeps track of different versions, the timestamp (for point-in-time recovery), and the location of the backup image. The number of backup versions is controlled by server definitions. DB2 provides the user with an option to asynchronously copy Data Linked files to disk (using the dlfm_copyd daemon) or to use the Backup-Archive Client of Tivoli Storage Manager to back up files to any secondary storage (may be the same disk).

Tivoli Space Manager This client transparently moves less-frequently accessed data to lower cost storage media, presenting, to the user, the impression that the data is still on disk. Now Data Links (DB2 V7.2 onwards) can be used with Tivoli Space Manager (or HSM) to provide users with the ability to manage and store Data Linked files in secondary storage.

Tivoli Data Protection (TDP) for applications Tivoli Data Protection for Applications is a group of solutions integrated to Tivoli Storage Manager, which protects data used by business applications. These are interface programs between a storage management API provided by the vendor application, and the Tivoli Storage Manager data management API. Tivoli Data Protection is available for Lotus Notes, Lotus Domino, Lotus Domino for iSeries, MS Exchange, MS SQL Server, Informix, and Oracle.

108 Data Links: Managing Files Using DB2 Tivoli Disaster Recovery Manager (DRM) Tivoli Disaster Recovery Manager assists with the technical steps that help in making data available to users after a widespread failure. It offers various options to configure, control, and automatically generate a disaster recovery plan containing the information, scripts, and procedures needed to automate restoration and to help ensure quick recovery of data after a disaster.

A typical configuration involving Backup-Archive and Tivoli Space Manager clients has the following components: Server – Provides storage management for client nodes – Maintains a database of information – Can be used in a network to allow you to manage them centrally and to balance storage resources Server Storage – Contains files that are backed up, archived, and migrated from client workstations – Consists of pools of random and sequential access media Administrative Client Provides a command line and Java-based administration interface to the server BA Client The Backup-Archive Client HSM Client Hierarchical Storage Manager: The client for the Tivoli Space Management product

Note: Refer to Tivoli Storage Manager Version 3.7.3 and 4.1: Technical Guide, SG24-6110, to learn more about Tivoli Storage Manager.

The following sections describe some of the base concepts of Tivoli Storage Manager and then discuss how Data Links works with BA and HSM clients.

Chapter 6. Using Tivoli Storage Manager 109 6.1.1 Storage device concepts The Tivoli Storage Manager-managed client’s data are stored in the Tivoli Storage Manager storage repository, which can consist of different storage devices, such as disk, tape, or optical devices. Tivoli Storage Manager controls this repository. To do this, Tivoli Storage Manager uses its own model of storage to view, classify, and control these storage devices, and to implement its storage management functionality.

The main difference between the storage management approach of Tivoli Storage Manager and other commonly used systems is that Tivoli Storage Manager storage management concentrates on managing data objects instead of managing and controlling backup tapes. Data objects can be files, directories, or raw logical volumes that are backed up from the client systems. They can be objects like tables or records from database applications or simply a block of data that a client system wants to store on the server storage.

To store these data objects on storage devices and to implement storage management functions, Tivoli Storage Manager has defined some logical entities to classify the available storage resources. The most important one is the storage pool logical entity.

Storage pool A storage pool describes a storage resource for one single type of media, such as, a disk partition or a set of tape cartridges. Storage pools are the place where data objects are stored. A storage pool is built up from one or more storage pool volumes. For example, in the case of a tape storage pool, this would be a single physical tape cartridge. To describe how Tivoli Storage Manager can access those physical volumes to place the data objects on them, Tivoli Storage Manager has another logical entity called a device class. A device class is connected to a storage pool and specifies how volumes of this storage pool can be accessed.

Storage hierarchy Tivoli Storage Manager organizes storage pools in one or more hierarchical structures. This storage hierarchy can span over multiple server instances and is used to implement management functions to migrate data objects automatically – completely transparent to the client – from one storage hierarchy level to another or in other words, from one storage device to another. This function may be used, for example, to cache backup data (for performance reasons) onto a Tivoli Storage Manager server disk space before moving the data to tape cartridges. The actual location of all data objects is automatically tracked within the server database.

110 Data Links: Managing Files Using DB2 Tivoli Storage Manager has implemented additional storage management functions for moving data objects from one storage volume to another. Tivoli Storage Manager uses the progressive backup methodology to backup files to the Tivoli Storage Manager storage wide area network (WAN), local area network (LAN), storage area network (SAN) Client System Server System Storage Pool.

The reorganization of the data and storage media for fast recovery happens completely within the server. For this purpose, Tivoli Storage Manager has implemented functions to relocate data objects from one volume to another and to collocate data objects that belong together, either at the client-system level or at the data-group level.

Another important storage management function implemented within the Tivoli Storage Manager server is the ability to copy data objects asynchronously and to store them in different storage pools or on different storage devices, either locally at the same server system or remotely on another server system. It is especially important for disaster recovery reasons to have – in the event of losing any storage media or the whole storage repository – a second copy of data available somewhere in a secure place. This function is fully transparent to the client, and can be performed automatically within the Tivoli Storage Manager server.

Figure 6-1 gives an overview of the TSM storage management. It shows how a data object on a TSM client can be migrated or recalled, or backed-up or recovered, from a TSM server storage repository. Two device classes are shown that have one or more storage pools. And each storage pool has one or more storage pool volumes. It shows how a data object is moved in the storage hierarchy.

Chapter 6. Using Tivoli Storage Manager 111 TSM Client Storage Repository Storage Pool Volume Device Class Storage Pool WAN, LAN, SAN

Migrate & Colocate Copy

TSM Server Data Object Relocate

Storage Pool Storage Pool Device Class Storage Hierarchy

Figure 6-1 Storage management

6.1.2 Policy concepts A data storage management environment consists of three basic types of resources: client systems, rules, and data. The client systems contain the data to be managed, and the rules specify how the management must occur. For example, in the case of backup, how many versions should be kept, where should they be stored, and so on.

Tivoli Storage Manager policies define the relationships between these three resources. Figure 6-2 illustrates this policy relationship. Depending on your actual needs for managing your enterprise data, these policies can be very simple or very complex.

112 Data Links: Managing Files Using DB2 Policy Set

Management Copy Group Class Rules Data

Policy Management Nodes Domain Copy Group Class Machines Rules Data

Management Copy Group Class Rules Data

Figure 6-2 Policy concepts

Tivoli Storage Manager has certain logical entities that group and organize the storage resources and define relationships between them. Client systems, or nodes in Tivoli Storage Manager terminology, are grouped together with other nodes with common storage management requirements, into a policy domain.

The policy domain links the nodes to a policy set, a collection of storage management rules for different storage management activities. A policy set consists of one or more management classes. A management class contains the rule descriptions called copy groups, and links these to the data objects to be managed. A copy group is the place where all the storage management parameters, such as number of stored copies, retention period, storage media, and so on, are defined. When the data is linked to particular rules, it is said to be bound to the management class that contains those rules.

Another way to look at the components that make up a policy is to consider them in the hierarchical fashion in which they are defined. That is to say, consider the policy domain containing the policy set, the policy set containing the management classes, and the management classes containing the copy groups and the storage management parameters.

Chapter 6. Using Tivoli Storage Manager 113 6.1.3 Security concepts Security is a vital aspect for Tivoli Storage Manager because all the data of an enterprise are stored and managed by the storage repository of Tivoli Storage Manager. To ensure that data can only be accessed from the owning client or an authorized party, Tivoli Storage Manager implements, for authentication purposes, a mutual suspicion algorithm, which is similar to the methods used by Kerberos authentication.

Whenever a client wants to communicate with the server, an authentication has to take place. This authentication contains both-sides verification, which means that the client has to authenticate itself to the server, and the server has to authenticate itself to the client.

To do this, all clients have a password, which is stored at the server side as well as at the client side. In the authentication dialog, these passwords are used to encrypt the communication. The passwords are not sent over the network, to prevent hackers from intercepting them. A communication session is established only if both sides are able to decrypt the dialog. If the communication has ended, or if a time-out period without activity is passed, the session is automatically terminated, and a new authentication will be necessary.

6.1.4 Communication methods Tivoli Storage Manager server supports following methods for communication with clients: Shared Memory (TCP/IP pre-requisite) TCP/IP HTTP (for a Web interface) SNMP DPI None (this option is selected to disallow any client from connecting to the server)

Note: Refer to Tivoli Storage Management Concepts, SG24-4877, for more information.

6.2 Data Links with the Backup-Archive Client

As an alternative to the disk backup, Tivoli Storage Manager can also be used to back up files that reside on a Data Links server.

Note: Disk copy is the default backup mechanism for backing up Data Linked files.

114 Data Links: Managing Files Using DB2 To use Tivoli Storage Manager as an archive server, you would use the following steps: 1. Install Tivoli Storage Manager on the Data Links server. For more information, refer to the Tivoli Storage Manager product documentation. 2. Register the Data Links server client application with the Tivoli Storage Manager server. For more information, refer to the Tivoli Storage Manager product documentation. 3. Add the following environment variables to the Data Links Manager administrator's db2profile or db2cshrc script files: For Bash, Bourne, or Korn shell: export DSMI_DIR=/usr/tivoli/tsm/client/ba/bin (On AIX) export DSMI_DIR=/opt/tivoli/tsm/client/ba/bin (On Solaris) export DSMI_CONFIG=$HOME/tsm/dsm.opt export DSMI_LOG=$HOME/dldump export PATH=$PATH:/usr/tivoli/tsm/client/ba/bin (On AIX) export PATH=$PATH:/opt/tivoli/tsm/client/ba/bin (On Solaris) For C shell: setenv DSMI_DIR /usr/lpp/tsm/bin setenv DSMI_CONFIG ${HOME}/tsm/dsm.opt setenv DSMI_LOG ${HOME}/dldump setenv PATH=${PATH}:/usr/tivoli/tsm/client/ba/bin (On AIX) setenv PATH=${PATH}:/opt/tivoli/tsm/client/ba/bin (On Solaris) 4. Ensure that the dsm.sys TSM system options file is located in the //tivoli/tsm/client/ba/bin directory, where is usr in AIX and opt in Solaris. 5. Ensure that the dsm.opt TSM user options file is located in the /tsm directory, where is the home directory of the Data Links Manager administrator. 6. Set the PASSWORDACCESS option to Generate in the /usr/lpp/tsm/bin/dsm.sys Tivoli Storage Manager system options file. 7. Register the TSM password with the Generate option before you start the Data Links File Manager for the first time. This way a password will not be needed when the Data Links File Manager initiates a connection to the TSM server. For more information, refer to Tivoli Storage Manager for AIX Administrator’s Guide, GC35-0403, at: http://www.tivoli.com/support/public/Prodman/public_manuals/ storage_mgr/v4pubs/v1_pdf/aix/guide/anragd40.pdf 8. Set the DLFM_BACKUP_TARGET registry variable to TSM by issuing the command: db2set -g DLFM_BACKUP_TARGET=TSM

Chapter 6. Using Tivoli Storage Manager 115 This activates the Tivoli Storage Manager backup option. The value of the DLFM_BACKUP_DIR_NAME registry variable is ignored in this case.

Notes: If the setting of the DLFM_BACKUP_TARGET registry variable is changed from TSM to LOCAL at run time, the archived files are not moved to the newly specified archive location. All newly-archived files are stored in the new location on the disk. The files that were previously archived to TSM are not be moved to the new disk location. To override the default TSM management class, there is a new registry variable called DLFM_TSM_MGMTCLASS. If this registry variable is left unset, then the default TSM management class is used.

9. Stop the Data Links File Manager by entering the command: dlfm stop 10.Start the Data Links File Manager by entering the command: dlfm start

6.3 Data Links and Tivoli Space Manager

DB2 V7.2 Data Links allows users to keep their Data Linked files in the file systems being managed by the Hierarchical Storage Manager (HSM) client of the Tivoli Space Manager. This section discusses the following topics: Overview of Tivoli Space Manager Various tools, processes and interfaces available with the TSM server and HSM client Data Links support for HSM: Overview and benefits Known restrictions of using Data Links with Tivoli Space Manager

6.3.1 Overview of Tivoli Space Manager Tivoli Space Manager maximizes the usage of existing storage resources by transparently migrating data off workstation and file server hard drives based on size and age criteria, leaving only a stub file. If and when the migrated data is accessed, Tivoli Space Manager transparently migrates the data back onto the local disk. In doing so, Tivoli Space Manager relieves the user from the task of manual deleting and archiving of data on their workstation. Tivoli Space Manager

116 Data Links: Managing Files Using DB2 is a complementary product that is available on all Tivoli Storage Manager servers. Therefore, the supported HSM clients can send data to any Tivoli Storage Manager server that also has installed the Tivoli Space Manager. Tivoli Storage Manager is a pre-requisite to implementing Tivoli Space Manager.

Tivoli Space Manager provides the basic functionality of space management by automatically migrating data based on file size, number of days since it was last accessed, or a combination of both. Once migrated, Tivoli Space Manager automatically recalls a file if it is accessed and restores it to its original location in the file system.

Figure 6-3 gives an overview of the functionality of Tivoli Space Manager.

User Application (1)

dsmrecalld File System (4) daemon Migrator (FSM) (or dsmrecall tool)

Physical File-System (4) (2)

dsmmonitord daemon (3) (or dsmmigrate tool) Storage Pool

Figure 6-3 Tivoli Space Manager overview

Chapter 6. Using Tivoli Storage Manager 117 The flow in Figure 6-3 is explained here: 1. The users see more file system space than what is actually in the file system. This is due to the transparent migration and recall of the files. File System Migrator (FSM) VFS provides this illusion to the user. 2. The dsmmonitord daemon keeps track of the threshold values specified in /etc/adsm/SpaceMan/config/dsmmigfstab and the file systems registered with it. If any of the file system attributes (space usage, for example) crosses the threshold value, it starts migrating the files that are eligible for migration. Whether a file is eligible for migration depends on many factors, which include: – The file size (should be greater than the stub file size) – The Include-Exclude list (specified in the dsm.sys option file) – The timestamp of the file The file can also be migrated explicitly (selective migration) by using the dsmmigrate utility and passing the file name as an argument to it. This utility also requires that the file size be greater than the stub file size. 3. The migrated files are sent to the TSM server, which stores them in the storage pools depending on the policy sets. 4. When an application or user accesses a migrated file, FSM VFS checks whether there is any need to recall the file from the server. If it finds that the file has to be recalled, it triggers the recall operation. The file is finally recalled by the Recall daemon (dsmrecalld). This is known as the transparent recall. Files can also be recalled explicitly (known as selective recall) by using the dsmrecall tool.

Tivoli Space Manager maintains data integrity and security of data by working closely with the operating system. It provides a graphical user interface and commands that can be used to display information about files, including whether they have been migrated.

Migration Files are migrated by Tivoli Space Manager from the original file system to storage devices connected to a Tivoli Storage Manager server. Each file is copied to the server, and a stub file is placed in the original file’s location. Using the facilities of storage management on the server, the file is placed on various storage devices such as disk and tape.

Tivoli Space Manager migrates only regular files on locally mounted file systems. It does not migrate character special files, block special files, First in/first out (FIFO) special files (named pipe files), or directories.

118 Data Links: Managing Files Using DB2 Note: Do not confuse HSM migration with storage pool migration. Storage pool migration is the process where client data (which could be backup, archive, or HSM data) moves through the Tivoli Storage Manager storage hierarchy, typically from disk to tape or optical. Storage pool migration happens entirely within the Tivoli Storage Manager server.

HSM migration is the process of moving data from an HSM client to the Tivoli Storage Manager where it will be stored in a storage pool. Once a file has been migrated (HSM) from a client to the server, it could subsequently be migrated (server) to another server storage pool.

There are two types of HSM migration: Automatic With automatic migration, Tivoli Space Manager monitors the amount of free space on your file systems. When it notices free space shortage, it migrates files off the local file system to the Tivoli Storage Manager server storage based on the space management options. Tivoli Space Manager monitors free space in two ways: – Threshold: Threshold migration maintains your local file systems at a set level of free space. At an interval specified in the options file, Tivoli Space Manager checks the file system space usage. If the space usage exceeds the high threshold, files are migrated to the server by moving the least-recently used (LRU) files first. When the file system space usage reaches the set low threshold, migration stops. Threshold migration can also be started manually. – Demand: Tivoli Space Manager checks for an out-of-space condition on a file system every two seconds. If this condition is encountered, Tivoli Space Manager automatically starts migrating files until the low threshold is reached. As space is freed up, the process causing the out-of-space condition continues to run. You do not receive out-of-space error messages while this is happening. Selective You can tell Tivoli Space Manager to selectively migrate a file immediately to the server’s storage. As long as the file meets the space management options, it is migrated. The file does not need to meet age criteria, nor does the file system need to meet space threshold criteria.

Chapter 6. Using Tivoli Storage Manager 119 Pre-migration Migration can take a long time to free up significant amounts of space on the local file system. Files need to be selected and copied to the Tivoli Storage Manager server, which may involve tape mount, and a stub file must be created in place of the original file. To speed up the migration process, Tivoli Space Manager can be told to implement a pre-migration policy.

After threshold or demand migration completes, Tivoli Space Manager continues to copy files from the local file system until the pre-migration percentage is reached. These copied files are not replaced with the stub file, but they are marked as pre-migrated.

The next time migration starts, the pre-migrated files are chosen as the first candidates to migrate. If the file has not changed since it was copied, the file is marked as migrated and the stub file is created in its place in the original file system. No copying of the file needs to happen, because the server already has a copy. In this manner, migration can free up space very quickly.

Recall Recall is the process of bringing back a migrated file from Tivoli Storage Manager to its original place on the local file system. A recall can be either transparent or selective: Transparent From a user or running process perspective, all the files in the local file system are actually available. Directory listings and other commands that do not require access to the entire file appear exactly as they would if the HSM client was not installed. When a migrated file is needed by an application or command, the operating system initiates a transparent recall for the file to the Tivoli Storage Manager server. The process is temporarily halted while the file is automatically copied from the server’s storage to the original file system location. Once the recall is complete, the halted process continues without requiring any user intervention. In fact, depending on how long it takes to recall the file, the user may not even be aware that HSM is used. After a recall, the file contents are on both the original file system and on the server storage. This allows Tivoli space Manager to mark the file as pre-migrated and eligible for migration unless the file is changed.

120 Data Links: Managing Files Using DB2 Selective Transparent recall only recalls files automatically as they are accessed. If you or a process need to access a number of files, it may be more efficient to manually recall them prior to actually using them. This is done using selective recall. Tivoli Space Manager batches the recalled file list based on where the files are stored. It recalls the files stored on disk first and then recalls the files stored on sequential storage devices such as tape. Advanced transparent recall Advanced transparent recall is available only on AIX platforms. There are three recall modes: normal (which recalls a migrated file to its original file system), migrate-on-close, and read-without-recall: – Migrate-on-close: When Tivoli Space Manager uses the migrate-on-close mode for recall, it copies the migrated file to the original file system, where it remains until the file is closed. When the file is closed and if it has not been modified, Tivoli Space Manager replaces the file with a stub and marks the file as migrated (since a copy of the file already exists on the server storage). – Read-without-recall: When Tivoli Space Manager uses the read-without-recall mode, it does not copy the file back to the originating file system, but passes the data directly to the requesting process from the recall. This can only happen when the processes that access the file do not modify the file, or if the file is executable, the process does not execute the file. The file does not use any space on the original file system and remains migrated (unless the file is changed; then Tivoli Space Manager performs a normal recall).

Reconciliation Tivoli Space Manager uses reconciliation to maintain synchronization of the local file system and Tivoli Storage Manager. Reconciliation builds a migration candidates list.

Note: Do not confuse the Tivoli reconciliation with the Data Links Reconcile utility.

Reconciliation can be started manually or allowed to happen automatically at intervals set in the options file and prior to threshold migration if the migration candidate list is empty.

Chapter 6. Using Tivoli Storage Manager 121 Synchronization Synchronization involves maintaining the Tivoli Space Manager database in sync with the actual files on the original file system. It ensures that, for every stub file, there is a valid file copy kept. For every original file on the original file system, there are no database entries. For pre-migrated files, there is an entry in the Tivoli Space Manager database, and it updates status fields in the database. For example, if you recall a file, change it, and immediately migrate it, Tivoli Space Manager has two copies of the file in its storage: the most recent one is valid, and there is an obsolete one. Reconciliation removes this obsolete file after its expiration interval has passed. Building a new migration candidates list Tivoli Space Manager uses the reconciliation process to build a prioritized list of files on the original file system that are eligible for automatic migration. The list is created based on the management class criteria and minimum file size. It is ordered according to the number of days since the file was last used, the file size, and the migration factors set in the options file. During threshold and demand migration, the list is used to select files to migrate in prioritized order. As the file is selected, it is checked again to ensure that it still meets the migration criteria. A new migration candidate list is created each time reconciliation runs. The list can also be created at any time.

Options Options to control Tivoli Space Manager are set in the client options file (dsm.sys). These options set items such as which Tivoli Storage Manager server to use for the Tivoli Space Manager functions, space management options, migration options, excluded file lists, and assigning management classes to files.

Backup/restore and archive/retrieve Tivoli Space Manager should not be considered as a replacement for backup. It should be viewed as a form of space extension of local disk storage. When a file is migrated to the HSM server, there is still only one copy of the file available, since the original is deleted on the client and replaced by the stub.

Also, Tivoli Space Manager maintains only the last copy of the file, giving no opportunity to store multiple versions. Therefore, the Tivoli Storage Manager backup-archive client must be used for files backup or archive before or after the file is migrated by Tivoli Space Manager. You can specify that a file is not eligible for HSM migration unless a backup has been made first with the backup-archive client. If the file is migrated and the save Tivoli Storage Manager server destination is used for both backup and HSM, the server can copy the file from the migration storage pool to the backup destination without recalling the file.

122 Data Links: Managing Files Using DB2 Both files and stub files can be restored from a Tivoli Storage Manager backup. If you restore the entire file, it will become a normal resident file on the client, and the migrated copy will be deleted from the HSM pool during the next reconciliation. If you do not want to restore the actual file data, you can use options on the HSM client to restore only the stub file without re-creating the file contents. In this case, the file will still remain in its migrated state.

The Tivoli Storage Manager backup-archive client allows you to archive and retrieve copies of migrated files without performing a recall for the file first, providing the save Tivoli Storage Manager server is used for both HSM and backup-archive. The file is simply copied from the HSM storage pool to the archive destination pool.

6.3.2 Tools, processes, and interfaces The various Tivoli Storage Manager processes, utilities for the TSM server, and backup and archive and HSM clients are discussed here: At the Tivoli Storage Manager server side: – dsmserv: This is the main server process that be can started either in the background or in the foreground. The different modes in which dsmserv can be started are: • CLI Mode: When started in this mode, dsmserv opens a command line interface. This mode is default when dsmserv is started from a shell. • Quiet Mode: This is the default mode when dsmserv is started at boot time. From a UNIX shell, dsmserv can be started in this mode by issuing the dsmserv quiet command. – dsmadmc: The administrative command-line client is a program that runs on a file server, workstation, or mainframe that allows administrators to control and monitor the server through administrative commands. It can be started in any one of the following modes: • Console mode: This mode is used to monitor TSM activities as they occur or to capture processing messages to an output file. For example, it is possible to monitor migration processes and clients logging on to TSM. No administrative command can be entered in this mode. To start the server in this mode, enter the following command at the shell prompt: dsmadmc -consolemode • Mount mode: This mode is used to monitor removable media mount activities. No administrative commands can be entered in this mode. To start the dsmadmc client in this mode, enter the following command: dsmadmc -mountmode

Chapter 6. Using Tivoli Storage Manager 123 • Batch mode: This mode is used to enter a single administrative command. The administrative client session automatically ends when the command has been processed. To start an administrative client session in batch mode, enter the following command: dsmadmc -id= -password= • Interactive mode: This mode is used to enter a series of administrative commands. To start an administrative client session in interactive mode, a server session must be available. To start the administrative client in this mode, simply enter dsmadmc at the shell prompt without any parameters. – Web interface: It is possible to do all the administration through a user friendly Web interface. At the Backup-Archive Client of TSM: – dsmc: This is a command line interface used for backup/restore and archive/retrieve operations. Users can do query on any files to see the status of backup or archive. – dsm: This is a GUI that provides all the basic functionalities provided by the dsmc client, as well as some extra functionalities like setting include-exclude options (discussed later). At Tivoli Space Manager (or Hierarchical Storage Manager): – dsmmonitord daemon: This daemon monitors file systems registered with HSM at a regular frequency. – dsmrecalld daemon: This daemon takes care of both transparent and selective recalls. It spawns a child for every active recall operation. – dsmmigfs: This is a tool used for registering the file systems with HSM. It puts an entry in the /etc/adsm/SpaceMan/config/dsmmigfstab file, which maintains the information for each file system registered with HSM. – dsmmigrate: This utility is used to selectively migrate files to the storage pool guided by the TSM server policy to which this client is registered. – dsmrecall: This tool is used for selective recall of migrated files. – dsmls: This is a tool that displays information about the files in an HSM managed file system. It is similar to the ls UNIX command and returns the following information: • Virtual size of the file (as it appears to the user) • Actual size of the file (in fact size of the stub, if the file is migrated) • State of the file (migrated, pre-migrated or resident) – dsmdu: This utility tells the virtual space usage of the file system objects (files, directories and subdirectories), for example, takes the virtual size of the migrated files into account and not the actual stub file size. The UNIX

124 Data Links: Managing Files Using DB2 du utility, on the other hand, gives the space usage of the file system objects based on the actual size.

6.3.3 Data Links support for HSM Data Links now supports HSM on AIX. In a typical scenario, DLFF sits over FSM (the VFS/Vnode layer of HSM), which in turn, layers over the native file system JFS. Any file system request coming to this file system is first trapped by DLFF and after DLFF is done with its preprocessing, it calls the corresponding base (in this case FSM) file system’s operation. FSM also does its own preprocessing required for space management functionality and calls the equivalent JSF file system operation. Figure 6-4 provides the entire picture of how and where different components of Data Links and Tivoli Space Manager fit.

Data Links and Tivoli Space Manager TSM Server

DB2 Application TSM Processes on File Server Archive Server SQL Access Path

Standard File Access Protocol Data Links File Manager (DLFM) db2agents Control DLFM_DB Path for DB2 Server DataLinks (metadata DLFF (Data Link Integrity repository) Filesystem Filter)

FSM (HSM's VFS)

Native File System Data Links Manager AIX - JFS on File server

Storage

Figure 6-4 Data Links and Tivoli Space Manager

To use the functionality of both Data Links and Tivoli Space Manager, the file system should be first registered with HSM with help from the following command: dsmmigfs add

Chapter 6. Using Tivoli Storage Manager 125 Then it should be DLFF-enabled by running dlfsfsmd install script (under the /usr/lpp/db2_07_01/instance directory), and finally registered with DLFM by running the following command:

dlfm add_prefix

This happens after mounting the file system as DLFS.

The /etc/filesystems file is modified twice in this process. Before registering the file system with either DLFF or HSM, the entry in /etc/filesystems for any JFS file system would typically look like the following example:

/myfilesystem: dev = /dev/lv02 vfs = jfs log = /dev/hd8 mount = true options = rw account = false

Now after registering the file system with HSM (by using dsmmigfs add /myfilesystem), the /etc/filesystem entry would be modified to:

/myfilesystem: dev = /dev/lv02 vfs = jfs log = /dev/hd8 mount = false options = rw account = false adsmfsm = true

Note: Observe that the mount option is modified to false and an extra option (“adsmfsm = true”) is added.

After running the dlfmfsmd script, this entry would look like the following example:

Note: On AIX, the dlfmfsmd script modifies the /etc/filesystems file to add or modify DLFS related options, corresponding to the file system name passed as an argument.

126 Data Links: Managing Files Using DB2 /myfilesystem: dev = /dev/lv02 vfs = dlfs log = /dev/hd8 mount = false options = rw,Basefs=fsm account = false adsmfsm = true nodename = -

Note: Observe that the options option is modified to “rw,Basefs=fsm” and an extra option (if not already there) is added (“nodename = -”).

6.3.4 Current restrictions There are some restrictions when using Data Links with Tivoli Space Manager. These restrictions are: Selective migration (dsmmigrate) of a file Data Linked under READ PERMISSION DB control, should be done by the super-user (root) only. An ordinary user amita (a non-root user) tries to migrate a READ PERMISSION DB file (fcfile) with or without a valid token. This would result in an error because the dsmmigrate utility expects the user to be the owner of the file. And since the owner of the READ PERMISSION DB file is DLFM admin user and not amita, the error message shown in Figure 6-5 is returned.

Figure 6-5 Selective Migration of READ PERMISSION DB file

Chapter 6. Using Tivoli Storage Manager 127 The statfs or the stat call on a file system having FSM, as well as DLFS (DLFF) mounted on it, would show the file system type to be of FSM and not of DLFS, although DLFS VFS layers above FSM VFS. The reason why this is done is because the HSM Recall daemon (dsmrecalld) expects the file system type to be of FSM, and it fails on finding some other file system type (DLFS in this case). Both DLFS and FSM are mounted (with DLFS on top) on the /dlfsfsmtest file system. The following “C” code (Figure 6-6) does a statfs on the file system name (passed as an argument) and tells its VFS type from the FSID field of the statfs structure.

Figure 6-6 dostatfs.c

Figure 6-7 shows the VFS number of DLFS and FSM.

Figure 6-7 VFS numbers of DLFS and FSM

The second entry in /etc/vfs corresponding to a VFS name is its VFS number (or type). Therefore, you see that DLFS has the VFS number of 7 and FSM has the VFS number of 15.

128 Data Links: Managing Files Using DB2 /dlfsfsmtest has both DLFS and FSM mounted over it. Now, when dostatfs.c is compiled and run (with /dlfsfsmtest as an argument), it gives the output shown in Figure 6-8.

Figure 6-8 Result of dostatfs on /dlfsfsmtest

Note that the VFS number shown by dostatfs is of FSM and not of DLFS, although DLFS is mounted on top of FSM. The dsmls utility does not show any output of whether the file having the minimum inode number (in that particular directory) is Data Linked under READ PERMISSION DB control. Figure 6-9 shows that the dsmls utility does not give any output, although there are three files (file, fcfile and normalfile) in the directory. Reason is that the file with the minimum inode number (file with inode number=6145) in the directory is a READ PERMISSION DB Data Linked file.

Figure 6-9 dsmls utility behavior

Chapter 6. Using Tivoli Storage Manager 129 130 Data Links: Managing Files Using DB2 7

Chapter 7. High Availability support on AIX

This chapter describes how Data Links can be supported under a High Availability Cluster Multiprocessor (HACMP) environment on AIX. It discusses some common cluster configurations under Data Links environment and specifies some of the key points required for Data Links to work under an high availability environment. Prior to reading this chapter, you should have some familiarity with the HACMP for AIX product.

In the context of HACMP, Data Links is an application that needs to be configured for high availability. In fact, Data Links consists of two sub-applications that need to be configured for high availability: Host DB2 Server: This is a piece of software that is essentially a DB2 instance to which database client applications can connect. Data Links Server: This is a piece of software that resides on the file server node where the host DB2 server communicates for linking/unlinking the files. The Data Links File Manager (DLFM) and the Data Links File System Filter (DLFS) are two different pieces of software under the Data Links server. But, as far as HACMP is concerned, they can be treated as one integrated application since both these pieces need to run together on the same node.

To learn more on HACMP, refer to HACMP/ES Customization Examples, SG24-4498.

The High Availability Cluster Multiprocessor environment is built on the concept of clustering. In a cluster, multiple server processors cooperate to provide a set of services or resources to other entities. HACMP defines relationships among cooperating processors where peer cluster nodes provide the services offered by a cluster node that becomes disabled.

The HACMP Cluster Manager runs on each cluster node, monitoring local hardware and software subsystems, tracking the state of the cluster peers, and triggering cluster events when the cluster status changes. A cluster event represents a change in a cluster's operational state that the HACMP Cluster Manager recognizes and to which it can respond.

Cluster nodes exchange “keep-alive” messages with peer nodes so that the Cluster Manager can track the availability of the nodes in the cluster. If a node stops sending keep-alive messages, the peer nodes drive the recovery process. The peer nodes take the necessary actions to start critical applications up and running and to ensure that data is not corrupted.

This relationship between nodes is the basis for a failover of services. A failover of services occurs when an HACMP cluster environment experiences a change that requires stopping services on one node and resuming those services on the standby or peer node.

The above mentioned Data Links sub-applications (DLFM and DLFS) can be configured in the two basic HACMP cluster configurations: Hot Standby: In this configuration, the host DB2 server and the Data Links server belong to two different HACMP clusters. Each cluster has one active node that runs the host DB2 server or Data Links server in normal mode and one Standby node that takes over the functionality of the active node in case of failure. The Standby node in each cluster is mostly dedicated to failover operation of active node and does not run any other major applications. Mutual Takeover: In this configuration, the host DB2 server node and the Data Links File Manager node back up each other and take over each other's functionality during failover. They both belong to the same HACMP cluster. This configuration is common to both the host DB2 and the Data Links File Manager applications.

7.2 HACMP cluster configuration for hot standby

Figure 7-1 shows the configuration for the Host DB2 cluster or the Data Links File Manager cluster. The cluster contains two nodes:

132 Data Links: Managing Files Using DB2 Active Node-A Standby Node-B

Each node has its own local disk on SCSI-0 adapter. The Volume Group 1 (VG-1) is the shared resource group of disks/file systems. The disks in the VG-1 are connected on the separate SCSI adapters (for example SCSI-1).

Network

Network Adapter Network Adapter Active Standby

SCSI - 1 SCSI - 1

Resource Volume Group 1 (VG - 1) SCSI - 0 SCSI - 0

Local Disk Local Disk

Active Node A Standby Node B Priority 1 Priority 2 Shared Disks Cascading Cascading

Hot Standby Configuration Host DB2 (or) Data Links File Manager Cluster

Figure 7-1 Host DB2 (or) Data Links File Manager cluster

The Active Node-A has priority 1, and the standby Node-B has priority 2 (less than the active node) for the shared resource group. In a cascading policy, active Node-A, whenever it is present in the cluster or rejoins the cluster, controls or takes over the shared resource Volume Group (VG-1) and acts as the host DB2 server. In case of a failure of active Node-A or its scheduled outage from the cluster, the shared resource Volume Group (VG-1) fails over to standby Node-B. The HACMP Cluster Manager detects the failure and then starts the applications (Host DB2 Server or Data Links server) on the standby Node-B. The sample scripts for starting and stopping the Host DB2 Server or Data Links server are given in 7.4, “The scripts” on page 142.

Chapter 7. High Availability support on AIX 133 Let us assume that the hostname of the active node is Node-A and its service IP address is IP-A. The association between Node-A and IP-A will be registered in the DNS or in the /etc/hosts files on the client nodes. The standby node with hostname Node-B will have IP-B as the IP address. HACMP configuration should enable both IP address and hardware address takeover for network adapters on Node-A and Node-B. When Node-A fails, the HACMP will make Node-B’s network adapter to release the IP address IP-B and take the new IP address IP-A. Also the network adapter on Node-B will assume the hardware address (for example, Ethernet address) of the network adapter of Node-A so that the client application nodes do not need to flush their ARP cache.

7.2.1 Hot standby setup for a host DB2 server The host DB2 server must have the following file systems under the shared resource Volume Group (VG-1). The file systems should be accessible under the same absolute path name on both the nodes: File system containing the DB2 instance home directory File system containing the host database directory

You must install the host DB2 server software on both the nodes with the same install options and parameters. Let us assume /home and /dbfs are the file systems under VG-1. /home/db2 and /dbfs/db2 are the instance home and database directories respectively. While installing software on Node-A, let us assume that VG-1 is attached to Node-A, so that the installation will create the instance home and database directories on /home and /dbfs respectively.

While installing the software on the standby Node-B, you should create two temporary file systems on the local disk of Node-B and mount them as /home and /dbfs. After installation on Node-B, unmount the /home and /dbfs and then delete these temporary file systems from the local disk. After failover, the standby Node-B will use the /home and /dbfs file systems from VG-1 and will find the instance home and database directories created during installation on Node-A.

The user can have multiple instances and database directories on an active Node-A, but all of them need to be setup as mentioned above. That is, they need to be shared by standby Node-B in case of a failover.

The sample start/stop script provided in sqllib/samples/hacmp/rc.db2server.dls will be run by the Cluster Manager to start/stop the host DB2 instance. The script is also documented in 7.4, “The scripts” on page 142. The start script runs on Node-B in case of a failover and should set the hostname of Node-B to Node-A (this hostname setting is required for the Data Links application to work).

134 Data Links: Managing Files Using DB2 Attention: The sample scripts provided are not meant for direct use but need to be customized for the local environment (for example, nodenames, etc.).

7.2.2 Hot standby setup for a Data Links server The setup for a Data Links server in a hot standby mode is much the same as the one for a host DB2 server described in 7.2.1, “Hot standby setup for a host DB2 server” on page 134. Some of the differences are mentioned in 7.4, “The scripts” on page 142.

The following file systems should be part of the shared resource Volume Group VG-1: The file system that contains the home directory of the DLFM’s local DB2 instance user (which by default is dlfm) The file system that contains the DLFMs meta data database (which by default is DLFM_DB), if the DLFM_DB database is created in a different file system from the default one (home directory of the dlfm instance user) The file system that contains the dlfm_backup directory if the local disk backup option is used All the dlfs file systems

You must install the Data Links File Manager and Data Links File System software on both nodes with the same options and parameters as mentioned above for the host DB2 server: 1. Install the software on the active node with VG-1 attached to this node. Once the installation is over on the active node, complete the DLFM administration work of registering the prefixes and host databases. This makes the active node ready for service. 2. Later install the software on the standby node by creating the temporary local file systems with the same path names as those in VG-1. This is just for the installation to succeed and for doing the other required setup on the standby node. After the installation is over, shut down the DLFS kernel extensions and the Data Links File Manager. Unmount and delete the temporary file systems. In case of failover, the HACMP Cluster Manager will start the Data Links File Manager, load the DLFS kernel extensions, and mount the dlfs file systems from the shared VG-1 on the standby node. 3. A sample script is provided in sqllib/samples/hacmp/rc.db2dls, which the Cluster Manager will run to start/stop the DLFM/DLFS. The script is also documented 7.4, “The scripts” on page 142.

Chapter 7. High Availability support on AIX 135 Attention: The sample scripts provided are not meant for direct use, but need to be customized for the local environment (for example, nodenames, etc.).

4. The standby Node-B gets the IP address or hardware address of the active Node-A’s network adapter during failover.

Important: In case of a Data Links server node failover, the hostname of standby Node-B does not need to be set to the hostname of Node-A. This is because, in case of failover, the DNS or clients’ local /etc/hosts files still have the name Node-A associated to IP-A. Since IP-A is now taken over by the network adapter of Node-B, all the network requests for Node-A automatically go to Node-B.

7.3 HACMP cluster configuration for mutual takeover

The mutual takeover configuration for Data Links is where the host DB2 server and the Data Links server back up each other to provide high availability. The mutual takeover environment is shown in Figure 7-2.

136 Data Links: Managing Files Using DB2 Network

Network Adapters Network Adapters Active Standby Active Standby

SCSI - 1 SCSI - 1 Host DB2 Priority 1 Priority 2 Data Links Node A File Manager SCSI - 2 Resource Volume SCSI - 2 Group 1 (VG - 1) Node B Priority 2 /home/db2inst1 Priority 1 SCSI - 0 /db2/tabledata SCSI - 0

Resource Volume Group 2 (VG - 2) /home/dlfm /usr/lpp/db2_07_01 /usr/lpp/db2_07_01 /var/db2/v71 /dlff/files1 /var/db2/v71 /etc/vfs /dlff/files2 /etc/vfs /etc/rc.dlfs /etc/rc.dlfs Shared Disks Local Disk Local Disk

Mutual Takeover Configuration between Host DB2 Server & Data Links File Manager

Figure 7-2 Mutual takeover environment

7.3.1 Configuration The configuration consists of: In this configuration, Node-A is an active node and Node-B is a standby node for the host DB2 server, and it is the reverse for the Data Links server. Resource Volume Group - 1 (VG-1): This is the host DB2 server shared resource group. It contains all the file systems containing the home directories of the DB2 instances and the database directories that are required to fail over to standby Node-B when the host DB2 active Node-A goes out of the HACMP cluster.

Chapter 7. High Availability support on AIX 137 The /sqllib directory must exist on a shared disk and must have the same path on both Node-A and Node-B. The database and database logs must exist on a shared disk and have the same path on Node-A and Node-B. Each instance must have a unique path for both the database and the logs. Node-A has priority 1, and Node-B has priority 2 for this shared resource group. Thus, in a cascading policy whenever Node-A rejoins the cluster, it gets the VG-1 back and takes over the functionality of the host DB2 server. Resource Volume Group - 2 (VG-2): This is the Data Links server resource group. It contains all the following file systems that are required to fail over to standby Node-A when the Data Links server active Node-B goes out of the HACMP cluster. Node-B has priority 1, and Node-A has priority 2 for this resource group. Therefore, in a cascading policy whenever Node-B rejoins the cluster, it gets the VG-2 back and takes over the functionality of the Data Links server. – The file system containing the home directory of the DLFM’s local DB2 instance user (which by default is dlfm) – The file system containing DLFM’s meta data database (which, by default, is DLFM_DB), if the DLFM_DB database is created in a different file system from the default one (under the home directory of the dlfm instance user) – The file system containing the dlfm_backup directory if the local disk backup option is used – All the dlfs file systems Each node should have two network adapters: One is active and the other is standby. – In normal operation mode, the active adapter on Node-A is configured with the host DB2 service IP address (to which the DB2 client applications connect) and the standby adapter carries the boot IP address (IP-B1). In case of a Node-B failure, the Cluster Manager fails over the IP address and hardware address of the active adapter of Node-B to the standby adapter of Node-A. – In normal operation mode, the active adapter on Node-B is configured with the Data Links File Manager service IP address (to which the host DB2 server connects) and the standby adapter carries the boot IP address (IP-B2). In case of a failure of Node-A, the Cluster Manager fails over the IP and hardware addresses of the active adapter of Node-A to the standby adapter of Node-B. – The active adapters on both nodes should be configured with two network addresses: one with a boot address and the other with a service address. The boot address is used when a failed node reboots. Then, the Cluster

138 Data Links: Managing Files Using DB2 Manager revokes the service address from the standby node and assigns it to the boot address. Therefore, the boot address is needed to avoid the network address conflict during the startup of a failed node. Both the host DB2 server software and the Data Links server software needs to be installed and setup on both nodes (node-A and node-B), the same way as mentioned in 7.2.2, “Hot standby setup for a Data Links server” on page 135. Also the sample scripts can be used for HACMP Cluster Manager to start and stop the host DB2 server and Data Links server on the nodes during a failover.

Important: When the host DB2 server Node-A goes out of cluster, the cluster manager starting the service on Node-B should set the hostname of Node-B as Node-A. This is a requirement for Data Links. Also since DNS or the /etc/hosts do not change the association between Node-B and its Service IP address, all the network requests to Node-B (despite its hostname change to Node-A) go to Node-B. Therefore, Data Links File Manager Service on Node-B is not affected by this hostname change.

A certain number of files must be the same on both nodes (Node-A and Node-B): – /var/db2/v71/default.env – /var/db2/v71/default.profiles.reg – /usr/lpp/db2_07_01/cfg/dlfs_cfg – /usr/lpp/db2_07_01 – /etc/vfs – /etc/rc.dlfs

Figure 7-3 shows the default.env file and the profiles.reg file. The default.env file contains the DB2 global variables for the node. The profiles.reg file contains all of the registered instances on the node. They must reflect the information for both nodes.

Chapter 7. High Availability support on AIX 139 Figure 7-3 The /var/db2 files show the global variables and instances

The dlfs_cfg file must reside on both nodes. The parameters of this file are explained in Release Notes for version 7.2/version 7.1 Fixpack 3 in the section “Minimize Logging for Data Links File System Filter (DLFF)”. This file must be the same on both nodes and does not failover. Figure 7-4 shows the contents of the file.

Figure 7-4 The dlfs_cfg file must exist on both servers

Figure 7-5 shows the /etc/vfs file. It is required on both nodes so that the DLFF can be loaded by the strload command.

140 Data Links: Managing Files Using DB2 Figure 7-5 The contents of /etc/vfs

7.3.2 Sequence of events Both instances, db2inst1 and dlfm must be able to run simultaneously on either node. Resource Volume Group 1 contains all of the file systems for db2inst1, the instance home directories, and database directories. Resource Volume Group 2 contains the dlfm instance home directory, the DLFM_DB database, the dlfm_backup directory and all of the dlfs file systems. The Resource Groups are the resources that will fail over. For more information on Resource Groups and HA, refer to HACMP Installation Guide, SC23-4278. Figure 7-2 on page 137 shows the Resource Groups.

When the failover is from Node-A (DB2 UDB) to Node-B (DLFM), you must perform the following steps: 1. On Node-A, run rc.db2server.dls with the stop option. This shuts down the DB2 instances on Node-A, using db2stop force, db2_kill, and killall. 2. Shut down the DB2 admin server if it exists. 3. On Node-B, mount the db2inst1 home directory, logs, and table data file systems (VG-1). This is part of the HA configuration. 4. On Node-B, run rc.db2server.dls with the start option. This sets uname and hostname to the Node-A hostname and starts the DB2 instance on Node-B.

Chapter 7. High Availability support on AIX 141 When the failover is from Node-B (DLFM) to Node-A (DB2 UDB), you must perform the following process: 1. On Node-B run rc.db2dls with the stop option. This shuts down the DLFM on Node-B using dlfm stop, dlfm shutdown. 2. Unmount the dlfs file systems. 3. On Node-A, mount the DLFM home directory, logs, table data, and archive file systems (VG-2). This is part of the HA configuration. 4. On Node-A, run rc.db2dls with the start option. It calls /etc/rc.dlfs, which loads the DLFF and mounts the Data Links files, runs dlfm shutdown, and runs dlfm start. 5. Force applications off on DB2 instance (if this is not done, you may see an SQL0357 error message with return codes: rc=5 or rc=1).

7.4 The scripts

The sample scripts are located in the /instance home/sqllib/samples/hacmp directory. The sample scripts can be modified to make them more robust in terms of checking the required environment present on the node before starting the services on the node.

These configuration scripts should ensure that the following conditions are met whether in normal or failover state: Data Links file systems are mounted properly with the correct options/characteristics. DLFM processes are running. DB2 processes are running for all the applicable DB2 instances. The DB2 server's hostname is established as the hostname of whichever node is currently running the DB2 processes.

Disclaimer: Sample scripts provided with the Data Links product for HA support should not be used as-is, but need to be modified to reflect the appropriate customer environment (user names, node names, etc.).

The sample script rc.db2dls (Example 7-1) either stops or starts the Data Links File Manager. When it stops DLFM, it also unmounts the Data Links file systems. When it starts DLFM, it runs /etc/rc.dlfs, which loads the Data Links File System Filter and mounts the Data Links file systems. It is called by the HACMP Cluster Manager.

142 Data Links: Managing Files Using DB2 Example 7-1 rc.db2dls sample script #!/bin/ksh # # Licensed Materials - Property of IBM # # (C) COPYRIGHT International Business Machines Corp. 1990,1997 # All Rights Reserved # # US Government Users Restricted Rights - Use, duplication or # disclosure restricted by GSA ADP Schedule Contract with IBM Corp. # ################################################################################### # # Name: rc.db2dls # # Description: Sample script to Start/Stop the Data Links File Manager Server. # # Arguments: $1 - instance: dlfm instance user (default “dlfm”) # $2 - status: Either start or stop # # Returns: 0 success # ###################################################################################

# # Initialization of variables etc. # DB2user=$1 parm2=$2 typeset -u parm2 HOST=`/bin/hostname -s ` PROGID=`echo $0 | sed 's%/usr/bin/%%g'` lnndir=`lsuser -c -a home $DB2user | awk -F":" '!/#/ { print $2}'` echo "\n`date`"

# # STOP the Datalinks Manager and Unload the DLFS if [[ "$parm2" = "STOP" ]] then echo "$PROGID - $HOST: Going to stop db2 " Date su - $DB2user -c dlfm stop su - $DB2user -c dlfm shutdown sleep 60 # # Unmount your datalinks file systems and Unload the DLFS kernel extension # umount /dlfsmountpoint(s)

Chapter 7. High Availability support on AIX 143 # Exit exit 0 fi

# # START the Datalinks Manager and Load the DLFS # if [[ "$parm2" = "START" ]] then echo "$PROGID - $HOST: Starting db2 "

# # Load the DLFS kernel extension. Unmount and Mount all the dlfs file systems. # Execute dlfmfsmd for each dlfs mount point. It will create/update /etc/rc.dlfs file. # /dlfm-home/sqllib/int/instance/dlfmfsmd /dlfsmountpoint(s)

# # Execute the rc.dlfs file created by /dlfm-home/sqllib/int/instance/dlfmfsmd # /etc/rc.dlfs

# # Shutdown and Restart the DLFM server. # su - $DB2user -c dlfm shutdown su - $DB2user -c dlfm start

exit 0 else echo "$PROGID ERROR:: rc.db2dls $*" echo "$PROGID SYNTAX:: rc.db2dls [DB2_USER] [ start | stop ]" exit 1 fi

------

The sample script rc.db2server.dls (Example 7-2) stops or starts DB2 and sets the uname and hostname. The uname and hostname must be set to the hostname that is registered to DLFM with the dlfm add_db command. It is called by the HACMP Cluster Manager.

144 Data Links: Managing Files Using DB2 Example 7-2 rc.db2server.dls sample script #!/bin/ksh # # Licensed Materials - Property of IBM # # (C) COPYRIGHT International Business Machines Corp. 1990,1997 # All Rights Reserved # # US Government Users Restricted Rights - Use, duplication or # disclosure restricted by GSA ADP Schedule Contract with IBM Corp. # ######################################################################### # # Name: rc.db2server.dls # # Description: Script to Start/Stop the Host DB2 Server HACMP Cluster manager. # # # Arguments: $1 - db2user: is the user of the db2 instance # $2 - parm2: [start | stop] : Start or Stop option. # $3 - param3 [standby|active] : This is to indicate the node on which #the script is running is active or standby node for DB2 Server # # Returns: 0 success # ######################################################################### # # Initialization of variables etc. # Change the Service_Host and Standby_Host with actual names, Service_Host=Node-A # Active Node for DB2 Server Standby_Host=Node-B # Standby node for DB2 Server

DB2user=$1 Parm2=$2 Param3=$3 typeset -u parm2 HOST=`/bin/hostname -s ` PROGID=`echo $0 | sed 's%/usr/bin/%%g'` lnndir=`lsuser -c -a home $DB2user | awk -F":" '!/#/ { print $2}'` echo "\n`date`" # # Stop the DB2 instance. # if [[ "$parm2" = "STOP" ]] then echo "$PROGID - $HOST: Going to stop db2 " date su - $DB2user -c $lnndir/sqllib/adm/db2stop force

Chapter 7. High Availability support on AIX 145 date su - $DB2user -c $lnndir/sqllib/bin/db2_kill sleep 15 su - $DB2user -c killall # # Set the uname and hostname back to the Standby_Host. Actually this must be done only # when script is run on Standby node to stop the DB2 server on Standby node. # if [[ "$parm3" = "standby" ]] then uname -S $Standby_Host hostname $Standby_Host Fi

# Exit exit 0 Else # # Start the DB2 Instance. # if [[ "$parm2" = "START" ]] then

# # Set the uname and hostname as DB2 Server's active node. Actually this setting of # hostname needs to be done only when script is run on Standby node during failover. # uname -S $Service_Host hostname $Service_Host date echo "$PROGID - $HOST: Starting db2 " su - $DB2user -c $lnndir/sqllib/adm/db2start exit 0 else echo "$PROGID ERROR:: rc.db2server.dls $*" echo "$PROGID SYNTAX:: rc.db2server.dls [DB2_USER] [ start | stop ]" exit 0 fi fi

7.4.1 Additional considerations for DB2 Universal Database Version 6 If failover is being configured using DB2 Universal Database Version 6, there is an extra task that you need to perform. The files listed from the following command on the DLFM server must be copied onto the DB2 server: ls -l /usr/lpp/db2_06_01/bin/dlfm_*

146 Data Links: Managing Files Using DB2 The permissions and links must be identical. This is not a problem on DB2 Universal Database Version 5 or Version 7 because the files are in the /dlfm instance/sqllib/adm directory and this fails over. Figure 7-6 shows the output of the ls command.

Figure 7-6 List of dlfm_ programs

Note: If a fixpack is installed for DB2 Universal Database Version 6, the fixpack installation process updates the /usr/lpp/db2_06_01/bin/dlfm_* files on the DLFM node, but not on the DB2 node. The files listed in Figure 7-6 need to be manually updated for failover to be successful after a fixpack upgrade.

7.4.2 Final considerations We chose to remove /etc/rc.db2 from the /etc/inittab file because the cluster controls the startup of the system. We also found that we had to modify the dlfs file systems by using the /usr/lpp/db2_0n/0n/instance/dlfmfsmd script after the failover from the DLFM node to the DB2 node. The cluster would return these file systems to the jfs file system type.

To initially set up the DLFM environment for failover on Node-A, you can: Create a temporary file system for the dlfm instance /home/dlfm. Run the db2setup program to create the dlfm instance. This modifies /usr/lpp/db2_07_01/, /var/db2, and /etc/vfs. Unload the DLFF driver (DLFSDRV): strload -uf /usr/lpp/db2_07_01/cfg/dlfs_cfg

Chapter 7. High Availability support on AIX 147 Unmount and delete the temporary file system. This removes the /home/dlfm/sqllib, /home/dlfm/dlfm_backup, and /home/dlfm directories.

Attention: We recommend that a skilled System Administrator be highly involved when setting up the failover environment. Extensive testing must be conducted to ensure all of the failover scenarios function correctly.

148 Data Links: Managing Files Using DB2 8

Chapter 8. Creating a new database

This chapter describes a method to create a new DB2 database from an existing DB2 database that uses Data Links. The export, import, load, dlfm_export, and dlfm_import utilities are discussed, as well as other administration commands for Data Links. The examples are from an AIX environment. The process we follow is similar for Windows NT and Sun Solaris with some minor differences. The dlfm_export and dlfm_import differences are documented in Chapter 5 “Moving Data Links Manager Data” in Data Movement Utilities Guide and Reference, SC09-2955.

Creating a new database from an existing database is a task that Database Administrators (DBA) are continually asked to do. The request to create this new database can arise for a number of reasons. A test database is required, and it needs to look just like production, or a customer requires a restore of yesterdays backup to a new database so the data can be reviewed. This task is relatively simple when Data Links is not involved. When Data Links is involved, the DLFM_DB database, which contains the Data Links database information, must be rebuilt. We show how to create the new test system using our existing host machine. We create a new instance and restore a backup to the new database. We use the existing Data Links File Manager and add a new file system to store the linked values. We use the following steps to create our new database with Data Links: 1. Run the Backup utility on the database. 2. Run the Export utility on the Data Links table data. 3. Capture the ddl table using db2look. 4. Run Restore on the database with the database manager configuration parameter DATALINKS NO. 5. Drop and recreate the table. 6. Copy files to be linked to a new directory on the DLFM server. 7. On the DLFM server, run: Dlfm add_db, dlfm add_prefix 8. Run Import or load to move the data into the table.

Figure 8-1 illustrates this process.

150 Data Links: Managing Files Using DB2 Backup file 4 db2 restore .... 1 db2 backup ....

DB2 Database DB2 Database dltest dlrestor

2 db2 export .... 8 db2 import .... nr nr desc image desc image

http://fileserv1/images/p10.bmp Export file http://fileserv1/images/p10.bmp export_resident.del 7 dlfm add_db dlfm add_prefix ....

3 db2look .... DLFM Server 5 db2 drop table create table .... Table ddl resident.ddl

6 copy datalink files .... /dldata2 /dldata

Figure 8-1 The steps used to create the new database

8.2 Backup

An offline backup is taken from a database that contains a table with the DATALINK data type. If an online backup is used, a copy of the logs must be available for the rollforward in the new database. The backup is used to restore the database and all of its contents to a new database. Figure 8-2 shows the backup command. This method is useful when you want to migrate the environment onto new hardware. For example, you are currently using DB2 Universal Database Version 5.2 and you want to migrate this environment to a Version 7.x environment on a new file system or machine that does not have DB2 Universal Database or DLFM installed.

Chapter 8. Creating a new database 151 Figure 8-2 Backup database command

8.3 EXPORT (dlfm_export)

We illustrate two types of exports. An export to an integrated exchange format (IXF) file and an export to a delimited ASCII format (DEL) file. We use the IXF file type if the Data Link files are on a new host machine, and we do not need to change the path of the Data Link files. The steps for exporting are: 1. Run the quiesce command to ensure that you have a consistent copy of the table and the corresponding files when you run the export command. 2. Run the export command to creating an IXF file and a control file. 3. Copy the control file to the Data Links server. 4. Run dlfm_export on the Data Links server. It creates a tar file that contains the Data Linked files that will be needed for the import.

Figure 8-3 illustrates the quiesce command and an export command to an IXF file type.

Figure 8-3 Quiesce and export to the IXF file type

The export to an IXF file creates a control file called ununbium.almaden.ibm.com. The control file is used as input for the dlfm_export command. The control file is placed in /tmp/dlfm, and the contents are shown in Figure 8-4.

152 Data Links: Managing Files Using DB2 Figure 8-4 Contents of the export control file

The dlfm_export command is run using the control file created by export as its input. The command creates a tar file that contains all of the files that are listed in the control file. The command and its output are shown in Figure 8-5.

Figure 8-5 Sample dlfm_export

For our example in Figure 8-1, we use an existing Data Links File Manager, so we have to change the path of the Data Linked files to avoid duplicates. To change the path names, we must edit the export file therefore we use a DEL file type. The DEL file type is easier to edit than an IXF file. The following process is used: 1. Run export on the Data Link data to a file type of type DEL. 2. Edit the file changing the file system name from /dldata to /dldata2.

Figure 8-6 displays the export command using a DEL file.

Chapter 8. Creating a new database 153 Figure 8-6 Export using delimited output

Figure 8-7 displays the delimited export file before and after it was edited. The file system was changed from /dldata to /dldata2 because the files in /dldata already exist.

Figure 8-7 Delimited file before and after editing

8.4 The db2look command

The ddl for the table with DATALINK columns must be captured so that it can be run on the new database. We use db2look to capture this ddl. You can find Help for db2look by simply typing db2look -h. Figure 8-8 shows db2look and the ddl it created.

154 Data Links: Managing Files Using DB2 Figure 8-8 The db2look command and the output it produced

8.5 The restore command

When we restore the database to the new instance, we set the database manager configuration parameter DATALINKS=NO. The reason for this is so that we can easily drop and recreate the table with the DATALINK column. If we leave the DATALINKS parameter set to YES, the table is put in the Datalinks_Reconcile_Not_Possible (DRNP) state, which is difficult to clear up. The following steps pertain to the restore command: 1. Use the restore command to create the new database. 2. Make sure the database manager configuration parameter is set to DATALINKS=NO. 3. Drop and recreate the tables containing the DATALINK data type using the output from db2look in Figure 8-8. 4. Update the database manager configuration using DATALINKS=YES. 5. Stop and start DB2. 6. Use the list datalinks managers command to see if the Datalink Manager name and port number are correct. If the name or port number is incorrect, use drop datalinks manager, and add datalinks manager to correct the configuration.

Chapter 8. Creating a new database 155 Figure 8-9 shows the restore, get dbm cfg and list datalinks managers commands.

Figure 8-9 Restore command, get dbm cfg, and list datalinks managers

8.6 Copying the linked files

The dlfm_import utility can be used to extract the files from the archive created by the dlfm_export utility. The name of the archive is dlfm_files.tar. When dlfm_import is run, it should be run as root. The dlfm_import command extracts the files into the same directory name from which they were copied.

Note: Use caution with dlfm_import. If it is run on the same host as dlfm_export, the existing Data Link files will be overwritten.

The dlfm_import command extracts the files into the same directory name from which they were copied.

Note: If dlfm_import is run on a different host, there is a way to make dlfm_import extract the files from the archive into a different path name. Symbolic links can be set up to do this. For our example, we could have used: ln -s /dldata2 /dldata

The files would then be extracted into /dldata2 using the link /dldata.

A sample dlfm_import is shown in Figure 8-10.

156 Data Links: Managing Files Using DB2 Figure 8-10 Sample dlfm_import

For the example in Figure 8-1, we chose not to use dlfm_import. It would not allow us to change the path name for the Data Linked files because they are absolute path names in the tar file. We simply copied, as root, the files from one directory to another and changed the ownership to something other than dlfm. cp /dldata/sys_pics/* /dldata2/sys_pics/

8.7 DLFM commands

We register our new instance and database with DLFM by using the dlfm add_db command. The dlfm add_prefix command is used to register our new dlfs file system with DLFM. Figure 8-11 shows the commands.

Figure 8-11 The dlfm add_db and dlfm add_prefix commands

8.8 Running the Import utility

The files have been copied and are now in a dlfs file system. The next step is to populate the table that contains the DATALINK column type. The Import utility can be used if the amount of data is not too large. The Import utility allows you to globally change the host name for the Data Link files by using the

Chapter 8. Creating a new database 157 dl_url_replace_prefix clause. You can also use the dl_url_suffix clause, which appends the value associated with the clause to the path component of the URL part of the DATALINK value. For detailed information on the dl_url_suffix and dl_url_replace_prefix clauses, refer to DB2 UDB Command Reference, SC09-2951. In our sample, we modified the import file and changed the directory name from /dldata to /dldata2. Figure 8-12 shows a delimited file import.

Figure 8-12 Import delimited file with DATALINK column type

8.9 Running the Load utility

The Load utility can also be used instead of the Import utility to populate the table with DATALINK columns. The Load utility should be used for a large number of rows. Another advantage of load is that it does minimal logging.

Note: When loading a large number of rows that contain DATALINK values, you will not be able to back up the database until the asynchronous copies of each file have completed on the DLFM server. This only occurs when the corresponding DATALINK columns are defined with the RECOVERY YES option. The asynchronous copy operations can be a lengthy process.

There are some options that are not supported when loading a table with Data Links: The COPY option The REPLACE option

158 Data Links: Managing Files Using DB2 The TERMINATE option The CPU PARALLELISM (value is forced to 1) option The NONRECOVERABLE option should not be used when the DATALINK column is defined with FILE LINK CONTROL

A sample load command is shown in Figure 8-13.

Figure 8-13 The Load utility

Chapter 8. Creating a new database 159 160 Data Links: Managing Files Using DB2 9

Chapter 9. Data replication

This chapter discusses how the DB2 data replication feature, otherwise known as DataPropagator Relational (DPropR), can be used to copy externally managed files from one location to another. Because it is necessary to understand the basics of replication in order to understand replication of DATALINK columns and their linked files, much of the material covered in this chapter is not limited to Data Links. This chapter explores many of the components of DPropR, how they interact, and how to use the DB2 Control Center to set up a replication environment involving externally managed files.

Replication is the process of automatically maintaining one or more copies of data so that it is kept synchronized with the original source data. As data is created, updated, or deleted at the source, the copy is also changed.

DB2 has for a long time supported the replication of traditional data types. The ability to replicate DATALINK columns was introduced in DB2 DataPropagator Version 7 and is included with DB2 Universal Database Version 7. The replication of the external files, which are controlled by the DB2 Data Links File Manager (DLFM), is not done by the DataPropagator product itself, but by a user exit routine invoked by the Apply program (a component of DPropR).

Replication Guide and Reference Version 7, SC26-9920, describes in detail all of the components of replication and how to plan, set up, and administer a replication environment. This chapter is not intended to replace Replication Guide and Reference Version 7, SC26-9920, but to supplement it by discussing items related to Data Links.

Data replication can be configured in many different ways. The simplest way to implement is known as data distribution, where updates to a source table are replicated to one or more read-only target tables. Data distribution uses most of the available features of replication and is the configuration discussed in this chapter. Replication of DATALINK values and their associated files can also be accomplished when using other replication configurations, although these configurations are not discussed in detail in this chapter. For an in-depth discussion of all possible replication configurations, refer to Replication Guide and Reference Version 7, SC26-9920.

Here are some of the supported replication configurations: Data consolidation: Data from multiple source tables is replicated to a common target table. Update anywhere: Target tables are read/write. Replication is bidirectional, that is, changes to a target table are also replicated to the source table. Note that conflict detection is not supported for DATALINK columns. Occasionally connected: Data from a primary source is copied to a target table on demand.

9.2 Why replicate linked files

The primary benefit of replicating files is improved performance when accessing those files from multiple remote sites.

162 Data Links: Managing Files Using DB2 Suppose your company stores engineering drawings in files that are managed by Data Links, and the DB2 database and the Data Links server are located in Los Angeles. Engineers in Los Angeles can access the linked files quickly. Delays caused by moving the file data over the network to the engineers workstations are minimal, because the physical distance from the Data Links server to workstations is small. Engineers in New York, on the other hand, experience significant delays when accessing the same files, because the file data must travel long distances over the network. Performance for the New Yorkers could be improved if the files were replicated to a server located in New York. The closer the user is to where the file physically resides, the less time they have to wait for the file to be transferred over the network. With replication, when a file is created or changed at the source, it is copied to the target server asynchronously. By the time the end user needs to access the new file, it already resides on the server local to that user.

9.3 Supported platforms

Tables with DATALINK columns can be replicated between DB2 databases on the following operating systems: AIX OS/400 (AS/400) Solaris Windows NT

This chapter discusses replication with both the source and target databases using DB2 on AIX. Although DPropR supports replication between many other platforms, remember that DATALINK columns cannot be replicated to platforms that do not support them. Here is a list of current restrictions: DATALINK columns cannot be replicated between DB2 databases on iSeries and AS/400 and DB2 databases on other platforms. Replication of the “COMMENT” attribute of a DATALINK value is not supported on the iSeries platform. Update-anywhere replication with DATALINK columns must use a conflict-detection level of NONE. DB2 does not check update conflicts for external files pointed to by DATALINK columns. DB2 always replicates the most current version of an external file pointed to by a DATALINK column. Target tables that are base-aggregate or change-aggregate tables do not support DATALINK columns.

Chapter 9. Data replication 163 9.4 Replication components

DataPropagator performs two primary functions: Collecting data that has been created or changed Copying that data to one or more target servers

These processes are commonly referred to as Change-capture and Apply. The DB2 Control Center can also be considered a component of replication because it can be used to setup and administer the replication environment. In 9.6, “Implementing replication with Data Links” on page 172, we use it to step through the tasks needed to set up replication.

An alternative to the DB2 Control Center is the DB2 Data Joiner Replication Administration (DJRA) tool. DJRA is required to setup and administer replication on many of the non-IBM databases. See Figure 9-1 for a list of platforms that require DJRA. You can find instructions for installing DJRA and using it to set up and administer a replication environment in Replication Guide and Reference Version 7, SC26-9920.

DataPropagator stores information about what data to capture and what data to replicate in a set of tables called the control tables. Some of the control tables are used by the change-capture process and some are used by the Apply process.

All of the replication components reside on a logical server. The term “server” as used here refers to a database rather than a physical machine. There are three types of logical servers: Source server: The database where the source tables and the control tables used by the Capture program reside. Target server: The database where the target tables reside. Control server: The database where the control tables used by the Apply program reside.

9.4.1 Change-capture Change-capture (Figure 9-1) is the process of collecting data as it is created or modified and storing it for later retrieval by the Apply process. The table whose data is to be captured is called the replication source. Whenever a transaction issues an INSERT, UPDATE, or DELETE statement, the affected data is copied by the Capture program (or a Capture trigger on non-IBM databases) to a control table called the change-data (CD) table. Data is written to the CD table before it is

164 Data Links: Managing Files Using DB2 committed. Information about which transactions have been committed is stored in another control table called the unit-of-work (UOW) table. The Apply program joins the UOW table and the CD table to ensure that only committed changes are replicated.

The Capture program runs on the same machine as the replication source. It collects the data to be replicated by reading the database log file and then copies that data to the CD table. The capture program runs continuously on the source server and is normally started immediately after DB2 is started.

Source Database

SQL INSERT, UPDATE or DELETE

DB2 Log

Replication Source Table Change-Data Table Capture Program

Figure 9-1 Change Capture

When you use the DB2 Control Center to define the replication source, the CD table is automatically created. Each table that is defined as a replication source has its own CD table.

When the replication source table contains a DATALINK column, the DATALINK URL is stored in the CD table, but the referenced file is not. The URL is used by an exit routine called by the Apply process, and the referenced file is copied at that time.

You may not want to capture all of the columns of your replication source table. When you define the replication source, you can restrict which columns are captured. This allows you to capture only the data that you want to be copied to the target table(s). Figure 9-2 shows the SOURCE.DEPARTMENT table in the SAMPLE database being defined as a replication source. Notice that we have chosen to capture only the data in the DEPTNO and DEPTNAME columns. The data in the MGRNO, ADMRDEPT, and LOCATION columns are not captured.

Chapter 9. Data replication 165 Note: One of the databases we use in our examples in this chapter is the SAMPLE database, which was created by a user ID of source. The tables in the SAMPLE database, therefore, have a schema name of SOURCE.

Figure 9-2 Defining a replication source

9.4.2 Apply Figure 9-3 shows the actions performed by the Apply program. The Apply program populates the target tables by either reading data directly from the source table (for an initial load or full refresh of the target table), or by reading the CD table (1). In most cases, the Apply program runs at the target server, but it can be run from any machine on the network that has access to the source server, the target server, and the control server. The data read by the Apply program is stored in a spill file (2) and then copied to the target table (3 and 4).

166 Data Links: Managing Files Using DB2 2 spill 3 Apply file 1 4

Source Database Target Database

Replication Source Table Change-Data Table Target Table

Figure 9-3 Apply program data flow

9.4.3 Subscription sets and subscription set members The mapping of the replication source data to the target table, as well as many of the parameters governing how the data is to be replicated is defined by subscription sets and subscription set members. A subscription set is used to define the source server, the target server, the frequency of replication, etc. Subscription sets are processed in a single transaction. This assures that changes are applied to all of the target tables defined in all of the subscription set members or to none of them.

A subscription set member is used to identify the source table and target table, which columns are to be replicated, and, through the use of SQL predicates, which rows are to be replicated. Figure 9-4 shows the relationship between subscription sets and subscript set members.

Chapter 9. Data replication 167 Figure 9-4 Subscription set and subscription set members

9.5 Data Links replication

Replication of DATALINK columns and their associated files requires a bit more work than replication of traditional data types. If you think about what the value contained in a DATALINK column represents, it is easy to see that you cannot just copy that value to another table. A DATALINK column contains the protocol used to access the file (HTTP or UNC), the name of the server where the file

168 Data Links: Managing Files Using DB2 resides, and a fully qualified pathname to the file. If you replicate the column and its associated file to another server, the new DATALINK column needs to point to the target server and new pathname of the replicated file. This is illustrated in Figure 9-5.

Source server Target server file1 file2 Source Database Target Database

Replication Source Table Target Table

Datalink value: Datalink value: HTTP:source_server.ibm.com/source_files/file1 HTTP:target_server.ibm.com/target_files/file2

Figure 9-5 DATALINK values before and after replication

Fortunately, DPropR provides a mechanism for changing the DATALINK value at the target server to point to the newly replicated file.

9.5.1 Capturing DATALINK values The Capture program treats DATALINK columns no differently than any other data type. The DATALINK value is read from the log and written to the CD table, along with the data from other columns that were selected to be part of the replication source. Note that the file referenced by the DATALINK value is not copied to the CD table. Remember that the Capture program reads the log file to collect the change data, and the content of the linked files are not written to the log.

9.5.2 How Apply handles DATALINK values The Apply program has two additional tasks to perform when dealing with DATALINK values. First, it must map the source file reference into the target file system. This might mean changing the server-name portion of the DATALINK value, as well as the pathname and filename so that they point to the correct location on the target server. Second, the source file must be copied to the target file system. Both of these functions are performed by a user exit program, ASNDLCOPY, which is called by the Apply program.

Chapter 9. Data replication 169 Note: The ASNDLCOPY user exit program is a sample program supplied with DB2 and resides in the sqllib/bin directory. The C language source code for the program is in sqllib/samples/repl/ASNDLCOPY.smp and can be modified to meet any unique user requirements. The prolog section of the code describes the program usage, default options, calling conventions, etc.

File-reference mapping function The Apply program invokes a user exit program, ASNDLCOPY, to perform both the file mapping and file copy functions. Before calling the exit routine, the Apply program reads the CD table (or the replication source table during initial load) and writes DATALINK column values to a file. This file is then read by ASNDLCOPY, which transforms the original file references by applying mapping definitions (stored in a file named ASNDLSRVMAP), and the modified file-references, now pointing to the target server and pathname, are written to an output file. See Figure 9-6. Section 9.6, “Implementing replication with Data Links” on page 172, examines the structure of the mapping definition file, ASNDLSRVMAP.

Mapping Definitions ASNDLCOPY ASNDLSRVMAP

Original Modified file references file references

Source Database APPLY

Replication Consistent Source Table Change-Data Table

Figure 9-6 File reference mapping

170 Data Links: Managing Files Using DB2 File copy function After the file-reference mapping has completed, the ASNDLCOPY routine copies the files from the source server to the target server. For files linked with the DATALINK option READ PERMISSION FS, the user exit program uses the FTP protocol to physically transfer the files being replicated. ASNDLCOPY reads a file containing hostnames, port numbers, user IDs, and passwords. This file is named ASNDLUSER, and its contents are discussed in 9.6.7, “Configuration files used by ASNDLCOPY” on page 184.

This information is used to establish a connection to an FTP daemon. If the DATALINK column uses the READ PERMISSION FS option (or the NO LINK CONTROL option), and both file systems are NFS-mounted, the ASNDLCOPY user exit can be customized to use the UNIX cp command to copy the files instead of FTP.

File copy function with READ PERMISSION DB If the files to be copied have been linked using READ PERMISSION DB, the ASNDLCOPY program can still use the FTP daemon to transfer files, but the user ID used to connect to FTP must have root access. This is usually not a desirable situation because the user ID and its password must be stored in a file that is accessible to ASNDLCOPY. Most system administrators do not allow this.

There is, however, an alternative. ASNDLCOPY can use the ASNDLCOPYD copy daemon instead of the FTP daemon to copy the files. ASNDLCOPYD is a sample file transfer program similar to FTP that is included with DB2. The C language source code for ASNDLCOPYD can be found in sqllib/samples/repl and an executable resides in sqllib/bin. ASNDLCOPYD provides a subset of FTPs functions for extracting file information (like file size, modification date, and time) and for extracting the contents of a linked file. The ASNDLCOPYD daemon can read files linked with the READ PERMISSION DB option because it runs with root authority, but it has an advantage over the FTP daemon because it uses two configuration files to control who can connect to it, and to control which directories can be accessed. These configuration files are discussed in 9.6.8, “Configuration files used by ASNDLCOPYD” on page 187.

For a complete description on how to set up and use ASNDLCOPYD, refer to Replication Guide and Reference Version 7, SC26-9920, or the prolog section of the sample program ASNDLCOPYD.smp in the sqllib/samples/repl directory.

Chapter 9. Data replication 171 9.6 Implementing replication with Data Links

Before you can begin to set up replication of DATALINK columns and their files, you need to create a source table that contains the DATALINK data type and to populate it with a few rows. We do not go into the details of how to do this, but we describe the environment that we use for the remainder of this chapter.

9.6.1 Before we begin In this scenario, we use the Control Center to set up the replication source, replication subscription set, and subscription set member. We assume that you are at least somewhat familiar with the DB2 Control Center. We do not describe, in detail, how to navigate the Control Center. Figure 9-7 shows the table used as the replication source: SOURCE.MANAGERS. The table was created in the SAMPLE database, in DB2 instance named SOURCE, on hostname NAPA.ALMADEN.IBM.COM.

Figure 9-7 SOURCE.MANAGERS table

Figure 9-8 shows the data we will replicate. We inserted four rows into the SOURCE.MANAGERS table. The column named PICTURE is the DATALINK column. We linked four files residing on hostname UNUNBIUM.ALMADEN.IBM.COM in the file system /dldata in the directory /source/pictures. From this, you can infer that the DLFM server managing these files is located on hostname UNUNBIUM.ALMADEN.IBM.COM and that the file system /dldata is a DLFS file system managed by that DLFM.

172 Data Links: Managing Files Using DB2 Figure 9-8 SOURCE.MANAGERS table contents

The replication target is also on hostname NAPA.ALMADEN.IBM.COM, in DB2 instance name TARGET, and our target database is COPYDB. We create the target table TARGET.MGR_COPY when we create the subscription set member. COPYDB will also use the DLFM server on UNUNBIUM as its DLFM server, and the replicated files will be stored in the DLFS file system /dldata under a directory called /target/photos.

Figure 9-9 shows the state of the environment before starting replication. Note that no files are linked to the TARGET.MGR_COPY table in COPYDB, and no files reside in /dldata/target/photos.

NAPA.ALMADEN.IBM.COM UNUNBIUM.ALMADEN.IBM.COM

SAMPLE /dldata/source/pictures

cathy.bmp rachel.bmp sayanna.bmp zack.bmp SOURCE.MANAGERS

COPYDB /dldata/target/photos

TARGET.MGR_COPY

Figure 9-9 Environment before replication

Chapter 9. Data replication 173 9.6.2 Defining the replication source You define the replication source using the Control Center. First, you expand the object tree to see the list of tables in the SAMPLE database. Right-click the SOURCE.MANAGERS table, select Define as Replication Source, and select Custom. See Figure 9-10.

Figure 9-10 Defining a replication source

You are next presented with a dialog that allows you to define which columns will be replicated. In this example, we use all of the columns of the SOURCE.MANAGER table, so we leave all of the columns selected and do not make any other changes. See Figure 9-11.

174 Data Links: Managing Files Using DB2 Figure 9-11 Selecting columns to be replicated

When you click OK, you are given the option to run the generated SQL or save it to a file. Here, save the SQL so you can examine it, and later run it to define the replication source. We save it on the C: drive in a directory called scripts as a file named replsrc.sql. See Figure 9-12.

Figure 9-12 Saving the replication source definition

If you look at the generated SQL (Figure 9-13), you see that the SOURCE.MANAGERS table is altered to capture changes. Next you see that the change data table was created and given a system generated name of RMRES2.CD20010514922768. The table owner is RMRES2 because that was the user ID under which we ran the DB2 Control Center when we generated the SQL.

Chapter 9. Data replication 175 Figure 9-13 SQL to define the replication source

Now you are ready to run the generated SQL and actually define the replication source. You do this by going back to the Control Center object tree, right-clicking Replication Sources, and selecting Run SQL Files (see Figure 9-14).

Figure 9-14 Defining the replication source by running an SQL file

176 Data Links: Managing Files Using DB2 Important: If you are using DB2 Control Center Version 7.2.0.0 or earlier to define the replication source with a DATALINK column, a known defect in the Control Center causes it to incorrectly define the attributes of the column in the change data table that holds the DATALINK URL. This defect is resolved with APAR IY19523 (this APAR is scheduled to be included in DB2 Universal Database V7 Fixpak 4).

Therefore, it is necessary to change the definition of the CD table before you run the saved SQL file. The CREATE TABLE statement near the beginning of the file incorrectly defines the column that will capture the DATALINK URL as CHAR(254). You need to changed this to VARCHAR(261). The column is defined correctly when using Data Joiner Replication Administration (DJRA).

After you locate and run the replsrc.sql file, the replication source is defined. You can verify this by clicking Replication Sources in the Control Center object tree. Figure 9-15 shows the replication source table MANAGERS.SOURCE.

Figure 9-15 Viewing the replication source

The first time that a replication source is defined for a database, the replication control tables are created. These tables have a schema name of ASN and table names beginning with IBMSNAP_. The generated SQL contains several INSERT statements that populate the replication control tables with information about the replication subscription you are creating. You are now ready to define the replication subscription set and subscription set member.

Chapter 9. Data replication 177 9.6.3 Defining the subscription set and subscription set member Once a replication source is defined, you can define where you want the captured data to be copied. Do this by defining a replication subscription in the Control Center by right-clicking the name of the replication source and selecting Define Subscription. See Figure 9-16.

Figure 9-16 Defining the replication subscription

In Figure 9-17, you see that we have given our subscription a name of “MGRSUB”, and that the target server is COPYDB. We also gave the Apply qualifier the name “MGRSUB”. Notice that we selected the Create table check box. This causes the target table to be created at the target server when the subscription is defined.

Figure 9-17 Define replication subscription dialog

178 Data Links: Managing Files Using DB2 By default, the target table has the same creator and name as the source table. You can choose a different name for the target table by clicking the Change button. Figure 9-18 shows that we changed the creator and name to TARGET.MGR_COPY.

Figure 9-18 Changing the target table name

Next, we click the Advanced button and then the Target Columns tab. Select the Primary Key box next to the source column named “ID”. This causes the target table to created with ID as the primary key. This dialog also allows you to rename or add columns to the target table definition (although we do not do so in this chapter). See Figure 9-19.

Figure 9-19 Selecting the primary key for the target

Chapter 9. Data replication 179 Perhaps you want to restrict which rows from the source table to replicate. By default, all of the rows in the source table are replicated to the target. You can change this by clicking the Rows tab in the Advanced subscription replication dialog. You can now enter SQL predicates that will be used by the Apply program to limit which rows are replicated. Figure 9-20 shows that we have only allowed rows with an ID greater than 000000 to be replicated.

Figure 9-20 Restricting replicated rows

When you click OK, you return to the Define replication subscription main dialog. Next you define how frequently you want the Apply program to run by selecting the Time-based check box. Figure 9-21 shows that we selected the Using relative timing radio button and then changed the replication frequency to once every minute by using the Minutes and Hours options.

180 Data Links: Managing Files Using DB2 Figure 9-21 Subscription timing

Clicking OK returns you once again to the Define replication subscription main dialog. Clicking OK one more time allows you to save the replication subscription to a file or to immediately run it. In either case, you need to specify where the subscription control information will be stored. This database is the control server. Here we chose COPYDB as the control server and then saved the subscription as a file named replsub.sql (see Figure 9-22).

Figure 9-22 Saving the replication subscription

The replication subscription is actually defined when running the file. You do this by right-clicking Replication Subscriptions from the Control Center object tree, selecting Run SQL Files..., and then selecting replsub.sql.

Chapter 9. Data replication 181 Important: If you are using the DB2 Control Center Version 7.2.0.0 or earlier to define the subscription set, a known defect in the Control Center causes it to incorrectly populate one of the control tables used for replication. This defect is resolved with APAR IY19523 (this APAR is scheduled to be included in DB2 Universal Database V7 Fixpak 4).

Therefore, it is necessary to change the generated subscription definition file before you run it. For each column in the target table, there is an INSERT statement into the ASN.IBMSNAP_SUBS_COLS table. Locate the INSERT statement for the DATALINK column being replicated. The value supplied for COL_TYPE is incorrectly set to “A”. This needs to be changed to “D”. If the subscription set is already defined, this column can be updated with the DB2 command line processor by connecting to the database containing the subscription definition and running this UPDATE statement: update ASN.IBMSNAP_SUBS_COLS set COL_TYPE=’D’ where TARGET_OWNER= and TARGET_TABLE=

and TARGET_NAME=

Here is the owner of the target table,

is the name of the target table, and is the name of the DATALINK column in the target table.

9.6.4 Configuring the source database When using replication, DB2 needs to retain log files. The Capture program reads the log files to collect the data to be replicated. You can tell DB2 to retain the log files by setting the database configuration parameter LOGRETAIN to RECOVERY or CAPTURE. This only needs to be done on the source server. You can use the Control Center to accomplish this by right-clicking the database object and selecting Configure and then clicking the Logs tab. Click the logretain parameter and select Recovery. When you click the OK button, a message appear that states that all applications must disconnect from the database for the parameter change to take effect. If the logretain parameter was previously set to “no”, it is necessary to take an offline backup of the database for the database to be accessible.

If you prefer to use the DB2 command line processor rather than the Control Center to make the change, the following commands accomplish the same thing: db2 connect to sample db2 update db cfg for sample using logretain recovery db2 connect reset db2 backup database sample to /backup_directory

182 Data Links: Managing Files Using DB2 Note: The logretain database configuration parameter should only be set to CAPTURE when the DB2 log files will not be used for rollforward recovery. When the logretain parameter is set to CAPTURE, the Capture program calls the PRUNE LOGFILE command to delete log files when the Capture program completes.

9.6.5 Binding the Capture and Apply programs DB2 DPropR can automatically bind the Capture and Apply programs to the source and target databases on the UNIX, Windows, and OS/2 operating systems. You can also use the DB2 command line processor to manually bind the programs. Here are the commands we used to bind the programs to the SAMPLE and COPYDB databases: db2 connect to sample cd sqllib/bnd db2 bind @capture.lst isolation ur blocking all db2 bind @applyur.lst isolation ur blocking all db2 bind @ applycs.lst isolation cs blocking all

db2 connect to copydb db2 bind @applyur.lst isolation ur blocking all db2 bind @ applycs.lst isolation cs blocking all

Note that you need to bind both the Capture and Apply programs to the source database, but only the Apply program to the target database. This is because Capture has no need to access the target database, while Apply needs to read from the source database and write to the target database.

9.6.6 Creating the password file for the Apply program Like any other application, the Apply program needs to connect to the databases that it will access. This connection is authenticated in the same way that any other connection is authenticated, which means that Apply must provide a user ID and a password. These are stored in a password file. The password file contains the name of the database, the user ID to be used for the connection, and a password. Our password file looks like this: SERVER=SAMPLE USERID=source PWD=sourcepwd SERVER=COPYDB USERID=target PWD=targetpwd

Chapter 9. Data replication 183 The first line contains the name of the replication source database, SAMPLE. The user ID that is used to connect to the SAMPLE database is source, which happens to be the DB2 instance owner ID of our instance named source. The password of the user ID source is sourcepwd. The next line supplies the name of the target server and the user ID and password that will be used to access it.

Note: The server name supplied must match the name in the subscription set, and, on UNIX and Windows NT, the user ID and password values supplied are case sensitive.

Naming the password file The password file must be named applyqual.pwd, where applyqual is the Apply qualifier defined in the subscription set. When we created the subscription set in 9.6.3, “Defining the subscription set and subscription set member” on page 178, we used MGRSUB as the Apply qualifier (see Figure 9-17 on page 178). Therefore, the password file needs to be named MGRSUB.pwd. The password file must reside in the same directory from which the Apply program will be started.

Note: Because the password file contains sensitive information, you may want to place it in a directory that is accessible only to authorized individuals.

9.6.7 Configuration files used by ASNDLCOPY The ASNDLCOPY user exit program needs two more files to replicate DATALINK columns and their linked files: ASNDLSRVMAP and ASNDLUSER.

ASNDLSRVMAP ASNDLSRVMAP contains information necessary to transform the URL stored in the source DATALINK value into the URL that will eventually point to the copied file on the target server. Consider one of the DATALINK values stored in our replication source table SOURCE.MANAGERS: HTTP://UNUNBIUM.ALMADEN.IBM.COM/dldata/source/pictures/cathy.bmp

We can break up this URL into four components: Protocol: HTTP Hostname: UNUNBIUM.ALMADEN.IBM.COM Pathname: /dldata/source/pictures Filename: cathy.bmp

184 Data Links: Managing Files Using DB2 When we replicate this row to the target server, any of these four components may change. If we replicate to a target server running on the Windows NT operating system, the protocol might change from HTTP to UNC. In most cases, the hostname of the target server will be different from the source server. The pathname may change, and we may want to rename the copied file. The ASNDLSRVMAP file is used to define how to change the first three. The filename can be changed, but requires user written code to do so. This can be done by modifying the ASNDLCOPY user exit program.

Here is the format of the ASNDLSRVMAP file: [ ]

The contents of the format are explained here: : Contains the protocol name and hostname of the source server. : Contains the protocol name and hostname of the target server. : Is optional and contains the pathname of the source file. : Is also optional and contains the pathname to the copied file on the target server.

Here is the content of a hypothetical ASNDLSRVMAP file: HTTP://host1.com HTTP://host2.com /data /files

The ASNDLCOPY program uses this to change a source DATALINK value of HTTP://host1.com/data/groovin.mp3 to a target DATALINK value of HTTP://host2.com/files/groovin.mp3. If the source path and target path are not supplied, ASNDLCOPY copies the source file to a like-named path on the target server. The ASNDLSRVMAP file may contain multiple lines to define mapping for multiple source and target server pairs and multiple source and target pathname pairs.

The ASNDLSRVMAP file will look like this: HTTP://UNUNBIUM.ALMADEN.IBM.COM HTTP://UNUNBIUM.ALMADEN.IBM.COM /dldata/source/pictures/dldata/target/photos

Important: Note that the ASNDLSRVMAP file used for our exercise contains only one line (it appears as two lines because of its length). Each source server and target server pair and any source pathname and target pathname must reside on a single line. The ASNDLSRVMAP file needs to reside in the same directory as the password file, that is, the directory from which the Apply program will be started.

Chapter 9. Data replication 185 ASNDLUSER The ASNDLUSER file contains the hostnames, port numbers, user IDs, and passwords used by ASNDLCOPY to physically copy the file being replicated from the source server to the target server. Here is the format of the ASNDLUSER file:

The first line identifies the hostname where the source file resides, the port number used by ASNDLCOPY to receive files, the port number used by ASNDLCOPY to send files, the user ID used to transfer files from the source, and the password of that user ID. The second line contains the same information for the target server.

Note that if the source hostname and the target hostname are the same, only one line is needed. Because we are using the FTP daemon to transfer files, both the receive port and the send port will use the standard FTP communication port number 21.

Here is the content of the ASNDLUSER file we will use for our exercise: UNUNBIUM.ALMADEN.IBM.COM 21 21 target targetpwd

If we wanted to use the ASNDLCOPYD daemon instead of the standard FTP daemon (as is required to replicate files linked with the READ PERMISSION DB option), the receive port number specified in the ASNDLUSER file needs to match the port specified when starting ASNDLCOPYD. Typically we choose a port that we know is not being used. For example, if we started the ASNDLCOPYD daemon using port 9999, the ASNDLUSER file would need to look like this: UNUNBIUM.ALMADEN.IBM.COM 9999 21 target targetpwd

This entry would cause ASNDLCOPY to connect to ASNDLCOPYD on port number 9999 with a user ID of target and a password of targetpwd.

Note: The ASNDLSRVMAP and ASNDLUSER files must reside in the directory from which the Apply program is started. Because the ASNDLUSER file contains sensitive password information, you may want to place it in a directory that is accessible only to authorized individuals.

186 Data Links: Managing Files Using DB2 9.6.8 Configuration files used by ASNDLCOPYD The ASNDLUSERINFO and the .DIR configuration files enable the ASNDLCOPYD daemon to restrict which user IDs can transfer files and which directories are accessible by those user IDs. These files must reside in a common directory, and the ASNDLCOPYD daemon must be started using this directory name as an argument.

ASNDLUSERINFO The configuration file called ASNDLUSERINFO contains a list of users which can login to the ASNDLCOPYD daemon, and the login password for each user. The password stored in this file can be encrypted. Here is how an entry in the ASNDLUSERINFO file may appear: db2inst99 22OAoPwDj0g

When the ASNDLCOPY user exit program connects to the ASNDLCOPYD daemon, ASNDLCOPY must supply a user ID and password. ASNDLCOPY gets this information from the ASNDLUSER configuration file. ASNDLCOPYD validates this user ID and password by looking in the ASNDLUSERINFO configuration file. Any attempt to connect to ASNDLCOPYD with a user ID or password which is not listed in the ASNDLUSERINFO file will be rejected.

The ASNDLUSERINFO file is populated and updated by using the ASNDLCOPYD_CMD command.

Here is the syntax for the command: ASNDLCOPYD_CMD [-d] {ADDUSER | PASSWD | RMUSER} [] : A directory where the ASNDLUSERINFO file will be stored. If no directory is supplied, the ASNDLUSERINFO file is created or updated in the current directory (that is the directory from which the command was run). ADDUSER: Adds a user ID to the ASNDLUSERINFOFILE and prompts for a password. must specify the user ID to be added. PASSWD: Is used to change the password of an user ID already in the ASNDLUSERINFO file. must specify the user ID whose password is to be changed. RMUSER: Removes a user from the ASNDLUSERINFO file. must specify the user ID to be removed.

Chapter 9. Data replication 187 .DIR For a user listed in the ASNDLUSERINFO file, there must exist a file named .DIR (for example, db2inst99.DIR). This file contains a list of directories that are accessible by the user in the ASNDLUSERINFO. The db2inst99.DIR file might look like this: /datalinks/data/photos /datalinks/data/mp3

When the ASNDLCOPY program connects to the ASNDLCOPYD daemon as user db2inst99, it has access to files in the /datalinks/data/photos directory (including any subdirectories) and /datalinks/data/mp3.

The .DIR file is created and updated manually.

9.6.9 Starting and stopping the Capture and Apply programs The Capture program usually runs on the source server. Capture can be started by issuing the following command: asnccp

We start the Capture program here with this command: asnccp SAMPLE COLD NOPRUNE

Using the COLD option causes Capture to clean up the CD table before it runs. Capture does not start collecting data from the replication source until the Apply program starts. Capture then does an initial load of the CD table from the replication source table.

The Apply program must be started from the directory containing the applyqual.pwd file. This directory should also contain the ASNDLSRVMAP and ASNDLUSERS configuration files. Apply is started by specifying the apply qualifier and the target database name: asnapply

To start Apply, use this command: asnapply MGRSUB COPYDB

To start the ASNDLCOPYD daemon, execute the ASNDLCOPYD command as root (or a user with superuser privilege on UNIX or administrator authority on Windows NT): ASNDLCOPYD []

188 Data Links: Managing Files Using DB2 To stop the Capture program, use the asncmd command: asncmd STOP

To stop the Apply program, use the asnastop command: asnastop

To stop the ASNDLCOPYD daemon on UNIX, log on as a user with root authority: kill -9

To stop the ASNDLCOPYD daemon on Windows NT, log on as a user with administrator authority, right-click the task bar, select the asndlcopyd process, and click the End Process button.

Chapter 9. Data replication 189 190 Data Links: Managing Files Using DB2 10

Chapter 10. The Reconcile utility

This chapter describes the Reconcile utility. Reconcile is a validation process that takes place between the DB2 Universal Database server that has tables with DATALINK columns and the DLFM server. It validates that the files referenced or linked by the DATALINK columns on the DB2 Universal Database server exist on the DLFM server or that the links can be re-established or that the file are in the proper state.

For example, if you were to insert a row into a table that has a DATALINK data type column, the insert would complete successfully if the file referenced (assuming that the file is not already linked) in the insert statement exists on the DLFM server. The file that is referenced will then be considered linked and “reconciled” since the table with the DATALINK column on the DB2 Universal Database server is in sync with the DLFM server. The purpose of the Reconcile utility is to ensure that the relationship described in the example given is maintained.

The Reconcile utility is initiated from the DB2 Universal Database server, and it involves all the Data Links servers running the DB2 Data Links Manager that are referenced by the DATALINK column values. When the Reconcile utility is initiated on the DB2 Database server, it communicates with the DLFM servers that are referenced by DATALINK column values. If the DLFM server is not available when the Reconcile utility is initiated, the warning message shown in Figure 10-1 is returned on the DB2 database server.

Figure 10-1 Reconcile warning when DLFM server is not available

In this example, we initiated the Reconcile utility on the DB2 Universal Database server with the following command: db2 reconcile resident dlreport recon_report

When the Reconcile utility completes, it generates a report file with an .exp and a .ulk extension. The .ulk file contains a list of files that were unlinked on the file server, and the .exp file contains a list of files that were in exception on the file server. If there were no exceptions or no files where unlinked, the report files will be empty.

In addition, the Reconcile utility also provides an option for specifying an exception table. All exceptions that were encountered when the Reconcile utility was initiated are populated in the exception table. In order for the exception table to be populated, it must have the same structure as the table that is being reconciled. This table can then be used by the Import or Load utilities. This is desirable since it can prevent the manual process of correcting exceptions by using the Reconcile report files (Figure 11-28 shows an example on creating an exception table). For more information on the Reconcile utility, refer to the RECONCILE command in DB2 UDB Command Reference, SC09-2951.

Note: When the Reconcile utility is initiated, it locks the table being reconciled with a “Z” or Super Exclusive lock. The table is locked until the Reconcile utility is complete. The “Z” lock prevents any access to the table. In our example, a snapshot for LOCKS on the table being reconciled showed the table being locked with a “Z” lock (Figure 10-2). This should be considered before you run the Reconcile utility, especially if the table has a large number of rows.

192 Data Links: Managing Files Using DB2 Figure 10-2 Extract of a lock snapshot for a table being reconciled

The Reconcile utility reconciles at a table level, that is, the Reconcile utility needs to be initiated for each table that has columns defined with the DATALINK data type. DB2 Universal Database provides a utility called db2_recon_aid, which provides a mechanism for checking and running reconcile on tables that are potentially inconsistent with the DATALINK file data on the DLFM file server.

The db2_recon_aid with the check option lists the tables that may need reconciliation. No reconciliation is performed. This is useful for determining which tables need to be reconciled in an environment where there are many tables in the database. In our example, we ran the db2_recon_aid with the check option on the database (Figure 10-3).

Note: You cannot specify individual DLFM file servers with the db2_recon_aid utility even though the utility provides the option.

Figure 10-3 Output of the db2_recon_aid utility with the CHECK option

The execution of the db2_recon_aid utility without the check option initiates the Reconcile utility for each of the tables that may require reconciliation. For more information on the db2_recon_aid utility, refer to DB2 Data Links Manager Quick Beginnings, GC09-2966.

Chapter 10. The Reconcile utility 193 10.2 When to run the Reconcile utility

You should run the Reconcile utility against tables that are in a Data Link Reconcile Pending (DRP) state to remove them from this state. To determine if a table is in a DRP state, you can examine the db2diag.log (Figure 10-4).

Figure 10-4 Extract of db2diag.log showing a table in DRP state

Alternatively, you can run the db2dart utility (Figure 10-5) against the database or tablespace.

Figure 10-5 Extract of a DB2DART report showing a table in DRP state

In some situations, the such utilities as Import or Load may detect a problem with the meta data in a DLFM server. In these situations, DB2 Universal Database would fail the utility. The table with the DATALINK column would then be placed in a Data Link_Reconcile_Not_Possible state (DRNP). The Reconcile utility cannot be run against a table in this state. The table with the DRNP state must be placed into a DRP state by using the SET INTEGRITY SQL statement.

Note: The SET INTEGRITY and SET CONSTAINTS SQL statements are equivalent. SET INTEGRITY has replaced SET CONSTRAINTS, which is still available for compatibility with previous versions. For more information on the SET INTEGRITY statement, refer to IDB2 UDB SQL Reference, SC09-2974.

Figure 10-6 shows a summary of when the Reconcile utility should be initiated as listed here: If the tables are in a Data Link Reconcile Pending state (DRP), run the Reconcile utility to remove the table from a DRP state to a normal state. Sometimes DB2 Universal Database automatically places tables into a DRP state if an inconsistency is suspected.

194 Data Links: Managing Files Using DB2 If the tables are in a Data Link Reconcile Not Possible state (DRNP), you can: – Prevent access to a table with possibly inconsistent DATALINK column values by issuing the following two commands. First, place the table into a normal state with the command: SET INTEGRITY FOR tablename DATALINK RECONCILE PENDING IMMEDIATE UNCHECKED Then place the table into a DRP state with the command: SET INTEGRITY FOR tablename TO DATALINK RECONCILE PENDING When a table is in DRP state, you can only issue the SELECT SQL statement against the table. The DRP state on the table prevents INSERT, UPDATE, and DELETE SQL statements against the table. Run the Reconcile utility to remove the table from the DRP state. – Alternatively, you can update the DATALINK values of a table in DRNP state using either of the following methods: • Using the SQL UPDATE statement, set the data location part of a DATALINK column value to a zero-length URL if the column is not nullable, or to NULL if the column is nullable. • Restore the files on the appropriate Data Links servers. Then run an application that issues SELECT statements to read the DATALINK column values and issues UPDATE statements to update the DATALINK column with the same values. After the update operation completes, the files are marked as linked on the appropriate Data Links servers.

Note: Note that the Data Link_Reconcile_Not_Possible state (DRNP) must be on while the DATALINK column values are being updated. You cannot update a table in DRP state.

After the UPDATE SQL statements are completed, you can reset the DRNP state by issuing the following SQL statement to bring the table to a normal state: SET INTEGRITY FOR tablename DATALINK RECONCILE PENDING IMMEDIATE UNCHECKED Then place the table into a DRP state with the following SQL statement: SET INTEGRITY FOR tablename TO DATALINK RECONCILE PENDING Run the Reconcile utility.

Chapter 10. The Reconcile utility 195 SET You can INTEGRITY ... SET Is the table DRNP either PREVENT TO DRP INTEGRITY ... in a DRP PREVENT IMMEDIATE TO DRP or DRNP ACCESS to UNCHECKED state? or UPDATE ACCESS (To prevent the table (To reset DRNP access to the to NORMAL table and place state) in DRP state) DRP UPDATE

Run Reconcile to remove DRP state Update Datalink Run column values Reconcile using SQL to remove DRP state

SET INTEGRITY ... SET TO DRP INTEGRITY ... IMMEDIATE TO DRP Run UNCHECKED Reconcile (To prevent to remove (To reset DRNP access to the DRP state to NORMAL table and place state) into DRP state)

Figure 10-6 Determining when to run the Reconcile utility

10.3 Situations that require the Reconcile utility

The following situations may lead to DB2 Universal Database placing the tables in a DRP or DRNP state and you may need to run the Reconcile utility: The entire database is restored and rolled forward to a point in time. Because the entire database is rolled forward to a committed transaction, no tables will be in the check pending state (due to referential constraints or check constraints). All data in the database is brought to a consistent state. The

196 Data Links: Managing Files Using DB2 DATALINK columns, however, may not be synchronized with the meta data in the DB2 Data Links Manager, and reconciliation is required. In this situation, tables with DATALINK columns data will already be in the Datalink_Reconcile_Pending state. You should issue the Reconcile utility for each of these tables. A particular Data Links server running the DB2 Data Links Manager lost track of its meta data. This can occur for different reasons: – The Data Links server was cold started. – The Data Links server meta data was restored to a back-level state. In some situations, such as SQL UPDATEs and DELETEs, DB2 may be able to detect a problem with the meta data in a Data Links server. In these situations, DB2 fails the SQL statement. You would put the table in the Datalink_Reconcile_Pending state by using the SET CONSTRAINTS statement, then run the Reconcile utility on that table. A file system is not available (for example, because of a disk crash) and is not restored to the current state. In this situation, files may be missing. An error like this is typically discovered by an application when it cannot access the file whose file reference it obtained from the database. You should put the table in the Datalink_Reconcile_Pending state and run the Reconcile utility on it. Some of the files may be restored from the archive server if their corresponding DATALINK columns had RECOVERY=YES. In any case, the Reconcile utility records the exceptions in the exception table or in the exception report. You can then restore those files or issue SQL UPDATEs to fix the column.

10.3.1 Reconcile algorithm This section outlines the high level steps taken by the Reconcile utility to remove a table from DRP state.

The algorithms presented are for tables that have DATALINK columns with the RECOVERY=YES option and for tables that have DATALINK columns with the RECOVERY=NO option. A table can have both types of columns.

Algorithm with RECOVERY=YES The Reconcile utility performs the following process for columns with RECOVERY=YES.

Chapter 10. The Reconcile utility 197 The file table entry does not exist for this file, then: a. Retrieve the file from the archive. • Retrieves only if the proper version of the file is not already in the file system. • If the modification time of the file on file system is greater than the version required, then before retrieving, it renames the existing file with the extension .MOD. (The DB2 side exception report file will have information indicating that this was done, irrespective of whether the exception table was specified). If the .MOD file already exists, the file will not be retrieved; it is an Exception (it will be reported in the exception report and table on DB2 side). • Retrieves the file from archive if the file is not found or if the modification time of the file on file system is less than the version required. b. Do the "relink" processing. The file table entry does exist for this file, then: – If the file is in a "linked" state as per the dfm_file table entry, then: • Check if the file exists in the file system: If YES, then check if the file is modified (or) file size changed. If YES, then call retrieve daemon to retrieve the file from archive (see the conditions under a. in the first bullet above). If file is not modified or file size is fine, then check if inode, fsid, or cellid is changed. If YES, then update the dfm_file table entry with these new values. • If file does not exist in file system, then call the retrieve daemon to retrieve the file from archive. After retrieval of file from archive is successful, check if inode, fsid, or cellid of the retrieved file is different from the file table entries values. If YES, then update the dfm_file table entry with values from the retrieved file. – If the file is in the "unlinked" state as per the dfm_file table entry, then: i. Retrieve file from archive (if needed, and based on the conditions mentioned in a under the first bullet). ii. Bring back the entry to "link" state. If retrieve was successful, then if inode, fsid, or cellid of the retrieved file is different from the dfm_file table entry values, then during the "link" processing, update the dfm_file table entry with values of the retrieved file.

198 Data Links: Managing Files Using DB2 Algorithm with RECOVERY=NO The following process takes place by the Reconcile utility for columns with the RECOVERY=NO option. The file table entry does not exist for this file, then an exception is issued. The file table entry exists for this file, then: – Check if file exists in the file system. If YES, then: If Full Access and If file modified or filesize changed then Exception. Otherwise, if inode, fsid, or cellid of the file on file system is different from the dfm_file table entry values, then update the dfm_file table entry with these new values. If file exists on file system in proper state, then if file is in "unlink" state as per the dfm_file table entry, then bring it back to "link" state. – If the file does not exist in file system, then an exception is issued. – For the exception cases when the file table entry exists, bring the corresponding dfm_file table entries to an unlink state.

Chapter 10. The Reconcile utility 199 200 Data Links: Managing Files Using DB2 11

Chapter 11. Recovery

This chapter describes recovery in a Data Links File Manager (DLFM) environment. Recovery in a DLFM environment may be required on the DLFM server, the DB2 Universal Database server and the file system storing the referenced DATALINK files. The best recovery strategy is one that is tested. It is important to plan ahead for disaster recovery.

This chapter looks at the following topics: Types of recovery Backup and restore considerations The database recovery history file The garbage collection process The Reconcile utility Example recovery scenarios

A database can become unusable because of hardware or software failure (or both), and the different failure situations may require different recovery actions. You should have a strategy in place to protect your database against the possibility of these failure situations. When designing a strategy, you should also rehearse it. This will allow you to detect any shortcomings in the plan and to avoid problems if you have to recover the database.

In general, recovery takes place after a failure, but there are cases where recovery is needed to go back in time to remove changes that were made to the database. An example of this is a user requesting a recovery to a point in time before the latest updates were made to the database. While recovery in a DLFM environment is similar to a standard database server recovery, there are some additional considerations in the DLFM environment.

There are three types of recovery that can take place in a DB2 UDB Database and a DLFM environment: Crash recovery Version or Full Database recovery Restore and Rollforward recovery

11.1.1 Crash recovery The purpose of crash recovery is to bring the database back to a consistent state after a severe error or condition that causes the database manager to abnormally terminate. It is possible that during the crash, for example a power failure, there were database transactions that were not committed or partially completed. The database manager would either commit or rollback transactions on the next CONNECT, ACTIVATE, or RESTART database command.

Crash recovery processes only the active transaction log files by performing a forward recovery followed by a backward or undo recovery before allowing access to the database. At the end of crash recovery, the database is in a consistent and usable state as before the crash occurred.

Note: You can think of active transaction logs as the logs used by DB2 Universal Database during normal transaction processing. The active logs are allocated by DB2 Universal Database when the first database connection is made. The number of active logs allocated is determine by the LOGPRIMARY database configuration parameter.

202 Data Links: Managing Files Using DB2 In a DLFM environment, the two-phase commit protocol is implemented between the DB2 database manager and the DLFM servers to process a COMMIT or ROLLBACK statement issued by an application. Let us first define the two-phase commit, and then discuss how DB2 uses it for transactions with the DLFM.

Two-phase commit The two-phase commit protocol is used in distributed transactions. It makes sure that the outcome of a transaction is consistent across all the resources involved in the transaction. As the name suggests, the protocol operates in two distinct phases to ultimately commit or abort a transaction.

In a Data Links scenario, the DB2 database manager is the coordinating transaction manager, and the DLFMs are the resource managers. The two phases in this case are (see Figure 11-1): Phase one: The DB2 database manager asks each DLFM involved in the transaction if it is ready to commit (1-a in Figure 11-1). If the DLFM is ready to commit the transaction, it puts the transaction in the PREPARED state and responds YES to the DB2 database manager. Only if all the DLFMs have responded YES, the transaction is committed. Phase two: If all the DLFMs respond YES, the DB2 database manager instructs them, in return, to commit the transaction (see 2-a in Figure 11-1). If at least one of the DLFMs responds NO, or no response came from it (due to network or machine failure), the DB2 database manager instructs all the DLFMs to rollback the transaction. Regardless of the instruction (commit or abort), each DLFM (up and running) complies and notifies when it is done (see 2-b in Figure 11-1).

Chapter 11. Recovery 203 Two Phase Commit

1st phase 2nd phase

Transaction commited DB2 DBM

(1-a) (1-b) (2-a) (2-b)

Are you ...... ready to "YES" Commit Notification commit? response

DLFM-1

Are you ready to "YES" Commit Notification commit? response

DLFM-n

Figure 11-1 Two-phase commit

Three different situations may occur: One of the DLFMs prepared the transaction and went down before sending its YES response to the database manager. One of the DLFMs prepared the transaction and sent its YES response to the database manager, but its response could not reach the database manager due to, for instance, a network failure. One of the DLFMs prepared the transaction, and the database manager sends a final commit after receiving the YES response sent by all DLFMs (including this one), and at this stage, the DLFM goes down.

Each of these three cases would result in leaving the transaction at the DLFM side in PREPARED state. Such transactions are known as in-doubt transactions.

204 Data Links: Managing Files Using DB2 Whenever the database manager determines that a failure has potentially created in-doubt transactions on a Data Links server, it marks the state of the Data Links server as needing crash recovery. It disallows any SQL requests involving the Data Links server, while it is in this state. SQL0357N with reason code “03” is returned to the application that made the SQL request.

While a Data Links server configured to a DB2 UDB Database is in a state needing crash recovery, the database manager disallows SQL requests involving that particular Data Links server. SQL requests involving other data in the database are still allowed. The database manager starts a process that asynchronously attempts to complete crash recovery on each Data Links server requiring recovery. When the process successfully completes the crash recovery, the state of the Data Links server is marked as available, allowing further SQL requests that involve it.

11.1.2 Version or full database recovery Version recovery using the BACKUP command in conjunction with the RESTORE command puts the database in a state that was previously saved (Figure 11-2). You use this recovery method with non-recoverable databases (that is, databases for which you do not have archived logs).

You can also use this method with recoverable databases by using the WITHOUT ROLLING FORWARD option. For example, if a full offline database backup is taken and the LOGRETAIN database configuration parameter is set to YES, the database will be placed into ROLLFORWARD PENDING after a restore. A ROLLFORWARD command with the STOP option removes the database from the ROLLFORWARD PENDING state and allows access to the database.

Chapter 11. Recovery 205 Units Units of of CREATE BACKUP work BACKUP work RESTORE BACKUP database database database database database

create create create

BACKUP BACKUP BACKUP database database database image image image

TIME

Figure 11-2 Version or full database recovery

In a DLFM environment, note the following points when considering a version recovery: The DB2 UDB Database server can be a recoverable or a non-recoverable database. A database becomes recoverable when the LOGRETAIN and or the USEREXIT database configuration parameters are turned on. A version recovery requires the use of a full offline databases backup to recover. The DLFM_DB database on the DLFM server is a recoverable database and requires a full offline backup for a version recovery. The WITHOUT ROLLING FORWARD option must be used when restoring the offline backup to prevent the database from going into a ROLLFORWARD PENDING state. In the event that a restore is done without specifying the WITHOUT ROLLING FORWARD clause, a ROLLFORWARD STOP after the restore would make the database accessible.

206 Data Links: Managing Files Using DB2 11.1.3 Restore and rollforward recovery Rollforward recovery builds on a restored database and allows you to restore a database to a particular time that is after the time that the database backup was taken (Figure 11-3). This point can be either the end of the logs, or a point between the time of the database backup and the end of the logs. The LOGRETAIN configuration parameter in the database configuration file must be set to YES to invoke log retention.

Units Units of of ROLLFORWARD CREATE BACKUP work BACKUP work RESTORE changes in logs database database database database

update update

n archived logs n archived logs 1 active log 1 active log

TIME

Figure 11-3 Rollforward recovery

The LOGRETAIN parameter indicates the use of log retention. The USEREXIT parameter indicates that a user exit program is used to archive and retrieve the log files. Log files are archived when the database manager closes the log file. They are retrieved when the ROLLFORWARD utility needs to use them to restore a database. If a user exit program is not used, then all the logs are kept in the current log path. This can lead to a file system full condition. We highly recommend you use a user exit program to archive log files to disk or Tivoli Storage Manager.

Important: We advise you not to copy log files manually from the log directory to free space on a file system. Some log files may be active and manually copying these files may corrupt them. Active log files are also required for crash recovery to complete.

Chapter 11. Recovery 207 After LOGRETAIN, or USEREXIT, or both of these parameters are enabled, you must make a full backup of the database. This state is indicated by the backup_pending flag parameter.

Log retention provides the following additional features over version recovery: The ability to take online database and tablespace backups Point-in-time recovery of databases and tablespaces The ability to restore and rollforward to the end of the logs

Note: Active logs are required for crash recovery. Archived logs can be used for database or tablespace recovery. You might use point in time recovery if an active or an archived log is not available.

In this situation, you could roll forward to the point where the log is missing. You might also roll forward to a point in time if a bad transaction was run against the database. In this situation, you would restore the database and then roll forward to just before the time that the bad transaction was run.

11.2 DLFM backup considerations

Using files externally introduces a new level of complexity. Because there are many parts involved, you must take care of all involved components. You do not have only one file as a backup image, but several database backup images and backups of the involved file systems.

In a Data Links Manager installed environment, there are at least two databases. One is the database on DB2 database server, and the other is the database that is used by Data Links Manager to store the meta data (DLFM_DB database). These databases are kept in sync; every time an INSERT/UPDATE/DELETE on a DATALINK column is run, the DLFM_DB is updated too.

No linked file can be renamed, updated, or deleted (setting WRITE PERMISSION BLOCKED in DATALINK column) without control of the DB2 database server. When files are linked, that is, when an insert or update is made on the DB2 database server on a DATALINK column, the files (if marked with RECOVERY=YES) are asynchronously copied into the backup directory or into the hierarchical storage. The working file that remains in the original directory is now protected through DB2. All involved components (Database on DB2 server, Database on Data Links server, and files and backups) are in sync.

208 Data Links: Managing Files Using DB2 When the Backup utility runs, DB2 ensures that all files scheduled for copying are copied. At the beginning of the backup process, DB2 also ensures that all Data Links Managers that are specified in the DB2 are running. If a Data Links Manager has one or more linked files, it must be available until the backup operation completes. If a DB2 Data Links Manager becomes unavailable before the backup operation completes, the backup operation is declared as incomplete.

The description that follows only applies to files that are linked by DATALINK columns that have the RECOVERY parameter set to YES. (Files that are referenced by DATALINK columns for which RECOVERY=NO is specified are not backed up or copied to the archive server.)

Figure 11-4 illustrates the actions that are performed during an INSERT in a Data Links environment.

SQL Insert Statement

Copy db2agent dlfm_child Daemon Archive Insert Asynchronous Server Archive Request

DB2 Server DLFM Server

Figure 11-4 Asynchronous archive request

When an INSERT SQL statement is issued against a table with a DATALINK data type on the DB2 server, a DB2 agent attempts to process the request by communicating this request with the DLFM server (Figure 11-4).

On the DLFM server the a dlfm_child process receives the request, links the file and makes a call to the COPY DAEMON (dlfm_copyd) process to make an asychronous backup the file being referenced in the insert statement to the archive server. The archive server could be Tivoli Storage Manager, disk, or any XBSA type storage system.

Figure 11-5 illustrates the processing that takes place when the Backup utility is executed on the DB2 UDB database server.

Chapter 11. Recovery 209 DB2 backup dlfm_child BackupVerify Ensure file backup complete

DB2 Server DLFM Server

Figure 11-5 Processing that takes place during a backup

When the database Backup utility runs on the DB2 server, DB2 ensures that all files scheduled for copying are copied. The database backup operation initiates an internal retry logic. After the retry logic iterations are completed, the backup fails if the data linked FILE backup is not complete.

When files are linked, the Data Links servers schedule them to be copied asynchronously to an archive server, such as ADSM, or to disk. When the Backup utility runs, DB2 ensures that all files scheduled for copying are copied. At the beginning of the Backup process, DB2 contacts all Data Links servers that are specified in the DB2 configuration file. If a Data Links server has one or more linked files and is not running, or stops running during the backup operation, the backup will not contain complete DATALINK information. The backup operation will complete successfully.

Before the Data Links server can be marked as available to the database again, the backup process for all outstanding backups must complete successfully. If a backup is initiated when there are already twice the value of num_db_backups outstanding backups waiting to be completed on the Data Links server, the backup operation will fail. That Data Links server must be restarted and the outstanding backups completed before additional backups are allowed.

Note: A successful backup operation can cause the Data Links servers to clean up (Garbage Collect) the archived versions of files on the archive server (either disk or TSM). The num_db_backups database configuration parameter specifies the number of DB2 database backups before archived versions of the files (that were unlinked) are removed.

11.2.1 Environment backup considerations Figure 11-6 outlines the components that are involved when backing up the entire environment.

210 Data Links: Managing Files Using DB2 1. Make sure that all Data Links servers are up and running (unless you have specified the NO LINK CONTROL option in the DATALINK definition). 2. Back up the databases on the DB2 database servers. 3. Back up the DLFM_DB databases on the Data Links servers. 4. Back up the file systems used by the Data Links Manager. File systems need to be unmounted, backed up (via the operating system), and then mounted again. 5. Back up the DLFM Backup directory that holds images of the DLFM_DB database and copies of the linked files and all updates if the RECOVERY OPTION is set to YES in the DATALINK column to provide point-in-time rollforward recovery.

DB2 Database Server Data Links Server

DB: Production 2

nr desc image DLFM_DB

oo1 http://fileserv1/images/ p10.bmp

oo2 http://fileserv2/im2/ DL File p123.bmp Systems 3 4 1 DL Backup Dir

Figure 11-6 Environment backup considerations

11.3 DLFM restore considerations

The information that follows applies if you have a DATALINK column (or columns) that is defined with RECOVERY=YES option for a table. If a table has a DATALINK column defined with the RECOVERY=NO option, the table is put in the Datalink_Reconcile_Pending (DRP) state at the end of the restore operation.

Figure 11-7 shows the processing that takes place when the restore utility is executed on the DB2 UDB database server.

Chapter 11. Recovery 211 DB2 Restore Retrieve dlfm_child Reconcile w.r.t. Daemon Archive DB Retrieve correct Server File file version System

DB2 Server DLFM Server

Figure 11-7 Processing that takes place during a restore

Note: The DLFM process model may change in future releases of DB2.

When the restore utility is invoked on the DB2 server, the fast reconcile routine is invoked if the WITHOUT DATALINK is not specified and if there is no break in the log sequence (LS) or log chain. The dlfm_child process may have to call the RETRIEVE DAEMON (dlfm_retrieved) to retrieve the file from the archive server if the file to be LINKED is not available on the file system (Figure 11-7). Fast reconcile logic performs the following process: All files that were linked after the backup image that was used for the database restore are marked as unlinked (because they are not recorded in the backup image as being linked). All files that were unlinked after the backup image, but were linked before the backup image was taken, are marked as linked (because they are recorded in the backup image as being linked). If the file was subsequently linked to another table in another database, the restored table is put into the Datalink_Reconcile_Pending state (Figure 11-8).

Restore Tables with Database. DATALINK Run WITHOUT columns placed Reconcile DATALINK in Reconcile Pending (DRP)

Figure 11-8 Restore with the WITHOUT DATALINK option

212 Data Links: Managing Files Using DB2 If you use the restore utility with the WITHOUT DATALINK option, all tables with DATALINK columns are placed in the Datalink_Reconcile_Pending state, and no reconciliation is performed with the Data Links servers during the Restore operation. This option can be used when the DLFM server is unavailable (Figure 11-9).

Ta ble s with (1) Restore DATALINK yes Database columns placed WITHOUT in Reconcile DATALINK? Pending (DRP)

Run Tables with Reconcile Is DLFM no DATALINK Run available ? columns placed Reconcile in Reconcile Pending (DRP)

yes

Were files Were files yes Were files no unlinked yes linked Files are linked after after before unlinked backup backup backup taken? taken? taken?

yes

Files are linked

Figure 11-9 Restore without specifying the WITHOUT DATALINK option

Chapter 11. Recovery 213 11.4 Recovery history file

Every DB2 UDB Database and the Data Links File Manager Database (dlfm_db) has a history file that records historical administrative operations. A recovery history file is created with each database and is automatically updated. During a database migration, the history file is migrated as well.

The history file can be accessed by issuing the following command: db2 list history all for

The database history file is invaluable in a recovery scenario.

The history file is individually restorable from any backup image. If the current database is unusable or not available, and the associated recovery history file is damaged or deleted, an option on the RESTORE command allows only the recovery history file to be restored. The recovery history file can then be reviewed to provide information on which backup to use to restore the database.

For example, we restore the history file for our sample database with: db2 restore database sample history file

Note: The size of the file is controlled by the REC_HIS_RETENTN database configuration parameter that specifies a retention period (in days) for the entries in the file. Even if the number for this parameter is set to 0, the most recent full database backup, plus its restore set, is kept. (The only way to remove this copy is to use the PRUNE with FORCE option.) The retention period has a default of 366 days. The period can be set to an indefinite number of days by using -1. In this case, explicit pruning of the file is required.

11.4.1 Events recorded in the history file The following events are recorded in the history file: Backup Restore Rollforward Load Quiesce of a tablespace Alter tablespace Dropped table (when dropped table recovery enabled) Reorganization of a table Update of table statistics

214 Data Links: Managing Files Using DB2 11.4.2 Data recorded in the history file The following data is recorded in the history file: Object affected (database, tablespace, or table) Location and device type of output (backup image or load copy) The status of the backup: active, inactive, expired, or deleted Range of relevant log files Start and completion time of event Resulting SQLCA

Note: When an ONLINE backup is taken, the history file shows the EARLIEST LOG and the CURRENT LOG. These are MINIMUM range of logs required for the Restore to complete. You have to rollforward through these logs to move the database out of a ROLLFORWARD PENDING state. Therefore, it is important to ensure that these logs are archived in a safe place and can be retrieved in an event of a recovery.

11.5 Restoring an offline backup without rollforward

A database can be restored without rolling forward by using a backup that was created with the offline option (this is the default). We show the restore and what happens to the files that were linked after the backup was taken. The steps are: 1. Select data in the Data Links table. 2. List files on the DLFM server. 3. Insert a new row into Data Link table. 4. Select data in the Data Link table. 5. List files on the DLFM server. 6. Restore to the backup taken before a new row is inserted. 7. Display a report of unlinked files from a fast reconcile.

We use a SELECT statement with a Data Link function to show the data in the table before inserting new data. The data shown in Figure 11-10 is the data that is on our backup image, for example: db2 'select dlurlpathonly(picture) from db2inst1.resident'

Chapter 11. Recovery 215 Figure 11-10 Selecting results prior to insert and restore

Figure 11-11 shows the files in the file system that is under the control of DLFF. Note the permissions before the insertion of a new row.

Figure 11-11 The ls results of the Data Link file system prior to insert

We insert a new row as shown in Figure 11-12 and display the contents of the table after the insert.

216 Data Links: Managing Files Using DB2 Figure 11-12 Inserting and selecting after a new link

In Figure 11-13, the file pic6.bmp is now linked and under the control of DLFM. The only way to access this file is as a root user or by using an access token generated by the DB2 database. Refer to 2.3.3, “How access tokens work” on page 32, for more information on access tokens. It seems like the dlfm user would be able to access the files because the permissions show read for dlfm. In fact, the DLFF will block access to the files from the dlfm user even though the files, when listed, show read permission.

Chapter 11. Recovery 217 Figure 11-13 List files after the link operation has completed

The next step is to restore the database to the backup. The file that was linked after the backup is unlinked and returned to its original state. If the DATALINK table used the parameter on UNLINK DELETE, the file would have been deleted. In this example, we did not have to run the Reconcile utility because fast reconcile was run. Figure 11-14 shows the Restore and the message that says pic6.bmp was unlinked.

Figure 11-14 Restore command and files that were unlinked

The following steps summarize the Restore to an offline backup: 1. An offline backup of the DB2 UDB database that has a table with a DATALINK column is taken (T1). See Figure 11-15. 2. A row is inserted into the table that has a DATALINK column (T2). 3. The backup image taken at time T1 is restored without rolling forward (T3).

218 Data Links: Managing Files Using DB2 4. fast reconcile is run, and it unlinks the row inserted at T2 on the DLFM server (T4). This occurs because the row inserted at T2 is not in the backup image taken at T1.

DB2 Server DLFM Server

Row inserted into table with Fast DB2 UDB Datalink column DB2 UDB Reconcile DLFM Database Database Database

Restore Inserted Row Offiline Backup taken from T2 Backup at T1 Unlinked

T1 T2 T3 T4

Time Figure 11-15 Restore of an offline backup

11.6 Restoring and rolling forward to a point in time

The steps for a restore and rollforward to a point in time with Data Links are: 1. Find the most recent backup taken prior to the point in time to which we will restore. 2. Restore the database. 3. Rollforward to the backup time to obtain the minimum rollforward time. 4. Rollforward to minimum CUT time plus 5 minutes.

Note: The point in time recovery must be to a Coordinated Universal Time (CUT).

5. Reconcile

Note: Make sure that log-retain has been set to ON and RECOVERY is set to YES for the corresponding DATALINK column.

Chapter 11. Recovery 219 We use the db2 list history backup all for db dlrestor command to produce the results shown in Figure 11-16. The point in time we restore to is a time greater than the end time of the backup. The time we use for the rollforward is 2001-05-15-11.54.07.

Figure 11-16 List history to find backup and point in time

We restore the database using information from the list history command in Figure 11-16. This restore is to a backup and the rollforward is what will make it a point in time recovery. We leave the WITHOUT ROLLING FORWARD clause off the restore command. The type of backup we use for the restore is shown in Figure 11-16. It shows “F” under Type, which means offline. If the Type is “N”, this shows that the backup is an online backup. Restoring to an online backup always requires a rollforward command. In our example, we rollforward to a point in time just slightly after the backup was taken. A fast reconcile will not take place because it will not run when the database is in the rollforward pending state. We must run the rollforward command. Figure 11-17 shows the restore and its output.

220 Data Links: Managing Files Using DB2 Figure 11-17 Restore with rolling forward and rollforward pending status

Figure 11-18 shows a method to derive the CUT time to be used in the rollforward command. We actually use the rollforward command with the backup timestamp to come up with the minimum CUT time for the rollforward command. This may be easier than trying to figure out how many hours to add or subtract from the time zone you are in. In this case, CUT time is 8 hours greater than the backup time.

Figure 11-18 Rollforward to obtain minimum CUT time

Using the CUT time obtained in Figure 11-18, we add 5 minutes to the time for the rollforward. The user has requested the restore to be “right before noon” on 15 May 2001. The time we use is 2001-05-15-19.59.07. The rollforward in Figure 11-19 places the table in Datalink Reconcile Pending (DRP) status.

Chapter 11. Recovery 221 Figure 11-19 Rollforward and log messages

Figure 11-20 shows the message received when a select is issued to retrieve data from the table that is in DRP.

Figure 11-20 Select statement with warning message

Now we run the reconcile command. The reconcile command removes the table from DRP status and makes the table fully accessible. Figure 11-21 shows the reconcile command and its output.

222 Data Links: Managing Files Using DB2 Figure 11-21 Reconcile command and log messages

To restore and rollforward to a point in time, the following steps were taken: 1. Restore a database backup that was taken earlier then T1 (Figure 11-22). 2. Rollforward the database to a point-in-time (T2). A scenario like this is useful if you want to recovery to a point before a damaged log or before unwanted data was inserted into the database. 3. When the rollforward is complete, the table is placed into Data Links Reconcile Pending (DRP) state (T3). This occurs since there could be files that are linked on the DLFM server that needs to be unlinked or vice versa. 4. Run the reconcile utility to synchronize the DB2 Database with the DLFM database (T4).

Chapter 11. Recovery 223 DB2 Server DLFM Server

Rollforward to a Point-in-time DB2 UDB Run DB2 UDB Database table in Reconcile DLFM Database Database DRP state

Restore earlier Backup

T1 T2 T3 T4

Time

Figure 11-22 Restore and rollforward to a point-in-time

11.7 Tablespace recovery

In the next example, we simulate losing the backup or archive Data Link files and losing one of the Data Link files from the Data Link file system. This is probably not a likely situation, but we wanted to illustrate what the reconcile command does in this case. The steps we illustrate are: 1. Delete the files in dlfm_backup. 2. Delete one linked file from /dldata2/sys_pics. 3. Restore tablespace userspace1. 4. Rollforward the tablespace online. 5. Use db2dart to show the Data Link reconcile pending status. 6. Run reconcile. 7. Display the report.exp file.

In Figure 11-23 as the root, we delete all of the files in the dlfm_backup directory. We also delete the file pic2.bmp, which is linked to the database. Without a backup of pic2.bmp in the dlfm_backup, this data is not recoverable by DB2. We must take manual steps to recover.

224 Data Links: Managing Files Using DB2 Figure 11-23 Removing dlfm_backup files and removing a Data Linked file

We restore the tablespace that contains the table with Data Links. Notice that when we list the /tmp/dlreport file, there were no files unlinked during restore. This type of restore does not perform fast reconciliation like the restore to an offline backup without rolling forward example. After the restore, we run the rollforward and see the message about the table being in the DRP/DRNP state. Figure 11-24 shows the restore and rollforward.

Figure 11-24 Tablespace restore and rollforward

Chapter 11. Recovery 225 An alternative to checking the db2diag.log for tables in DRP/DRNP status is to run the db2dart utility. We issue db2dart dlrestor after making sure there are no connections to the database. Figure 11-25 shows that table db2inst1.resident is in Datalink Reconcile Pending status.

Figure 11-25 Using db2dart to see the table status of DRP

Before we run the reconcile command, we select rows from the table and receive a warning message. We cannot use insert, update, or delete on the table at this point. To make the table usable, we must run reconcile. Figure 11-26 shows a SELECT statement and the DRP status.

Note: The tablespace is placed in the backup pending state after a rollforward to a point in time. We are allowed to run a database backup while the table is in the DRP state.

226 Data Links: Managing Files Using DB2 Figure 11-26 Selecting the data before reconcile is run

The Reconcile utility finds that there are two rows that are exceptions. The row that links pic2.bmp is an exception because the file was deleted from the file system and backup directory. Reconcile sets the DATALINK value to null. To recover this row, we must first restore the file /dldata2/sys_pics/pic2.bmp from the daily backup of this file system. Then we must do an update on the DATALINK column that will link the column again. The file pic1.bmp is also an exception. Even though the file exists, the permissions and time do not match with the DLFM meta data, and therefore, the DATALINK value is set to null.

Figure 11-27 shows the reconcile command and the output it produced.

Chapter 11. Recovery 227 Figure 11-27 Reconcile and the exceptions

Note: We suggest when using reconcile, you also use the exception table. The exception table makes it easier to rebuild the data than the reconcile report. Figure 11-28 illustrates a CREATE TABLE statement for the exception table.

Figure 11-28 The ddl to create the exception table for reconcile

The information in Figure 11-29 shows a SELECT statement that displays information from the exception table. The documentation for the msg column can be obtained by referring to DB2 UDB Command Reference, SC09-2951, under the Reconcile command.

228 Data Links: Managing Files Using DB2 Figure 11-29 Information from the exception table for the reconcile

Figure 11-30 shows the rows that have had the value set to null in the DATALINK column.

Chapter 11. Recovery 229 Figure 11-30 Selecting the data after reconcile has run

The steps to use for a tablespace recovery are: 1. A linked file on the Data Links File System (DLFS) is lost (T1) (Figure 11-31). 2. Restore a file system backup (T2) taken prior to T1 on the DLFM server. 3. Restore a tablespace backup to a point-in-time on the DB2 server (T3). 4. Run the Reconcile utility to synchronize the DB2 database with the DLFM database (T5).

230 Data Links: Managing Files Using DB2 DLFM Server DB2 Server

Rollforward Tablespace to a DLFM Database DLFM Database DB2 UDB Point-in-time Run Reconcile Database

Restore Restore A linked file on Filesystem Tablespace the DLFS is lost Backup that Backup taken has the file before T1 T1 T2 T3 T4 T5

Time Figure 11-31 Tablespace recovery scenario

11.8 Recovering the dlfm_db to a point in time

This section shows how to recover the dlfm_db to a point in time. We inadvertently deleted all of the rows from dfm_file table in the dlfm_db. We must restore and rollforward to the point in time just prior to our delete and then reconcile all of the databases that are defined to the dlfm_db. This highlights one of the problems with defining multiple databases to one DLFM. Any files linked after the point in time that we must recover to will be set to null by reconcile. Files that are unlinked after the point in time we restore to will appear as linked by the dlfm_db meta data until we run reconcile.

Figure 11-32 illustrates the restore command. We must issue dlfm stop before the restore command can work. DLFM maintains persistent connections to the database dlfm_db.

Chapter 11. Recovery 231 Figure 11-32 Restore command and dlfm stop

Figure 11-33 shows the rollforward command. After we rollforward, we must start DLFM. For this, we issue the dlfm start command.

Figure 11-33 Rollforward and messages

The next step is to reconcile all of the databases defined to dlfm. To find out what these are, we run the dlfm list registered databases command. Figure 11-34 shows this command and the output.

232 Data Links: Managing Files Using DB2 Figure 11-34 The list registered databases output

Using the list we obtained from Figure 11-34, we run reconcile for each registered database. As the instance owner, we run db2_recon_aid in check mode to find out which tables have DATALINK columns. Figure 11-35 shows db2_recon_aid with the -check option and also without the -check option that runs the reconcile utility.

Note: In this recovery scenario, we run the Reconcile utility to make the meta data in dlfm_db reflect what is in the tables with Data Links. We did not restore the four databases that contained DATALINK columns. Any changes to those databases that were done between the time of the rollforward and the present time will not be reflected in the dlfm_db until reconcile is run.

Figure 11-35 The db2_recon_aid utility and output

The steps taken to recover the dlfm_db database to a point in time are: 1. At time T1 (Figure 11-36), the rows in the DFM_FILE table in the dlfm_db database are deleted. 2. At time T2, we restore a database backup of the dlfm_db. 3. At time T3, we rollforward the dlfm_db database to a point-in-time before the rows in the table were deleted at T1.

Chapter 11. Recovery 233 4. At time T4, we run the recon_aid or reconcile utility on the DB2 UDB database servers that reference the DLFM server that was recovered at time T2.

DLFM Server DB2 Server

Rollforward Database to a Point-in-time DLFM Database DLFM Database Run Reconcile

All rows in the Restore DFM_FILE table DLFM_DB in the DLFM_DB database are deleted backup. T1 T2 T3 T4

Time

Figure 11-36 DLFM_DB database point-in-time recovery

234 Data Links: Managing Files Using DB2 12

Chapter 12. Garbage collection

This chapter describes the garbage collection process in a Data Links File Manager (DLFM) environment.

Garbage collection is a process by which DB2 monitors database backups in the database History File. It marks the backup as being active, inactive, or expired and reclaims expired database backups.

An active backup can be used to restore and rollforward through the current database logs to bring the database to the current state. An active backup is associated with the current log sequence and should be retained.

An inactive backup cannot be restored and rolled forward to reach the current state of the database because it requires a different set of log files. An inactive backup is associated with a previous log sequence or log chain and should be retained.

Note: DB2 Universal Database maintains transaction log chains or Log Sequence (LS). A log chain represents a life of the database defined by a unique set of transaction logs. All of the log records in these logs have been applied to the database.

A new log chain or LSN is created by: A database rollforward to a point in time A database restore without rolling forward

After a new transaction log chain is created, there is a new version of the transaction log files. Since there can be more than one version of a log file, DB2 Universal Database must keep track of which log files belong to which chain. We recommend you make backup copies of log files before a point-in-time recovery or a restore without a rollforward recovery.

All database backups that are no longer needed are marked as “expired”. These backups are considered as no longer needed because there are several database backups as defined by NUM_DB_BACKUPS database configuration that are more recent. For example, if you have NUM_DB_BACKUPS set to four and have taken four backups, the “oldest” backup will be marked as “expired” when you take the fifth backup (Figure 12-1).

236 Data Links: Managing Files Using DB2 Figure 12-1 Expired database backups

All backups that are marked as “expired”, have related linked FILE backups, and the related meta data on the Data Links File Manager Server can be deleted since they are considered as not needed.

Each Data Links server has its own garbage collector. DB2 garbage collection monitors the number of DB2 database backups that are kept.

Note: The DB2 garbage collector does not delete the physical database backups.The garbage collector deletes Data Link file backups and related meta data.

The DB2 garbage collector is invoked after each: Backup Restore Drop database or tablespace Drop table

When a database backup is taken on the DB2 UDB database server that has tables with DATALINK columns, the Garbage Collector daemon (the process that performs the garbage collection) on the DLFM server is invoked.

When a database backup and its related file backups are ready to be deleted (ready to be garbage collected), the DB2 garbage collection routine marks the history file entries for the database backup, all associated table space backups, and all associated load backup copies as “expired”. The routine also notifies all Data Links servers to delete all the associated files unlinked before this backup.

After every full database backup, the database configuration REC_HIS_RETENTN is used to prune (that is, the entry is deleted from the history file) expired entries from the history file. If a backup is pruned that is not expired, all Data Links servers are contacted to garbage collect the corresponding set of file backups.

Chapter 12. Garbage collection 237 The PRUNE HISTORY command prunes only backups that are marked as expired from the history file unless the WITH FORCE OPTION is used. If a backup is pruned that is not expired, all Data Links servers are contacted to garbage collect the corresponding set of file backups. The PRUNE HISTORY command allows pruning of just backups (to include database, table space, load copy, and log). Entries marked as “expired” are pruned.

DB2 garbage collection is also invoked when a database backup is restored (with or without rolling forward). If an active database backup is restored, but it was not the most recent database backup recorded in the history file, any subsequent database backups that belong to the same log sequence are marked as inactive. If an inactive database backup is restored, any inactive database backups that belong to the current log sequence are changed back to active state. DB2 garbage collection then contacts all Data Links Servers to make the same status changes to the corresponding set of file backups.

12.2 Garbage collection scenario

The following diagrams gives an example of how DB2 garbage collection works.

Assume that the current value of DB2_NUM_BACKUP is 4. For example, DB2 must retain up to four database backups associated with the current log chain or log sequence (Figure 12-2).

BK1 BK2 BK3 BK4

LSN1

Expired Active Inactive

Figure 12-2 Four database backups are taken

At time t1, take database backup BK1. Log sequence is LSN1. At time t2, take database backup BK2. Log sequence is LSN1. At time t3, take database backup BK3. Log sequence is LSN1. At time t4, take database backup BK4. Log sequence is LSN1. There are now four active database backups for log sequence LSN1.

A new log sequence or log chain is created when an active database backup is restored (Figure 12-3).

238 Data Links: Managing Files Using DB2 BK1 BK2 BK3 BK4

LSN1

Restore & Roll forward LSN2

Expired Active Inactive

Figure 12-3 Active database backup being restored

At time t5, restore active database backup BK2 and roll forward to a point before database backup BK3. This breaks the current log sequence LSN1 and starts log sequence LSN2. There are two active database backups associated with log sequence LSN2: BK1 and BK2. DB2 garbage collection marks database backups BK3 and BK4 as inactive (because they are in a previous log sequence).

All backups taken after a restore have the new log sequence (Figure 12-4).

BK1 BK2 BK3 BK4 LSN1 BK5 BK6 Restore & Roll forward LSN2 Expired Active Inactive

Figure 12-4 Database backups taken with a new log sequence number

At time t6, take database backup BK5. Log sequence is LSN2. At time t7, take database backup BK6. Log sequence is LSN2. There are now four active database backups for log sequence LSN2.

Chapter 12. Garbage collection 239 The DB2 garbage collector marks a backup as expired when the backup is older then the oldest active backup (Figure 12-5).

BK1 BK2 BK3 BK4

LSN1 BK5 BK6 BK7 Restore & Roll forward LSN2 Expired Active Inactive

Figure 12-5 Backup (BK1) is marked as expired

At time t8, take database backup BK7. Log sequence is LSN2. DB2 garbage collection marks database backup BK1 as expired (because it is older than the oldest active backup).

A new log sequence is created when an earlier backup is restored (Figure 12-6).

BK1 BK2 BK3 BK4

LSN1 BK5 BK6 BK7

LSN2

Expired Restore & Active Roll forward Inactive LSN3

Figure 12-6 New log sequence created after restore of backup (BK6)

At time t9, restore active database backup BK6 and roll forward to a point before database backup BK7. This breaks the current log sequence LSN2 and starts log sequence LSN3. There are three active database backups associated with log sequence LSN3: BK2, BK5, and BK6.

240 Data Links: Managing Files Using DB2 DB2 garbage collection marks database backup DB7 as inactive (because it is not in the active chain and in a previous log sequence).

Backup (BK2) is marked as expired by the DB2 garbage collector when two additional backups are taken (Figure 12-7).

BK1 BK2 BK3 BK4

LSN1 BK5 BK6 BK7

LSN2 BK8 BK9 Expired Restore & Active Roll forward Inactive LSN3

Figure 12-7 Garbage collection marks backup BK2 as expired

At time t10, take database backup BK8. Log sequence is LSN3. At time t11, take database backup BK9. Log sequence is LSN3. DB2 garbage collection marks database backup BK2 as expired. There are now four active database backups for log sequence LSN3: BK5, BK6, BK8, and BK9.

When a database backup falls out, the log sequence number or active chain and becomes expired, all other backups before this database backup must then become expired too (Figure 12-8).

BK1 BK2 BK3 BK4

LSN1 BK5 BK6 BK7

LSN2 Expired BK8 BK9 BK10 Active Restore & Roll forward Inactive LSN3

Figure 12-8 All backups prior to and including BK5 are marked as expired

Chapter 12. Garbage collection 241 At time t12, take database backup BK10 on log sequence LSN3. DB2 garbage collection marks the following database backups as expired: BK5, BK3, and BK4 (because BK5 falls out the log sequence number or active chain and becomes expired. All other backups before DB5 must then become expired too).

Inactive database backups may become active because the backups are retained (Figure 12-9).

BK1 BK2 BK3 BK4

LSN1 BK5 BK6 BK7

LSN2 BK8 BK9

LSN3 Expired Restore & Active Roll forward Inactive LSN4

Figure 12-9 Inactive databases may become active because they are retained

In Figure 12-9, the database backup BK10 was taken. But let’s consider a different scenario here.

At time t12, we are not taking a backup, but we restore inactive database backup BK4 and rollforward to a point in time past the end of database backup BK5. DB2 garbage collection will mark database backups BK3 and BK4 as active and database backups BK5, BK6, BK7, BK8, and BK9 as inactive. There will be two active database backups in the new log sequence LSN4: BK3 and BK4. Notice that only inactive backups may become active because they are retained.

242 Data Links: Managing Files Using DB2 13

Chapter 13. Migrating to DB2 UDB Version 7

This chapter describes the process of migrating existing DB2 Universal Database Version 5.x and DB2 Universal Database Version 6.x databases to DB2 Universal Database V7.x in a Data Links Manager environment. Moving from Version 7.1 to Version 7.2 on the database server or the Data Links Manager server is not considered a migration, but rather an upgrade.

Before you attempt a migration, consider the following points: Data Links File Manager can be migrated to the current release on the AIX and Windows NT platforms. The Solaris version of Data Links Manager has only been made generally available with Version 7.1. The Windows version of Data Links Manager has only been made generally available with Version 6.1. The DB2 database server must have exactly the same fixpack level as the Data Links File Manager (DLFM) components.

There are two methods in which a migration can be performed: Migration of the UDB database server and Data Links server using the instance migration scripts (db2imigr) and the migrate database command Migration of the databases by using an offline DB2 database backup

13.1.1 DB2IMIGR and MIGRATE database commands Migration to Version 7.x in a Data Links File Manager environment requires you to perform the following steps for each DB2 Universal Database server and Data Links Manager instance: Database instance and database migration on the DB2 UDB database server Data Links instance and database migration on the Data Links server

The steps in the following sections are required to perform a successful migration of the DB2 UDB database server and the Data Links File Manager (DLFM) server. In our example, we migrate a Version 5.x DB2 UDB database server and DLFM server to Version 7.2.

Migrating the DB2 UDB V5.x Database Server (AIX) Perform the following steps to migrate a DB2 UDB Version 5.x database server to Version 7.2: 1. Log on to the DB2 UDB database server as the db2 instance owner. 2. To ensure that you are attached to the instance that contains the database that has tables with data link columns, issue the command: db2 get instance In our example, the db2 get instance command returns db2inst1.

Note: You can issue the db2ilist command to list the names of all the instances on the DB2 UDB database server. In our example, there are two instances, namely db2inst1 and target. We do not have to migrate the target instance at this time.

3. To ensure that there are no applications connected to the database that you want to migrate, issue the command: db2 list applications If all applications are disconnected from the database, the following warning should be returned:

244 Data Links: Managing Files Using DB2 SQL1611W No data was returned by the Database System Monitor SQLSTATE=00000 4. Issue the db2stop command and run the db2dart utility against the database to be migrated in inspection mode. On our system, we invoked the utility as follows: db2dart sample /db Here sample is the name of the database and /db is an argument to inspect the entire database. The db2dart utility can be found in /instancehome/sqllib/adm. The db2dart utility generates a report file and an error file. The report file is generated in the path in which the db2dart utility is executed and has the dbname.rpt naming convention.

Note: The db2dart utility verifies that the architectural integrity of the database is correct. For example, this tool confirms that: The control information is correct. There are no discrepancies in the format of the data. The data pages are the correct size and contain the correct column types. Indexes are valid.

It is important that you run the db2dart utility against the database while there are no connections to the database.

In our example, our report file is called SAMPLE.RPT. Once the database inspection is complete, open the report file with an editor and examine the contents. The bottom of the report should contain the entry shown in Figure 13-1 if no problems were found in the database.

Figure 13-1 DB2DART utility output reporting no errors

5. Start the database manager instance with the db2start command. 6. Take an offline backup of the database that will be migrated. In our example, we take offline database backups to disk.

Chapter 13. Migrating to DB2 UDB Version 7 245 db2 backup database sample to /dbbackups The migration process does not migrate database transaction logs. 7. Install the DB2 Universal Database EE Version 7.x software. In our example, we installed DB2 Universal Database Version 7.2. For details on installing on the AIX platform, refer to IBM DB2 UDB for UNIX Quick Beginnings, GC09-2970. 8. Log on as a DB2 UDB Version 5.x Database instance owner and ensure that the database manager is stopped. Issue the db2stop command if necessary. Change to the /usr/lpp/db2_07_01/bin directory. 9. Execute the db2ckmig utility to verify that the database can be migrated. Do not execute the utility as root. Usage notes for the utility can be found by running db2ckmig with no arguments. We run db2ckmig on the sample database (Figure 13-2).

Figure 13-2 Verifying that the database can be migrated with the db2ckmig utility

The migrate.log file in our example is empty since the db2ckmig utility completed without any errors. 10.Log on as a user with root authority. 11.Execute the db2imigr utility to migrate the Version 5.x instance to a Version 7.x instance. The db2imigr utility can be found in the /usr/lpp/db2_07_01/instance directory. Usage notes for the utility can be determined by running the db2imigr utility without any arguments. We will run db2imigr on the db2inst1 instance (Figure 13-3). db2imigr -u db2fenc1 db2inst1

Figure 13-3 Instance migration using the db2imigr utility

12.Log on as the instance owner on the DB2 UDB database server. Any attempt to connect to the database return the SQL5035N error message (Figure 13-4).

246 Data Links: Managing Files Using DB2 Figure 13-4 Connecting to a database that requires migration

13.Migrate the database using the migrate database command. We run the migrate database on our sample database. db2 migrate database sample A successful migration result in the message shown in Figure 13-5.

Figure 13-5 Successful migration of the database using the migrate command

14.Update the database manager configuration parameter to enable Data Links functionality: db2 update database manager configuration using datalinks yes

Note: In Version 5.x, the DATALINKS configuration parameter was exported as an environment variable. In Version 6.x and later, the DATALINKS configuration parameter is incorporated in the database manager configuration file.

Migrating the V5.x Data Links File Manager (AIX) Complete the following steps to migrate the V5.x Data Links File Manager: 1. Log on on the Data Links File Manager Server as a user with root authority. Install the DB2 Data Links File Manager Version 7.x software. In our example, we install DB2 Universal Database Version 7.2. For details on installing on the AIX platform, refer to DB2 Data Links Manager Quick Beginnings, GC09-2966. 2. Log on as the Data Links Administrator on the Version 5.x Data Links File Manager instance. 3. To ensure that there are no applications connected to the Data Links File Manager database, issue the command: db2 list applications If all applications are disconnected from the database, the following warning should be returned: SQL1611W No data was returned by the Database System Monitor SQLSTATE=00000

Chapter 13. Migrating to DB2 UDB Version 7 247 4. Issue the db2stop command and run the db2dart utility against the database to be migrated in inspection mode. On our system, we invoked the utility as: db2dart dlfm_db /db Here dlfm_db is the name of the database, and /db is an argument to inspect the entire database. The db2dart utility can be found in /instancehome/sqllib/adm. The db2dart utility generates a report file and an error file. The report file is generated in the path in which the db2dart utility was executed and has the dbname.rpt naming convention. Ensure that there are no errors by examining the db2dart report. 5. Take an offline backup of the Data Links File Manager Database (dlfm_db). In our example, we take an offline backup to disk. db2 backup db dlfm_db to /backups 6. Stop the Data Links File Manager with the dlfm_shutdown command. 7. Change to the /usr/lpp/db2_07_01/bin directory. Execute the db2ckmig utility to verify that the database can be migrated. Do not run the utility as root. Usage notes for the utility can be found by running db2ckmig with no arguments. We run db2ckmig on the dlfm_db database (Figure 13-6).

Figure 13-6 Verifying that the database can be migrated with the db2ckmig utility

8. Issue a dlfm_see command to ensure that the Data Links File Manager is stopped. 9. Log on as a user with root authority. Unmount the dlfs file system that is under the control on the Data Links File System Filter. In our example, we issue the umount /v5data command. 10.As a user with root authority, execute the db2imigr utility to migrate the Data Links File Manager Instance to Version 7.2 (Figure 13-7). The db2imigr utility can be found in the /usr/lpp/db2_07_01/instance directory.

Figure 13-7 Instance migration using the db2imigr utility

248 Data Links: Managing Files Using DB2 11.Log on as the Data Links File Manager Administrator and migrate the dlfm_db: db2 migrate database dlfm_db 12.Execute the db2dlmmg to start the DLFM migration from the /usr/lpp/db2_07_01/adm/ directory (Figure 13-8). The db2dlmmg utility: – Binds the migration package. – Backs up the DLFM_DB database.

Figure 13-8 Successful migration of the DLFM instance

13.Issue the db2set command to determine if the environment variables from Version 5.x were converted to registry variables in Version 7.2. In our example, we received the output shown in Figure 13-9.

Figure 13-9 Output of the db2set command

14.In our example, we set two additional registry variables that were not set by the migration utility: db2set DLFM_BACKUP_TARGET=LOCAL db2set FS_ENVIRONMENT=NATIVE You may need to set these registry variables as well. 15.Log on as a user with root authority, and ensure that the Data Links File System Filter is loaded. In our example, we queried the driver by executing the strload command: strload -q -f /usr/lpp/db2_07_01/cfg/dlfs_cfg Use strload -u to load the Data Links File System Filter if it is not loaded. 16.As root, mount the file system that is to be under the control of the Data Links File System Filter. In our example, we issued: mount -v dlfs /v5data

Chapter 13. Migrating to DB2 UDB Version 7 249 17.Log on as the Data Links File Manager Administrator and issue the command: dlfm start 18.Verify that the Data Links File Manager is running with the dlfm see command. The migration of the Data Links File Manager is now complete.

13.1.2 Migrating the DB2 UDB V6.x database server In our example, we migrate the DB2 UDB database server and the Data Links File Manager from Version 6.1 to Version 7.2. On the Windows platform, Data Links File Manager became generally available (GA) in Version 6.1.

Note: Version levels of DB2 Data Links and DB2 Universal Database can be any combination of Version 6.1 and Version 7.x. For example, DB2 Universal Database can be at Version 6.1 and Data Links Manager can be at Version 7.2.

While this configuration is supported, we recommend you have both the Data Links File Manager and the DB2 Universal Database at the same release level and fixpack level. The advantages of being on the same release and fixpack level are: Products at the same release provide the same functionality. Fixpacks are release dependent, and having the DB2 Universal Database and Data Links Manager at the same release makes upgrades easier. An earlier release may discontinue support. Troubleshooting problems becomes easier if the DB2 Universal Database and the Data Links Manager are at the same release and fixpack level.

Complete these steps: 1. Log on to the DB2 UDB database server as the db2 instance owner. 2. To ensure that you are attached to the instance that contains the database that has tables with data link columns, issue the command: db2 get instance In our example the db2 get instance command returns db2inst2.

Note: You can issue the db2ilist command to list the names of all the instances on the DB2 UDB database server. In our example, there are two instances, namely db2inst2 and target. We do not have to migrate the target instance at this time.

250 Data Links: Managing Files Using DB2 3. To ensure that there are no applications connected to the database that we want to migrate, issue the following command: db2 list applications If all applications are disconnected from the database, the following warning should be returned: SQL1611W No data was returned by the Database System Monitor SQLSTATE=00000 4. Issue the db2stop command and run the db2dart utility against the database to be migrated in inspection mode. On our system, we invoked the utility as: db2dart sample /db Here sample is the name of the database, and /db is an argument to inspect the entire database. The db2dart utility can be found in drive:\path\sqllib\bin. The db2dart utility generates a report file and an error file. The report file is generated in the path in which the db2dart utility was executed and has the dbname.rpt naming convention.

It is important that you run the db2dart utility against the database while there are no connections to the database.

Chapter 13. Migrating to DB2 UDB Version 7 251 Figure 13-10 DB2DART utility output reporting no errors

5. Start the database manager instance with the db2start command. 6. Take an offline backup of the database that will be migrated. In our example, we take offline database backups to disk. db2 backup database sample to c:\dbbackups The migration process renames the current active logs with the *.MIG extension. For example, SQL00001.LOG is renamed to SQL00001.MIG. 7. Stop the DB2 database manager by issuing the command: db2stop 8. End the DB2 license daemon by entering the command: db2licd -end 9. Stop the administration server if installed by entering the command: db2admin stop 10.Stop DB2 Services (Figure 13-11).

Figure 13-11 Stopping DB2 Services on Windows NT

11.Execute the db2ckmig from the Version 7.x CD-ROM to verify that the database can be migrated. The utility can be found in the drive:\db2\common

252 Data Links: Managing Files Using DB2 directory. You can find the usage notes for the utility by running db2ckmig with no arguments. We run db2ckmig on the sample database (Figure 13-12).

Figure 13-12 Verifying that the database can be migrated with the db2ckmig utility

The migrate.log file in our example is empty since the db2ckmig utility completed without any errors. 12.Install the DB2 Universal Database EE Version 7.x software. In our example, we installed DB2 Universal Database Version 7.2. For details on installing on the AIX platform, refer to IBM DB2 UDB for Windows Quick Beginnings, GC09-2971.

Note: Windows allows only one version of DB2 to be installed on a machine. For example, if you have DB2 Version 6.x and install Version 7.x, Version 6 will be deleted during the installation.

13.Once the installation is complete, log in with a user that has SYSADM authority. 14.To verify that the database that will be migrated is cataloged, issue the command: db2 list db directory 15.Migrate the database using the db2 migrate database command. In our example, we migrate the sample database: db2 migrate database sample

Migrating the V6.x Data Links File Manager (Windows NT) This section explains the process for migrating the V6.x Data Links File Manager on Windows NT: 1. Log on on the Data Links File Manager Server as a user with root authority. Install the DB2 Data Links File Manager Version 7.x software. In our example, we install DB2 Universal Database Version 7.2. For details on installing on the AIX platform, refer to DB2 Data Links Manager Quick Beginnings, GC09-2966. 2. Log on as the Data Links Administrator on the Version 5.x Data Links File Manager instance. 3. To ensure that there are no applications connected to the Data Links File Manager database, issue the command:

Chapter 13. Migrating to DB2 UDB Version 7 253 db2 list applications If all applications are disconnected from the database, the following warning should be returned: SQL1611W No data was returned by the Database System Monitor SQLSTATE=00000 4. Take an offline backup of the Data Links File Manager Database (dlfm_db). In our example, we take an offline backup to disk. db2 backup db dlfm_db to /backups 5. Stop the Data Links File Manager with the dlfm_shutdown command. 6. Issue a dlfm_see command to ensure that the Data Links File Manager is stopped. 7. Log on as a user with root authority and execute the db2dlmmg to start the DLFM migration from the /usr/lpp/db2_07_01/adm/ directory. The db2dlmmg utility: – Binds the migration package. – Backs up the DLFM_DB database. – Migrates the instance or database.

13.1.3 Migrating databases using an offline backup The second method of migrating the DB2 UDB database server and the Data Links File Manager Server is by means of a database backup. DB2 Universal Database supports a restore of a backup taken from two releases prior to the latest release. For example, a Version 5.x and Version 6.x database backup can be used to restore into a Version 7.x instance. The steps described in this section apply to both the AIX and Windows platforms.

In our example, we restore a backup taken at DB2 Universal Database V5.2 into a DB2 Universal Database Version 7.2 instance. This functionality is made possible by the fact that the database engine on the instance that the database is being restored into migrates the database. The database has to be migrated during the restore before it can be used. The migration is done automatically when the database engine determines that the backup image being used to restore is from an earlier release of DB2 Universal Database.

While it is possible to restore a backup into a more current release of DB2 Universal Database, the converse is not possible. For example, you can restore a Version 5.x database into a Version 7.x instance but cannot restore a Version 7.x database into a Version 5.x or Version 6.x instance. One of the reasons for this limitation is the changes that takes place in the System Catalog Tables during the migration to a later release.

254 Data Links: Managing Files Using DB2 The following steps outline the process of migrating the DB2 UDB database server and the Data Links File Manager database using backups taken at an earlier release. These steps assume that the DB2 UDB database server and the Data Links File Manager instances were already migrated using the db2imigr and the db2dlmmg utilities.

On the Data Links File Manager Server Complete these steps: 1. Log on to the DB2 Data Links File Manager Server as the Data Links administrator. 2. To determine if there are any applications connected to the database that will be migrated, issue the command: db2 list applications Terminate any applications normally. 3. Take an offline backup of the database (you may need to issue a dlfm stop command). In our example, we make database backups to disk. You may backup to ADSM/TSM or a vendor device. db2 backup database dlfm_db to /datalink/dlfm/dlfm_backup

Note: The migration of databases using database backups require the database backups to be offline. Online backups require the database transaction logs to be rolled forward at the completion of the restore. Database transaction logs are not migrated and, as a result, cannot be used to roll forward after the restore in the new database instance.

4. Verify that the backup has completed successfully by examining the database history file. This step is important since the backup image will be used to create the database on Version 7.x. The database history file can be examined by issuing: db2 list history all for The output should be similar to the example shown in Figure 13-13.

Chapter 13. Migrating to DB2 UDB Version 7 255 Figure 13-13 Extract of a recovery history file

5. Install the DB2 Universal Database Version 7.x code and create an instance. 6. Log on as the new DB2 Universal Database Version 7.x instance owner. 7. Restore the backup taken from step 3. In our example, we issued: db2 restore db taken at 20010511100146 from /datalink/dlfm/dlfm_backup 8. The command returns an SQL2539W warning message since dlfm_db from Version 5.2 still exists on the system. We choose to overwrite the existing dlfm_db database (Figure 13-14).

Figure 13-14 Restoring into an existing database

Note: If the LOGRETAIN configuration parameter in the database configuration file is set to Yes/No when the offline database backup was taken, you must issue an additional command: db2 rollforward db stop

This is needed to remove the database from rollforward pending state.

9. Since we had LOGRETAIN in the database manager configuration file turned on when the backup was taken, in our example, we issued: db2 rollforward db dlfm_db stop

256 Data Links: Managing Files Using DB2 10.Once the restore is completed issue the command: db2 list database directory Notice the release level. 11.Start the DLFM server with the dlfm start command. 12.Verify that the DB2 UDB database is registered on the DLFM server: dlfm list registered databases 13.Issue the command: db2 connect to 14.Issue the command: db2 list tables At this point, the migration is complete on the DLFM server side.

On the DB2 UDB database server Complete the following steps on the DB2 UDB database server: 1. Log on to the DB2 UDB database server as the instance owner. 2. To determine if there are any applications connected to the database that will be migrated, issue the command: db2 list applications Terminate any applications normally. 3. Take an offline backup of the database. In our example, we make database backups to disk. You may backup to ADSM/TSM or a vendor device. db2 backup database dltest to /dbbackup Verify that the backup has completed sucessfuly by examining the database history file. This step is important since the backup image will be used to create the database on Version 7.x. The database history file can be examined by issuing: db2 list history all for 4. Install the DB2 UDB Version 7.x code and create an instance. 5. Log on as the new DB2 UDB Version 7.x instance owner. 6. Verify that the DLFM server is registered: db2 list datalinks managers for database dltest 7. Restore the backup taken from step 3. In our example, we issued: db2 restore db dltest taken at 20010511100146 from /datalink/dlfm/dlfm_backup without datalink

Chapter 13. Migrating to DB2 UDB Version 7 257 The command returns a SQL2539W warning message since dlfm_db from Version 5.2 still exists on the system. We choose to overwrite the existing dlfm_db database. 8. Once the restore is completed, issue the following command: db2 list database directory Notice the release level. Since we had LOGRETAIN in the database manager configuration file turned on when the backup was taken, in our example, we issued: db2 rollforward db dltest stop See Figure 13-15.

Figure 13-15 Rollforward completing with a warning

9. Issue the command: db2 connect to 10.Issue the command: db2 list tables 11.Execute the db2_recon_aid utility with the check option to determine which tables may need to be reconciled. In our example, we issued: db2_recon_aid -db dltest -check 12.Run the Reconcile utility. For more information on the Reconcile utility, see 10.1, “Overview” on page 192. At this point, the migration is complete.

258 Data Links: Managing Files Using DB2 14

Chapter 14. Moving a Data Links file system to a new disk

There may be a situation, although rarely, where it is required to migrate files under Data Links control from one storage disk to another. This chapter discusses the various steps involved. It also discusses some performance tips that, when kept in mind, can save a lot of time.

The following two scenarios are considered (on AIX and Solaris): Moving the entire file systems to a different disk in the same machine. The current disk still remains connected to the machine. Replacing the existing disk with a new one, therefore, moving all the DLFS-enabled file systems to the new disk.

Let us assume that you need to migrate a DLFS enabled file system named /dlfsfs to a new disk, configured on the machine. Also, the logical volume containing /dlfsfs file system is /dev/dlfslv. The following steps describe the entire process: 1. Stop the Data Links File Manager: dlfm stop 2. Switch to the super user ID (root on AIX, Solaris). 3. Get the File System IDentifier (FSID) of the file system to be migrated (/dlfsfs). Here is an example to get the FSID of the file system. Get the major and minor number of the device that is mounted on /dlfsfs: ls -l /dev/dlfslv | awk '{print $5,$6}' Let’s assume that /dev/dlfslv has: – Major number = 10 (a in hex) – Minor number = 9 (0009 in hex) So, the above command would produce the following output: 10, 9 Now the FSID value = 000a0009 (Major and Minor number appended together) or 655369 (in decimal). 4. Unmount the file system to be migrated (/dlfsfs): umount /dlfsfs 5. Use the dd command to copy the contents from the old logical volume to the new logical volume. This command helps in maintaining the inode values of the files, therefore, minimizing the time required for a final reconcile on the DB2 UDB table having the DATALINK column. The following two cases are possible: – If you are going to replace the old disk with a new one: i. Copy the old logical volume to a tape: dd if=/dev/dlfslv of=/dev/rmt0 bs=512b ii. Repeat the same procedure, if you want to migrate more than one DLFS-enabled file system. iii. Replace the old disk with the new one. Configure it for its standard configuration. Create a new logical volume (/dev/newdlfslv) in the new disk. Note that the size of the new logical volume must be the same or more than the old logical volume which was mounted on /dlfsfs.

260 Data Links: Managing Files Using DB2 Tip: We recommend you keep the FSID of the new logical volume same as that of the current logical volume, because it would improve the performance by reducing the time taken by the Reconcile utility.

Keeping the FSID same: In AIX, the FSID is an integer whose first 16 bits represent the major number of the volume group and the last 16 bits represent the minor number of the logical volume. To keep the FSID of the new logical volume same as that of the old logical volume, the new disk should be the part of the same volume group, and the minor number of the new logical volume should be maintained too. There is no way of explicitly specifying the minor number of the logical volume on AIX. The system assigns the lowest number available to the new logical volume, under that major number (for example, the volume group). Therefore, to keep the minor number same, after the physical volume representing the new disk is added to the same volume group as of the old disk, delete the old logical volume. This results in freeing the minor number that corresponds to the old logical volume (from which the Data Links files have to be copied). Now create the new logical volume (under the new physical volume but the old volume group). This results in the system assigning the smallest free minor number, which should be the minor number of the old logical volume just freed.

Note: It is possible (although rare) that a smaller minor number is available under the same volume group. And therefore, when the new volume group is created, it would be assigned this minor number, and not the minor number of the current logical volume. This hole can only be created when an already defined logical volume was deleted.

iv. Copy the contents of the tape to the new logical volume (/dev/newdlfslv): dd if=/dev/rmt0 of=/dev/newdlfslv bs=512b – If both the disks are connected to the machine: i. Create a new logical volume (/dev/newdlfslv) in the new disk. Note that the size of the new logical volume must be the same or more than the old logical volume that was mounted on /dlfsfs.

Chapter 14. Moving a Data Links file system to a new disk 261 Note: For better performance, the FSID of the logical volume should be maintained. See “Keeping the FSID same:” on page 261.

ii. Directly copy the contents from the old logical volume to the new one: dd if=/dev/dlfslv of=/dev/newdlfslv 6. Change the file system entry for /dlfsfs in the file /etc/filesystems. Change the device name from the old logical volume name (/dev/dlfslv) to the new logical volume name (/dev/newdlfslv). This is done to keep the file system mount point same. So you don't have to change the file system name in the dlfm tables. 7. Mount the file system as DLFS enabled. This time the new logical device (/dev/newdlfslv) is mounted on /dlfsfs: mount -v dlfs /dlfsfs 8. Get the FSID of the file system (/dlfsfs). This must have changed if the new logical volume has a different major and minor number than the old logical volume. Refer to step 3 to get the FSID of the file system. 9. If your new logical volume has a different major and minor number than the old one (Please refer to step 5), you need to change the FSID entry in the DFM_DIR table in DLFM_DB database. Otherwise skip this step. The following commands serve this purpose: db2 connect to dlfm_db db2 "update dfm_dir set fsid= where fsid=" ddb2 commit 10.Start the Data Links File Manager (DLFM): dlfm start

14.2 Migrating a DLFS-enabled file system (Solaris)

Lets assume that you are going to migrate a DLFS-enabled file system named /dlfsfs residing the disk slice /dev/dsk/c0t0d0s5 to a new disk configured on the machine. Here are the steps to do this: 1. Stop the Data Links File Manager: dlfm stop 2. Switch to the super user ID (root on AIX, Solaris). 3. Get the FSID of the file system to be migrated (/dlfsfs): df -g /dlfsfs | grep filesys | awk {'print $4'}

262 Data Links: Managing Files Using DB2 4. Unmount the file system to be migrated (/dlfsfs): umount /dlfsfs 5. Run the dd command to copy the contents from the old disk slice to the new disk slice. Here are the steps to do it. – If you are going to replace the old disk with a new one: i. Copy the content in the old disk slice to a tape: dd if=/dev/dsk/c0t0d0s5 of=/dev/rmt0 bs=512b ii. Repeat the same procedure, if you want to migrate more than one DLFS-enabled file system. iii. Replace the old disk with the new one. Configure it for its standard configuration. Configure a new disk slice (/dev/dsk/c0t8dos5) in the new disk. Note that the size of the new disk slice must be the same or more than the old disk slice which was mounted on /dlfsfs.

Note: We recommend you keep the major number and minor number of the new disk slice same as that of the old disk slice. It reduces the migration time a lot.

iv. Copy the contents of the tape to the new disk slice (/dev/dsk/c0t8d0s5): dd if=/dev/rmt0 of=/dev/dsk/c0t8d0s5 bs=512b – If both the disks are connected to the machine: Directly copy the contents from the old disk slice to the new one: dd if=/dev/dsk/c0t0d0s5 of=/dev/dsk/c0t8d0s5 6. Change the file system entry for /dlfsfs in the file /etc/vfstab. Change the device name from the old disk slice name (/dev/dsk/c0t0d0s5) to the new disk slice name (/dev/dsk/c0t8d0s5). This is done to keep the file system mount point same, so you don't have to change the file system name in the dlfm tables. 7. Mount the file system as dlfs enabled. This time the new disk slice (/dev/dsk/c0t8d0s5) is mounted on /dlfsfs: mount /dlfsfs 8. Get the FSID of the file system (/dlfsfs). This must have changed if the new disk slice has a different major and minor number than the old one. Refer to step 3 to get the FSID of the file system. 9. If your new disk slice has a different major and minor number than the old one (refer to step 5), you need to change the FSID entries in the dlfm tables (DFM_DIR and DFM_FILE). Otherwise skip this step. The following commands serve this purpose:

Chapter 14. Moving a Data Links file system to a new disk 263 db2 connect to dlfm_db db2 "update dfm_dir set fsid= where fsid=" db2 "update dfm_file set fsid= where fsid=" db2 commit 10.Start the Data Links File Manager: dlfm start

264 Data Links: Managing Files Using DB2 15

Chapter 15. Replacing or upgrading a machine

This chapter takes you through the steps that are required to replace or upgrade a machine that has DB2 Universal Database or Data Links File Manager installed.

Once in a while, because of performance reasons, or maybe due to new usage requirements, customers need to upgrade their DB2 system. Usually, this can be done by either: Replacing pieces of hardware (for example, CPU, Memory, so both the HOSTNAME and IP address remain UNCHANGED) Moving the DB2 server to another machine, which also means both IP address and HOSTNAME change

The first scenario is straightforward because it does not involve any data movement. However, for the second case, which involves moving DB2 data from one machine to another, it becomes more complicated. It is even more complicated if the DB2 server is connected with several Data Links File Managers (DLFM) because there is some meta data stored in the Data Links File Manager about the location of the DB2 server. Currently there is no external command or tool to do it easily.

This section takes you through the procedure to replace the DB2 UDB database server that has connections to DLFMs.

15.1.1 Assumption Replace or upgrade a DB2 UDB Database server machine that has files linked to several DLFMs. The DLFMs will remain untouched, but the IP address or hostname of the new DB2 machine will be different. Assume the hostname of old DB2 machine is OLDHOST and the hostname of the new machine is NEWHOST.

15.1.2 Steps to perform Perform the following steps: 1. Make sure that there is no database activity. 2. Take an offline backup of the original DB2 UDB database. 3. Copy the database backup files to the new machine. 4. Copy the datalink.cfg file as well as the datalink.cfg.BAK file from the database directory of the original DB2 database. You can locate the database directory by using the command: DB2 LIST DATABASE DIRECTORY 5. To drop the original database, issue the following command: DB2 DROP DATABASE

266 Data Links: Managing Files Using DB2 6. For each DLFM, change the DB2 UDB database hostname registration at the DLFM side. Currently there is no external command for this, so you have to do this by directly modifying the DLFM_DB database: db2 connect to DLFM_DB db2 update table dfm_dbid set hostname = 'NEWHOST' where hostname = 'OLDHOST' db2 commit db2 terminate 7. Issue a db2start on the new DB2 Universal Database machine. 8. Initiate a DB2 RESTORE from the DB2 backup file (keep all the instances and database names the same) and then issue: DB2 ROLLFORWARD STOP 9. Issue the following SQL statement for all the tables having DATALINK columns: "SET CONSTRAINT FOR 'table' DATALINK RECONCILE PENDING IMMEDIATE UNCHECKED" 10.For each table that has a DATALINK column, issue the following command to return the state back to normal: DB2 RECONCILE

15.2 Replacing or upgrading a DLFM machine

The IP address of the new machine remains the same. (This scenario is somewhat similar to the disk crash recovery.)

15.2.1 Steps to perform You need to perform the following steps: 1. Make sure there is no activity for any databases connected to this DLFM. 2. Connect to the DLFM_DB database and make sure there is nothing in the dfm_xnstate table. If so, wait until it is empty. 3. Make an offline backup for each DB2 database that is connected to this DLFM. 4. Make an offline backup to DLFM_DB, and copy the backup file to the new machine. 5. Backup all the Data Link files under DLFS file system, and copy the backup archive to the new machine.

Chapter 15. Replacing or upgrading a machine 267 6. If you are not using ADSM as the Data Link file backup, copy all the Data Links file archives under the DLFM_BACKUP_DIR_NAME directory on to the new machine. 7. Stop the original DLFM. 8. In the new DLFM machine, restore DLFM_DB from the backup archive created in step 4. 9. In the new DLFM machine, restore all the Data Link files from the backup archive created in step 5. 10.If using ADSM, set up DLFM to connect to the same ADSM. 11.Start the new DLFM. 12.Run reconcile for each table that has a DATALINK value pointing to this new DLFM.

268 Data Links: Managing Files Using DB2 16

Chapter 16. Problem determination

This chapter provides a detailed description of problem determination. First it describes a methodology for problem solving. Then it discusses some solutions to the common problems.

This section describes the steps required and the information to collect in order to try an solve problems in a Data Links File Manager (DLFM) environment. The steps that are outlined apply both to the DB2 UDB database server and DLFM Server unless otherwise noted.

16.1.1 Problem solving process The problem solving process involves the following steps: 1. The first step in any problem determination process is to understand what the problem really is. Recognizing that a certain condition exists in a problem requires understanding the environment in which the problem condition has occurred. It is important to try and differentiate between a product limitation and a product defect or problem early in the process. The problem may be caused by an error in a user application, or bug in the DB2 UDB or DLFM server code. 2. Problem determination requires a “good” description of the problem. A good description of the problem usually indicates how well the problem is being understood. To determine what the problem is, you must fully describe the error conditions. The problem description should include: – All error codes/error conditions; include the reason code if applicable – The actions that preceded the error – A description of the problem 3. Determine if the problem can be reproduced or is it was a one-time occurrence. If the problem is reproducible, determine the steps that are required. 4. Identify the source or cause of the error. – Is it a user error? – Determine if the system working as designed. For example, a user did not understand the behavior of the system or the system is working as it was intended. – Is the system configuration supported? For example, the system was never intended to run with the hardware or software that was installed. – Is it a DB2 UDB or DLFM server bug? 5. Provide a fix for the problem. – If the problem is caused by any of the following reasons, an application or environment change may be required:

270 Data Links: Managing Files Using DB2 • User error • The system is working as designed • It is an unsupported environment or configuration – If the problem is caused by a DB2 UDB or DLFM bug, a fix for the defect will be provided or a workaround developed.

16.1.2 Information needed to analyze a problem This section describes the information you should gather based on the error conditions encountered. Different types of errors require different data to be collected. However, some data is collected for all error conditions.

Required information The following information is required: The SQL Error Code that was returned with the corresponding Reason Code (RC). For example, SQL0357n, RC = “03” is a possible error and reason code returned when the Reconcile utility was initiated. Provide a System Error code if the error is not a SQL error. The approximate time of the error. Determine where the error was encountered, for example, on the DB2 UDB database server or the DLFM server. A “good” problem description. Description of the actions that preceded the error. The database manager configuration file for the DB2 UDB database server and the DLFM server. The following command can be used to collect this information: db2 get dbm cfg The database configuration file for the DB2 UDB databases and the Data Links database (DLFM_DB). You can use the following command to collect this information: db2 get db cfg for The DB2 UDB server and DLFM server code level. This information can be collected by issuing the db2level command on each server. The db2 diagnostic logs (DB2DIAG.LOG) from both the DB2 UDB server and the DLFM server. By default, the db2diag.log is located in the instance home directory under the sqllib/db2dump directory. The DB2DIAG.LOG file is the most important debugging information and must be collected for all DB2 UDB and DLFM error conditions.

Chapter 16. Problem determination 271 – If the problem is reproducible, we recommend you set the diaglevel to 4 and recapture the information. The diaglevel is a database manager configuration parameter.

Note: DIAGLEVEL can be set at: 0 No logging. 1 Severe errors. 2 Severe + non severe errors. 3 Severe, non-severe & warning. 4 Severe, non-severe, warning & informational.

The default diaglevel is level 3.

– On the DLFM server, the error level is set by the DLFM_LOG_LEVEL registry variable. The default value is ERROR. For problem determination purposes, the DLFM_LOG_LEVEL should be set to DEBUG. To determine what the current DLFM_LOG_LEVEL is set to, issue the db2set command on the DLFM server. – Collect any dump files mentioned in DB2DIAG.LOG. Dump files can be identified by files named as x.dmp, where x is the process ID that produced the dump. – Collect any trap files DIAGPATH. Trap files can be identified by files named as x.trp, where x is the process ID that produced the trap. – The DIAGPATH is a database configuration parameter that points to the directory location for placing diagnostic data. We recommend that you collect all files in the DIAGPATH directory. To reduce the files that must be analyzed, you should clean up this directory on a regular basis. Figure 16-1 shows a description of the type of information that is written to the db2diag.log file.

272 Data Links: Managing Files Using DB2 2 1 3 1998-03-23-14.59.01.30 3000 Instance:DB2 Node:000 5 4 PID:147(db2syscs.exe) TID:203 Appid:*LOCAL.DB2.9803231 95820 6 buffer_pool_services sqlbStartPools Probe:0 Database:SAMPLE 9 7 8

10 Starting the database

There is no error code in this case as it is an informational message

Figure 16-1 Extract of an entry written to the db2diag.log file

Figure 16-2 explains what each of the identified components in (Figure 16-1) represents.

1. Time/date information 2. Instance name 3. Partition Number, even for non EEE 4. Process ID and thread ID in Windows 5. Application ID 6. Component Identifier 7. Function identifier 8. Unique Error identifier (Probe ID) 9. Database name 10. Error description and/or Error code

Figure 16-2 Information about each component in the db2diag.log file

16.1.3 DB2 Universal Database or DLFM ‘hang’ situations The debugging of “hang” situations is more complicated. An example of a hang is a CONNECT to the database that does not return. DB2 Universal Database provides the db2_call_stack on UNIX platforms and the db2bddbg.exe (DB2 Backdoor Debugger) tools on Windows platform to collect information for hang situations.

Chapter 16. Problem determination 273 Hangs in the UNIX environment On UNIX platforms, the db2_call_stack tool can be found in the sqllib/bin directory. The db2_call_stack tool should be initiated by the db2 or dlfm administrator. The db2_call_stack tool does not resolve the hang, but rather provides more information by issuing a signal -36 on AIX (signal -21 on Solaris) against db2 or dlfm processes. The tool can be initiated on the DB2 UDB server or the DLFM server.

For example, let us assume that we suspect a hang on the DLFM server in a UNIX environment. You should perform these steps: 1. Log on to the DLFM server as the DLFM administrator. 2. Issue the db2_call_stack command. This generates trap files in the sqllib/db2dump directory. A trap file is generated for each DB2 process on the DLFM server. 3. Wait for 3 minutes and issue the db2_call_stack command again. You should repeat this step at least twice. You do this to determine if there are any changes on the stack calling chain dumped in the trap file. Figure 16-3 shows an example of what is dumped to a trap file. The trap files are required for problem determination purposes by the DB2 support team.

Figure 16-3 Extract of a trap file

4. The trap file shows the order in which functions were called. The most recent function call is on the top of the stack. The trap is analyzed by looking at changes on the top of stack after a number of iterations of dumping the stack with the db2_call_stack command. We recommend you dump the stack at least three times, two minutes apart.

274 Data Links: Managing Files Using DB2 Hangs in the Windows environment On Windows platforms, the db2dbdbg tool can be found in the sqllib/bin directory. The db2dbdbg tool should be initiated by the db2 or dlfm administrator. The db2dbdbg tool does not resolve the hang, but rather provides more information. The tool can be initiated on the DB2 UDB server or the DLFM server.

For example, let us assume that we suspect a hang on the DLFM server in a Windows environment. You should perform the following steps: 1. Issue the following command: db2set DB2_BDINFO This sets up the debugger registry variable. This command displays three numbers. For example, the output of the numbers should resemble the following format: 1904 1812223120 2011788013 2. Run the following command: db2bddbg 1904 1812223120 db2bd db2ntDumpTid E:\work -1 stack.dmp The first two arguments to the db2bddbg tool are the first two numbers from the output of the db2set DB2_BDINFO command. The next two arguments of db2bddbg are the internal debug DLL and function names. E:\work is the directory for output file, stack.dmp is the name of the stack trace back file. 3. Run the following command again after two minutes: db2bddbg 1904 1812223120 db2bd db2ntDumpTid E:\work -1 stack2.dmp 4. Send the stack.dmp and stack2.dmp files to DB2 support for analysis.

16.1.4 DB2 Universal Database or DLFM crash A crash is a severe error or condition that causes the DB2 UDB Database Manager or DLFM to abnormally terminate. An example of a crash is a power failure. In the event of a crash of the DB2 UDB server or the DLFM server, a number of trap files are generated in the sqllib/db2ump directory or the directory specified by the DIAGPATH database manager configuration parameter. A trap file is generated for each DB2 process. The contents of the sqllib/db2dump directory should be sent to the DB2 support team for analysis of the crash.

The minimum amount of information to collect for problem determination purposes includes: The SQL code and any reason code or system error code A useful description of the problem A description of the actions preceding the error

Chapter 16. Problem determination 275 The database code level The database manager and database configuration parameters The DB2DIAG.LOG file The time of the error Any dump file listed in the DB2DIAG.LOG file Any trap file in the DIAGPATH

If possible, collect as much information as possible since it can reduce the time taken to resolve complex problems.

Additional information In many cases, the DB2DIAG.LOG and its associated trap and dump files will be enough information to solve the problem. However, in some cases, additional data may be required: A DB2 Trace can be taken on the DB2 UDB and DLFM server. A trace is useful in determining the internal code path taken by DB2 UDB and the DLFM server that leads to the problem. A SYSLOG in the UNIX environment or an Event Log in the Windows environment can be collected.

Note: Refer to Appendix D, “Logging priorities for DLFF and DLFSCM” on page 331, to learn how to change logging level of DLFF (or DLFSCM in DCE-DFS environment) to the required level.

What else can be done In the event that the previous information does not provide the source of the problem, what else can be done? Additional debug code can be added that provides more information in the DB2DIAG.LOG file when the error occurs again. Additional trace points can be added that provide additional information about the function where the error is occurring and the data it is manipulating.

16.1.5 The DB2 Trace This section describes capturing and analyzing a DB2 Trace. There are other forms of traces, such as Operating System Traces and Application Traces, which we do not discuss. The DB2 Trace can be used on both the DB2 UDB database server and the DLFM server. It may sometimes be necessary to take a trace concurrently on the DB2 UDB database server and the DLFM server. An example of such as situation is a communication problem between the DB2 UDB database and the DLFM server.

276 Data Links: Managing Files Using DB2 Taking a DB2 Trace A DB2 Trace may be required if the diagnostic data already collected does not give enough information about a problem. A DB2 Trace can be really useful if the problem being encountered is reproducible. Since the trace logs all actions being performed along with parameter values at various steps in the process, the following actions occur: DB2 Trace does impact performance. The trace should be taken when there is minimum activity on the machine to prevent the capture of unnecessary information. In addition, DB2 Trace also initializes some variables. This sometimes eliminates traps or segmentation violations from occurring while tracing.

DB2 Trace in memory To perform a DB2 Trace in memory, follow these steps: 1. Turn on the trace with the command: db2trc on -l 8000000 -e -1 The 8000000 represents a 8 MB memory buffer. This may be increased if the buffer is too small to capture the error in the trace. The -e -1 indicates that the trace should continue after system errors. 2. Reproduce the error. 3. Dump the trace with the command: db2trc dmp tracefile.name The tracefile.name can be any name to store the trace output from memory to disk. 4. Turn off the trace with the command: db2trc off 5. Since the trace is dumped in a binary format, it must be converted to ASCII to be analyzed by the following commands: db2trc fmt tracefile.name tracefile.fmt db2trc flw tracefile.name tracefile.flw tracefile.name is used as an input file from the db2trc dmp command. tracefile.fmt and the tracefile.flw represent the ASCII output files of the trace FLOW and trace FORMAT. These are the two files that are analyzed to determine what caused the error.

Chapter 16. Problem determination 277 DB2 Trace to a file The following steps outline the process of taking a DB2 Trace directly to a file instead of to memory. This method is recommended in system hang situations when you cannot manually dump the trace. A trace to file causes more of performance decrease than tracing to memory. 1. To turn on the trace, enter: db2trc on -l 8000000 -e -1 -f tracefile.name The 8000000 represents a 8 MB memory buffer. This may be increased if the buffer is too small to capture the error in the trace. The -e -1 indicates that the trace should continue after System Errors. The -f tracefile.name indicates to write directly to the file specified instead of logging to memory 2. Reproduce the error. 3. There is no need to dump the trace file since it is already on disk. 4. Turn the trace off with the command: db2trc off

Information provided by the trace The trace gives the following information: The trace records all functions called in the order of time in which they were called. The trace captures trace points. The types of trace points include: – Function entry trace points – Data trace points to record variable values at points within the function – Exit trace points to record the function return codes – Error trace points to record additional data in the event of an error

Figure 16-4 shows an illustration of a trace entry that is dumped to the trace format file that is produced by using the db2trc fmt command.

2 3 4 5 6 1 1 DB2 cei_entry oss 2 sqlo_init_GMT_tim er_services (1.20.74.152) 7 pid 198; tid 197; cpid 0; tim e 1512414; trace_point 0 called_from 10047AE9 8 9 10 11

Figure 16-4 Extract of a trace entry in the formatted trace file

278 Data Links: Managing Files Using DB2 Figure 16-5 describes each of the components that are dumped to a formatted trace file for each trace point.

1. Sequence number 2. Instance name 3. Entry type, for example: function entry/exit/data 4. Component name 5. Function name 6. Internal ID 7. Process and thread information 8. Companion process ID 9. Time information 10. Unique trace point identifier 11. Address where the function was called

Figure 16-5 Information about each component in a formatted trace file

The most recent trace points are at the bottom of the trace. When analyzing a trace, start at the bottom and work upward to find the source of the problem.

You need to format or flow the trace prior to examination: To format the trace file, issue the following command: db2trc fmt trace.file trace.fmt To flow the trace file, issue the following command: db2trc flw trace.file trace.flw

Analyzing a DB2 Trace to resolve problems This section demonstrates using the DB2 Trace utility through examples to resolve problems. In the first example, we took a trace on the DB2 UDB server.

The db2diag.log in this example provides enough information to resolve the problem, but for learning purposes, we use a trace to confirm the error. In the second example, we took a trace on the DLFM server. This helps demonstrate that the principals of problem determination are very similar on both servers.

Example In this example, the scenario is as follows. A user is running an application in a DLFM environment and receives an SQL1036 error message when trying to use the application. The user approaches you, the Database Administrator (DBA), to resolve the problem.

Chapter 16. Problem determination 279 You should use the following approach to resolve the problem: 1. Log on to the DB2 UDB server as the database instance owner. As the DBA, you know that the application has to communicate with the DB2 UDB database that would communicate the user request with the DLFM server. 2. You need to narrow down that there is a problem with the DB2 UDB database or DLFM server by using the DB2 CLP (Command Line Processor) and not the application. Open a CLP window and try to connect to the database that the user is trying to use. You notice the error message, shown in Figure 16-6, on the connect.

Figure 16-6 An SQL1036 error message when connecting to the database

3. You can try to connect to the Data Links File Manager database (dlfm_db) to ensure that you can connect. From this information, you can determine that the problem is on the DB2 UDB database server. It is important to narrow down the problem source as much as possible. 4. At the CLP, issue the following command to gather more information about the error message: db2 ? sql1036 You notice that there are many possibilities for this error message to be returned. 5. Examine the db2diag.log (Figure 16-7). For problem determination purposes, you should have diaglevel in the database configuration file set to 4.

280 Data Links: Managing Files Using DB2 Figure 16-7 Extract of the DB2DIAG.LOG with the SQL1036 error message

6. The db2diag.log shows the first SQL1036 error occurring at 2001-05-30-15.29.57 on PID:12136 (Figure 16-7). The function sqlpgint reports the SQL1036 error. The first function to report an error is the sqlpgilt on PID:20914. From db2diag.log, you can tell that we are missing the SQL000001.LOG file, which is the first active log file. 7. Notice the ZRC=FFFFE60A error being dumped in the db2diag.log on PID 12136. This is an internal error message, on which you can find more information in Appendix A, “DB2 Internal Return Codes,” in DB2 UDB Troubleshooting Guide, GC09-2850. The E60A error maps to a “File Does Not Exist” message. 8. Since this problem is reproducible, we take a trace of the problem to confirm that the missing log file is indeed causing the I/O error. 9. Take a DB2 Trace to memory on the DB2 UDB database server as follows: a. Turn the trace on with the command: db2trc on -l 8000000 -e -1 Reproduce the SQL1036N error: db2 connect to b. Dump the trace with the command: db2trc dmp tracefile.name

Chapter 16. Problem determination 281 c. Turn the trace off with the command: db2trc off d. Since the trace is dumped in a binary format, it must be converted to ASCII to be analyzed by the following commands: db2trc fmt tracefile.name tracefile.fmt db2trc flw tracefile.name tracefile.flw

Note: When the trace is being formatted, look at the output to determine whether the trace is wrapped. You should try and capture a trace that is not wrapped. A trace that is wrapped usually indicates that the error has not been captured (Figure 16-8).

Figure 16-8 Output of the DB2 Trace format command

10.Open the tracefile.flw, tracefile.fmt, and the db2diag.log files in three different windows. The trace flow file shows the flow of control of processing by DB2 UDB. The vertical lines in the file in the flow helps to match the start and finish of each function (Figure 16-9). The the trace flow is a diagram of the execution path of the source code. Each function has an entry, data, and exit point. For the purposes of our analysis, we focus on the entry and the exit points. The function exit points dump a return code. A return code of zero (rc=0) means the function completed without any error. A negative return code indicates an error condition. A positive return code is usually a warning. When analyzing a trace flow, it is important to understand that error conditions are propagated to the calling functions, so a function can return an error only because an error was encountered by another function.

282 Data Links: Managing Files Using DB2 Source Code Trace Flow

begin function OpenFile() begin function OpenFile() ..... | ...... begin function FindFile() | begin function FindFile() Entry filefound=0; | filefound=0; Data end function FindFile() | end function FindFile() ..... | ...... Exit end function OpenFile() end function OpenFile()

Figure 16-9 Function flow structure

11.Go to the bottom of the tracefile.flw and search for -1036. The trace flow shows the sequence in which functions were called and the corresponding return codes. If the error is found and the trace is not wrapped, this indicates that the trace has captured the error and can be analyzed to find the source of the problem.

Tip: Most recent trace points are at the bottom of the trace. Start at the bottom and work upwards to find the source of the problem. Not all errors in the trace are real problems.

12.Search for the PID = 12136 string in the trace flow. This pid is the process ID of the sqlpgint function that failed with a “File no found” error message in the db2diag.log (Figure 16-7). We recommend this approach of starting the trace analysis because it focuses on the functions that returned the error (Figure 16-10). 13.When you reach sequence number 1701, as in our example of the flow, you will notice that there where no serious errors. Continue following the rest of the functions for the identified pid and analyze any errors. 14.Analyze all the error codes that are being returned on PID = 12136. Some functions return errors that are not really serious. In general, an error is considered serious when the error code is propagated to all functions on a particular pid (Figure 16-10).

Chapter 16. Problem determination 283 pid = 12136; tid = 1; node = 0;

Corresponds to PID 1161 sqloset cei_entry in db2diag.log 1162 sqloset cei_data ... 1163 sqloset cei_data ... 1164 sqloset cei_retcode 0 1165 sqleWaitUntilReactivated fnc_data ... 1166 sqleWaitUntilReactivated fnc_retcode 0 1167 sqleAgentActivationInit fnc_entry 1168 |sqloInstallEDUSignalHandler cei_entry 1169 |sqloInstallEDUSignalHandler cei_retcode 0 1170 |sqloInstallEDUSignalHandler cei_entry 1171 |sqloInstallEDUSignalHandler cei_retcode 0 ...... Entry to the function sqlubrfg 1692 ||||||||||sqlubrfg cei_entry which calls function sqloppth 1693 |||||||||||sqloppth cei_entry 1694 |||||||||||sqloppth cei_retcode0 1695 |||||||||||sqloppth cei_entry 1696 |||||||||||sqloppth cei_retcode0 1697 |||||||||||sqloopenpcei_entry Not a serious error since 1698 |||||||||||sqloopenpcei_data ... function sqlubrfg below 1699 |||||||||||sqloopenpcei_data ... returns no error. 1700 |||||||||||sqloopenpcei_errcode0xffffe60a=-6646 1701 ||||||||||sqlubrfg cei_retcode0 ...... Function sqlubrfg ...... returns with no errors

Figure 16-10 Extract of the trace flow file

15.Follow the sequence numbers in an ascending numerical order. If a sequence number suddenly jumps to another pid, follow it on the new pid. In our example, the sequence number 2296 is on another pid (Figure 16-11). 16.The sqloopenp function returns an FFFFE60A error. This is the first sign of a function returning an error. Notice that this error is propagated to other functions that eventually leads to the SQL1036 error (I/O error). 17.Now that you have found the source of the FFFFE60A error, analyze the trace format to gather more details.

284 Data Links: Managing Files Using DB2 2158 ||||||||||||sqlpinit cei_entry 2159 |||||||||||||sqlogmblkcei_entry Function entry 2160 |||||||||||||sqlogmblkcei_data ... for sqlpinit 2161 ||||||||||||||MemHoldSegcei_entry 2162 ||||||||||||||MemHoldSegcei_retcode0 2163 |||||||||||||sqlogmblkcei_data ... 2164 |||||||||||||sqlogmblkcei_retcode0 2165 |||||||||||||sqloinca cei_entry Function entry 2166 |||||||||||||sqloinca cei_retcode0 for sqlpgint 2167 |||||||||||||sqlpgint fnc_entry 2168 ||||||||||||||sqlpgolf fnc_entry 2169 |||||||||||||||sqloopenp cei_entry 2170 |||||||||||||||sqloopenp cei_data ... 2171 |||||||||||||||sqloopenp cei_data ... 2172 |||||||||||||||sqloopenp cei_retcode 0 ...... Function exit pid = 19130; tid =1; node = 0 for sqloopenp with ...... error on PID 19130 2296 | | | |sqloopenp cei_errcode 0xffffe60a = -6646 ...... 2662 |||||||||||||||sqlocloselogcei_entry 2663 ||||||||||||||||get_libc_reen_buffer cei_entry Function exit 2664 ||||||||||||||||get_libc_reen_buffer cei_data ... for sqlpgint with 2665 ||||||||||||||||get_libc_reen_buffer cei_data ... error 2666 ||||||||||||||||get_libc_reen_buffer cei_retcode 0 2667 |||||||||||||||sqlocloselogcei_retcode0 2668 ||||||||||||||sqltfast2cei_retcode0 Error propogated 2669 |||||||||||||sqlpgint fnc_errcode0xffffe60a = -6646 2670 ||||||||||||sqlpinit cei_errcode0xffffe60a = -6646 to calling functions 2671 |||||||||||sqledint fnc_errcode0xfffffbf4=-1036 SQL 1036

Figure 16-11 Extract of trace flow showing the SQL1036 error

18.The trace format file provides more details for each function. We know from the trace flow file that the function sqloopenp returned an error at sequence number 2296. 19.Open the trace flow file if it is not already opened and search for sequence number 2296 (Figure 16-12). 20.Notice the rc = 0xffffe60a on the exit of the sqloopenp function. Find the entry point to the sqloopenp function so you can analyze the function. In our example, the entry point to sqloopenp is at sequence number 2293.

Chapter 16. Problem determination 285 Figure 16-12 Trace format file

21.Sequence number 2294 gives valuable information about the problem. It is the data portion of the sqloopenp function and reports that the file SQL000001.LOG cannot be found in the directory path /datalink/db2inst1/NODE0000/SQL00001/SQLOGDIR/SQL000001. This information is useful, since we get the name of the file that DB2 is trying to open as well as the path in which it is looking. The path to the file is not dumped in db2diag.log. 22.Log on to the DB2 server and go to the path that was dumped. Notice the SQL000001.LOG file is missing, which explains the I/O error. 23.Since the missing log file is an active log file, any connection attempts to the database would fail. To resolve this problem, you need to restore from the latest backup and rollforward to a point-in-time before the missing log files.

This concludes our example of analyzing a trace taken on the DB2 UDB server. The methodology used here applies to analyzing traces taken on the DB2 UDB database server and the DLFM server.

16.2 Solutions to common problems

Many of the problems encountered when using DLFM tend to fall into one of three categories. First, there could be a problem with the DLFM file server. Maybe it won't start, or it cannot talk to the DB2 server. Second, there could be a problem with the DB2 server that is attempting to use Data Links. Third, there

286 Data Links: Managing Files Using DB2 may be a problem with a file system being managed by DLFM. Maybe the file system cannot be mounted, or client workstations may not be able to access linked files. The following sections discuss each of these types of problems, what can cause them, and what can be done to fix them.

Before discussing the three problem categories, let’s briefly look at the resources that are available to aid in problem determination.

16.2.1 Available resources A good explanation for many of the error conditions encountered can be found in one of the DB2 manuals. The question is always which one. Here’s a helpful guide: DB2 UDB Message Reference, GC09-2978: For error messages beginning with the characters DB2 or SQL DB2 Data Links Manager Quick Beginnings, GC09-2966, in Appendix A, “DB2 Data Links Manager Errors and User Responses”: For messages beginning with DLFM DB2 UDB Troubleshooting Guide, GC09-2850, Appendix A, “DB2 Internal Return Codes”: For interpretations for the four-byte, hexadecimal error codes written to the db2diag.log file; there is another chapter that covers the Data Links Manager

Important: One of the most useful resources can be the db2diag.log file. This is probably the first place one should look for error messages. If no useful messages can be found in db2diag.log and the problem is reproducible, try increasing the Database manager configuration parameter DIAGLEVEL to its maximum value of 4, issue a db2stop and db2start, recreate the problem, and check db2diag.log again. When doing this, be sure to reset the DIAGLEVEL parameter to its previous value.

16.2.2 DLFM server problems This section outlines common problems with the DLFM server. In some cases, it also outlines the symptoms, and, in all cases, offers solutions to correct the problem.

Problem: DLFM will not start If any of the critical resources needed by DLFM are unavailable, startup may fail. A DLFM101E error message may be written to the db2diag.log file. This problem most commonly occurs when trying to start dlfm shortly after it is stopped. The dlfm stop command takes some time to cleanup all Inter Process

Chapter 16. Problem determination 287 Communication resources (IPCs), and any attempt to run start dlfm before the cleanup is complete will fail. This problem can also occur if the database manager has a problem starting, or communication services cannot be started, or if the dlfsdrv device driver is not loaded.

Symptoms After running the dlfm start command on the Data Links server, one or more of the following conditions exits: The DLFM SEE command shows no processes running. Application program receives a SQL0357N reason code, 03 return code, when attempting to SELECT, INSERT, or UPDATE a DATALINK value. db2diag.log on DB2 server contains a message that indicates that DLFM is unreachable. db2diag.log on DB2 server contains a message indicating that restart recovery is pending/in progress for DLFM.

Solution 1. Log on as dlfm, issue the dlfm shutdown command, and retry the dlfm start command. 2. Check the DB2 registry variables using db2set -all, and validate that they are correct. Be sure that the port number specified in the DLFM_PORT registry variable is not being used by another process. 3. Validate that the dlfm instance can be started by logging on as dlfm and issuing the db2start command. 4. Verify that DLFM_DB is usable by connecting to it. 5. Check to see if the dlfsdrv device driver is loaded. Log on as root and run: strload -qf /usr/lpp/db2_07_01/cfg/dlfs_cfg This command should return: /usr/lpp/db2_07_01/bin/dlfsdrv: yes If no is returned instead of yes, the dlfs device driver needs to be loaded. The driver can be loaded by root by running: strload -f /usr/lpp/db2_07_01/cfg/dlfs_cfg 6. Examine the db2diag.log file for additional messages.

288 Data Links: Managing Files Using DB2 Problem: DLFM does not automatically start after reboot Solution Follow these steps to correct the problem: 1. Validate that /etc/inittab has an entry to start db2 by running /etc/rc.db2. 2. Check to see that /etc/rc.db2 issues a dlfmstrt command to start dlfm. 3. Verify that all dlfs file systems are being mounted during startup. The mount option in /etc/filesystems for the dlfs file systems should be set to false. The file systems should be mounted by issuing the mount -v dlfs command. See the note box below. 4. Validate that /etc/rc.db2 or a script called by /etc/rc.db2 loads the dlfsdrv device driver.

Note: One common practice is to create an executable file called /etc/rc.dlfs and to call it from /etc/rc.db2 after the dlfmstrt command is executed. The file should contain the command to load the dlfsdrv device driver, mount all dlfs file systems, and export them.

The sample content of /etc/rc.dlfs is: strload -f /usr/lpp/db2_07_01/cfg/dlfs_cfg mount -v dlfs /datalinks_fs1 mount -v dlfs /datalinks_fs2 exportfs -a (this assumes that the dlfs file systems are listed in /etc/exports)

Problem: Files can be written to a DLFS but not read Symptoms The db2diag.log on the DB2 UDB server has no errors recorded.

The db2diag.log on the DLFM server has messages such as “Dest not valid for upcall” and “Expired or invalid token errors”.

The application errors on reading a file persists even though the DL_EXPINT database configuration parameter on the DB2 UDB server is set to 600 seconds. The read errors occur within the 600 seconds.

Solution Check the system times on the DLFM and DB2 UDB database server and ensure that they are synchronized. For example, a one hour difference in times between the DLFM server and the DB2 UDB server will expire the token immediately even though the DL_EXPINT database configuration is set to 5 minutes.

Chapter 16. Problem determination 289 Problem: Error mounting a DLFS file system, on Solaris

The following command is issued to try and mount a DLFS file system: dmins0/opt/IBMdb2/V7.1/instance$ ./dlfmfsmd /dlfs

The following error is returned: dlfs mount Error : Invalid argument umount : warning: /dlfs not in mnttab Explanation: An attempt to mount the specified file system has failed. User Response: Verify that the file system is defined. Correct any errors from the mount command and try again. DB1035E Failed to mount file system /dlfs

Solution Ensure that the server is booted in 32-bit mode and not 64-bit mode. The Solaris command isainfo -v displays the mode in which the server was booted.

Note: If the server is booted in 64-bit kernel mode, the isainfo -v command would show both the architectures (32-bit sparc and 64-bit sparcv9).

16.2.3 DB2 server problems This section outlines common problems with the DB2 server. In some cases, it also outlines the symptoms, and in all cases, offers solutions to correct the problem.

Problem: DB2 server cannot talk to DLFM When a DB2 instance that uses Data Links is started, DB2 attempts to connect to the File Managers that are registered with it. If a File Manager is not running, an error is written to the db2diag.log file, but users of the database using Data Links do not see any errors until they try to access a DATALINK value. This problem results in a SQL0357N error.

Another communication-related error, SQL0368N, can be caused by the DB2 database not being registered with DLFM or being registered incorrectly. This can also be caused by the Database manager configuration parameter DATALINKS being set to NO. DLFM will refuse a connection from a DB2 server that is not on the exact same release level and fixpak level as the DLFM server.

290 Data Links: Managing Files Using DB2 Symptoms After running the dlfm start command on the Data Links server, the dlfm see command indicates that DLFM is up and running, but one or both of the following conditions still exist: Application program receives a SQL0357N reason code, 03 return code, when attempting to SELECT, INSERT, or UPDATE a DATALINK value. db2diag.log on the DB2 server contains message indicating that DLFM is unreachable.

Solution Follow these steps: 1. Verify that the database, instance and hostname are registered correctly with DLFM. Log on as dlfm and issue the command: dlfm list registered databases 2. Verify that the DLFM server is correctly registered with the DB2 database. Connect to the DB2 database and issue the command: db2 list datalinks managers for database Check the hostname and port number that is registered. 3. Make sure the DATALINKS database manager configuration parameter is set to YES. On AIX, run the command: db2 get dbm cfg | grep DATALINKS 4. Run the db2level command on the DB2 server and on the DLFM server to verify that both are running the same version and fixpak level of DB2. 5. Make sure the DB2COMM registry variable on the DB2 server includes the value “TCPIP”. On AIX, enter the command: db2set -all | grep DB2COMM 6. On AIX, issue the following commands: a. db2stop command on the DB2 server b. iptrace command c. db2start on the DB2 server d. kill the iptrace process e. ipreport command This shows what DB2 is sending to DLFM when it tries to connect. Here is an example of using iptrace (see the man pages for options and details): iptrace -b -d -s -P TCP kill -9 ipreport -r -n -s >

Chapter 16. Problem determination 291 16.2.4 File system problems This section outlines common problems with the file system. Each problem, is then followed by the symptoms, in some cases, and a solution.

Problem: Can’t mount the DLFS file system The file system that contains the files that are managed by Data Links must be mounted as a dlfs file system. For the mount to succeed, the dlfsdrv device driver must be loaded and the file system must be defined as a dlfs file system in /etc/filesystems (on AIX).

Symptoms The symptoms may include: The mount command does not show file system mounted. The mount command shows file system mounted, but not as a dlfs file system. The mount -v dlfs command fails with one of the following two messages: dlfs mount Error : Function not implemented dlfs mount helper: Mount Unsuccessful Unmount the base file system

dlfs mount helper: Error in getting basefs type dlfs mount helper: No base file system specified

Solution Follow these steps: 1. Check to see if the dlfsdrv device driver is loaded as root: strload -qf /usr/lpp/db2_07_01/cfg/dlfs_cfg This command should return: /usr/lpp/db2_07_01/bin/dlfsdrv: yes If no is returned instead of yes, the dlfs device driver needs to be loaded: strload -f /usr/lpp/db2_07_01/cfg/dlfs_cfg 2. Check /etc/filesystems and verify the following settings for the file system: vfs = dlfs nodename = - (Make sure there are no trailing spaces after the dash) mount = false options = rw,Basefs=jfs If the file system is defined as a journal file system (jfs on AIX), convert it to a dlfs file system by running the dlfmfsmd command as root:

292 Data Links: Managing Files Using DB2 /usr/lpp/db2_07_01/instance/dlfmfsmd Here is the name of a jfs file system to be converted to a dlfs file system. 3. Check /etc/vfs and verify that there is an entry that identifies the helper programs for a dlfs file system. This entry will look something like this: dlfs 12 /usr/lpp/db2_07_01/bin/dlfs_mnthlp /usr/lpp/db2_07_01/bin/dlfs_fshelper

Problem: Clients cannot access files in the DLFS file system To read files in a file system that is managed by Data Links, clients may need several things. First, the file system must be accessible from the network. In other words, the file system needs to be NFS mounted on the clients. A prerequisite for this is that the file system has been exported on the Data Links server. Next, the clients may need read/write permission on the file system. If the files are linked using the READ PERMISSION DB option, clients need to read files with a valid access token.

Solution Follow these steps: 1. Make sure the dlfs file system is mounted on the client. If it is suspected that the mount is stale, unmount the file system on the client, validate that the file system is mounted on the server, export the file system on the server, and then remount the file system on the client. Stale mounts can occur if the file system is unmounted and then remounted on the server without exporting the file system. 2. Check the access permissions of the dlfs file system on the server and on the client. 3. Check the READ PERMISSION option on the DATALINK column. Section 3.4.5, “Querying DATALINK options” on page 74, discusses how to do this. If it is READ PERMISSION DB, make sure the client application is using the access token generated by DB2. It may be necessary to run a DB2 event monitor that traces the SQL statement activity of the application program to see this. DB2 UDB SQL Reference, SC09-2974, discusses how to create a DB2 event monitor. If the application uses the DLURLPATHONLY function to extract the pathname and filename from the DATALINK value, DB2 does not return an access token. Also, check the DL_EXPINT database configuration parameter. On AIX, this is: db2 get db cfg for | grep DL_EXPINT

Chapter 16. Problem determination 293 This determines the length of time (in seconds) for which the generated access token will be valid. If the application does not use the token within that period of time, the token will be rejected by DLFM, and the application will not be allowed to read the file.

16.2.5 Frequently Asked Questions (FAQs) This section answers some of common questions asked about the DB2 UDB database and Data Links File Manager (DLFM) environment. What file backup products can I use to backup Data Linked files? Data Links File Manager currently supports: – Disk – Tivioli Storage Manager – Net Backup – Legato – Any Backup Services API (XBSA) compliant applications Can I integrate disk backup systems with Data Linked files? No, file system backups of Data Link files can be made to be used to recover in the event of a disk crash. File system backups should not be used for any other form of recovery. The DB2 UDB Backup and Restore utilities should be used for Data Links recovery purposes. Can I use HSM functionality with Data Linked files? Yes, but only on the AIX platform. Can I mix and match DB2 versions between DLFMs and host databases? No, a DB2 server and any DLFM servers that are registered with it must be on the exact same release level and fixpak level of DB2. Can a DATALINK column in a single table reference file systems on different operating system platforms? Yes, any combination of the operating system platforms which support Data Links can be registered as a DLFM server. A maximum of 16 DLFM servers can be registered with a DB2 database, unless the DLFM server resides in a DCE/DFS environment, in which case, the limit is one. A DATALINK column in a single table can reference files that are managed by any of the DLFM servers that are registered with the DB2 database. What are the symptoms of temporary unavailability of the file system or the network in DFS case while workload is going on? – Will not be able to link files since the two-phase commit processing will fail. – Will not be able to change directories.

294 Data Links: Managing Files Using DB2 Is integrity compromised if DLFM is unavailable or if the file system is not currently mounted as DLFS? If the DLFM is unavailable and read permission is set to DB, integrity will not be compromised. If the file system is not mounted as DLFS, then integrity will be compromised. What are the restrictions in a DLFS environment? The ability to rename directories is restricted. What happens when the maximum number of backup copies maintained is exceeded? Backups that are marked as expired will be garbage collected. Is JDBC supported? Yes, JDBC can be used in a Data Links environment. Can I use capabilities of Data Linked files in an NFS environment? Are there any integrity issues? Yes, Data Linked files can be accessed through NFS. There are no integrity issues. However, caching at the NFS client may result in a user being able to access the READ PERMISSION DB file(s), even when the token has expired. What do I miss out if I use LOBs instead of Datalinks type? Are there functions if I still want to bring data into the database? See 3.3, “Data Links versus LOBs” on page 67. Would the Data Links control over a file system restrict normal file system activity for ordinary files, or for ordinary file system user accesses into the file system? All requests to access files that reside in a dlfs file system are intercepted by the dlfs helper programs to determine if the request will be allowed. Access to any file that is linked using READ PERMISSION FS will be allowed or disallowed based only on the file access permissions on the file. Access to any file that is linked using READ PERMISSION DB will be allowed only if a valid access token is supplied as part of the file name. For details about using an access token to read a file linked with READ PERMISSION DB, see 3.5.3, “Reading a linked file” on page 77. When does file backup of linked files occur, if recovery is set to yes? File backups of files that are linked occur asynchronously. When a file is linked, DLFM records the attributes of the linked file (name, creator, access permissions, size, etc.) in the DLFM_DB database, and returns control to the program that issued the SQL INSERT or UPDATE statement. DLFM maintains a queue of files that need to be backed up, and performs the backup of these files as resources permit.

Chapter 16. Problem determination 295 How can we tell that the copy daemon is busy or doing something before we decide to run a backup? – Use Operating System tools to see if there is any CPU activity with the copy daemon process (dlfm_copyd) on the DLFM server. – Run the command: retrieve query How can we tell that the Reconcile utility is not hung? Use Operating System tools to see if there is any CPU activity with the Reconcile utility agent on the DB2 UDB Server. Can we run more then one DLFM on a server? No, the DLFS is implemented using a kernel extension (see 4.1.4, “Multiple DLFMs on a single host” on page 94) that can communicate with a single DLFM, on one server. What kind of operations can continue if DLFM is down? – Backup – Restore – Rollforward – Reconcile Can I create a directory in a DLFS type file system if DLFM is down? No, the Data Link File System Filter needs to communicate with the DLFM.

296 Data Links: Managing Files Using DB2 A

Appendix A. BNF specifications for DATALINK

This appendix provides information on the Backus Naur Form (BNF) specifications for DATALINKs.

© Copyright IBM Corp. 2001 297 A DATALINK value is an encapsulated value that contains a logical reference from the database to a file stored outside the database. The data-location attribute of this encapsulated value is a logical reference to a file in the form of a Uniform Resource Locator (URL).

The following conventions are used in the BNF specification: | is used to designate alternatives. [] are used around optional or repeated elements. “” are used to quote literals.

Elements may be preceded with [n]* to designate n or more repetitions of the following element; if n is not specified, the default is 0.

The BNF specification for DATALINK is explained here:

URL

url httpurl | fileurl | uncurl | dfsurl | emptyurl

HTTP

httpurl “http://” hostport [“/” hpath ]

hpath hsegment *[ “/” hsegment ]

hsegment *[ uchar | “;” | “:” | “@” | “&” | “=” ]

Note that the search element from the original BNF in RFC1738 has been removed, because it is not an essential part of the file reference and does not make sense in DATALINK context.

FILE

fileurl “file://” host “/” fpath

fpath fsegment *[ “/” fsegment ]

fsegment *[ uchar | “?” | “:” | “@” | “&” | “=” ]

Note that host is not optional and the “localhost” string does not have any special meaning, in contrast with RFC1738. This avoids confusing interpretations of “localhost” in client/server and DB2 EEE configurations.

298 Data Links: Managing Files Using DB2 UNC

uncurl “unc:\\” hostname “\” sharename “\” uncpath

sharename *uchar

uncpath fsegment *[ “\” fsegment ]

Supports the commonly used UNC naming convention on Windows NT. This is not a standard scheme in RFC1738.

DFS

dfsurl “dfs://.../” cellname “/” fpath

cellname hostname

Supports the DFS naming scheme. This is not a standard scheme in RFC1738.

EMPTYURL

emptyurl “”

hostport host [ “:” port ]

host hostname | hostnumber

hostname *[ domainlabel “.” ] toplabel

domainlabel alphadigit | alphadigit *[ alphadigit | “-” ] alphadigit

toplabel alpha | alpha *[ alphadigit | “-” ] alphadigit

alphadigit alpha | digit

hostnumber digits “.” digits “.” digits “.” digits

port digits

Empty (zero-length) URLs are also supported for DATALINK values. These are useful to update DATALINK columns when reconcile exceptions are reported and non-nullable DATALINK columns are involved. A zero-length URL is used to update the column and cause unlink

Appendix A. BNF specifications for DATALINK 299 Miscellaneous definitions

lowalpha “a” | “b” | “c” | “d” | “e” | “f” | “g” | “h” | “i” | “j” | “k” | “l” | “m” | “n” | “o” | “p” | “q” | “r” | “s” | “t” | “u” | “v” | “w” | “x” | “y” | “z”

hialpha “A” | “B” | “C” | “D” | “E” | “F” | “G” | “H” | “I” | “J” | “K” | “L” | “M” | “N” | “O” | “P” | “Q” | “R” | “S” | “T” | “U” | “V” | “W” | “X” | “Y” | “Z”

alpha lowalpha | hialpha

digit “0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9”

safe “$” | “-” | “_” | “.” | “+”

extra “!” | “*” | “'” | “(” | “)” | “,”

hex digit | “A” | “B” | “C” | “D” | “E” | “F” | “a” | “b” | “c” | “d” | “e” | “f”

escape “%” hex hex

unreserved alpha | digit | safe | extra

uchar unreserved | escape

digits 1*digit

Leading and trailing blank characters are trimmed by DB2 while parsing. Also, the scheme names ('HTTP', 'FILE', 'UNC', 'DFS') and host are case-insensitive. They are always stored in the database in uppercase.

300 Data Links: Managing Files Using DB2 B

Appendix B. Overview of DCE-DFS on AIX

This appendix introduces Transarc’s Distributed Computing Environment-Distributed File Service (DCE-DFS) on IBM-AIX. It also includes detailed, high-level information on the DCE-DFS concepts on AIX.

Note: Refer to Administering IBM DCE and DFS Version 2.1 for AIX and OS/2 Clients, SG24-4714, to learn more about the administration of DCE-DFS.

The Distributed Computing Environment (DCE) is a cross-platform, comprehensive, integrated set of services that supports the development, use, and maintenance of distributed computing applications. The availability of a uniform set of distributed computing services gives applications an effective means to harness the power inherent in networks of computers that may otherwise be unused.

DCE has the following main services: Distributed File Service (DFS) Time Service Cell Directory Service Security Service Threads Service

Figure B-1 shows the layout of various services of DCE.

Distributed Applications DCE

Distributed Time File Service Service

Remote Procedure Call

Cell Directory Security Service Service

Threads Service

Transport Services/Operating System

Figure B-1 DCE architecture

302 Data Links: Managing Files Using DB2 DCE provides a communications environment that supports information flow from wherever it’s stored to wherever it’s needed, without exposing the network's complexity to the end-user, system administrator, or application developer.

DCE encompasses all of the facilities necessary for building distributed applications. It integrates all of these services into a single, logical structure that enables programmers and administrators to develop and manage distributed applications as easily as traditional, single-system programs.

In DCE, the cell is the basic unit of operation. A cell consists of from one to several thousand systems that share an administratively independent installation of server and client machines, a unified DCE Cell Directory Service (CDS) naming environment, and a common authentication server and database. Multiple cells can exist at one geographical location. It is also possible for DFS machines at geographically distant locations to belong to the same cell. However, a machine can belong to only one cell at one time.

DCE is based on many formal and de facto standards, including: Internet TCP/IP protocols POSIX 1003.4a draft threads and POSIX 1003.6 draft ACLs CCITT X.500/ISO 9594 Directory Service Internet DNS and Network Time Protocols (NTP) standards X/Open Directory Service (XDS) and X/Open Object Management (XOM) application Programming interfaces Internet GSS API

Distributed File Service (DFS)

The Distributed File Service is a DCE application that provides global file sharing. Access to files located anywhere in the interconnected DCE cells is transparent to the user. To the user, it appears as if the files were located on a local drive. DFS servers and clients may be heterogeneous computers running different operating systems.

The DFS has its origin from the Transarc Corporation' s implementation of the Andrew File System (AFS) from Carnegie-Mellon University.

The DFS is built onto and integrated with all of the other DCE services. It has the following main features: Location transparency Uniform naming Good performance

Appendix B. Overview of DCE-DFS on AIX 303 Security High availability File consistency control NFS inter operability

The DFS distributed file system is a high-performance, scalable, secure method for sharing remote files. DFS appears to the user as a local file system, providing access to files from anywhere in the network for any user, with the same file name used by all (that is, uniform file access). DFS includes many advanced features not found in traditional file systems. It includes scalability and security over wide area networks, which greatly enhance DFS performance and, at the same time, simplify administration.

These services include: Distributed File System Client: The Distributed File System Client makes requests for file data from file servers and maintains caches of commonly requested information. Through sophisticated protocols, the client ensures that file updates made by multiple users are coordinated so that a single file image is seen by all users. Base File Service (BFS): The BFS distributed file system server provides file data from existing local file systems to DFS clients. Using BFS, an administrator can make existing data from a UNIX File System (UFS), JFS, Veritas, CD-ROMs, and other physical file systems available to DFS clients. Enhanced File Service (EFS): The EFS distributed file system server provides features that greatly increase the availability of information and further simplify the administration of DFS. The EFS delivers the ability to replicate, back up, and even move different parts of the DFS file system without interrupting service to the end user. Through the use of copy-on-write technology, EFS can maintain entire snapshots of backed-up file data for on-line access to previous versions of files. EFS enables the use of access control lists (ACLs) on files and directories stored in DFS for fine-grained control over access to data. The EFS also includes a high-performance, log-based physical file system for fast server restart. DCE NFS-to-DFS Secure Gateway: The NFS-to-DFS Secure Gateway provides uniform and secure access to DFS files from NFS clients. The gateway provides an easy migration path for the introduction of DFS into environments with widely installed NFS clients. DFS and the Web: DFS combines its replication, unique file names, security and scalability to meet the demands of growing Web sites. And DFS Web Secure is the ideal tool for protecting corporate security as you expand access to your enterprise via the Web.

304 Data Links: Managing Files Using DB2 DFS Name Space In a distributed computing environment connecting many workstations, a user likely will have access to several different computers. For example, a user in New York might prepare a document for a meeting in Europe using an office computer, and later amend the document from a computer in Munich. For this reason, a distributed computing environment should support global file names. One mechanism that allows the name of a file to look the same on all computers is called a uniform name space. Without such a mechanism, users might have difficulty finding files as they move from computer to computer and might have to return to the workstation on which they created their files to make updates efficiently.

DCE-DFS solves this problem by providing an enforced uniform name space. It specifies a naming convention with which all installations must comply. DFS file access is consistent, regardless of which computer is being used or by whom. In addition, the DCE DFS naming system is designed to provide a global name space across all DFS installations. As a result, all DFS installations taken together appear as one worldwide file system.

Figure B-2 shows an example of Cell Directory Service (CDS) entry in Domain Name Service (DNS) format.

DNS Format /.../almaden.ibm.com/fs/usr/ricardoh/games/tictactoe.exe

Cell root File System Directory File name

CDS Entry into DFS Figure B-2 CDS entry in DNS format

Note: The local cell can also be abbreviated to: /:/usr/ricardoh/games/tictactoe.exe

The /: abbreviation represents /.../local_cell/fs.

Table B-1 summarizes some of the common terms used in the DCE-DFS environment.

Appendix B. Overview of DCE-DFS on AIX 305 Table B-1 Some commonly used terms in DCE-DFS environment

Term Explanation

DMAPI Data Management Application Programming Interface (DMAPI) is a user-level programming interface to logical extensions of the operating system. It supports data management applications that typically require intercepting file system operations in a manner that is transparent to file system applications.

DCE-Cell A DCE-Cell consists of a collection of machines that fall within a single DCE administration domain.

DFS SMT DFS Storage Management Toolkit (DFS SMT) is an implementation of DMAPI for the DFS Local File System (DCE LFS). It provides some extensions to DMAPI to handle certain DFS specific aspects.

Aggregate Aggregate is a logical unit of disk storage, similar to a disk partition. The DCE LFS aggregate is a logical volume that has been formatted as a DCE LFS physical file system by the DFS newaggr command. A DCE LFS aggregate can contain multiple DCE LFS filesets. A standard UFS exported into the DFS file space by a DFS File Server is referred to as an aggregate or a non-LFS aggregate, which can contain only one fileset.

Fileset Fileset is a hierarchical grouping of files, managed as a single unit; this is the basic unit of data administration in DFS. DCE LFS supports multiple filesets within a single aggregate. When UFS is used with DFS, the entire file system is considered one fileset.

DCE LFS It is the log-based high performance physical file system provided with DFS. The DCE LFS supports multiple filesets within a single aggregate, fileset replication, fast system restarts, and DCE access control lists.

Non-LFS Non-LFS refers to OS native file systems (JFS on AIX).

DM-enabled DM-enabled aggregates are aggregates that have been enabled with Data Management.

Events Events are the foundations of DMAPI. In this paradigm, the operating system informs a DM application running in the user space when a particular event occurs (pertaining to the file system). Events may be: Synchronous: A token identifies the event message and a response to each event message is a must to avoid the system or calling applications from hanging Asynchronous: No token is involved and does not require any response from the DM application; these are mostly used for logging purposes

306 Data Links: Managing Files Using DB2 C

Appendix C. VPM and Data Links

This appendix demonstrates a methodology on how IBM middleware (DB2 and Data Links) can provide solutions for Data Archive and restoration on a large enterprise basis, that is, when specifically working with IBM and Dassault Systèmes CATIA and VPM. This appendix provides details on how the systems work together and the different options that are available.

Data Links technology has been supported in VPM since the general availability (GA) of VPM 1.2. This technology support provides four primary capabilities: Logical data consistency: For example, an engineer cannot delete or rename a file that is referenced by its corresponding part description in the database. Transaction consistency: If a transaction is rolled back in the database, the link to the appropriate version of the file at this site is maintained. Security and access: Files controlled by Data Links can either be totally protected by the database preventing unauthorized file system access, or opened to allow file system access. Synchronized backup and recovery: Using DB2 with Data Links ensures consistent backup and recovery of ENOVIAVPM meta data and the associated CATIA models. This makes the overall process more automatic and less database administrator (DBA)-intensive. In the past, administrative tasks were performed outside of the CATIA environment, requiring a separate backup strategy for external CATIA files, which introduced a large risk of inconsistencies between the database and related external files.

You need to follow these steps to assemble DB2, VPM, and Data Links: 1. Install the DB2 UDB server. 2. Install DB2 CAE for VPM Clients. 3. Create VPM DB. 4. Set up DB2 CAE Client communications. 5. Install VPM on a client and populate the UDB database. 6. Install DB2 Data Links. 7. Enable VPM for Data Links storage.

You can find instructions for installing and configuring DB2 Data Links in IBM DB2 Data Links Manager for AIX, Quick Beginnings, GC09-2837.

CATIA 422R1 This installation uses CATIA V4.22 R1, and the installed PTFs will vary. You should contact your local geography's CATIA level 1 support organization.

VPM 1.3 VPM 1.3 PTF1 is known by APAR HC64371 with PTF UB79557, UB79556, UB79561, UB79563, and UB79564 for AIX.

VPM 1.3 PTF2 is known by APAR HC67387 with PTF UB80888, UB80890, and UB80893 for AIX.

VPM 1.3 PTF3 is known by APAR HC69972, with PTF UB81606, UB81614, UB81599, UB81611, and UB81593 for AIX.

To create the VPM database on the UDB server, you must first create an empty database on the UDB server. Optionally, you can install VPM on the UDB server or simply mount the VPM code and administrator's file systems from an installed client workstation with NFS. After the empty database is created, the population of the VPM data structures can be performed on the server. It is not the intention of this document to describe how to install CATIA or VPM. We assume that we are starting with a working database and system.

There is a CATIA PTF that is required to support more than one Data Link Manager while VPM is in operation. By default, without APAR HC68395 or PTF UB81274 & UB81275, you can only work with one hardcoded (and declared) Data Link File Manager.

308 Data Links: Managing Files Using DB2 DB2 6.1 level used For the purpose of this exercise, we used Fixpak 5 for DB2 6.1. This is delivered on PTF U472727 (AIX). Remember, after you install this PTF on a Universal Database server, you must update your database instance where your VPM database is located: db2iupdt

Then, rebind your VPM database. On a Data Links File Manager (DLFM), you must update the dlfm instance: dlmupdt

Then rebind it: dlfm bind

We will start a DLFM installation from scratch. Also remember to install Fixpak 5 on the DB2 server, the Data Links File Manager nodes, and the clients.

Installing DB2 Data Links Manager 6.1 GA Here we assume that the VPM client and the VPM database are on the same node, and the Data Links File Manager is on a separate physical node. For the purposes of DB2 and its Data Links File Managers, ports need to be defined in the /etc/services file. In our example, we use 50100, as suggested by DB2 Data Links Manager Quick Beginnings, GC09-2966. If you used the DB2 installer (db2setup) from the CD-ROM, these can be generated automatically.

Software levels, Fixpak 5 You need Fixpak 5 to run this installation. You can find Fixpak 5 for DB2 6.1 on the Web at: ftp://ftp.software.ibm.com/ps/products/db2/fixes/english-us/db2aixv61/

On this site, you will find the files that are listed in Figure C-1.

Appendix C. VPM and Data Links 309 Figure C-1 DB2 V6 Fixpak 5

The file named bnd.tar is a collection of the client bind files.

Preliminary installation steps Before you install the Data Links Manager software, there are a few steps that you must perform. You need to create new journaled file systems (JFS) to be used by either the DB2 code, the dlfm administrator, or the DLFF backup directory/file system. Depending on what backup rules there may be, at the minimum, the backup directory will be used to store database backups of the DLFM_DB database that is created during the installation of Data Links. You also need to create a group and user that will be the DLFM administrator (and DLFM instance owner). The steps are outlined here: 1. When creating your file systems, use this list as a reference (your choices may be different depending on your installation). Table C-1 Creating your file systems

File system name File system size Description

/home/dlfm 105 MB dlfm instance home

/home/dlfm/dlfmbackup 65 MB db backups

/usr/lpp/db2_06_01 as needed actual DB2 s/w

Note: Do not mount these file systems yet.

310 Data Links: Managing Files Using DB2 2. Do not create your Group/User for the DLFM Instance Owner if using RDIST. The DB2 Installer (db2setup) does this for you in V 6.1. 3. Modify the file system stub points to match the new dlfm user ID. Then mount the file systems from step 1.

Data Links post-installation From now on, we assume that Data Links has been successfully installed and that the installation has been verified.

The DB2 installer should have placed the following list of variables in the db2profile or local profile (.profile)/dtprofile for the dlfm administrator: DLFM_PORT=port_number DLFM_LOG_LEVEL=LOG_ERR DB2_RR_TO_RS=ON DB2_HASH_JOIN=ON DLFM_INSTALL_PATH=$HOME/sqllib/bin DB2INSTANCE=dladmin_username DLFM_BACKUP_DIR_NAME=$HOME/dlfmbackup

The following values were set for our example: DLFM_PORT=50100 DLFM_LOG_LEVEL=LOG_ERR DB2_HASH_JOIN=ON DB2_RR_TO_RS=ON DLFM_INSTALL_PATH=$HOME/sqllib/bin DB2INSTANCE=dlfm DLFM_BACKUP_DIR_NAME=$HOME/dlfmbackup

Time management: A very important aspect of the Data Links technology depends on the time synchronization that exists between the Data Links File Manager and the UDB database for which it is configured.

If there is a time difference of more than the expiry time of the token, you will be unable to access the files stored in the Data Link File Managers. This is also important for point-in-time recovery. Now is a good time to synchronize your machine time and time zone information. There is also an AIX daemon called timed that can broadcast a network time which makes synchronization much easier.

Appendix C. VPM and Data Links 311 Refer to AIX Version 4.3 System Management Guide: Communications and Networks, SC23-4127. This book contains reference information on Advanced Interactive Executive (AIX) operating system commands. It also describes the tasks that each command performs, how commands can be modified, how they handle input and output, and who can run them. Plus it provides a master index for all six volumes.

Making Data Links work with VPM

You should complete the steps in the following sections to make Data Links work with VPM.

At the Data Links server (file server) Starting on the Data Link server, follow these steps: 1. Create a JFS file system /test and register it with DLFS (using dlfmfsmd script). 2. Register the /test file system with the DLFM by issuing the following command: dlfm add_prefix /test 3. The VPM database vpmdb1 (residing at the DB2 server) should be registered with DLFM. If this database resides in the db2adm instance on a machine called ibm3 (the DB2 server), issue the following command: dlfm add_db vpmdb1 db2adm ibm3 4. Start the DLFM by issuing the following command: dlfm start 5. Create the directory called pictures on the file system /test, by entering the following command: mkdir /test/pictures 6. Change the permissions of the pictures directory that you just created so that any user can create a file in that directory by entering the following command: chmod 777 /test/pictures 7. Create a file called paulz.bmp in the /test/pictures directory, to be managed by the Data Links File Manager, by entering the following command: echo “This is a picture of Paul Zikopoulos” > /test/pictures/paulz.bmp

312 Data Links: Managing Files Using DB2 At the DB2 server Continue the process on the DB2 server by following these steps: 8. Log on to the system with a valid DB2 user ID that has System Administrative (SYSADM) authority on the DB2ADM instance that you created.

Note: By default, any user that belongs to the primary group of the instance owner has SYSADM authority on an instance.

9. Run the db2profile or db2cshrc script as follows: /sqllib/db2profile (for Bash, Bourne or Korn shell) source /sqllib/db2cshrc (for C shell) Here, is the home directory of the instance owner (in this case, DB2ADM). 10.Start the DB2ADM instance by entering the command: db2start 11.Register the Data Links server that will control the files that are linked by a DATALINK data type by entering the following command: db2 “add datalinks manager for database vpmdb1 using node ibm3 port 50100” 12.Connect to the VPMDB1 database by entering the following command: db2 connect to vpmdb1 13.Create a table called EMPLOYEE in the VPMDB1 database that you just created, that has a column defined with a DATALINK data type, by entering the following command: db2 “create table employee ( id int, fname varchar(30), lname varchar(30), picture datalink linktype url file link control integrity all read permission db write permission blocked recovery yes on unlink restore )”

Appendix C. VPM and Data Links 313 Note: These options give a read permission to all users, but block them from writing. If and when a file is unlinked from this table, the file is restored or deleted from the file system. This includes operations in VPM as simple as a CATIA File->Save (overwrite) or a New Model Revision. In our example, we used restore. When the unlink option is set to delete, the previous model copy is deleted. This option (delete) is desired if you have implemented a backup strategy (Tivoli Storage Management, for example) that can manage the archived or backed up files as they are created. If you are not using Tivoli Storage Management, and you want to maintain backup versions (on a daily, weekly, or other basis), set the option to RESTORE.

14.Insert an entry into the EMPLOYEE table that you created by entering the following command: db2 "insert into employee values (001,'Paul','Zikopoulos', dlvalue('http://ibm4/test/pictures/paulz.bmp'))"

Back again at the Data Links server Return to the Data Links server, and log on to the system as any user (except as a user with root authority, or as the DB2 Data Links Manager Administrator).

Verify that the paulz.bmp file is now controlled by the Data Links File Manager by entering the following command: cat /test/pictures/paulz.bmp

If this file is being controlled by the Data Links File Manager, you receive the following error: Cannot open /test/pictures/paulz.bmp.

VPM and Data Link tokens This section explains how to force Data Links to generate tokens compatible with VPM and how to use them in conjunction with VPM.

Uppercase tokens For VPM operations to be successful, they require that you change a Database parameter in the VPM database to generate a Data Link token with all uppercase letters. This can be enabled by the database configuration parameter DL_UPPER = YES.

314 Data Links: Managing Files Using DB2 You can use the following commands to enable this setting. Login to the DB2 administrator for the VPM database (on the DB2 server): db2 “connect to vpmdb1” db2 “update db cfg for vpmdb1 using DL_UPPER 'YES'”

Using the Data Links access token to access a file The access token provides an application with a way to secure the access of a file, giving the right only to the users that request this access. In the case of VPM, models stored in a DLFS can be opened so that users can access the files through the directories, or only through database authorization. Database authorization is granted via VPM.

A Database Configuration Parameter is used to control the expiration length of time (in seconds) that a Data Links Token can be valid. The token is granted when a view of the table (an SQL select statement) is taken. This token is then valid for the length of time specified for the Database parameter DL_EXPINT. If another view of the table is taken, a new token is generated. It is also valid for the same length of time. Both tokens would expire after their DL_EXPINT periods have ended.

In the case of a remote DLFS that is used to store documents in a wide area network (WAN), the token only needs to be set for as long as it takes for a valid file to be opened. If a CATIA model must be transferred over a telecommunications line, the token would only need to be valid for the amount of time it takes to start opening the model, not the entire time for the model to be read into memory.

Changing the expiry token By default, the access token that is returned is only valid for 60 seconds. This means that once you enter this command, you only have 60 seconds to complete the remaining steps in this section (or edit any Data Links controlled file). You can change the default expiration time by changing the DL_EXPINT database configuration parameter.

To change the default expiration time for an access token to 10 minutes (the value is entered in seconds), enter the following commands on the database server: db2 update db cfg for staff using dl_expint 600 db2 terminate db2 connect to database vpmdb1

If you change a setting for any database configuration parameter, you must always reconnect to the database for the changes to take effect.

Appendix C. VPM and Data Links 315 Obtaining an access token Start the DB2ADM instance by entering the db2start command. Connect to the VPM database by entering the following command: db2 connect to vpmdb1

Select the controlled file for update by issuing an SQL SELECT statement, such as: db2 "select dlurlpath(picture) from employee where lname = 'Zikopoulos'"

This command returns the full path name with an access token of the form: /;

Note the following explanation: : The fully qualified path of the controlled file. : An encrypted key assigned by the database manager. : The name of the file that is under the control of a Data Links File System Filter.

In our example, the access token that you receive is similar to this example: /test/pictures/HVJ5NXGC0WQ.I5KKB6;paul.bmp

This key is used to read this file on the Data Links server.

For a complete urlpath for the object, issue the following command: db2 "select dlurlcomplete(picture) from employee where lname ="Zikopoulos"

The system responds with a URL:

HTTP://HOSTNAME/DIRECTORY_NAME/Token_key;FILENAME

Verify that you can access the file that is under the control of the Data Links File Manager. In our example, enter the following command: cat "/test/pictures/;paulz.bmp"

Here, is the encrypted key that you recorded in the previous step.

You should receive the following output from this command:

This is a picture of Paul Zikopoulos

316 Data Links: Managing Files Using DB2 Adapting VPM to work with Data Links This section explains how to alter the VPM tables to make them compatible with Data Links.

Altering the VPM table CDM.INFO_LF How you define the DATALINK column in CDM.INFO_LF will affect security and recovery of the CATIA model files. The following example guarantees referential integrity. It prevents unauthorized access to external applications (like CATIA File/Open). Also, as files are unlinked (re-written), the older versions are deleted from the Data Links File systems: db2 alter table CDM.INFO_LF ADD CUR_DATALINK DATALINK LINKTYPE URL FILE LINK CONTROL INTEGRITY ALL READ PERMISSION DB WRITE PERMISSION BLOCKED RECOVERY YES ON UNLINK RESTORE

Read Permission DB specifies that in order to access the model file, authorization through VPM must be used. The option for UNLINK DELETE (or UNLINK RESTORE) alters the behavior or an unlinked file. With Data Links implemented, as a user unlinks (or deletes) a VPM model, the most recent previously written version in the DLFMBACKUP directory is restored to the Data Links file system. To allow for garbage collection of the unwanted versions of backup files, you should specify ON UNLINK DELETE.

DATALINK options for VPM These are the DATALINK options that you should use when VPM is working with Data Links: INTEGRITY ALL Any model referenced by a DATALINK column is under the control of the Database Manager and may not be deleted, renamed, or copied using standard file system commands. READ PERMISSION FS/DB When set to DB, model files can be read only by VPM. When read permission DB is used, VPM must obtain an encrypted token from DB2 and use it to open the file. When set to FS, application access is granted to the Data Link file system, based on file system permissions. WRITE PERMISSION BLOCKED When write permission blocked is used, DB2 does not allow linked files to be modified. To modify a file, VPM performs the following steps:

Appendix C. VPM and Data Links 317 a. Makes a copy of the linked file b. Makes changes to the copy c. Unlinks the original file d. Links the modified file RECOVERY YES This option allows point in time recovery of VPM and model data. This means that you can restore a backup database image and then roll the logs forward to a point in time. You should use this option so that models can be recovered, if they were ever to be lost. ON UNLINK RESTORE/DELETE For RESTORE, when VPM deletes a model, the (unlinked) file will be returned to its previous AIX owner and file permission set. For DELETE, the file is erased.

DATALINK column options in the database If you want to see the current settings of the DATALINK column in the VPM database, you could use the following DB2 statements.

Log in as the DBA for the VPM database: db2 "connect to VPMDB1" db2 "select COLNAME,DL_FEATURES from SYSIBM.SYSCOLPROPERTIES"

You should receive a listing similar to this example: COLNAME DL_FEATURES CUR_DATALINK UFADBYD

Figure C-2 illustrates how the DL_FEATURES can be interpreted.

318 Data Links: Managing Files Using DB2 DL_FEATURES column from the SYSIBM.SYSCOLPROPERTIES table

UFADBYD

On Unlink D=Delete Recovery Y=Yes Write Permission B=Blocked Read Permission D=DB Integrity A=ALL Link Control F=FILE Linktype U=URL Figure C-2 Interpreting DL_FEATURES values

CATIA and VPM declarations The following definitions must be added to the CATIA declaration series. This must be completed for both the VPM and CATIA sides. catcdm.DBLFCAT_AUTHORIZATIONS = 'rw-rw-rw-' ; catcdm.DBLFAIX_ALGO = 'DELETE_RR' ; catcdm.DBLFAIX_OLD_SUFFIX = '' ; catcdm.DATALINK_SERVER : set of STRING ; catcdm.DATALINK_SERVER = 'dlfm_machine,/dlff_filesystem1' ; catcdm.DATALINK_SERVER = 'dlfm_machine,/dlff_filesystem2' : catcdm.DBLFCAT_NOSHOW_PATH = 'TRUE' ;

Where 'dlfm_machine' is the nodename of your DLFM, and '/dlff_filesystem1' is the physical name of the DLFF file system. The declaration catcdm.DBLFAIX_ALGO is set in the delivered MECCDM.dcls.

Appendix C. VPM and Data Links 319 The declaration catcdm.DBLF_OLD_SUFFIX is also set in the delivered MECCDM.dcls. The declaration catcdm.DBLFCAT_AUTHORIZATIONS is set in the delivered CATCDM.dcls. The declaration for catcdm.DBL FCAT_N OSHOW_PATH is currently set to FALSE in the DCLS file CATCDM.dcls. Please feel free to modify these declarations that are already in your DCLS set. Then simply add the remaining catcdm.DATALINK_SERVER declarations to either CATCDM.dcls or MECCDM.dcls.

Mounting the DLFS file system on the VPM clients Each VPM client that needs to write to the DLFS (/test in our example) needs to NFS mount the DLFS file systems.

Use the following mount characteristics. Log on as root and issue following command: mount -o noac dlfmsrv:/test /test

Here dlfmsrv is the DLFM nodename and /test is the NFS exported file system.

Seeing how the files are manipulated by Data Links At the point where you have stored your model in the Data Links storage environment, it is important to understand that whether you are using the VPM drivers of DBLFCAT or DBLFAIX. You must let VPM name the models for you. The reason for this is that every time a CATIA model is File->Saved, or re-stored, a new file is generated in the Data Links storage area. If your users allocate a model name, it is impossible to re-save your model.

You can use the following process to write and update models in VPM, so that Data Links has the controls.

Writing a model If the user uses the VPM method for Model Create, they would see this screen (Figure C-3).

320 Data Links: Managing Files Using DB2 Figure C-3 Creating a model in VPM

1. Select the Create & Save icon (Figure C-4).

Appendix C. VPM and Data Links 321 Figure C-4 Creating and saving a model

2. Fill in these attributes. Note that for the repository field, we selected the DBLFCAT driver, and for the directory, we specified my Data Links file system, which is mounted locally. 3. Clicking OK gives you a warning of some kind (Figure C-5), like a new part/model.

Figure C-5 Confirm Write

322 Data Links: Managing Files Using DB2 Then, the model is saved (Figure C-6).

Figure C-6 Saved model in VPM

4. Now, if you go to a window on your VPM client and change the current working directory to the NFS mounted Data Links File system, and run a list, you will see this file there, still available to you and other users logged into your client for read-only (Figure C-7).

Appendix C. VPM and Data Links 323 Figure C-7 Read-Only file

This file stays in this format (not fully Data Linked), until either another model is written, or you re-save this same model. 5. Open your model into CATIA, by selecting the model we created a few steps back (Figure C-8).

324 Data Links: Managing Files Using DB2 Figure C-8 Opening a model in CATIA

6. Double-click the Model definition shown above. CATIA then opens and displays your model (Figure C-9).

Appendix C. VPM and Data Links 325 Figure C-9 A model in CATIA

7. At this point, if you simply click File->Save, or press CTRL+S from your CATIA session, the model is re-filed. 8. You should then open an aixterm or any other type of window, and change directories to the Data Links file system. If you ask for a detailed list, you see code similar to the example in Figure C-10.

326 Data Links: Managing Files Using DB2 Figure C-10 File under Data Links control now

You can now see that DB2 Data Links has fully taken control of your model. You should also notice that the model file name is also different. What is the reason for this? Data Links Version 6.1 does not support update-in-place of linked files. This means that to change a linked file, VPM must perform the following tasks: 1. Make a copy of the linked file. 2. Make changes to the copy. 3. Unlink the original file. 4. Link the modified file.

Whenever a file is linked, a backup copy of this file is written to the directory pointed to by the DLFM_BACKUP_DIR_NAME variable on the DLFM server. This means that every time a model is saved, a copy of the new version is written to the backup directory.

If you want to see your prior file, log into your DLFM as root or dlfm, and change your current working directory to the directory identified by the variable DLFM_BACKUP_DIR_NAME. In this directory, you see backed up copies of the DLFM_DB database, and a subdirectory that includes the DLFM’s file system, in our case test (see Figure C-11).

Appendix C. VPM and Data Links 327 Figure C-11 Backup directory

You see all of the backup copies that have been written to the DLFM. The file you are looking for should be found in the display shown in Figure C-12.

Figure C-12 Files backed up under the Backup directory

328 Data Links: Managing Files Using DB2 Note: If you attempt to delete this model from VPM, you will encounter an Abend S0004. This defect has been reported to Dassault Systèmes. This abend was encountered using the Data Links UNLINK options of RESTORE and DELETE. The file should have been returned to the Data Links file manager file system, with the original user's permissions and ownership in the case of RESTORE, and deleted in the case of DELETE.

Model storage methods VPM uses three different methods for writing models. These are: DBLFCAT: This is a file system based method that writes models using the insertion of a model read header so that the file can be opened from an AIX file system. This would allow a CATIA user to perform a File/Open operation from within CATIA. DBLFAIX: This is another file system based writing method that writes models without a Model header. They are unreadable when accessed via a file system read operation. DBLFCDM: This method writes long field blobs into the CDM relational database.

With the Data Links PTFs applied, any or all of these write methods will work. If you want to restrict the use of any of these methods, the normal method is to customize your environment profiles, and use a User Exit called DMUSLF, or CheckLFBeforeWrite.

Additional information

All along the deployment of CATIA or ENOVIA solutions, functional and technical architectures interact. They need to be consistent when Digital Enterprise requirements are often evolving to require global teams, distributed databases, and ever increasing integration of function to support digital mock-up and virtual product, process, resource, and model management.

In April 1999, IBM and Dassault Systèmes announced the establishment of the IBM/Dassault Systèmes International Competency Center (IDSICC) located at Dassault Systèmes headquarters in Suresnes, France. The IDSICC is staffed by a team of highly skilled developers from both IBM and Dassault Systèmes. All have extensive hands-on experience with implementing IBM and Dassault Systèmes Digital Enterprise solutions for customers.

Appendix C. VPM and Data Links 329 The center's mission is to provide worldwide technical experience and comprehensive Digital Enterprise system recommendations to customers as well as to IBMers and IBM business partners to address every phase of development from research to implementation and production.

IBM is the only technology provider in CATIA/ENOVIA solutions to develop a competency center with this mission of total solution optimization. So IBM provides you with the most powerful combination available in the industry through the integration of IBM key e-business technologies and services to implement the Digital Enterprise vision.

Skilled engineers from each of these IBM labs are part of the core staff within the center working with customers to provide integration support and total enterprise solution implementation.

330 Data Links: Managing Files Using DB2 D

Appendix D. Logging priorities for DLFF and DLFSCM

SYSLOG is a system log file on the UNIX platforms, in which all the messages are logged by the kernel. The path of this file can be found in the /etc/syslog.conf file. Only the root user can edit this file and alter the path for SYSLOG. The DLFF in AIX, Solaris, and DCE-DFS environments log messages in this file. There are two tunable priority levels for logging in DLFF: The message logging priority The module logging priority

On Windows, DLFF uses an event log mechanism to log the messages. Instead of having two logging levels, it has a single dynamically tunable parameter.

This appendix discusses how to modify the logging priorities (or levels) on DLFF on AIX, Solaris and Windows environment, and for the DLFSCM on DCE-DFS environment.

You can modify the logging level for the Data Links File System Filter (DLFF) by changing the dlfs_cfg file. The dlfs_cfg file is passed to the strload routine to load the driver and configuration parameters. The file is located in the /usr/lpp/db2_07_01/cfg directory. Through a symbolic link, the file can also be found in the /etc directory. The dlfs_cfg file has the following format: d - 0 1

Here: d: This parameter specifies that the driver is to be loaded. : The driver name is the full path of the driver to be loaded. For instance, the full path for DB2 Version 7 is /usr/lpp/db2_07_01/bin/dlfsdrv. The name of the driver is dlfsdrv. : This is the vfs entry for DLFS in /etc/vfs. : This is the user ID of the owner of the READ PERMISSION DB files. : This is the global message priority. : This is the global module priority. 0 1: These are the minor numbers for creating non-clone nodes for this driver. The node names are created by appending the minor number to the cloned driver node name. No more than five minor numbers can be given (0 to 4).

A real-world example might look like this:

d /usr/lpp/db2_07_01/bin/dlfsdrv 14,208,255,-1 - 0 1

The messages that are logged depend on the settings for the global message priority and global module priority. To tune DLFF logging, you can change the value for these global priorities.

There are four message priority values you can use:

#define LOG_EMERGENCY 0x01

#define LOG_TRACING 0x02

#define LOG_ERROR 0x04

#define LOG_TROUBLESHOOT 0x08

332 Data Links: Managing Files Using DB2 These values can be added together, depending on the level of logging you want. Do not worry much about the module logging priorities. The final value of priority that is calculated by the DLFF is done by using a bit-wise AND operation on these two logging priorities values. But in case you want to log any particular DLFF operation (generally not used), you can modify the module priority using one or more (by adding them together) of the following values:

#define LOG_LOOKUP_NORMAL 0x01

#define LOG_LOOKUP_TOKEN 0x02

#define LOG_OPEN 0x04

#define LOG_CLOSE 0x08

#define LOG_RENAME 0x10

#define LOG_REMOVE 0x20

#define LOG_MKDIR 0x40

#define LOG_GETATTRIBUTE 0x80

#define LOG_SETATTRIBUTE 0x100

#define LOG_CREATE 0x200

#define LOG_UPCALLMESSAGES 0x400

#define LOG_LOADUNLOAD 0x800

#define LOG_MOUNT 0x1000

#define LOG_UNMOUNT 0x2000

#define LOG_VFSROOT 0x4000

#define LOG_VFSSTAT 0x8000

#define LOG_VFSVGET 0x10000

#define LOG_IOCTL 0x20000

#define LOG_SUBROUTINE 0x40000

#define LOG_OTHERS 0x80000

#define LOG_INITIALIZE 0x100000

#define LOG_GETACL 0x200000

#define LOG_SETACL 0x400000

#define LOG_ACCESS 0x800000

Appendix D. Logging priorities for DLFF and DLFSCM 333 #define LOG_INACTIVE 0x1000000

#define LOG_SYMLINK 0x2000000

#define LOG_HARDLINK 0x4000000

Consider this example. Suppose you want to log only the error and emergency messages for the open, mkdir, and remove DLFF operations. The values for the two logging priorities can be calculated as shown here:

Message logging priority = LOG_ERROR + LOG_EMERGENCY = 0x04 + 0x01 = 0x05 = 5

Module logging priority = LOG_OPEN + LOG_MKDIR + LOG_REMOVE = 0x04 + 0x40 + 0x20 = 0x64 = 100

The dlfs_cfg file for these logging priorities would look like this example: d /usr/lpp/db2_07_01/bin/dlfsdrv 14,208,5,100 - 0 1

If you want to log for each and every operation, the value for module logging priority should be “-1” (0xFFFFFFF).

Modifying the DLFSCM logging priorities in DCE-DFS (on AIX)

Similar to DLFF on AIX, there is a configuration file dlfscm_cfg under /usr/lpp/db2_07_01/cfg directory. Through a symbolic link, the file can also be found in the /etc directory. The dlfscm_cfg file has the following format: d - 0 1

Here: d: The d parameter specifies that the driver is to be loaded. : The driver name is the full path of the driver to be loaded. For instance, the full path for DB2 Version 7 is /usr/lpp/db2_07_01/bin/dlfscmdrv. The name of the driver is dlfscmdrv. : This is the vfs entry of DLFSCM in /etc/vfs : This is the global message priority.

334 Data Links: Managing Files Using DB2 : This is the global module priority. 0 1: These are the minor numbers for creating non clone nodes for this driver.

The node names are created by appending the minor number to the cloned driver node name. No more than five minor numbers can be given (0 to 4).

A real-world example might look like this: d /usr/lpp/db2_07_01/bin/dlfscmdrv 15,255,-1 - 0 1 There are four message priority values you can use: #define LOG_EMERGENCY 0x01

#define LOG_TRACING 0x02

#define LOG_ERROR 0x04

#define LOG_TROUBLESHOOT 0x08

These values can be added together, depending on the level of logging you want. Do not worry much about the module logging priorities. The final value of priority that is calculated by the DLFF is done by a bit-wise AND operation on these two logging priorities values. But in case you want to log any particular DLFF operation (generally not used), you can modify the module priority using one or more (by adding them together) of the following values:

#define LOG_LOOKUP_NORMAL 0x01

#define LOG_LOOKUP_WITHTOKEN 0x02

#define LOG_OPEN 0x04

#define LOG_CLOSE 0x08

#define LOG_RENAME 0x10

#define LOG_LOCAL 0x20

#define LOG_MKDIR 0x40

#define LOG_GETATTRIBUTE 0x80

#define LOG_SETATTRIBUTE 0x100

#define LOG_CREATE 0x200

#define LOG_UPCALLMESSAGES 0x400

#define LOG_LOADUNLOAD 0x800

#define LOG_MOUNT 0x1000

Appendix D. Logging priorities for DLFF and DLFSCM 335 #define LOG_UNMOUNT 0x2000

#define LOG_VFSROOT 0x4000

#define LOG_VFSSTAT 0x8000

#define LOG_VFSVGET 0x10000

#define LOG_IOCTL 0x20000

#define LOG_SUBROUTINE 0x40000

#define LOG_OTHERS 0x80000

#define LOG_INITIALIZE 0x100000

#define LOG_GETACL 0x200000

#define LOG_SETACL 0x400000

#define LOG_ACCESS 0x800000

#define LOG_INACTIVE 0x1000000

#define LOG_SYMLINK 0x2000000

#define LOG_RDWR 0x4000000

#define LOG_CHANGEVNOPS 0x8000000

#define LOG_IMPERSONATION 0x10000000

The values of the these two logging priorities can be calculated as shown in the example on page 334.

Modifying the DLFF logging priorities on Solaris

In Solaris, the logging priorities are defined in the /etc/system file. The entries (in /etc/system) corresponding two these two logging priorities look like: set dlfsdrv:glob_mod_pri=0x100800 set dlfsdrv:glob_mesg_pri=0xff

Here, the format is:

set :=

To change the values for the two logging priorities, you should edit the /etc/system file. You may have to reboot the machine for the new logging values to take effect.

336 Data Links: Managing Files Using DB2 Modifying the DLFF logging level on Windows

Unlike on AIX and Solaris, the logging level (or priority) can be changed dynamically on Windows. Instead of two different variables (message and module priorities), there is only one variable on Windows. It has the following possible values: 0 Logs all messages (success messages also) 1 Logs basic information, warning and error messages 2 Logs warning and error messages 3 Logs error messages only.

The following two commands should be issued to modify the log level for DLFF: dlff set loglevel dlff refreshtrace

For example, to modify the log level of DLFF to log only warning and error messages, enter the following commands: dlff set loglevel dlff refreshtrace

Appendix D. Logging priorities for DLFF and DLFSCM 337 338 Data Links: Managing Files Using DB2 Related publications

The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook.

IBM Redbooks

For information on ordering these publications, see “How to get IBM Redbooks” on page 341. HACMP/ES Customization Examples, SG24-4498 Administering IBM DCE and DFS Version 2.1 for AIX and OS/2 Clients, SG24-4714 Tivoli Storage Management Concepts, SG24-4877 DB2 UDB for AS/400 Object Relational Support, SG24-5409 Tivoli Storage Manager Version 3.7.3 & 4.1: Technical Guide, SG24-6110

Other resources These publications are also relevant as further information sources: DB2 UDB Troubleshooting Guide, GC09-2850 DB2 Data Links Manager Quick Beginnings, GC09-2966 Quick Beginnings manuals for UNIX (GC09-2970), Windows (GC09-2971), OS/2 (GC09-2968), and Linux (GC09-2972) (from Chapter 3) TSM for AIX Administration Guide, GC35-0403 DB2 Application Building Guide, SC09-2948 DB2 Application Development Guide, SC09-2949 DB2 UDB Call Level Interface Guide and Reference, SC09-2950 DB2 Command Reference Guide, SC09-2951 Data Movement Utilities Guide, SC09-2955 AIX Version 4.3 System Management Guide: Communications and Networks, SC23-4127 Replication Guide and Reference, SC26-9920

© Copyright IBM Corp. 2001 339 Davis, Judy R. Data Links white paper “Data Links: Managing External Data with DB2 Universal Database”. Prepared for the IBM Corporation by Database Associates International, February 1999. http://www.software.ibm.com/data/pubs/papers/ Biggs, Maggie. “IBM's DB2 6.1 Strengthens Web Appeal”. Infoworld, Infoworld Test Center, June 28 1999. Alur, Nagraj and Davis, Judy R. “How to Improve RDBMSes -- Seven long-term requirements for managing complex data”. Byte Magazine, April 1997. “IBM emerges as early leader in burgeoning content management market”, ContentWatch, September 1997. Narang, Inderpal and Rees, Robert. “Data Links - Linkage of database and filesystems”. Proceedings of the Sixth High Performance Transaction Systems (HPTS), September 1995. Stodder, David. “An Interview with Don Haderle”. DB2 Magazine, Summer 1998. Saracco, Cindy. Universal Database Management: A Guide to Object/Relational Technology, Chapter 9. Morgan Kaufmann Publishers, Inc, California, 1998. Gwynne, Peter. “Reaching Beyond the Database”. IBM Research Magazine, Number 3, 1998, http://www.research.ibm.com/resources/magazine Papiani, Mark et al. “A distributed Scientific Data Archive Using the Web, XML and SQL/MED”. ACM SIGMOD Record, Vol 28, Number 3, September 1999. Hsiao, H. and Narang, I. “DLFM: A Transactional Resource Manager”. In ACM SIGMOD/PODS 2000. Alur, Nagraj and Routray, Ramani Ranjan. “Link Integrity+: A Web Asset Integrity Solution”. IBM Almaden Research Center paper. Baker. Brian and Roushdi, Amr. “Installing and Configuring VPM with DB2 Datalinks”. V1.0, IBM Dassault Systèmes International Competency Center (IDSICC), 20 November 2000.

Referenced Web sites

These Web sites are also relevant as further information sources: Data Links home page: http://www.ibm.com/software/data/db2/datalinks DB2 Product Family home page: http://www.software.ibm.com/data/db2 Data Management home page: http://www.software.ibm.com/data

340 Data Links: Managing Files Using DB2 Data Links white paper: http://www-4.ibm.com/software/data/pubs/papers/#datalink DB2 technical library: http://www.ibm.com/software/data/db2/library DB2 Administration Guide: http://www.ibm.com/cgi-bin/db2www/data/db2/udb/winos2unix/support/ v7pubs.d2w/en_main DB2 related software downloads: http://www.software.ibm.com/data/db2/udb DB2 product and service technical library: http://www-4.ibm.com/software/data/db2/library/db2udb TSM Administration guide: http://www.tivoli.com/support/public/Prodman/public_manuals/td/ TD_PROD_LIST.html Information on Dassault Systems: http://www.developer.ibm.com/welcome/icc/dassault.html

How to get IBM Redbooks

Search for additional Redbooks or redpieces, view, download, or order hardcopy from the Redbooks Web site: ibm.com/redbooks

Also download additional materials (code samples or diskette/CD-ROM images) from this Redbooks site.

Redpieces are Redbooks in progress; not all Redbooks become redpieces and sometimes just a few chapters will be published this way. The intent is to get the information out much quicker than the formal publishing process allows.

IBM Redbooks collections Redbooks are also available on CD-ROMs. Click the CD-ROMs button on the Redbooks Web site for information about all the CD-ROMs offered, as well as updates and formats.

Related publications 341 342 Data Links: Managing Files Using DB2 Special notices

References in this publication to IBM products, programs or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM product, program, or service is not intended to state or imply that only IBM's product, program, or service may be used. Any functionally equivalent program that does not infringe any of IBM's intellectual property rights may be used instead of the IBM product, program or service.

Information in this book was developed in conjunction with use of the equipment specified, and is limited in application to those specific hardware and software products and levels.

IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to the IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785.

Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact IBM Corporation, Dept. 600A, Mail Drop 1329, Somers, NY 10589 USA.

Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee.

The information contained in this document has not been submitted to any formal IBM test and is distributed AS IS. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.

Any pointers in this publication to external Web sites are provided for convenience only and do not in any manner serve as an endorsement of these Web sites.

Tivoli, Manage. Anything. Anywhere.,The Power To Manage., Anything. Anywhere.,TME, NetView, Cross-Site, Tivoli Ready, Tivoli Certified, Planet Tivoli, and Tivoli Enterprise are trademarks or registered trademarks of Tivoli Systems Inc., an IBM company, in the United States, other countries, or both. In Denmark, Tivoli is a trademark licensed from Kjøbenhavns Sommer - Tivoli A/S.

C-bus is a trademark of Corollary, Inc. in the United States and/or other countries.

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and/or other countries.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States and/or other countries.

PC Direct is a trademark of Ziff Communications Company in the United States and/or other countries and is used by IBM Corporation under license.

ActionMedia, LANDesk, MMX, Pentium and ProShare are trademarks of Intel Corporation in the United States and/or other countries.

UNIX is a registered trademark in the United States and other countries licensed exclusively through The Open Group.

SET, SET Secure Electronic Transaction, and the SET Logo are trademarks owned by SET Secure Electronic Transaction LLC.

Other company, product, and service names may be trademarks or service marks of others.

344 Data Links: Managing Files Using DB2 Index

Symbols B 93 backup .DIR 188 active 236 /etc/vfs 140 expired 236 ? command 81 inactive 236 backup/restore 122 Backup-Archive Client 108, 114 A Backus Naur Form (BNF) 23 access control specifications for DATALINK 297 READ PERMISSION DB 72 Binary Large Object (BLOB) 67 READ PERMISSION FS 72 BLOB (Binary Large Object) 67 access to linked files 70 BNF (Backus Naur Form) 23 access token 30, 32, 72, 78 expired 78 ADSM (Adstar Distributed Storage Manager) 31 C Adstar Distributed Storage Manager (ADSM) 31 Cache Manager 62 advanced transparent recall 121 Capture program aggregates 60 binding with the Apply program 183 AIX with DCE-DFS 301 Capture of DATALINK values 169 analyzing a problem 271 starting, stopping 188 application development 65 CATIA 422R1 308 application development tasks 69 cell 58, 303 application program 85 change-data (CD) table 164 Apply program Character Large Object (CLOB) 67 binding with the Capture program 183 Chown daemon 41 File copy function 171 Client Cache Manager 62 File-reference mapping function 170 client workstation 84 handling DATALINK values 169 CLOB (Character Large Object) 67 password file 183 commit processing 40 starting, stopping 188 communication method 114 architecture, Data Links 11 Coordinated Universal Time (CUT) 219 archive directory 96 Copy daemon 43 archive/retrieve 122 copy groups 113 ASNDLCOPY 169, 170, 171 crash 275 configuration files used by 184 crash recovery 202, 205 ASNDLCOPYD 171 creating a new file 76 .DIR 188 CURRENT LOG 215 configuration files used by 187 CUT time 221 ASNDLSRVMAP 170, 184 ASNDLUSER 171, 186 ASNDLUSERINFO 187 D daemon process 34 asynchronous 35 Data Joiner Replication Administration (DJRA) authentication 30 164, 177 automatic migration 119

© Copyright IBM Corp. 2001 345 Data Links DataPropagator Relational (DPropR) 161 access control 68 DB2 6.1 309 advantages of 68 DB2 agent 38 applications 7 DB2 Call Level Interface 80 architecture 11, 12 DB2 client 15 Backup-Archive Client 114 DB2 Client Application Enabler 84 control over a file system 103 DB2 Data Links Manager 6.1 GA 309 controlled file system 97 DB2 database access 69 DCE-DFS 57 db2 list history 214 deployment 91 DB2 Logging Manager 14, 35 migrating existing applications 84 DB2 replication 162 on UNIX, Windows 33 DB2 server problems 290 READ PERMISSION DB option 68 DB2 Trace 276 suitable applications 66 analysis 279 tables and servers 102 in memory 277 Tivoli Space Manager 116 information 278 transactional semantics 66 to a file 278 versus LOBs 67 DB2 UDB server 15 VPM 307 DB2 UDB V5.x database server (AIX) 244 Data Links File Manager (DLFM) 4, 13, 34, 58 DB2 UDB V6.x database server 250 Data Links File Manager V5.x (AIX) 247 DB2 Universal Database Data Links File System Cache Manager (DLFS-CM) crash 275 16, 62 hang situations 273 Data Links File System Filter (DLFF) 5, 14, 43 DB2 Universal Database server 15 Data Links Manager administration 101 db2_recon_aid 193, 233 Data Links server 12 db2dart 194, 226 Data Links support for HSM 125 db2diag.log 287 Data Management Application Programming Inter- DB2IMIGR database command 244 face (DMAPI) 59 db2look command 154 Data Manager Application (DMAPP) 14, 59 DB2OPTIONS 28 data manager events 59 DBID 36 data replication 161 DCE (Distributed Computing Environment) 302 database configuration parameters 31 DCE cell 303 DATALINK data type 6, 18 DCE-DFS 57 attributes 23 on AIX 301 Backus Naur Form specifications 297 ddl 154 choosing options 71 default.env 139 scalar functions 24, 80 delete file 45 DATALINK options 26 Delete Group daemon 40 changing 74 device class 110 choosing 71 dfm_access 36 querying 74 dfm_archive 37 Datalink Reconcile Pending (DRP) 194, 221 dfm_backup 37 DATALINK values 169 dfm_boot 36 Datalink_Reconcile_Not_Possible state (DRNP) dfm_dbid 36 155, 194 dfm_dir 37 Datalink_Reconcile_Pending (DRP) 211 dfm_file 37 DATALINKS parameter 31 dfm_grp 36 DataPropagator 162 dfm_prfx 36

346 Data Links: Managing Files Using DB2 dfm_rcfile 36 dlfm start 106 dfm_url 37 dlfm startdbm 106 dfm_xnstate 37 dlfm stop 106 DFS (Distributed File Service) 303 dlfm stopdbm 106 DFS Client Cache Manager 62 dlfm_backup 224 DFS Client Enabler 16, 57 DLFM_BACKUP_DIR_NAME 96 DFS Client Enabler for Data Links 62 dlfm_child 209, 212 DFS Name Space 305 dlfm_copyd 209 DIAGLEVEL 287 DLFM_DB 14, 208 Distributed Computing Environment (DCE) 302 DLFM_DB database backup 98 Distributed File Service (DFS) 303 dlfm_export 149, 152, 153 Distributed Transaction Processing (DTP) 54 dlfm_import 149 DJRA (Data Joiner Replication Administration) 164 dlfm_retrieved 212 DL_DROP_TIME parameter 31 DLFM101E 287 DL_EXPINT 78 dlfmfsmd 103 DL_EXPINT parameter 31 DLFMs, single Universal Database 92 DL_FEATURES column 74 dlfs_cfg 140 DL_NUM_COPIES 96 DLFS-CM (Data Links File System Cache Manager) DL_NUM_COPIES parameter 31 16, 62 DL_TOKEN parameter 32 DLFSCM logging priority 331 DL_UPPER parameter 32 in DCE-DFS (on AIX) 334 DLFF (Data Links File System Filter) 5, 14, 43 dlurlpathonly 215 DLFF logging level, Windows 337 DMAPI (Data Management Application Program- DLFF logging priority 331 ming Interface) 59 on AIX 332 DMAPP (Data Manager Application) 14, 59 on Solaris 336 DMAPP process model 60 DLFM (Data Links File Manager) 4, 13, 34, 58 DMLFS 60, 62 dlfm add_db 104, 157 DPropR (DataPropagator Relational) 161 dlfm add_prefix 104, 157 DRP (Datalink Reconcile Pending) 194, 222 DLFM backup 96 dsm 124 dlfm bind 104 dsmadmc 123 DLFM commands 104 dsmc 124 DLFM crash 275 dsmdu 124 dlfm create 104 dsmls 124 dlfm create_db 104 dsmmigfs 124 dlfm drop_db 104 dsmmigrate 124 dlfm drop_dlm 104 dsmmonitord daemon 124 DLFM hang situations 273 dsmrecall 124 dlfm help 104 dsmrecalld daemon 124 dlfm list registered databases 105 dsmserv 123 dlfm list registered prefixes 105 DTP (Distributed Transaction Processing) 54 DLFM process model 38 dlfm refresh key 105 dlfm restart 105 E EARLIEST LOG 215 dlfm retrieve 105 embedded 44 dlfm see 105 error handling 81 DLFM server problems 287 events 60 dlfm setup 105 exception table 228 dlfm shutdown 105

Index 347 export 152 I Export utility 86, 87 Import utility 157 in-doubt transactions 204 F INTEGRITY 27 failover of services 132 Inter Process Communication (IPC) 105 FAQs 294 resources 287 fast reconcile 212, 218, 220 IPC (Inter Process Communication) 105 fast reconciliation 225 ipcs | grep dlfm 106 file linking 47, 86 file migration 118 J file permissions 216 JDBC scalar functions 80 file server 84 file server location 98 File System Migrator (FSM) 12 K Kerberos authentication 114 file system problems 292 file system sizing 95 file unlinking 47 L files per directory 99 large object (LOB) 26, 67 fileset 63 least-recently used 119 Frequently Asked Questions (FAQ) 294 LFS aggregate 60 FSM (File System Migrator) 12 Link Integrity+ 7 linked file enabling access to 70 G updating 79 Garbage Collect 210 linking a new file 76 Garbage collection 236 linking files 47, 86 Garbage Collector daemon 41, 51 list datalinks managers 102, 156 ged 124 list datalinks managers command 70 get dbm cfg 156 list db directory 102 list history backup 220 H list registered databases 232 HACMP (High Availability Cluster Multiprocessor) load 158 131 Load utility 87 HACMP cluster configuration, hot standby 132 LOB (large object) 26, 67 HACMP Cluster Manager 132 LOBs 84, 85 hangs 273 BLOB, CLOB 67 in UNIX 274 externalizing data 86 Windows 275 externalizing LOB data 85 Hierarchical Storage Manager (HSM) 12 log retention 208 High Availability Cluster Multiprocessor (HACMP) Log Sequence (LS) 236 131 logging levels 98 host database 15 logging priority 331 host variable declaration 75 LOGPRIMARY 202 hostname 91 LOGRETAIN 207 hot standby 132 lsfs -v dlfs 103 HSM (Hierarchical Storage Manager) 12 HSM migration 119 M management class 113

348 Data Links: Managing Files Using DB2 maximum length of DATALINK value 76 O MIGRATE database command 244 offline backup for migration 254 migrate-on-close 121 OLTP (OnLine Transaction Processing) 54 Migrating 244 ON UNLINK option 27, 74 migrating OnLine Transaction Processing (OLTP) 54 Data Links File Manager (AIX) 247 open file 44 Data Links File Manager (Windows NT) 253 options file 122 database server (AIX) 244 out-of-space condition 119 database server (Windows NT) 250 ownership 47 existing applications to use Data Links 84 using an offline backup 254 migration 118 P performance tuning 98 db2 backup database 246 permissions 216 db2 get instance 244 point in time recovery 202 db2_recon_aid 258 point-in-time 35 db2admin stop 252 policy 112 db2ckmig 246 policy domain 113 db2dart 245 policy set 113 db2dlmmg 249 power failure 275 db2ilist 244 prefix 36, 50 db2imigr 244 pre-migration policy 120 db2licd -end 252 prepare processing 40 db2set 249 prepare-to-commit 51 dlfm see 250 PRFX_ID 36 dlfm_see 248 PRFX_NAME 36 dlfm_shutdown 248 problem analysis 271 mount 249 problem determination 269 strload 249 problem solving 270 umount 248 profiles.reg 139 mount command 103 PRUNE HISTORY 238 multiple DB2s and DLFMs 94 multiple DLFMs on single host 94 multiple file server restrictions 82 Q multiple links to the same file 83 quiesce 152 multiple Universal Databases 93 multiple Universal Databases, single DLFM 93 mutual suspicion algorithm 114 R rc.db2dls 142 rc.db2server.dls 141 N READ PERMISSION DB 78 needing crash recovery 205 reading linked files 77, 78 Network File System (NFS) 97 read-without-recall 121 Network Information Service (NIS) 97 REC_HIS_RETENTN 96, 214, 237 NFS (Network File System) 97 recall process 120 NIS (Network Information Service) 97 reconcile 222 NUM_DB_BACKUPS 96, 236 exceptions 227 num_db_backups 210 Reconcile utility 191 reconciliation 121, 191 RECOV_ID 35

Index 349 RECOVERY 27 retrieve_query 105 recovery at a point in time 202 return code 282 RECOVERY option 73 rollback 53 RECOVERY YES 96 ROLLFORWARD PENDING 205 Redbooks Web site 341 Contact us xx referential integrity 30 S scalar functions 21, 24, 80 rename directory 45 SQLBuildDataLink 80 rename file 45 SQLGetDataLinkAttr 80 processing 45 with DB2 Call Level Interface 80 replication 21 with JDBC 80 Apply program 166 security 30, 60, 114 binding Capture, Apply programs 183 segmentation violations 277 Capture of DATALINK values 169 selective migration 119 Capture program 164 selective recall 118, 121 CD table 169 servers 102 Change-capture 164 SET INTEGRITY 194, 195 change-data (CD) table 164 setting permissions 45 changing the target table name 179 single host with multiple DLFMs 94 components 164 single server implementation 92 configurations 162 single Universal Database, multiple DLFMs 92 conflict-detection 163 sizing and file systems 95 control server 164, 181 SMT (Storage Management Toolkit) 59 control tables 164, 177 spawns 57 data 161 SQL0357N 82 data distribution 162 SQL0358N 81, 83 Data Joiner Replication Administration tool 177 SQLBuildDataLink 80 DB2 Control Center 172 SQLGetDataLinkAttr 80 defining the replication source 174 stale mount 293 DJRA 177 storage device 110 logical servers 164 storage hierarchy 110 LOGRETAIN parameter 182 Storage Management Toolkit (SMT) 59 Occasionally connected 162 storage pool 110 READ PERMISSION DB 171, 186 storage pool migration 119 replicating DATALINK columns 161, 168, 172 strload 140 restrictions 163 subscription set 167 Source Server 164 subscription set member 167 spill file 166 Super Exclusive lock 192 subscription set members 167 superuser privilege 41 Subscription sets 167 SYSCOLPROPERTIES table 74 Supported Platforms 163 sysibm.syscolproperties 102 Target Server 164 SYSIBM.SYSCOLPROPERTIES table 74 Update anywhere 162 using the DB2 Control Center 177, 182 replication source 164 T restart 57 tables 102 restore 156, 218 Tivoli without rolling forward 215 communication methods 114 Retrieve daemon 43 Data Links support for HSM 125

350 Data Links: Managing Files Using DB2 policy 112 X security 114 XA transaction 54 storage device 110 XN_ID 36, 55 Tivoli Data Protection (TDP) 108 Tivoli Disaster Recovery Manager (DRM) 109 Tivoli Space Manager 108, 116 archive/retrieve 122 backup/restore 122 file migration 118 options file 122 pre-migration policy 120 recall process 120 reconciliation 121 Tivoli Storage Manager 31, 107, 207 tools, processes, interfaces 123 token 13 token algorithm 99 token expiration 32 tokenized file name 30 transaction ID (XN_ID) 55 transaction support 53 transactional semantics 66 transparent recall 118, 120 traps 277 two-phase commit 51, 203

U uniform name space 305 unit of work consistency 34 unit-of-work (UOW) table 165 UNIX, hangs 274 UNLINK DELETE 218 unlinking files 47, 79 Upcall daemon 32, 43 updating a linked file 79 USEREXIT 208

V VFS (Virtual File System) 43 Virtual File System (VFS) 43 VNODE 43 VPM 1.3 308 VPM with DB2 Data Links 8

W Windows, hangs 275 WRITE PERMISSION option 73

Index 351 352 Data Links: Managing Files Using DB2 Data Links: Managing Files Using DB2

(0.5” spine) 0.475”<->0.875” 250 <-> 459 pages

Back cover ® Data Links Managing Files Using DB2

Understand the The amount of data that is stored digitally is growing rapidly. The file INTERNATIONAL Data Links paradigm is very common for such data types as video, image, text, architecture, graphics, and engineering drawings because capture, edit, and TECHNICAL unleashed for the delivery tools use the file paradigm for these data types. A large SUPPORT first time number of applications store, retrieve, and manipulate data in files. ORGANIZATION

Data Links – a new feature of DB2 Universal Database – extends the Explore planning, management umbrella of the relational database management migration, the system (RDBMS), to data stored in external operating system files BUILDING TECHNICAL Reconcile utility, as if the data was stored directly in the database. Data Links INFORMATION BASED ON PRACTICAL EXPERIENCE and recovery provides several levels of control over external data such as referential integrity, access control, coordinated backup and Learn about HSM recovery, and transaction consistency. IBM Redbooks are developed by and HACMP the IBM International Technical support This IBM Redbook explains how to effectively deploy Data Links in Support Organization. Experts from IBM, Customers and a complex environment. First it describes the technical architecture Partners from around the world of Data Links, developing applications in a Data Links environment, create timely technical and planning a deployment of Data Links. Then, it covers information based on realistic administering a Data Links environment, setting up Tivoli Storage scenarios. Specific Manager as a backup server with Data Links, and implementing recommendations are provided to help you implement IT high-availability cluster multiprocessing (HACMP) with Data solutions more effectively in Links. It includes a full chapter on data replication and the your environment. replication of Data Linked files. It then describes the Reconcile utility and how the DB2 backup and recovery mechanism supports Data Links. This redbook concludes by providing some hints and tips for problem determination in a Data Links environment. For more information: ibm.com/redbooks

SG24-6280-00 ISBN 0738423106