<<

C:\Program Files\Adobe\FrameMaker8\UniData 7.2\7.2rebranded\RFS\ARFSTITL.fm March 8, 2010 4:20 pm

Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta

UniData

Administering the Recoverable System on

UDT-720-RFSU-1 :\Program Files\Adobe\FrameMaker8\UniData 7.2\7.2rebranded\RFS\ARFSTITL.fm March 8, 2010 4:20 pm

Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta

Notices

Edition Publication date: June 2008 Book number: UDT-720-RFSU-1 Product version: UniData 7.2

Copyright © Rocket Software, Inc. 1988-2008. All Rights Reserved.

Trademarks The following trademarks appear in this publication:

Trademark Trademark Owner

Rocket Software™ Rocket Software, Inc.

Dynamic Connect® Rocket Software, Inc.

RedBack® Rocket Software, Inc.

SystemBuilder™ Rocket Software, Inc.

UniData® Rocket Software, Inc.

UniVerse™ Rocket Software, Inc.

U2™ Rocket Software, Inc.

U2.NET™ Rocket Software, Inc.

U2 Web Development Environment™ Rocket Software, Inc.

wIntegrate® Rocket Software, Inc.

Microsoft® .NET Microsoft Corporation

Microsoft® Office Excel®, Outlook®, Word Microsoft Corporation

Windows® Microsoft Corporation

Windows® 7 Microsoft Corporation

Windows Vista® Microsoft Corporation

Java™ and all Java-based trademarks and logos , Inc.

UNIX® X/Open Company Limited

ii Administering the Recoverable on UNIX The above trademarks are property of the specified companies in the United States, other countries, or both. All other products or services mentioned in this document may be covered by the trademarks, service marks, or product names as designated by the companies own or market them.

License agreement This software and the associated documentation are proprietary and confidential to Rocket Software, Inc., are furnished under license, and may be used and copied only in accordance with the terms of such license and with the inclusion of the copyright notice. This software and any copies thereof may not be provided or otherwise made available to any other person. No title to or ownership of the software and associated documentation is hereby transferred. Any unauthorized use or reproduction of this software or documentation may be subject to civil or criminal liability. The information in the software and documentation is subject to change and should not be construed as a commitment by Rocket Software, Inc. Restricted rights notice for license to the U.S. Government: Use, reproduction, or disclosure is subject to restrictions as stated in the “Rights in Technical Data- General” clause (alternate III), in FAR section 52.222-14. All title and ownership in this computer software remain with Rocket Software, Inc.

Note This product may contain encryption technology. Many countries prohibit or restrict the use, import, or export of encryption technologies, and current use, import, and export regulations should be followed when exporting this product. Please be aware: Any images or indications reflecting ownership or branding of the product(s) documented herein may or may not reflect the current legal ownership of the intellectual property rights associated with such product(s). All right and title to the product(s) documented herein belong solely to Rocket Software, Inc. and its subsidiaries, notwithstanding any notices (including screen captures) or any other indications to the contrary.

Contact information Rocket Software 275 Grove Street Suite 3-410 Newton, MA 02466-2272 USA Tel: (617) 614-4321 Fax: (617) 630-7100 Web Site: www.rocketsoftware.com

Administering the Recoverable File System on UNIX iii Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Table of Contents Table of Contents

Chapter 1 Introduction to the Recoverable File System (RFS) RFS System Requirements ...... 1-3 Disk Space ...... 1-3 Memory ...... 1-4 Recommended Knowledge Base ...... 1-6 Overview of RFS ...... 1-7 ACID Qualities ...... 1-7 System Failures ...... 1-8 Media Failures ...... 1-9 Enabling RFS...... 1-9 RFS Architecture ...... 1-10 RFS Components ...... 1-11 Logging ...... 1-12 Log Files ...... 1-12 Log File Overflow ...... 1-13 Archiving ...... 1-15 Archive Files ...... 1-15 Automatic of Archive Files ...... 1-15 Synchronizing with Archive Files ...... 1-16 Archive Configuration Table ...... 1-16 Creating and Converting Recoverable Files ...... 1-17 Creating a Recoverable File ...... 1-17 Converting to a Recoverable File ...... 1-18 Crash Recovery...... 1-19 Recovering from a Media Crash ...... 1-19 Monitoring and Tuning RFS...... 1-21

Chapter 2 RFS Commands and Daemons cntl_install ...... 2-3 Creating a Recoverable File ...... 2-6

:\Program Files\Adobe\FrameMaker8\UniData 7.2\7.2rebranded\RFS\ARFSTOC.fm (bookTOC.template) March 8 2010 4:16 pm Disabling a Recoverable File...... 2-7 forcecp Command ...... 2-8 mediarec Command ...... 2-9 startud Command ...... 2-12 Example ...... 2-13 udfile Command ...... 2-14 Daemons for the Recoverable File System ...... 2-15 sm ...... 2-15 tm ...... 2-16 bimglog...... 2-16 aimglog...... 2-16 archive ...... 2-16 ar_backupd ...... 2-16 sync ...... 2-17

Chapter 3 Configuration Steps for Logging How Logging Works ...... 3-3 About Log Files ...... 3-3 About Checkpoints ...... 3-4 How to Turn On and Configure Logging ...... 3-6

Chapter 4 Configuration Steps for Archiving How Archiving Works...... 4-3 How to Turn On and Configure Archiving ...... 4-4 Managing Archive Backup ...... 4-16 Backing Up Archives Manually ...... 4-16 Backing Up Archives Automatically ...... 4-17

Chapter 5 Creating and Configuring Recoverable Files Converting Nonrecoverable Files to Recoverable Files ...... 5-3 Creating New Recoverable Files ...... 5-6 Creating a List of Recoverable Files ...... 5-8 Special Considerations for Recoverable Files ...... 5-9

Chapter 6 System Crash Recovery System Crash Recovery ...... 6-3

Chapter 7 Media Crash Recovery Media Crash Recovery ...... 7-3 Data Lost, Logs, and Archives Unaffected ...... 7-4

Table of Contents v Data and Archive Files Unaffected, Logs Lost...... 7-11 Data and Log Files Unaffected, Archives Lost...... 7-13 Data and Logs Lost, Archives Unaffected ...... 7-15 Data and Archives Lost, Logs Unaffected ...... 7-22 Logs and Archives Lost, Data Unaffected ...... 7-29 Disk Containing /usr/ud72/include Lost ...... 7-30

Chapter 8 RFS Configuration Parameters UniData Configuration Parameters ...... 8-3 Modifying udtconfig Parameters ...... 8-7 RFS Configuration Parameters ...... 8-10

Chapter 9 Monitoring and Tuning The sysmon Utility ...... 9-3 sysmon Fields and Values ...... 9-4 Performance Tips ...... 9-11 Tuning N_PUT and N_BIG ...... 9-12 Adjusting the Log Files ...... 9-13 Adjusting the Archive Files ...... 9-13 Tuning CM_SLEEP ...... 9-14 RFS File Open Performance ...... 9-15 How RFS Tracks Open Files...... 9-15 Tuning RFS Open File Performance ...... 9-15

Chapter 10 Troubleshooting RFS Possible Errors ...... 10-4 Failure of UniData to Start ...... 10-4 File Log Size Too Small ...... 10-4 Inadequate Number of Message Queues Defined ...... 10-5 Value of SYS_PV Changed in udtconfig ...... 10-6 Process Errors ...... 10-7 Values of N_TMQ and N_PGQ Are Zero...... 10-7 UniData Daemon Killed ...... 10-8 Errors During Processing ...... 10-9 Archive Files Are Full...... 10-9 aimglog and bimglog Have Been Removed . . . . . 10-9 Parameter Limits Exceeded...... 10-10 MAX_OPEN_FILE ...... 10-10 N_AFT ...... 10-10 BPF_NFILES ...... 10-10 Administering the Recoverable File System on UNIX Files Are Not Being Treated as Recoverable ...... 10-11 File is Not Defined as Recoverable ...... 10-11 SB_FLAG Turned Off ...... 10-11 Recoverable File System Not Licensed ...... 10-12 Warning Messages ...... 10-13 Log Files Are Too Small ...... 10-13

Table of Contents vii Chapter Introduction to the Recoverable File System 1 (RFS)

RFS System Requirements...... 1-3 Disk Space ...... 1-3 Memory ...... 1-4 Recommended Knowledge Base ...... 1-6 Overview of RFS...... 1-7 ACID Qualities ...... 1-7 System Failures ...... 1-8 Media Failures ...... 1-9 Enabling RFS ...... 1-9 RFS Architecture...... 1-10 RFS Components ...... 1-11 Logging ...... 1-12 Log Files ...... 1-12 Log File Overflow ...... 1-13 Archiving ...... 1-15 Archive Files ...... 1-15 Automatic Backup of Archive Files ...... 1-16 Synchronizing Backups with Archive Files ...... 1-16 Archive Configuration Table...... 1-16 Creating and Converting Recoverable Files ...... 1-17 Creating a Recoverable File ...... 1-17 Converting to a Recoverable File ...... 1-18 Crash Recovery ...... 1-20 Recovering from a Media Crash ...... 1-20 Monitoring and Tuning RFS ...... 1-22 This document describes the Recoverable File System (RFS). RFS functions and utilities are designed to protect UniData files against system or media failures. RFS also supports UniData Transaction Processing semantics to provide the ACID properties (atomicity, consistency, isolation, and durability). This document does not cover the use of Transaction Processing semantics. For information on Transaction Processing, see Developing UniBasic Applications. For step-by-step information on configuration and use of RFS, see the chapters following this introduction.

The purpose of this chapter is to describe the concepts associated with RFS. This chapter contains the following:

„ RFS Requirements „ Overview of RFS „ RFS Architecture (graphic) „ RFS Components

1-2

RFS System Requirements

If you plan to run the Recoverable File System, you need additional space and memory. The amount of additional space and memory depends on the of platform you are running. UniData can only provide initial recommendations. The initial recommendations are based on baseline tests on a variety of platforms, with 20 users, record sizes ranging from 70 to 2,500 bytes, modulo 200, block size multiplier 3, and type 0. The recommendations discussed in the following sections are based on the results.

Check your installation media and sure it matches your UNIX version.

Refer to Installing and Licensing UniData Products for detailed information about installing UniData.

Disk Space

On systems, you must have least 125 MB of free disk space under one point for the UniData installation to run. Depending on the type of installation, some space—as much as 10 MB —may be released after installation.

If you decide to install UniData on a partition other than the default /usr partition, you still need to have approximately 2 MB available on /usr.

Use the following formula to determine the minimum disk space you need for logging:

(8MB + 4,096) * (NUSERS+1)

These requirements are based on the default of two before image and two after image logs per set.

Each archive should be a minimum of 8 MB to contain two sets of after image logs.

Note: If you turn on archiving, plan for additional resources (either disk or tape) to handle regular backup of archive files.

To determine the space available on your system, use the UNIX or commands at the shell prompt. Refer to your UNIX documentation for with these commands.

1-3 Administering the Recoverable File System on UNIX : Most UNIX systems provide an online manual, also known as the “man pages.” At the UNIX prompt, enter the man man command to see if your system provides an online manual, and if so, refer to the man pages about the specific syntax for df or du command.

Memory

The exact memory needs for a UniData installation are highly platform- and appli- cation-specific. Optimal assignment of memory resources usually comes after some careful testing, and memory needs can change over . Aside from the memory required by other applications on your system, if any, you can use the following guidelines to estimate the memory required for your UniData installation:

„ Approximately 10-20 MB of free memory for the . Consult your operating system documentation for the amount of memory needed for your operating system. „ 2 MB of free memory per UniData session if you do not use RFS, just to run UniData. „ 3-5 MB of free memory per UniData session if you do use RFS, just to run UniData. memory is required if you use Transaction Processing, if you have a large number of writes in one transaction, and if you are using archiving. „ Any additional memory required by your application. „ Compute your memory needs, then add 10 percent more memory for the UNIX file system buffer.

The following example illustrates how to calculate the amount of memory necessary for 50 users running an application that requires 1 MB of memory per session. In this example, RFS is not considered:

Memory needed for OS 10 MB Memory needed for UniData (2 MB * 50 users) 100 MB Additional memory needed for application (1 MB * 50 users) 50 MB Subtotal 160 MB 10 percent for file system buffer 16 MB Total memory required 176 MB

RFS System Requirements 1-4

In the next example, memory is calculated for the same scenario running RFS:

Memory needed for OS 10 MB Memory needed for UniData (4 MB * 50 users) 200 MB Additional memory needed for application (1 MB * 50 users)50 MB Subtotal 260 MB 10 percent for file system buffer 26 MB Total memory required 286 MB

1-5 Administering the Recoverable File System on UNIX Recommended Knowledge Base

RFS is a system that takes advantage of the complexities of the UNIX operating system. IBM recommends that you have the following knowledge before using RFS:

„ Expertise with the UNIX operating system. „ Experience with UniData in general, and UniData daemons, in particular. „ Experience with administration concepts such as backup and restore. „ Knowledge of how Transaction Processing and RFS work together.

Recommended Knowledge Base 1-6

Overview of RFS

Hardware and software failures, power loss, fire, or natural disaster can disrupt processing by causing loss of data consistency or loss of entire data files. The Recov- erable File System (RFS) includes before image logging, after image logging, archiving, and failure recovery. You can enable these functions to protect recoverable files from loss of data due to system crashes and media failures. UniData allows you to create files as recoverable, or convert existing nonrecoverable files to recoverable files.

ACID Qualities

UniData provides the ACID properties (atomicity, consistency, isolation, and durability). The features of UniData also enable you to develop applications that provide the ACID properties in full. You will get the best results by implementing RFS and using Transaction Processing as well.

Note: If you are using Transaction Processing, a transactionally consistent database is one that reflects the most recent committed transaction. If you are not using Transaction Processing, a transactionally consistent database reflects the last update.

The ACID qualities are:

„ Atomicity „ Consistency „ Isolation „ Durability

Atomicity

Logical operations grouped by transaction semantics will be treated as one unit. They will all succeed or all fail.

1-7 Administering the Recoverable File System on UNIX Consistency

The components of a logical operation will all succeed or all fail. If a database is in a consistent state before you apply a logical operation, it will be in a consistent state afterwards. For example, writing a new record will guarantee that the record and the indexes are both written. The record will not be written without its associated indexes.

Isolation

Isolation means that operations are based on a consistent database, rather than on intermediate results from other operations. Isolation is controlled within UniBasic applications. If locks are properly set and checked, all transactions will be properly isolated. In UniData SQL, a desired isolation level is set by specifying the isolation level with the SET TRANSACTION command. If no isolation level is specified, a default isolation level is handled by the database engine.

Durability

Durability means that completed transactions are preserved in the database despite failures. If the system stops running, on restart the database recovers to the last committed transaction. Recovery techniques are available to provide durability in the case of media failure.

System Failures

Some failures interrupt processing between the time a piece of data is entered into a system and the time it is recorded in the database. For these failures (called system crashes in this document), before image and after image logging protects data by recording information about changes to your recoverable files. If your system crashes, UniData uses the before image logs to restore your recoverable files to the state they were in at the last completed checkpoint. Then UniData applies the after image logs to restore up to the last completed update.

Note: If you enable Transaction Processing, UniData applies after image logs through the last complete transaction. UniData will not a partial transaction to your database.

Overview of RFS 1-8

Media Failures

Other failures cause a file or files to become unreadable unless restored from a backup. For these failures (called media crashes in this document), the archiving feature protects recoverable files by maintaining a complete record of updates since your last backup. You can restore your recoverable files from backup and then use crash recovery and the mediarec utility to apply archives, bringing your database to the state it was in when the last archive was written.

Note: You can enable before and after image logging without enabling archiving. However, IBM recommends you enable both to provide the best protection for your recoverable data files.

Enabling RFS

You enable the logging function of RFS by installing UniData including RFS and then creating recoverable files. You enable archiving by creating archive files and an archive configuration table, and by setting parameters in the udtconfig file.

1-9 Administering the Recoverable File System on UNIX RFS Architecture

The following illustration shows the architecture of RFS:

UniData Process Architecture

SM cleanupd SBCS SMM

CM System Buffer RFS Files

local Buffer TM udt non-RFS Files

BIM Buffer BIMp BIM Log

AIM Buffer AIMp AIM Log

= RAM ARCH Archive = Process Logs = Disk Files

The remainder of this document explains RFS, including these UniData processes:

„ SM—system monitor „ CM—checkpoint manager „ TM—transaction manager „ BIMp—before image log process „ AIMp—after image log process

RFS Architecture 1-10

RFS Components

There are five basic elements of RFS:

„ Logging „ Archiving „ Creating and Converting Files „ Failure Recovery „ Monitoring and Tuning

These components are described in detail in the following chapters. The remaining sections of this chapter provide an overview of the components.

1-11 Administering the Recoverable File System on UNIX Logging

Before and after image logging protects data by storing information about updates to your recoverable files. If your system crashes, UniData uses before image logs to restore your recoverable files to the state they were in at the last completed check- point, and applies after image logs to restore through the last completed update before the crash.

Log Files

The types of log files are:

„ Before image log files „ After image log files „ File-level log file

Before Image Log Files

When you update database files that you define as recoverable, UniData first writes a copy of the unaltered file blocks to a before image log file. If your system crashes, UniData can restore the database by first reading the before image log files and applying them to the recoverable files. Then the Recoverable File System updates the files with information from the after image log files.

After Image Log Files

When you update database files that you define as recoverable, UniData does not write the changes directly to your database files. Instead, UniData records the changes in an after image log file and to the system buffer. Periodically, UniData flushes the system buffer pages to update your database. If your system crashes, UniData can recover your files to a state that existed before the crash by first reading the before image log files and applying them to the recoverable files, then reading the after image log files and writing the changes recorded in them back to the recoverable files.

Logging 1-12

File-Level Log File

A file-level log file stores a record of operations that affect an entire file rather than affecting the contents of a file. Commands that produce entries in a file-level log file include CREATE.FILE, DELETE.FILE, CNAME, CREATE.INDEX, DELETE.INDEX, BUILD.INDEX, ENABLE.INDEX, and DISABLE.INDEX. During crash recovery, UniData uses the file-level log file to recover certain of these actions and to prompt you to redo the ones it cannot restore.

Log Configuration Table

The log configuration table acts as an index to the log files associated with the Recov- erable File System. UniData uses the information from the configuration table to create the log files.

Note: If you include RFS when you install UniData, UniData automatically creates log files using a default log configuration table. The purpose of this default table is to get your system up and running. You need to change this table to match the complexities of your system. When you modify the log configuration table, you may need to change the values of N_BIMG and N_AIMG in /usr/ud72/include/udtconfig, depending upon the number of log files you have defined. Use the UNIX or command to view the values of N_BIMG and N_AIMG. You also need to execute the cntl_install command to reinitialize values and create log files. For more information about RFS configuration parameters, see Chapter 8, “RFS Configuration Parameters.”

Log File Overflow

UniData allows log files to overflow to a path defined by the LOG_OVRFLO parameter in /usr/ud72/include/udtconfig. This parameter lets you define a path for overflow from log files. Be sure you specify a path that has disk space available. For best results, specify a path that is on a different physical device from your recoverable files. If you install UniData with the Recoverable File System option, the LOG_OVRFLO is automatically set in /usr/ud72/include/udtconfig. The default path is udthome/log/log_overflow_dir.

1-13 Administering the Recoverable File System on UNIX Warning: If you have not defined LOG_OVRFLO or the path defined does not exist and your log files overflow unexpectedly, UniData shuts down to protect its integrity. A correlation exists between the size of your transaction and the overflow behavior of log files. Because it is virtually impossible to determine when log files might overflow, you should allow for the possibility.

For step-by-step information on logging, see Chapter 3, “Configuration Steps for Logging.”

Logging 1-14

Archiving

With archiving activated, UniData writes your recoverable database updates to archive files as well as to after image log files. When you turn on archiving, you define a group of archive files. UniData copies after image log sets to the first you define. When that file fills, UniData moves to the second file, and so on. As each file fills, UniData assigns it a logical sequence number (LSN). The mediarec utility depends on the sequence numbers when identifying archive files for recovery.

Archive Files

Archive files store a chronological record of changes to your recoverable files. UniData copies each after image log set to an archive file before overwriting the log set. After image log files help you recover from a system crash, and archive files help you recover from a media crash.

If you have a media problem—a disk crash or a bad tape, for instance—you can restore the database files that you define as recoverable to a state that existed before the crash by performing crash recovery through startud, restoring the files from backup, then applying the updates recorded in the archive file.

Because the purpose and function of archive files are different from those of log files, archive files need to be much larger than log files. Log files are normally overwritten several times in an hour, whereas archive files need to hold a large number of changes.

Archive files periodically fill up. When this happens, UniData waits until the archive files are copied to reliable storage before allowing any further processing. You can copy the archives manually, when the system prompts you, or you can turn on automatic backup of archives. The archiving system alerts you to move archive files to storage; it also tells you how to label the files.

Each archive file must be at least as big as one full set of logs.

Automatic Backup of Archive Files

UniData provides an optional backup utility that can be automatically invoked when an archive file on disk is full. This backup utility allows users to back up the files to an offline storage on any system of their choice.

1-15 Administering the Recoverable File System on UNIX Synchronizing Backups with Archive Files

It is important to keep your archives synchronized with your backups, so your current archive set starts as of the most recent backup.

If you invoked dbpause prior to performing your backup, the dbpause process forces the after image logs to be written to the archive files. The time at the after image logs were flushed, along with the next logical sequence number (LSN) available, are written to the sm.log and displayed on the screen. This information is necessary if you need to perform mediarec using the archive files created after the backup commenced. You do not need to execute cntl_install if you use dbpause prior to starting your backup.

If you stop UniData before you perform backups, execute the cntl_install command after you have completed and verified your backup. cntl_install “clears out” the archive files and reinitializes counters so that when you start UniData you will begin writing to your first archive file.

Archive Configuration Table

The archive configuration table acts as an index to the archive files that UniData uses to recover transaction files after a media crash. (Archive files help recover from media crashes; log files help recover from system crashes.)

Note: UniData does not create archive files or an archive configuration table when you install UniData. If you want to enable archiving, you need to create an archive configuration table and change the value of ARCH_FLAG in the udtconfig file. You may also need to change the value of N_ARCH, depending upon the number of archive files you specify in the archconfig file.

For step-by-step information on archiving, see Chapter 4, “Configuration Steps for Archiving.”

Archiving 1-16

Creating and Converting Recoverable Files

The protections UniData provides with the Recoverable File System work only on files you define as recoverable. You can create new files as recoverable files with the ECL CREATE.FILE command, or convert nonrecoverable files to recoverable with the UniData UNIX udfile command.

Warning: If you enable Transaction Processing, do not mix writes to recoverable files and nonrecoverable files within transactions. If you mix them and your system crashes, you will not be able to restore your data to a consistent state.

Creating a Recoverable File

The ECL CREATE.FILE command with the RECOVERABLE keyword creates a file for use with the Recoverable File System.

Syntax:

CREATE.FILE [DICT | DATA] [DIR | MULTIFILE | MULTIDIR] file- name [,subfile] [modulo [,block.size.multiplier]] [TYPE hashtype] [DYNAMIC [KEYONLY | KEYDATA] [PARTTBL part_tbl]] [RECOVERABLE]

You can define the following types of files as recoverable:

„ DATA „ DICT „ MULTIFILE „ DYNAMIC Note: The Recoverable File System does not protect DIR and MULTIDIR type files, or sequentially hashed files. You cannot specify the DIR or MULTIDIR options with the RECOVERABLE option. For more information on the ECL CREATE.FILE command, see the UniData Commands Reference.

1-17 Administering the Recoverable File System on UNIX Converting to a Recoverable File

To convert an existing file to a recoverable file, use the udfile command.

Syntax:

udfile [-r | -s] filename

The UniData system-level udfile command converts a nonrecoverable file to a recoverable file, a recoverable file to a nonrecoverable file, or displays whether the file is recoverable or nonrecoverable. You must have root privileges to change a file type by using this command. If you do not specify an option, UniData returns the type of file (recoverable or nonrecoverable UniData file.) You do not need root privileges if you only want to display the type of file.

Note: The udfile command will not convert files that were created in one-half kilobyte (K) blocks. If you attempt to do so, UniData generates an error message indicating that the file cannot be converted to a recoverable file. You must resize the file to at least a 1K block size using the ECL RESIZE command or the UniData system-level memresize command. If the file is a dynamic file, use the memresize command to resize the file using a block size of at least 1K. For information on ECL commands, see the UniData Commands Reference.

The following table describes the valid udfile parameters.

Parameter Description

[-r] Converts a nonrecoverable file to a recoverable file.

[-s] Converts a recoverable file to a nonrecoverable file. udtfile Options

Warning: You cannot convert files with this command while the Recoverable File System is running. For best results, select the files you want to convert. Next, stop UniData and back up the files for safekeeping. Execute udfile on each file you are converting. Then perform a full backup, run cntl_install to initialize your logs and archives, and start UniData. Whenever possible, use UniData commands to operate against recoverable files. Do not use host UNIX commands (like and ) while UniData is running. If you use host UNIX commands while UniData is running, your system may crash.

For step-by-step information about creating and converting recoverable files, see Chapter 5, “Creating and Configuring Recoverable Files.”

Creating and Converting Recoverable Files 1-18

Crash Recovery

When your system crashes, some of your database files may not contain a complete record of recent updates. This happens if processing is interrupted between the time information is entered into your system and the time updated records are written to disk. By using the logging component of the Recoverable File System, you can return files you have defined as recoverable to the last transactionally consistent state that existed before the crash.

Note: If you are using transaction processing, “last transactionally consistent state” means the last committed transaction. If you are not using transaction processing, “last transactionally consistent state” means the last update.

When you restart UniData after a system crash, the system monitor (sm) detects that a crash occurred. UniData performs the following steps automatically:

1. Identifies the log set that was active at the time of the crash. 2. Applies all before image blocks in the current before image log. 3. Reviews current after image log set and applies changes as appropriate. If you are using Transaction Processing semantics, UniData applies only the changes that are part of a completed transaction. 4. Reviews the file-level log, and performs those operations that can be recovered automatically. UniData writes a message to the sm.log directing you to the location of a file called FileInfo, which lists those file-level opera- tions that cannot be recovered automatically. 5. Writes the after image log files to the archive files. 6. If you have corrupted files in your database that were not defined as recov- erable, you will have to use other methods to restore and recover those files. See the UniData Commands Reference for references to the fixgroup and dumpgroup commands.

Recovering from a Media Crash

Successful recovery from a media crash includes these steps:

1. Performing crash recovery and flushing the current after image logs to the archive files by executing startud.

1-19 Administering the Recoverable File System on UNIX 2. Restoring your database files from your most recent backup, usually from tape to disk. 3. Modifying the restored database files by applying the archives created since the backup. Tip: It is important to keep your archives synchronized with your backups, so that your current archive set starts as of the most recent backup. See “Synchronizing Backups with Archive Files” on page 1-16.

For step-by-step information about recovering from a media failure, see Chapter 6, “System Crash Recovery.”

Crash Recovery 1-20

Monitoring and Tuning RFS

The configuration and use of RFS depends on your hardware and software environ- ments, and the type of software you are running. RFS includes a utility called sysmon to help you determine how RFS is working on your system.

For detailed information about sysmon, descriptions of the sysmon fields, and perfor- mance tips, see Chapter 9, “Monitoring and Tuning.”

The following is an example of sysmon working with RFS.

1-21 Administering the Recoverable File System on UNIX Chapter RFS Commands and Daemons 2

cntl_install Command ...... 2-3 Creating a Recoverable File ...... 2-6 Disabling a Recoverable File ...... 2-7 forcecp Command ...... 2-8 mediarec Command ...... 2-9 startud Command ...... 2-12 Example ...... 2-13 udfile Command ...... 2-14 Daemons for the Recoverable File System ...... 2-15 sm ...... 2-15 tm ...... 2-16 bimglog ...... 2-16 aimglog ...... 2-16 archive ...... 2-17 ar_backupd ...... 2-17 sync ...... 2-17

This chapter contains an alphabetic list of commands associated with the Recov- erable File System (RFS). This chapter also describes the daemons associated with RFS. The commands featured in this chapter are also discussed in detail in relevant sections of this manual.

2-2 Administering the Recoverable File System on UNIX cntl_install Command

This UniData system-level command initializes log files and archive files, reini- tializes the udt.control.file, the system.status file, the restart.fileend file, and the restart.newblk file, located in /usr/ud72/include.

Syntax:

cntl_install [-forcerestart]

You must log on as root to execute cntl_install, and UniData cannot be running. You should run it only after you have stopped UniData and created and verified a full backup to keep your backups and logs synchronized. Do not run cntl_install if you used dbpause prior to creating your backup. If you choose to install UniData with the Recoverable File System, the installation process automatically runs cntl_install.

Do not use cntl_install immediately after a crash. cntl_install overwrites the log and archive files, which prohibits you from recovering from a crash. Use them only after all required recovery is complete.

Note: Relative paths are not allowed in the logconfig or archconfig files. If cntl_install detects a relative path, it fails and does not initialize the logs and archives. UniData displays an error message indicating that relative paths are not allowed. If this occurs, you may simply edit the configuration files and reexecute cntl_install. For more information on logging, see Chapter 3, “Configuration Steps for Logging.” For more information on archiving, see Chapter 4, “Configuration Steps for Archiving.”

cntl_install Command 2-3

Parameter

The following table describes the parameter of the syntax:

Parameter Description

forcerestart Prompts if you want to continue restarting UniData, and attempts to open the $UDTHOME\include\system.status file on Windows platforms or the /usr/udnn/system.status file on UNIX platforms. If UniData cannot open this file, it tries to create a new one. If the status in the system.status file reports the system is already in system recovery mode, UniData returns a message similar to “System is already in crash recovery status (status). You might want to remove (/usr/ud72/include/system.status) and rerun cntl_install -forcerestart. If the status in the system.status file reports an unrecognized code, UniData returns a message similar to “System is in unknown status (status), will be forced to recovery mode. cntl_install Parameter

udt.control.file

The location and name of the control file is /usr/ud72/include/udt.control.file. When you install UniData, the system automatically creates this binary data file, which contains information including locations for log and (if archiving is turned on) archive files. The udt.control.file also tracks the logical sequence number UniData assigns to archive files as they fill. If you stop UniData to perform a full backup, you should execute cntl_install to reinitialize the udt.control.file. If you use dbpause prior to performing a full backup, you do not need to reinitialize the udt.control.file by executing cntl_install.

Warning: The udt.control.file is vital to successful operation of RFS. Do not delete this file or edit it.

2-4 Administering the Recoverable File System on UNIX system.status File

The system.status file is located in /usr/ud72/include. This file registers the current status of the Recoverable File System and cannot be removed. If this file is deleted, UniData continues to run, but you cannot start UniData if it stops for any reason until you either restore the system.status file from tape or execute cntl_install to re-create the file. In either case, you can only recover to the last completed archive. restart.newblk File

The restart.newblk file is located in /usr/ud72/include. This file is used internally by UniData when performing crash recovery, and cannot be removed. If this file is deleted, crash recovery will fail when you start UniData. restart.fileend File

The restart.fileend file is also located in /usr/ud72/include. This file is used internally by UniData when performing crash recovery, and cannot be removed. If this file is deleted, crash recovery fails when you start UniData.

cntl_install Command 2-5

Creating a Recoverable File

The ECL CREATE.FILE command with the RECOVERABLE keyword creates a file for use with the Recoverable File System.

You can define the following types of files as recoverable:

„ DATA „ DICT „ MULTIFILE „ DYNAMIC

Syntax:

CREATE.FILE [DICT | DATA] [DIR | MULTIFILE | MULTIDIR] file- name [,subfile] [modulo [,block.size.multiplier]] [TYPE hashtype] [DYNAMIC [KEYONLY | KEYDATA] [PARTTBL part_tbl]] [RECOVERABLE] [OVERFLOW] Note: The Recoverable File System does not protect DIR and MULTIDIR type files, or sequentially hashed files. You cannot specify the DIR or MULTIDIR option with the RECOVERABLE option. For more information on the ECL CREATE.FILE command, see the UniData Commands Reference. Warning: If you saved an account containing recoverable files using the ACCT_SAVE command, the acctrestore command, which restores accounts that have been saved with the ACCT_SAVE command, will not restore those files as recov- erable. You must execute udfile against the files you want to be recoverable once they are restored.

In the following example, the CREATE.FILE command creates the recoverable file MASTER with a modulo of 4. Since 4 is not a prime number, UniData changes the modulo to the next highest prime number, 5.

:CREATE.FILE MASTER 4 RECOVERABLE 4 is not a prime number, modulo changed to 5. Create file D_MASTER, modulo/1,blocksize/1024 Hash type = 0 Create file MASTER, modulo/5,blocksize/1024 Hash type = 0 Added "@ID", the default record for UniData to DICT MASTER

2-6 Administering the Recoverable File System on UNIX Disabling a Recoverable File

Use the DISABLE.RFS.FILE command to turn off the RFS flag in a recoverable file while UniData is running.

To make the file recoverable again, you must issue the udfile command with UniData shut down. For more information, see “udfile Command” on page 14.

Warning: Any updates made to the file after executing this command will not be recovered should you experience a system crash.

Syntax

DISABLE.RFS.FILE [DICT | DATA] filename [,subfile] [FORCE]

Parameters

The following table describes each parameter of the syntax:

Parameter Description

DICT Specifies only the DICT portion of the file. If you do not specify DICT or DATA, UniData acts on both the dict and data portions of the file.

DATA Specifies only the DATA portion of the file. If you do not specify DICT or DATA, UniData acts on both the dict and data portions of the file.

filename The name of the file for which you want to turn off the RFS flag. [,subfile]

FORCE Forces UniData to turn off the RFS flag wihout prompting for confirmation. DISABLE.RFS.FILE Parameters

Disabling a Recoverable File 2-7

forcecp Command

The system-level forcecp command allows you to force a checkpoint for your system. This allows you to create a forced checkpoint to register your system at a given point in time.

You can execute forcecp directly from the udtbin directory.

Syntax:

forcecp

In the following example, the forcecp command forces a checkpoint:

# forcecp CheckPoint time before ForceCP: Tue Apr 29 10:00:39 2004 .CheckPoint time after ForceCP: Tue Apr 29 10:58:09 2004 . has been forced successfully. #

2-8 Administering the Recoverable File System on UNIX mediarec Command

The mediarec command restores changes to your recoverable files by applying archives since the last backup.

Syntax:

mediarec [-s [MM::YY:]HH:MM[:SS]] [-e [MM:DD:YY:] HH:MM [:SS]] [-f path/filename] [-T start_LSN[,end_LSN]]

The following table describes each parameter of the syntax.

Parameter Description

[-s] Specifies the recovery start time. If you do not use the -s option, the whole archive set (from the last backup to current) is recovered.

[-e] Specifies the recovery end time. If you do not use the -e option, the whole archive set (from the last backup to current) is recovered.

[-f] Specifies a file that contains a list of files (one path and file name per line) to recover. If you do not use the -f option, mediarec recovers all files.

[-T] Specifies the starting LSN and the ending LSN for media recovery. If you only specify the starting LSN, mediarec will prompt for the next sequential LSN. mediarec Parameters

mediarec Command 2-9

In the following example, the mediarec command restores a database:

#mediarec

Using UDTBIN=/usr/ud72/bin

For media recovery, you would be required to have space for two temporary files, one to hold the largest archive file and another to hold the largest CP size. Please note the following , read documentation about media recovery procedure and re-start media recovery.

Max CP Size (in bytes): 54272 Max Arch File Size (in bytes): 4218880

Also, if you're planning to use the tape(s) created by archive process, please setup restore /usr/ud60/include/arch_restore properly (tape device) and load the first archive tape.

Do you want to continue?(y/n)[n]

All output and error logs have been saved to /usr/ud72/bin/saved_logs directory.

SMM is started. Starting media recovery... Please .

For media recovery, you'll be asked to upload archive files one by one by sequence number into the /usr/ARCH file.

de_arch: reading archive file on disk

The file TEST may have been deleted at OS level If you choose to not re-create this file now, the Media Recovery will be aborted to keep the system transaction consistent. Would you like it re-created? (y/n) [y]y Deleting file D_TEST. Deleting file TEST. Create file D_TEST, modulo/1,blocksize/1024 Hash type = 0 Create dynamic file TEST, modulo/5,blocksize/1024 Hash type = 1 Added "@ID", the default record for UniData to DICT TEST. ....

Please check /usr/ud72/FileInfo for un-recovered file level operations.

*****!!! Media Recovery Finished!!!*****

2-10 Administering the Recoverable File System on UNIX SM stopped successfully. SMM stopped successfully. Media Recovery finished.

Please use /usr/ud72/bin/startud to start the system

mediarec Command 2-11

startud Command

The UniData system-level command startud also starts the Recoverable File System if the SB_FLAG is set to 1 in udtconfig, and automatically recovers the files you defined as recoverable if a system crash occurs. This command starts the UniData background daemons, including the sm daemon, and reinitializes udtbin/sm.log.

Note: The last 20 sm.logs are appended in the sm.log file located in udtbin/saved_logs.

Syntax:

startud [-i] [-m]

The following table describes each parameter of the syntax.

Parameter Description

none Starts all the UniData processes in the correct order, checks to see if a system crash occurred, and automatically performs crash recovery if it is needed.

[-i] For this startup only, bypasses the automated crash recovery sequence.

-m Executes the ECL command mediarec to restore archived changes made since the last backup. See the mediarec command in this manual for more information on this command. startud Options

Warning: IBM recommends that you not use the -i option with startud unless IBM Technical Support instructs you to do so. Tip: You will not see any message on the screen if crash recovery is complete. Check udtbin/sm.log for information, especially if you notice anything unusual about the startup.

You should not have startud in your boot startup script because you will not be able to control crash recovery operations.

2-12 Administering the Recoverable File System on UNIX Example

In the following example, the startud command starts UniData and the Recoverable File System:

# startud

Using UDTBIN=/liz1/ud72/bin

All output and error logs have been saved to /liz1/ud72/bin/saved_logs directory.

SMM is started. SBCS is started. SM is started. RM is started. CLEANUPD is started. Unirpcd has already been started

UniData R7.2 has been started.

#

startud Command 2-13

udfile Command

To convert between nonrecoverable and recoverable files, use the udfile command. The UniData operating system level udfile command converts a nonrecoverable file to a recoverable file, or a recoverable file to a nonrecoverable file. If you do not specify an option, UniData returns the type of file (recoverable or nonrecoverable UniData file.) Any user can use this command to display the type of a file. If you specify an option for converting, you must log in as root.

Note: The udfile command does not convert files created in one-half kilobyte blocks. If you attempt to do so, UniData will generate an error message indicating that the file cannot be converted to a recoverable file. You must resize the file to at least a 1K block size using the UniData system-level memresize command. For information on ECL commands, see the UniData Commands Reference. Warning: You cannot execute this command to convert a file while the Recoverable File System is running. Make sure you have a verified backup of the file before you convert it. For best results, back up files, convert them, then perform a full backup and run cntl_install before letting users log on to the system.

Syntax:

udfile [-r | -s] filename

The following table describes each parameter of the syntax.

Parameter Description

[-r] Converts a nonrecoverable file to a recoverable file.

[-s] Converts a recoverable file to a nonrecoverable file. udfile Parameters

In the following example, the udfile command converts the INVENTORY file to a recoverable file:

# udfile -r INVENTORY Non-recoverable file 'INVENTORY' is changed to recoverable file. #

2-14 Administering the Recoverable File System on UNIX Daemons for the Recoverable File System sm

The startud command invokes the sm (system monitor) daemon. sm checks the integrity of your UniData installation—in other words, sm examines log files upon startup for evidence of a crash, then starts the appropriate processes as needed to restore integrity. sm also monitors processing for certain error conditions (for example, loss or unavailability of the disk containing before or after image logs) and stops UniData in as controlled a manner as possible if these conditions are detected. sm is the parent of the processes described in the following table.

Process Description

restart Crash recovery process. When you start UniData, sm checks for a system crash. If it detects one, sm starts this process to determine the current log set and applys before and after image logs to your recoverable files. mediarec also uses this process, with different parameters, to restore from archives.

cm Checkpoint manager. This process writes “dirty pages” (changed records) to disk, switches log sets between active and inactive status, and archives after image logs. It performs these steps according to a user-defined checkpoint interval, or as required by the system.

archive Archive process. This process writes after image log sets to archive files, marks log sets as archived and notifies the user if archives fill. You can save filled archives to tape automatically.

bimglog Before image log process. tm writes before images of blocks to the shared memory buffer for this process, and bimglog then writes the blocks to the before image log files on disk.

aimglog The after image log process. tm writes changes to records into the shared memory buffer for this process, and aimglog then writes the records to the after image log files on disk.

ar_backupd The archive backup daemon. Exists only if you turn on automated backup of archive files.

sync If you notice significant performance degradation during a checkpoint, you can start sync daemons, which periodically flush updated pages from the system buffer to the log files. Processes Invoked by the sm Daemon

Daemons for the Recoverable File System 2-15

tm

Each udt session has an associated tm daemon, a transaction manager with root privi- leges. tm daemons access recoverable files through the system buffer, which is organized in 1K pages. tm daemons perform all user access to recoverable files and log processes. If you have enabled Transaction Processing, all transaction semantics (such as ABORT and COMMIT) are also handled by the tm. Each tm runs with root privileges, so users other than root cannot a tm. If a tm daemon dies, sm detects this and stops UniData to preserve the integrity of the database.

bimglog

The number of bimglog processes matches the number of before image log files per log set. This matches the value of the configuration parameter N_BIMG.

aimglog

The number of aimglog processes matches the number of after image log files per log set. This matches the value of the configuration parameter N_AIMG.

archive

This daemon only exists if you have archiving enabled. This process writes the after image log sets to the archive files, marks the after image log files as archived, and informs the user if the archives fill.

ar_backupd

This daemon only exists if you turn on automated backup for your archives. If you have automated backup turned on, startud starts ar_backupd. ar_backupd then runs continuously, invoking your archive backup script whenever an archive file fills.

2-16 Administering the Recoverable File System on UNIX sync

If you notice significant performance degradation during a checkpoint, you can start sync daemons by setting the udtconfig parameters N_SYNC and SYNC_TIME. Sync daemons periodically flush updated pages from the system buffer to the log files, reducing the amount of time it takes to complete a checkpoint.

N_SYNC determines the number of sync daemons UniData starts. SYNC_TIME defines, in seconds, the amount of time the sync daemons wait before scanning the system buffer for updated pages.

Daemons for the Recoverable File System 2-17 Chapter Configuration Steps for Logging 3

How Logging Works ...... 3-3 About Log Files...... 3-3 About Checkpoints...... 3-4 How to Turn On and Configure Logging ...... 3-6 The first part of this chapter outlines how logging works and describes log files. The remainder of the chapter is a step-by-step procedure for you to use when setting up and configuring logging on your computer.

3-2

How Logging Works

Hardware and software failures, power loss, fire, or natural disaster can disrupt processing by causing loss of data files and making it difficult to restore a database to a consistent state. Some failures interrupt processing between the time a piece of data is entered into a system and the time it is recorded in the database. For these failures (called system crashes in this document), the before and after image logging protects data by recording information about changes to your recoverable files. If your system crashes, UniData uses the before image logs to restore your recoverable files to the state they were in at the last completed checkpoint. Then UniData applies the after image logs to restore up to the last completed update.

Note: If you have enabled Transaction Processing, UniData applies after image logs through the last complete transaction. UniData will not write a partial transaction to your database. You can enable before and after image logging without enabling archiving. However, IBM recommends that you enable both to provide the best protection for your recoverable data files.

About Log Files

The types of log files are:

„ Before image log files „ After image log files „ File-level log files

Before Image Log Files

When you update database files that you define as recoverable, UniData first writes a copy of the unaltered file blocks to a before image log file. If your system crashes, UniData can restore the database by first reading the before image log files and writing them back into the recoverable files. The Recoverable File System then updates the files with information from the after image log files.

3-3 Administering the Recoverable File System on UNIX After Image Log Files

When you update database files that you define as recoverable, UniData does not write the changes directly to your database files. Instead, UniData records the changes in an after image log file and to the system buffer. Periodically, UniData flushes the system buffer pages to update your database. If your system crashes, UniData can recover your files to a state that existed before the crash by first reading the before image log files and writing them back into the recoverable files, then reading the after image log files and writing the changes recorded in them back into the recoverable files.

File-Level Log Files

A file-level log file stores a record of operations that affect an entire file rather than affecting the contents of a file. Commands that produce entries in a file-level log file include CREATE.FILE, DELETE.FILE, .FILE, CNAME, CREATE.INDEX, DELETE.INDEX, BUILD.INDEX, ENABLE.INDEX, and DISABLE.INDEX. During crash recovery, UniData uses the file-level log file to recover certain of these actions, and prompts you to redo the ones it cannot restore.

Crash recovery attempts to recover CLEAR.FILE and any completed file-level operations automatically, except for index operation. If a file-level operation is incomplete, UniData prints a message in the FileInfo file located in udthome. Media recovery attempts to recover all file-level operations except for index operations.

About Checkpoints

When a user-configurable checkpoint interval is reached (or when any log file reaches 80 percent full) the cm (checkpoint manager) process performs the following series of actions:

1. Checks the system buffer to see if any updates were performed since the last checkpoint. If so, continues with step 2. If not, waits for next checkpoint interval (or next time a log file reaches 80 percent full). 2. Blocks tm processes from initiating new transactions. 3. Sends messages to all aimglog and bimglog processes to flush all pages from their buffers to the log files on disk. 4. Switches log sets, defining the set just written to as inactive and activating the second set.

How Logging Works 3-4

5. Marks all “dirty pages” in the system buffer. (A “dirty page” indicates that updates have been made but not yet copied to disk.) 6. Wakes up tm processes, allowing them to initiate new transactions. 7. If archiving is enabled, tells the archive process to save the after image logs. 8. Flushes “dirty pages” to disk, updating your database. Note: Because tm processes are blocked from starting new transactions during a checkpoint, setting the checkpoint interval too short may impact system performance. You need to balance that impact against the fact that a longer checkpoint interval means more space required for log files.

3-5 Administering the Recoverable File System on UNIX How to Turn On and Configure Logging

Tip: If you need to improve system performance, consider locating your before image logs on a different disk than your after image logs. You may also want to create the log files as raw disk files, that is, files on a disk partition or device that is not mounted in the UNIX file system. The impact of using raw disk depends on your operating system; refer to your host operating system documentation for information about using raw disk.

1. Perform a Full Backup of Your System

Make sure your backup follows symbolic links for large dynamic files. Do not start UniData after the backup.

2. Decide How Many Log Files

You need two sets of log files and a file-level log. Each set should contain at least two before image logs and at least two after image logs.

Tip: Multiple before and after image logs in each set allow your system to perform simultaneous writes to logs for improved performance. If you expect RFS to be heavily loaded, consider having more than two before and/or after image logs per set.

3. Determine the Size of Log Files

There is not a set formula for calculating the size of your logs. As an initial estimate, multiply the number of records expected for update during one checkpoint interval by the largest record size. This will give you an approximation of the number of bytes needed for your logs. Divide this number by the block size you have chosen for your log files and you have an estimate of the number of blocks needed.

For optimal performance, IBM recommends that each of your before image and after image logs use the UNIX file system block size. To determine the UNIX file system block size, use the UNIX df command. If you cannot determine your UNIX file system block size, IBM recommends using 4096.

Note: IBM recommends you not use odd block sizes, such as 3K, 5K, 7K, or 9K in your logconfig file.

How to Turn On and Configure Logging 3-6

Warning: If the block size in your logconfig file exceeds 4096, you must also increase the AIMG_BUFSZ and BIMG_BUFSZ configuration parameters. These parameters must be a multiple of the block size defined in logconfig, and cannot exceed the log block size multiplied by the log length.

The size of the file-level log depends on the udtconfig parameter NUSERS. Check the udtconfig file to see the setting for this parameter, as shown in the following example:

% /usr/ud72/include % NUSERS udtconfig NUSERS=38 Determine the size of the file-level log by adding one to the value of NUSERS. In the example shown, the file-level log size must be 39.

Remember to change the size of your file-level log and run cntl_install if you change the value of NUSERS.

4. Determine Where to Logs

Logs may be located on raw disk partitions or within UNIX file structures. You can put several logical log “files” into a single actual file. To do this, specify the same file name for each log, but specify different start blocks. Make sure that the logs do not overlap.

Note: You should place your logs on a different physical device from your recov- erable data files. That way, a single disk failure will not affect both logs and data. To improve performance, you can also put the groups of before image and after image logs on different devices, but make sure you designate the same block size for all log files.

5. Determine Where to Locate Log Overflow Files

The LOG_OVRFLO parameter, located in the udtconfig file, lets you define a directory (use the absolute path) for overflow from log files. For best results, pick a directory that:

„ Has adequate disk space available. „ Is on a different physical device from your recoverable files.

3-7 Administering the Recoverable File System on UNIX UniData allows log files to overflow to a path defined by the LOG_OVRFLO parameter in /usr/ud72/include/udtconfig. This parameter lets you define a directory path for overflow from log files. Be sure you specify a path that has disk space available. For best results, specify a path that is on a different physical device from your recoverable files. If you install UniData with the Recoverable File System option, the LOG_OVRFLO parameter is automatically set in /usr/ud72/include/udtconfig. The default path is udthome/logs/log_overflow_dir.

Warning: If you have not defined LOG_OVRFLO or the path defined does not exist and your log files overflow unexpectedly, UniData shuts itself down to protect its integrity. A correlation exists between the size of your transaction and the overflow behavior of log files. Since it is virtually impossible to determine when log files might overflow, you should allow for the possibility. IBM recommends that you locate the LOG_OVERFLO directory on a separate file system from your log files. Make sure that the path defined for LOG_OVRFLO is a directory and not a file. If the path is not a directory, UniData will shut down should your log files overflow.

6. Create the Log Configuration Table

If you install UniData with the Recoverable File System option, UniData creates a logconfig file in the /usr/ud72/include directory. This logconfig file is a minimum configuration intended only to allow UniData to start with the Recoverable File System invoked. You must edit this file to define the proper path and size of your log files. If you did not install UniData with the Recoverable File System option, you must manually create the logconfig file before UniData will start with the Recov- erable File System invoked.

Log in as root. Use vi or another UNIX text editor to create or edit the logconfig file. Each line represents a log file. The default path for this table is /usr/ud72/include/logconfig.

How to Turn On and Configure Logging 3-8

Each line in the logconfig file represents a log file. You must define five attributes for each log file that must appear in the logconfig file.

Field Description

Filename The full path and filename of the log file. You can use any file name.

Flag Select the appropriate flag: 021 describes an after image log 022 describes a before image log 0120 describes a file-level log

Blocksize The block size of the log file. A UNIX block size should be a multiple of the file system block size. A raw disk file should be a multiple of the disk sector size. IBM recommends a 4096-byte block size for both UNIX file system and raw disk logs. The block size cannot exceed 16384.

Start Block The start block offset in the log file.

Log Length The log file size as specified by the number of blocks in the log file. Before and After Image Log File Fields

Use the following format for each line. Separate attributes with tabs:

filenameflagblock_sizestart_blocklog_length

Make entries in the log configuration table in this order:

1. First group of after image logs. 2. First group of before image logs. 3. Second group of after image logs. 4. Second group of before image logs. 5. File-level log file.

Log Configuration Table Examples

The following examples illustrate log configuration tables using UNIX file system log files and raw disk files.

3-9 Administering the Recoverable File System on UNIX Log Configuration Table Using UNIX Files and Default Settings

The first example shows a log configuration table for log files that are on a UNIX file system.

% more /usr/ud72/include/logconfig /ud6/log/a_0000 021 4096 0 1024 /ud6/log/a_0001 021 4096 0 1024 /ud6/log/b_0000 022 4096 0 1024 /ud6/log/b_0001 022 4096 0 1024 /ud6/log/a_0002 021 4096 0 1024 /ud6/log/a_0003 021 4096 0 1024 /ud6/log/b_0002 022 4096 0 1024 /ud6/log/b_0003 022 4096 0 1024 /ud6/log/f_0000 0120 4096 0 39 This log configuration table identifies two sets of log files, as indicated in the following table.

Row Contents

1 and 2 First log set; after image logs.

3 and 4 First log set; before image logs.

5 and 6 Second log set; after image logs.

7 and 8 Second log set; before image logs.

9 File-level log. Log Configuration Table Description Notice that the starting block offset is zero, because each log is a separate UNIX file.

Customized Log Configuration Table

The following example shows a log configuration table for log files that are raw disk files:

How to Turn On and Configure Logging 3-10

Warning: IBM recommends that the start block offset is not 0 for raw disk files. % more /usr/ud72/include/logconfig /dev/rdsk/0s4 021 4096 1024 1024 /dev/rdsk/0s4 021 4096 2048 1024 /dev/rdsk/0s4 021 4096 3072 1024 /dev/rdsk/0s4 022 4096 4096 1024 /dev/rdsk/0s4 022 4096 5120 1024 /dev/rdsk/0s4 022 4096 6144 1024 /dev/rdsk/0s4 022 4096 7168 1024 /dev/rdsk/0s4 021 4096 8192 1024 /dev/rdsk/0s4 021 4096 9216 1024 /dev/rdsk/0s4 021 4096 10240 1024 /dev/rdsk/0s4 022 4096 11264 1024 /dev/rdsk/0s4 022 4096 12288 1024 /dev/rdsk/0s4 022 4096 13312 1024 /dev/rdsk/0s4 022 4096 14336 1024 /dev/rdsk/0s4 0120 4096 15360 39 This log configuration table defines two sets of logs. Each set contains three after image logs and four before image logs, as shown in the following table.

Row Contents

1 through 3 First log set; after image logs.

4 through 7 First log set; before image logs.

8 through 10 Second log set; after image logs.

11 through 14 Second log set; before image logs.

15 File-level log. Log Configuration Table Description

Notice that, because the logs are part of a raw disk partition rather than a file system, the starting block offset is required.

Note: The log sets must match each other. The number and size of before image logs in the first set must be the same as the number and size of before image logs in the second set, and the number and size of after image logs in the first set must be the same as the number and size of after image logs in the second set. Within each set, though, the number of before image logs can be different from the number of after image logs. The before image logs may not fill at the same rate as the after image logs. Before image logs contain blocks and after image logs contain records, and one before image block can correspond to a large number of after image records. During performance tuning, you may want to explore defining different numbers of before image and after image logs.

3-11 Administering the Recoverable File System on UNIX Note: If you create your log configuration table in a location other than the default, you need to set the LOGCONFIG to the absolute path of your log configuration table.

Syntax for C shell:

setenv LOGCONFIG /directory/logconfig

Syntax for Bourne or Korn shell:

LOGCONFIG=/directory/logconfig;export LOGCONFIG Tip: If you are using the LOGCONFIG environment variable to identify the location of your log configuration table, that environment variable must be correctly set whenever you start UniData with the startud command. Consider setting the environment variable in a startup script.

7. Match udtconfig Parameters for Log Files

The number of before image logs and after image logs in each log set must match the values of the N_BIMG and N_AIMG udtconfig parameters. By default, those values are set at 2. To view the values of N_BIMG and N_AIMG on your system, use the UNIX pg or cat command.

You have to change N_BIMG and N_AIMG in the udtconfig file if you decide to use more than two before or after image logs. To change the values of N_BIMG and N_AIMG, use any text editor to edit the parameters in the udtconfig file.

Note: Refer to the examples previously shown. In the first example, each log set contains two before image logs and two after image logs. The default parameters match that example. In the second example, each log set contains three after image logs and four before image logs. For that example, N_AIMG should be reset to 3 and N_BIMG should be reset to 4.

How to Turn On and Configure Logging 3-12

8. Turn on SB_FLAG

If you installed UniData with the Recoverable File System option, the SB_FLAG in udtconfig is automatically set to 1. If you did not install UniData with the Recov- erable File System option, you must manually set the flag to turn on the Recoverable File System. Make sure it looks like the following example:

% cd /usr/ud72/include % grep SB_FLAG udtconfig # turned on by setting SB_FLAG to a positive value. SB_FLAG=1 Note: If the SB_FLAG is set to zero (off), all files are handled as UniData nonrecoverable files. You can perform any function on them, but you will not have logging or crash recovery. If you attempt to create recoverable files with SB_FLAG off, UniData displays an error message, and ignores the RECOVERABLE keyword.

9. Verify CHKPNT_TIME and GRPCMT_TIME

The default setting for CHKPNT_TIME is 300 seconds, and the default setting for GRPCMT_TIME is 5 seconds. Display the current values of the parameters as shown in the following example:

cd /usr/ud72/include % grep _TIME udtconfig CHKPNT_TIME=300 GRPCMT_TIME=5 Note: Make the GRPCMT_TIME than CHKPNT_TIME. Otherwise, you will not get any benefit from defining GRPCMT_TIME. If you want updates to write immedi- ately to the log files without going through the local buffer, set GRPCMT_TIME to 0. Remember, however, that setting GRPCMT_TIME to 0 may negatively impact system performance. IBM recommends you keep your CHKPNT_TIME at 300 seconds. The lower the value of CHKPNT_TIME, the more often checkpoints occur, which could impact system performance.

If you are going to turn on archiving now, proceed to Chapter 4, “Configuration Steps for Archiving.” Otherwise, proceed to step 10.

10. Run the cntl_install Command

This command allocates space for your log files and initializes counters.

3-13 Administering the Recoverable File System on UNIX Note: Make sure UniData is not running and you log in as root before running cntl_install.

The following example shows the output from the cntl_install command:

# cd $UDTBIN #./cntl_install

cntl_install utility resets Unidata System after a full database backup (Image Copy). This means, all log (and archive) files will also be initialized for re-use.

Do you want to continue?(y/n) [n] y

Installing Logs (and Archives) after cntl_install ...... If you want to create or convert data files at this time, proceed to Chapter 5, “Creating and Configuring Recoverable Files.” Otherwise, proceed to step 11.

11. Start UniData

Use startud with no options to implement your new udtconfig parameters. The following screen shows the startud command:

# startud

Using UDTBIN=/liz1/ud72/bin

All output and error logs have been saved to /liz1/ud72/bin/saved_logs directory.

SMM is started. SBCS is started. SM is started. RM is started. CLEANUPD is started. Unirpcd has already been started

UniData R7.2 has been started.

#

How to Turn On and Configure Logging 3-14 Chapter Configuration Steps for Archiving 4

How Archiving Works ...... 4-3 How to Turn On and Configure Archiving ...... 4-4 Managing Archive Backup ...... 4-16 Backing Up Archives Manually...... 4-16 Backing Up Archives Automatically ...... 4-17 This chapter provides a brief description of the archiving function, and then provides a step-by-step procedure you can use to turn on and configure archiving on your system. The chapter ends with a section explaining how to manage archive backup.

4-2

How Archiving Works

A variety of conditions can cause a file or files to become unreadable unless restored from a backup. For these failures (called media crashes in this document), the archiving feature protects recoverable files by maintaining a complete record of updates since your last backup. You can follow the procedures for restoring your recoverable files from backup, then use the mediarec utility to apply archives, bringing your database to the state it was in when the last archive was written. UniData updates the current archive file with the latest after image log files when UniData is started. You should execute startud with no options before executing mediarec.

Note: You can enable before and after image logging without enabling archiving. However, IBM recommends you enable both to provide the best protection for your recoverable data files.

4-3 Administering the Recoverable File System on UNIX How to Turn On and Configure Archiving

You turn on archiving by creating archive files and an archive configuration table, and by setting parameters in the udtconfig file. The archive configuration table acts as an index to the archive files, and udtconfig parameters tell UniData to use the archiving feature. You also need to decide how to save your archive files, and you need a media configuration file for use during recovery.

Tip: If you need to improve system performance, IBM recommends you create the archive files (not the archive configuration table) as raw disk files, that is, files on a disk partition or device that is not mounted in a UNIX file system. The impact of using raw disk depends on your hardware and software environment. Refer to your host operating system documentation for information about using raw disk.

To turn on archiving, complete the following steps:

1. Perform a Full Backup of Your System

If you just finished turning on logging, you can skip this step and go directly to step 2. Otherwise, create and verify a full backup. Make sure your backup follows symbolic links for large dynamic files. Do not start UniData at the end of the backup; UniData must be down for the remainder of the steps.

2. Determine How Many Archive Files

The default number of archive files is 2. IBM recommends you use at least two archive files. You may want to use more than two, especially if you know your Recoverable File System will be heavily loaded.

Note: Archiving is different from logging. There is only one archive process. It writes to your first archive file until it is full, then moves to the next, and so on. When the process runs out of space in the last file, it checks to see if the first archive file has been backed up to reliable storage. The archive process does not overwrite a file until the file is backed up. You cannot get simultaneous writes if you use multiple archive files, but you may get more time to back them up.

How to Turn On and Configure Archiving 4-4

3. Determine How Large Archives Should Be

Because the purpose and function of archive files are different from those of log files, archive files need to be larger than log files. Log files are normally overwritten several times in an hour, whereas archive files need to hold a large number of changes.

Refer back to your log configuration table. Your smallest archive file must be larger than one full set of after image log files. If it is not, cntl_install will fail and display an error message.

Note: Unlike log files, archive files cannot overflow.

4. Determine Where to Locate Archive Files

Note: Because the purpose of the archiving system is to protect you from media failure, you will get the best results if you locate archive files on a separate physical disk from your data files and your before and after image logs. The optimum configuration places data, logs, and archives each on a separate physical device.

If you are using raw disk for your archives, you can put several logical archive “files” into a single actual file. To do this, specify the same filename for each archive file, but specify different start blocks. Make sure that the archive files do not overlap.

5. Create the Archive Configuration Table

Log in as root, and use vi or another UNIX text editor to create the archive configu- ration table. The default path for the archive configuration table is /usr/ud72/include/archconfig.

4-5 Administering the Recoverable File System on UNIX Each line in the archive table represents an archive file. There are four attributes for each archive file that must appear in the archconfig file.

Field Description

Filename The full path and file name of the archive file. You can use any file name.

Blocksize The block size of the archive file. A UNIX file size should be a multiple of the file system block size. A raw disk file should be a multiple of the disk sector size. IBM recommends a 4096-byte block size for both UNIX file system and raw disk archives. The block size cannot exceed 16,384.

Start Block The start block offset in the archive file.

Log Length The archive file size as specified by the number of blocks in the archive file. Archive File Fields

When entering the archconfig file, separate attributes with tabs. For example:

filenameblock_sizestart_blocklog_length

Archive Configuration Table Examples

The next example shows an archive configuration table for two archive files that are files in a UNIX file system:

% more /usr/ud72/include/archconfig /ud6/archive/arch0 4096 0 8192 /ud6/archive/arch1 4096 0 8192 The following example shows an archive configuration table for four archive files that are raw disk files:

Warning: IBM recommends that the start block offset is not 0 for raw disk files. % more /usr/ud72/include/archconfig /dev/rdsk/0s5 4096 8192 8192 /dev/rdsk/0s5 4096 16384 8192 /dev/rdsk/0s5 4096 24576 8192 /dev/rdsk/0s5 4096 32768 8192 Note: If you are using raw disk partitions, and you put your archive files on the same raw disk partition as your log files, do not give the archive files the same name and start_block as the log files. Give your archive files start_blocks that will not conflict with each other, or with your log files.

How to Turn On and Configure Archiving 4-6

If you create your archive configuration table somewhere other than the default location, set the environment variable ARCHCONFIG to the full path of your table. Syntax (C shell): setenv ARCHCONFIG /directory/archconfig Syntax (Bourne and Korn shell): ARCHCONFIG=/directory/archconfig;export ARCHCONFIG Note: If you are using the ARCHCONFIG environment variable to identify the location of your archive configuration table, that environment variable must be correctly defined whenever you start UniData with the startud command. Consider setting the environment variable in a startup script.

6. Match Configuration Parameter for Archives

The number of archive files you use must match the value of the parameter N_ARCH in the udtconfig file. Display the value of the parameter as shown in the following example:

% cd /usr/ud72/include % showconf |grep N_ARCH N_ARCH=2 % Use vi or another UNIX text editor to modify N_ARCH in udtconfig if it is different from your number of archives. For more information, see Chapter 7, “Media Crash Recovery.”

7. Create the Media Configuration File

During recovery from a crash, UniData needs a media configuration file to determine where on your system to load archive files and after image logs before applying them to your database. When you execute the mediarec utility, you load archive files, one at a time, into an area defined by the media configuration parameter TMP_ARCH_SPACE. The utility then copies the contents of the archive, one check- point at a time, into a working area defined by the media configuration parameter TMP_CP_SPACE.

The default path for the media configuration file is /usr/ud60/include/mediaconfig. The file contains absolute paths for TMP_ARCH_SPACE and TMP_CP_SPACE.

4-7 Administering the Recoverable File System on UNIX Use vi or another UNIX editor to create the mediaconfig file. The following example shows a media configuration file where both paths are defined in /usr:

% more /usr/ud72/include/mediaconfig TMP_ARCH_SPACE=/usr/ARCH TMP_CP_SPACE=/usr/CP Note: You need to define absolute paths including file names. You do not have to create the files. The mediarec utility creates the files during execution.

Locate TMP_ARCH_SPACE on a disk with enough space to hold the largest archive file. To calculate the amount of space needed (in bytes), refer to your archive config- uration table, identify the largest file, and multiply the block size by the file length.

Locate TMP_CP_SPACE on a disk with enough space to hold the largest set of after image log files, with some room left over for log file overflow. To calculate the amount of space needed, complete the following steps:

„ Refer to your log configuration table. Add together the log lengths of each after image file in your first log set. „ Multiply the of lengths by the block size to calculate the space (in bytes) you need.

Make sure the location of TMP_CP_SPACE has additional space available to handle log overflows.

Note: The default path for the media configuration file is /usr/ud60/include/media- config. You can specify a path other than the default by defining the environment variable MEDIACONF. Syntax (C shell): setenv MEDIACONF /directory/mediaconfig Syntax (Bourne or Korn shell): MEDIACONF=/directory/mediaconfig;export MEDIACONF Tip: If you are using the MEDIACONF environment variable to identify the location of your media configuration file, that environment variable must be set correctly whenever you run mediarec. Consider setting the environment variable in a startup script.

How to Turn On and Configure Archiving 4-8

8. Determine a Backup Method for Archives

You need to back up archive files to reliable storage so they will be available for recovery. UniData includes an optional automated archive backup utility that minimizes the need for user intervention.

If the automated backup utility is turned on, a daemon called ar_backupd invokes a user-customized backup script in the background whenever an archive file fills. No user input is required unless there is a problem (your script fails for some reason, or an archive takes too long to back up). ar_backupd writes certain messages to udtbin/sm.log. UniData also writes messages that require intervention to the window where you executed startud (if it is available) and to the system console.

The automated backup and restore scripts write messages to files called arch_backup.out and arch_restore.out respectively, in the same directory as the scripts. Use these to monitor and verify your backups and restores.

If you do not want to use the automated utility, you can back up the files as they fill or back up the full set when it fills. The archive process assigns logical sequence numbers to the archive files as they fill. These are used by mediarec to apply the archives in order when recovering from media failure.

If you want to turn on the automated utility, go to step 9a. If you want to back the files up manually, go to step 9b.

9a. Turn On Automated Archive Backup

Change the value of the parameter ARCHIVE_TO_TAPE in the udtconfig file to 1. Use vi or another UNIX editor to modify the parameter.

4-9 Administering the Recoverable File System on UNIX The following example shows a sample of RFS parameters in the udtconfig file:

# Section 3 RFS related parameters # These parameters are only used for RFS which is turned by # setting SB_FLAG to a positive value. # # 3.1 RFS flag SB_FLAG=1

# 3.2 File related parameters BPF_NFILES=80 N_PARTFILE=500

# 3.3 AFT related parameters N_AFT=200 N_AFT_SECTION=1 N_AFT_BUCKET=101 N_AFT_MLF_BUCKET=23 N_TMAFT_BUCKET=19

# 3.4 Archive related parameters ARCH_FLAG=1 N_ARCH=3 ARCHIVE_TO_TAPE=1 ARCH_WRITE_SZ=0

# 3.5 System buffer parameters N_BIG=233 N_PUT=8192

# 3.6 TM message queue related parameters N_PGQ=10 N_TMQ=10

# 3.7 After/before image related parameters N_AIMG=2 N_BIMG=2 AIMG_BUFSZ=102400 BIMG_BUFSZ=102400 AIMG_MIN_BLKS=10 BIMG_MIN_BLKS=10 AIMG_FLUSH_BLKS=2 BIMG_FLUSH_BLKS=2 LOG_OVRFLO=/disk1/ud52/log/log_overflow_dir

# 3.8 Flushing interval related parameters CHKPNT_TIME=300 GRPCMT_TIME=5

# 3.9 Sync Daemon related parameters N_SYNC=0 SYNC_TIME=0

How to Turn On and Configure Archiving 4-10

10a. Set Up Backup and Restore Scripts

Set up backup and restore scripts. UniData provides default scripts, which use the UNIX dd command to back your archive files up to tape. The default paths for these are:

„ /usr/ud72/include/arch_backup „ /usr/ud72/include/arch_restore

If you use these default scripts, modify them to set the archive backup device ($TAPEDEV in each script) to the actual UNIX device name.

You can create your own backup and restore scripts if you prefer. Be sure the scripts are compatible, so that archive backups can be read by the restore script. Use the standard code scheme: code 0 indicates a successful exit, and a nonzero code indicates a failure.

You may decide to use the automated backup utility, but back your archives up to disk instead of tape. Use the following sample script as a template to create your own. Again, make sure that exit code 0 always means successful completion.

#!/bin/sh # # This script is used to back up archive files to disk when they fill. # If set up properly, this will be invoked automatically. It only works # if each archive is a separate actual file. # # The script makes a copy of the archive file, appending the LSN, in the # directory specified by ARCH_SAVE. # ARCH_SAVE=/tmp/archive_save

dd if=$1 of=$ARCH_SAVE/arch_$5 skip=$2 count=$3 ibs=$4 obs=$4 conv=sync STAT=$? if [ $STAT !=0 ] then "arch_backup of lsn ${5} failed" exit $STAT fi

echo "Archive file $1 (lsn $5) successfully backed up." exit 0

4-11 Administering the Recoverable File System on UNIX Use a similar approach to develop a script for restoring the archives you save to disk. Your backup and restore scripts should be compatible. An example of a restore scripts follows:

ARCH_SAVE=/tmp/archive_save

cp $ARCH_SAVE/arch_$2 $1 STAT=$? exit $STAT If you decide to back archives up to disk, use vi or another UNIX text editor to create your shell scripts. If you use the default path and file names, rename the default scripts for safekeeping before you create yours.

Note: If the path and file names for your archive and restore scripts are different from the default, identify them with environment variables or add them as parameters in the udtconfig file. Syntax (C shell): setenv ARCH_BACKUP /directory/arch_backup setenv ARCH_RESTORE /directory/arch_restore Syntax (Bourne or Korn shell): ARCH_BACKUP=/directory/arch_backup;export ARCH_BACKUP ARCH_RESTORE=/directory/arch_restore;export ARCH_RESTORE

The automated backup and restore scripts create files called arch_bacfkup.out and arch_restore.out respectively, in the same directory as the scripts. Use these to monitor and verify your backups and restores.

Tip: If you are using environment variables to identify the location of your arch_backup and arch_restore scripts, those environment variables must be correctly defined whenever you run mediarec. Consider setting the environment variables in a startup script.

11a. Set Up the Archive Tape Device

If you are backing up to tape, IBM recommends that you make a dedicated device available for fast and dependable backups. The sample scripts do not open the device exclusively, so if another application uses the device, archive files may be overwritten.

How to Turn On and Configure Archiving 4-12

Note: Messages for automated backup display in sm.log, in the window from which you executed startud (if available), or at the console (if WRITE_TO_CONSOLE is set to 1 in the udtconfig file). You should check them from time to time; if your backup script fails for any reason, you need to intervene. All messages are displayed in sm.log. Messages that require response also go the startud window or the console. Tip: Use the UNIX -f command if you are monitoring the sm.log. This command allows you to see messages as they are written to the log.

If you are backing up to disk, you need to save the backed-up files from disk to tape periodically.

You can now skip to step 12.

9b. Backing Up Archives Manually

If you do not turn on automated backup, you need to back up your archives whenever the set fills. The system displays messages in one or more of the following:

„ To the console (if you have set the WRITE_TO_CONSOLE parameter to 1 in the udtconfig file). „ In the udtbin/sm.log file. „ To the terminal (or window) from which you executed startud, if that terminal or window remains available.

All messages are written to sm.log. Messages requiring intervention also go to the startud window and the console.

During initial setup, you should determine how you want to set up your system to be sure of seeing the messages. If you do not start UniData from a terminal (for example, if UniData starts automatically when you boot your system) and your configuration does not include a console, you have to monitor udtbin/sm.log on a regular basis to receive the messages. If all the archives fill, and you do not back them up, eventually UniData processing will stop.

Tip: If you are monitoring sm.log, use the UNIX tail -f command. This allows you to see messages as they are written to the log.

4-13 Administering the Recoverable File System on UNIX 12. Turn On ARCH_FLAG

Set the udtconfig parameter ARCH_FLAG to 1 to turn on archiving. Make sure it looks like the following example:

% cd /usr/ud72/include % grep ARCH_FLAG udtconfig ARCH_FLAG=1 %

13. Run cntl_install

This command allocates space for your archive files and initializes the logical sequence numbers in the udt.control.file. The cntl_install command initializes both logging and archiving. The following example shows the cntl_install command:

# cd $UDTBIN #./cntl_install

cntl_install utility resets Unidata System after a full database backup (Image Copy). This means, all log (and archive) files will also be initialized for re-use.

Do you want to continue?(y/n) [n] y

Installing Logs (and Archives) after cntl_install You must log in as root and stop UniData with stopud before you run cntl_install. If you plan to convert your data files at this time, proceed to Chapter 5, “Creating and Configuring Recoverable Files.” Otherwise, go to step 14.

How to Turn On and Configure Archiving 4-14

14. Start UniData

Use startud with no options to implement your new udtconfig parameters. The following screen shows the startud command:

# startud

Using UDTBIN=/liz1/ud72/bin

All output and error logs have been saved to /liz1/ud72/bin/saved_logs directory.

SMM is started. SBCS is started. SM is started. RM is started. CLEANUPD is started. Unirpcd has already been started

UniData R7.2 has been started.

#

4-15 Administering the Recoverable File System on UNIX Managing Archive Backup

This section describes messages that pertain to backing up archives, and indicates the actions required.

Backing Up Archives Manually

If you do not turn on the automated backup utility, you must back up your archive files whenever the set fills.

As each file fills, UniData writes a message to sm.log. The message looks like the following example:

The archive file /ud6/archive/a_0000 is full. The Logical Sequence Number (LSN) of this archive is -- 0 If you are monitoring sm.log, you can back up the file as soon as you see the message, or wait until all the files fill. When all the files fill, UniData writes a message to sm.log, the console, and the startud window. This message looks like the following example:

The current set of archive files is FULL.

You may want to make sure that these files have been saved.

Also, label them from Logical Sequence Number (LSN) 0 thru 3 in the order they appear in /usr/ud72/include/archconfig file.

Please create /usr/ud72/include/DONE file when done... When you see this message, you need to back the files up promptly, because processing stops when all archives are full. Once you complete the backup and create the DONE file, processing resumes.

Note: The message is the same regardless if you backed up the files as they filled. Whether you back them up one at a time or after the set fills, you need to create the DONE file before UniData can begin reusing the archives.

The following example shows how to create the DONE file:

% /usr/ud72/include/DONE

Managing Archive Backup 4-16

You can back up the files to tape, disk, or any reliable storage. Because writing to tape can be a slow process, you may want to consider copying your archive files to another location on disk and creating the DONE file, then off-loading the files to tape at your convenience.

Backing Up Archives Automatically

With automated backup, the need for intervention should be rare. You need to intervene if an archive backup takes longer than 10 minutes, or if your script fails.

Tip: If you are using automatic archive backup, remember that writing to tape can be a slow process. Consider using automatic backup to copy your archives to disk files rather than tape. You can off-load the disk files to tape at your convenience.

The system displays messages about archive backup in udtbin/sm.log, in the window where you ran startud (if it is available) and at the system console. All messages go to sm.log. Messages that require you to take action also go to the startud window and the console.

When the automated backup is running without problem or delay, UniData only writes messages to the sm.log.

As each file fills, a message is written to the sm.log as shown in the following example:

The archive file /ud6/archive/a_0000 is full. The Logical Sequence Number (LSN) of this archive is -- 0 When the ar_backupd process begins to off-load the file, UniData writes the following message to the sm.log:

ar_backupd: starting to offload archive /ud6/archive/a_0000 (LSN = 0) At this point, your archive backup script is running, while UniData writes to the next archive file in your archive configuration table.The system monitors the backup process, and writes messages like the following example to the sm.log at intervals until the process completes:

ar_backupd: waiting for the offloading of file (LSN = 0) to complete.

4-17 Administering the Recoverable File System on UNIX When the first backup completes, ar_backupd writes a message to sm.log, then checks to see if the next file is ready to off-load. When the next file is ready, the backup begins, as shown in the following example:

ar_backupd: file (LSN = 0) off-loaded.

The archive file /ud6/archive/a_0001 is full. The Logical Sequence Number (LSN) of this archive is -- 1

ar_backupd: starting to offload archive /ud6/archive/a_0001 (LSN = 1)

Slow Backups

If backing up an archive file takes more than 10 minutes, UniData writes a message to the sm.log, to the startud window, and to the console. The message looks like the following example:

ar_backupd: The /usr/ud72/include/arch_backup script (pid 75733) off-loading the file /ud6/archive/a_0001 (LSN 1) has taken more than 10 minutes without completing. Check the status of the script and take appropriate corrective action. The output of the script can be found in the file /usr/ud72/include/arch_backup.out. The archive file /ud6//a_0001 can not be off loaded. You may want to make sure that this file has been saved. Also, label it Logical Sequence Number (LSN) -- 1 Please create /usr/ud72/include/DONE file (as root) when done... At this point, complete the following three steps:

1. As root, use the UNIX kill command to kill the archive backup script process. (Its process ID was displayed). Kill any children of that process. 2. Back up the archive file manually, and label it with the correct LSN. 3. Create a DONE file. The following example shows how to create the DONE file: % touch /usr/ud72/include/DONE After you back the file up manually and create the DONE file, ar_backupd begins backing up the next log file that’s full.

Managing Archive Backup 4-18

If your archive backup script off-loads files slower than they are written to, you may encounter the situation where UniData needs to write to an archive that has not yet been backed up. UniData generates messages similar to the following examples:

ar_backupd: off-loading of archive /ud6/archive/a_0000 (lsn 0) must be completed before system can progress. This first message displays at the startud window, the console, and in the sm.log. The following message appears in sm.log:

ARCH: waiting for the file (/ud6/archive/a_0000) to be off- loaded... At the point where these messages display, UniData processing will wait until the file is off-loaded. If the backup completes, processing will automatically resume.

Note: While UniData processing is waiting, many UniData commands (even including stopud) are blocked. It will appear as if the system is hung.

If you see these messages, and the off-load does not complete in a few minutes, perform the following steps:

1. Check the status of the backup script process. If the process is running normally, proceed to step 2. Otherwise, go to step 3. 2. Check the output file arch_backup.out. If it indicates the backup has failed, proceed to step 4. If it does not indicate a problem, you can either wait longer for the backup to complete or go to step 4. 3. Identify and resolve external problems. If it is possible to unblock the process, do so. UniData should resume normal processing. If you cannot unblock the process, go to step 4.

4-19 Administering the Recoverable File System on UNIX 4. Log on as root and use the UNIX kill command to kill the backup script process. Kill any child processes of the backup script process. Make sure no copies of the backup script are running. Refer to your host operating system documentation for information about checking process status, unblocking a process, and killing a process. If you kill the backup script process, UniData displays additional instruc- tions to the sm.log, the startud window and the console. The instructions look like the following example: The archive file /ud6/archive/a_0000 can not be off loaded. You may want to make sure that this file has been saved.

Also, label it Logical Sequence Number (LSN) -- 0 Please create /usr/ud72/include/DONE file (as root) when done... 5. Copy or back up the file. 6. Create the DONE file, as shown in the following example: % touch /usr/ud72/include/DONE Once you save the archive file and create the DONE file, UniData process- ing should resume normally. If the backup the system is waiting for takes more than ten minutes to com- plete, UniData will display the messages described earlier for slow backup, prompting you to off-load the file and create a DONE file.

Failed Backup

If your archive backup script fails to complete, you will see messages similar to the following example:

The archive file /ud6/archive/a_0000 can not be off loaded. You may want to make sure that this file has been saved.

Also, label it Logical Sequence Number (LSN) -- 0 Please create /usr/ud72/include/DONE file (as root) when done... These messages appear in the sm.log, at the startud window, and at the console. Complete the following steps to resolve the problem:

1. Check the output file from the script, arch_backup.out. Identify and resolve the problems that caused the script to fail.

Managing Archive Backup 4-20

2. Manually back up the file that could not be off-loaded, and label it with the correct LSN. 3. Create the DONE file.

Once you have corrected the problem, saved the archive file, and created the DONE file, UniData processing should continue normally.

Note: If a backup script process fails or you kill it, there is a possibility that the process partly backed up the archive file before failing. If you need to restore that archive file during a media recovery, be sure to use the copy you backed up manually.

Starting and Stopping UniData

Before you stop UniData with stopud, make sure all archive to tape copy operations are complete. This ensures that all filled archives have been backed up before you stop UniData. This is an important step to preserve ordering of your archives and ensure a smooth start when you execute startud. Use the UNIX command to make sure there are no copies of your backup script running, and check sm.log to be sure all filled archives have been off-loaded.

Note: If you execute stopud while your archive backup script is still running, the system will not identify the file being backed up as “off-loaded.” The next time UniData needs to write to that file, you may see some puzzling messages, and you will have to manually back up the file. Warning: If your system crashes with some or all archives full (but not off-loaded), you may experience delays or system hangs when you start UniData. You should still be able to recover, but you must perform additional manual steps. Keep careful note of each step you perform to make sure you have preserved all your archives. Contact your VAR or IBM Technical Support if you need assistance.

Issuing the dbpause Command

When you issue dbpause to block updates to your system, generally to perform a backup, UniData forces a checkpoint, flushes the after image logs to the archive files, and marks the next available logical sequence number (LSN) in the archive file for use after the backup. UniData displays this information on the screen where you issue dbpause, and writes it to udtbin/sm.log.

4-21 Administering the Recoverable File System on UNIX After you perform a system backup, the archives created prior to the backup are no longer needed. Should you need to perform mediarec after the system backup, it is important to know the time of the checkpoint after you execute dbpause, and which LSN UniData will use when you execute dbresume. The following example shows the output from the dbpause command:

# dbpause CheckPoint time before ForceCP: Fri Apr 25 10:49:14 2002 .CheckPoint time after ForceCP: Fri Apr 25 10:49:42 2002 .CP has been forced successfully. Forcearch completed, the next LSN is 2. DBpause successful. # There is no need to stop UniData to issue the cntl_install command if you use dbpause in conjunction with your system backups. For more information about the dbpause command, see Administering UniData and the UniData Commands Reference.

Managing Archive Backup 4-22 Chapter Creating and Configuring Recoverable Files 5

Converting Nonrecoverable Files to Recoverable Files ...... 5-3 Creating New Recoverable Files ...... 5-6 Creating a List of Recoverable Files...... 5-9 Special Considerations for Recoverable Files ...... 5-10 The features of the Recoverable File System work only on files you define as recoverable. You can create new files as recoverable files or convert existing UniData hashed files to recoverable files. This chapter describes tasks for setting up your database for RFS.

5-2

Converting Nonrecoverable Files to Recoverable Files

The udfile command converts nonrecoverable files to recoverable files. Complete the following steps to make existing files recoverable.

1. Analyze Your Database

You can choose to convert all your UniData files to recoverable files. If you want to convert only some files, consider your application and consider which files should be logically grouped. If you are using Transaction Processing, any files you update within a transaction should be recoverable. Even if you are not using Transaction Processing, you may want to make the following types of files recoverable:

„ Files that are frequently updated. The ability to recover from failure is particularly useful for these. „ Files that must remain synchronized with one another. „ The UniData VOC file. You must make this file recoverable if you want to recover file-level operations.

You may want to leave files used for query purposes as nonrecoverable files, if they are rarely updated.

Note: Making all of your files recoverable may cause system performance to be degraded. For performance reasons, IBM recommends that you carefully analyze your files and make those files that are frequently updated recoverable, leaving files that are not updated nonrecoverable.

2. Stop UniData

Log on as root, make sure all users have logged out, and execute stopud.

5-3 Administering the Recoverable File System on UNIX 3. Perform a Full Backup of Your System

If you have just turned on logging and archiving, you do not need to back up your system at this point. You should back up if you have run UniData since the last backup, and particularly if you have updated any of the files you have decided to convert.

Make certain your backup follows symbolic links for large dynamic files. Verify your backup.

4. Convert the Files

For each file you have selected, execute the udfile command with the -r option. The following example shows the udfile command:

# $UDTBIN/udfile -r INVENTORY Non-recoverable file 'INVENTORY' is changed to recoverable file. Warning: Remember that you must log on as root and stop UniData before converting files with udfile. You need to run udfile once for each file you are converting.

5. Check Your Files

After you have converted your files, use the udfile command with no options to make sure they are all recoverable. The following example shows how to check a file with udfile:

% $UDTBIN/udfile INVENTORY File 'INVENTORY' is recoverable file. If you want to create new recoverable files at this point, go to step 8. Otherwise, go to step 6.

6. Perform Another Full Backup

This backup is critical, because it will serve as the beginning point for your Recov- erable File System.

Make sure your backup follows symbolic links for large dynamic files. Verify the backup.

Converting Nonrecoverable Files to Recoverable Files 5-4

7. Run cntl_install

Run the cntl_install command to initialize logging and archiving.

Note: Remember that you must log on as root and stop UniData before you execute cntl_install.

The following screen shows the cntl_install command:

cd $UDTBIN #./cntl_install

cntl_install utility resets Unidata System after a full database backup (Image Copy). This means, all log (and archive) files will also be initialized for re-use.

Do you want to continue?(y/n) [n] y

Installing Logs (and Archives) after cntl_install

......

8. Start UniData

Use the startud command with no options.

If you want to create new recoverable files at this point, proceed to “Creating New Recoverable Files” on page 5-6. Otherwise, you can allow users to access UniData.

5-5 Administering the Recoverable File System on UNIX Creating New Recoverable Files

Follow the steps described in this section describes to create new recoverable files.

1. Use CREATE.FILE

Use the ECL CREATE.FILE command to create new recoverable files. The following example shows how to create recoverable files:

:CREATE.FILE NEWTEST 5 DYNAMIC RECOVERABLE Create file D_NEWTEST, modulo/1,blocksize/1024 Hash type = 0 Create dynamic file NEWTEST, modulo/5,blocksize/1024 Hash type = 0 Added "@ID", the default record for UniData to DICT NEWTEST. Using the RECOVERABLE keyword creates your file as a recoverable file. You can create static or dynamic recoverable files.

2. Check Your Files

With UniData running, you can check the files with either the udfile command (using no options) or the FILE.STAT command. The following example shows the FILE.STAT command:

:FILE.STAT NEWTEST2 File name (Recoverable Static File) = NEWTEST2 Number of groups in file (modulo) = 13 Static hashing, hash type = 0 Block size = 1024 Number of records = 0 Total number of bytes = 0

Average number of records per group = 0.0 Standard deviation from average = 0.0 Average number of bytes per group = 0.0 Standard deviation from average = 0.0

Average number of bytes in a record = 0.0 Minimum number of bytes in a record = 0 Maximum number of bytes in a record = 0

Minimum number of fields in a record = 0 Maximum number of fields in a record = 0 Average number of fields per record = 0.0 The actual file size in bytes = 14336

Creating New Recoverable Files 5-6

If you converted your existing UniData files to recoverable files just before creating new files, proceed to step 3. Otherwise, you can continue UniData processing.

3. Stop UniData

Log on as root, ask users to log out of UniData, and execute stopud.

4. Perform a Full Backup of Your System

This backup is critical, because it will serve as the starting point for your Recoverable File System.

Make sure your backup follows symbolic links for large dynamic files. Verify the backup.

5. Run cntl_install

Execute the cntl_install command to initialize logging and archiving.

Note: Remember that you must log in as root and stop UniData before you execute cntl_install.

The following screen shows the cntl_install command:

cd $UDTBIN #./cntl_install

cntl_install utility resets Unidata System after a full database backup (Image Copy). This means, all log (and archive) files will also be initialized for re-use.

Do you want to continue?(y/n) [n] y

Installing Logs (and Archives) after cntl_install

......

6. Start UniData

Start UniData using the startud command with no options. Users can now access your recoverable files.

5-7 Administering the Recoverable File System on UNIX Creating a List of Recoverable Files

You can create a UNIX shell script that generates a list of recoverable files in a UniData account. Use the following example as a template:

#!/bin/sh ##set -x directory=$1 if [ "$directory"="" ] ;then directory="." fi cd $directory

echo Making list of files to process ... echo

rm -f /tmp/list.rfs /tmp/12 /tmp/21 >/dev/null 2>&1

/bin/ > /tmp/12 2>/dev/null

echo "UDTBIN=$UDTBIN" echo "#!/bin/sh" > /tmp/21 echo "UDTBIN=$UDTBIN" >> /tmp/21 echo "export UDTBIN" >> /tmp/21 cat /tmp/12| ' {print "$UDTBIN/udfile " $1 " 2>/dev/null"} ' >> /tmp/21 +x /tmp/21 /tmp/21 > /tmp/list.rfs 2>&1 grep " recoverable" /tmp/list.rfs > list.rfs 2>&1

echo Checking each file... echo

echo List of Recoverable Files echo ======

cat list.rfs

Creating a List of Recoverable Files 5-8

Special Considerations for Recoverable Files

For best results, IBM recommends you adopt the following conventions when dealing with your recoverable files.

1. Use UniData Commands

Never use “generic” UNIX commands (like rm or mv) to manipulate recoverable files. Use UniData ECL commands like DELETE.FILE or CNAME instead. Using UNIX equivalents produces unpredictable results with recoverable files, and may cause your system to crash. The UniData commands are designed to work with recoverable files.

2. Back Up Your System Regularly

Make sure you verify your backups.

Make sure your backup follows symbolic links.

If you stop UniData to process your backups, execute cntl_install after you complete and verify your backups. Do not start UniData after a backup until you execute cntl_install. This initializes logging and archiving, making recovery easier.

If you use dbpause to process your backups, record the checkpoint time and next available LSN number that dbpause displays when you issue the command. There is no need to stop UniData and execute cntl_install if you use the dbpause/dbresume commands. For more information about dbpause, see the UniData Commands Reference and Administering UniData.

Note: If you need to perform UNIX handling of recoverable files (moving, deleting), make sure you stop UniData, back up the files before you continue, and perform a full backup before you restart UniData.

5-9 Administering the Recoverable File System on UNIX Chapter System Crash Recovery 6

System Crash Recovery ...... 6-3

The Recoverable File System enables you to return recoverable files to a consistent state after a system crash or a media failure. This chapter shows how you can recover from a system crash.

6-2 Administering the Recoverable File System on UNIX System Crash Recovery

The term “system crash” means a failure that interrupted processing without actually damaging files. In this case, UniData stopped while the image of your data in shared memory did not match your database. UniData uses logging to recover from system crashes. Follow these steps to recover.

1. Preserve sm.log

While UniData is down, make a copy of udtbin/sm.log. Look at the log for infor- mation related to the crash.

2. Start UniData

Use startud with no options.The following example shows the startud command.

# startud

Using UDTBIN=/liz1/ud72/bin

All output and error logs have been saved to /liz1/ud72/bin/saved_logs directory.

SMM is started. SBCS is started. SM is started. RM is started. CLEANUPD is started. Unirpcd has already been started

UniData R7.2 has been started.

# Note: If your log files are very large or have overflowed, or if you have automatic archive backup turned on, crash recovery may take several minutes.

System Crash Recovery 6-3

3. Verify the Recovery

After you start UniData, check the current udtbin/sm.log. The following example shows what the sm.log looks like when you do not need to handle file-level operations:

% cd $UDTBIN % more sm.log The file system needs to be recovered. Please wait: the recovery is in progress Starting to restart...... restart: Report of Log-file Status. # Type Checkpoint-Number Status 0 after-image 0 0 1 after-image 0 4 2 before-image 0 4 3 before-image 0 4 4 after-image 0 4 5 after-image 0 4 6 before-image 0 4 7 before-image 0 4

Step1: Undo the logfiles. Undo logfile[2]..... The logfile[2] has been scanned. Undo logfile[2] has been undone. Undo logfile[3]..... The logfile[3] has been scanned. Undo logfile[3] has been undone. Undo logfiles finished,

Step2: Redo the records in the logfiles.

Redo logfiles finished.

Step3: Check the file level commands...

File level commands checking finished.

Restart is successful!!!

System clean up......

*****!!! Restart Finished !!!! ***** Checking log files..... ----- SM (26668) is started at Apr 17 2005 10:09:39 ----- If your sm.log looks like this first example, go to step 5.

6-4 Administering the Recoverable File System on UNIX The next example shows what sm.log looks like if you need to repeat file-level operations.

% cd $UDTBIN % more sm.log The file system needs to be recovered. Please wait: the recovery is in progress Starting to restart...... restart: Report of Log-file Status. # Type Checkpoint-Number Status 0 after-image 0 0 1 after-image 0 4 2 before-image 0 4 3 before-image 0 4 4 after-image 0 4 5 after-image 0 4 6 before-image 0 4 7 before-image 0 4

Step1: Undo the logfiles. Undo logfile[2]..... The logfile[2] has been scanned. Undo logfile[2] has been undone. Undo logfile[3]..... The logfile[3] has been scanned. Undo logfile[3] has been undone. Undo logfiles finished,

Step2: Redo the records in the logfiles.

Step3: Check the file level commands... Note: Please check the file '/usr/ud60/FileInfo' to see if there are failed file level operations.

File level commands checking finished.

Restart is successful!!!

System clean up......

*****!!! Restart Finished !!!! ***** Checking log files..... ----- SM (77872) is started at Apr 17 2005 15:54:03 ----- If your sm.log looks like this, you need to complete step 4.

Note: A before image log is sometimes called an undo log, and an after image log is sometimes called a redo log. You will notice that convention in these examples, and in other messages in the sm.log.

System Crash Recovery 6-5

4. Repeat File-Level Operations

If your sm.log directs you to check a FileInfo file, you may need to perform additional steps before letting your users access UniData. Crash recovery attempts automatic recovery of the CLEAR.FILE operation and any other completed file-level operations. For example, if a CREATE.FILE was occurring at the time of the crash and was incomplete, that operation will not be automatically recovered. You need to verify, and possibly repeat, any file-level operations that took place just before the crash. Locate the FileInfo file, and display or print it. The following example shows a FileInfo file from a crash recovery.

Command is: CREATE.INDEX. real data file name:/usr/ud72/demo/CUSTOMER. real index file name:/usr/ud72/demo/X_CUSTOMER. Execution State: The index file may be created, and data file not changed You may need to delete the index file /usr/ud72/demo/X_CUSTOMER. Repeat the operations listed in FileInfo, in the order they appear in the file. In the example, the message is a warning, letting you know that an index may be unusable. You can delete and re-create the index before letting users access UniData, or you can test first to see if the index is usable.

Note: When you do each file-level operation, remember that you need to be in the UniData account that contains the VOC entry for the affected file. When you are done with file-level recovery, check file permissions in your UniData accounts to be sure users have correct access to the data.

5. Resume Normal Processing

Recovery should be complete.

6-6 Administering the Recoverable File System on UNIX Chapter Media Crash Recovery 7

Media Crash Recovery ...... 7-3 Data Lost, Logs, and Archives Unaffected ...... 7-4 Data and Archive Files Unaffected, Logs Lost ...... 7-11 Data and Log Files Unaffected, Archives Lost ...... 7-13 Data and Logs Lost, Archives Unaffected ...... 7-15 Data and Archives Lost, Logs Unaffected ...... 7-22 Logs and Archives Lost, Data Unaffected ...... 7-29 Disk Containing /usr/ud72/include Lost ...... 7-31

The Recoverable File System enables you to return recoverable files to a consistent state after a system crash or a media failure. This chapter outlines how to recover from a media failure affecting your data files, your log files, your archive files, and combinations of each. While it is impossible to guarantee complete recovery in the case of a multiple-disk failure, this chapter describes options for combinations of failures.

7-2 Administering the Recoverable File System on UNIX Media Crash Recovery

UniData uses archive files and the mediarec utility to recover from a media failure. In order to use the mediarec function, your system must meet the following conditions:

„ You must have a full backup of your system. „ You must have had archiving turned on since that backup. „ You must have saved your archive sets when they filled. „ The /usr/ud72/include directory must exist. Note: You get the best data protection from RFS if your data, your logs, and your archives are on separate physical devices.

Before you perform mediarec, it is important to understand the type of failure you had, and how that failure has affected your data, logs, and archives. While it is impos- sible to guarantee complete recovery in the case of a multiple-disk failure, the following scenarios describe the actions you should take to bring your data to as consistent a state as possible.

Media Crash Recovery 7-3

Data Lost, Logs, and Archives Unaffected

The following steps assume that you have lost your data, but you have not lost /usr/ud72/include, logs or archives. You should be able to recover to the last completed transaction.

1. Check and Correct External Problems

Identify and resolve hardware problems and software problems external to UniData.

2. Check sm.log

Note any unusual conditions that may have contributed to the crash. Make a copy of sm.log in case you need to refer back to it.

Warning: If you attempt to save all of the error logs located in udtbin after a media failure, do not save the logs using the UNIX mv *log command. This command will cause the aimglog and bimglog executables to be moved as well as the error logs, and UniData will not be able to start.

3. Preserve udt.control.file

UniData saves all the archive sequence numbers in the udt.control.file located in /usr/ud72/include. Assuming the file is intact, make a copy of it to protect against accidentally overwriting it during your restore.

You should have the current udt.control.file to restore your database through automatic recovery. If the udt.control.file is damaged or destroyed, you can still recover your database, but you will need to know the logical sequence numbers since the last backup and perform mediarec manually.

4. Start UniData

Start UniData by executing the startud command with no options.

7-4 Administering the Recoverable File System on UNIX Make sure UniData started successfully. Depending on the failure, startud may have performed automatic crash recovery. By performing crash recovery, you will update the current set of archive files on disk with the latest changes to your database. This will allow mediarec to recover to the last completed transaction.

5. Stop UniData

Stop UniData by executing the stopud command so you can proceed with restoring the system.

6. Restore the System from the Last Full Backup

Restoring your system re-creates the state your database was in when you created the backup, putting the data is in a consistent state. Check to make sure you have the correct udt.control.file. You want the one you saved in step 3.

UniData uses absolute paths in recovery. You need to restore the file system exactly as it was at backup. If you are using a different physical device, make sure you configure the file system so the absolute paths remain the same.

Note: Your ability to recover from a media failure depends on complete, verified full backups.

7. Check the Media Configuration File

The mediaconfig file, located in /usr/ud72/include, has pointers to the areas where mediarec creates working files. Check those areas and clear disk space. Make a note of how much space is available.

8. Invoke the mediarec Command

To execute mediarec, you must log in as root, and UniData must not be running. You must be at a UNIX prompt.

If you ran cntl_install after your last full backup, mediarec displays the number of the archive file to upload.

Data Lost, Logs, and Archives Unaffected 7-5

If you executed dbpause prior to performing your backup, you must use mediarec with the -T option to provide the logical sequence number after the backup. You may also use the -s option to provide the checkpoint time after the forced checkpoint. IBM recommends using the -T option. For more information about mediarec, see Chapter 2, “RFS Commands and Daemons.”

When you execute dbpause, UniData displays the following information on the terminal screen and writes it to the sm.log. The following example shows the output from the dbpause command:

#dbpause CheckPoint time before ForceCP: Thu May 1 10:54:59 1999 .CheckPoint time after ForceCP: Thu May 1 10:58:31 1999 .CP has been forced successfully. Forcearch completed, the next LSN is 1. DBpause successful. # Note: UniData saves the last 20 sm.logs in the udtbin/saved_logs directory. If the dbpause output is not in the sm.log located in udtbin, check the saved sm.log.

The following example shows the first mediarec response:

#mediarec

Using UDTBIN=/usr/ud72/bin For media recovery, you would be required to have space for two temporary files, one to hold the largest archive file and another to hold the largest CP size. Please note the following info, read documentation about media recovery procedure and re-start media recovery.

Max CP Size (in bytes): 17408 Max Arch File Size (in bytes): 4218880

Also, if you’re planning to use the tape(s) created by archive process, please setup restore script /usr/ud72/include/arch_restore properly (tape device) and load the first archive tape.

Do you want to continue?(y/n)[n] If you do not have enough space, answer n. mediarec exits. Resolve the space problem and reenter the command.

7-6 Administering the Recoverable File System on UNIX The next example illustrates using mediarec with the -T option:

#mediarec -T 1

Using UDTBIN=/disk1/ud72/bin

For media recovery, you would be required to have space for two temporary files, one to hold the largest archive file and another to hold the largest CP size. Please note the following info, read documentation about media recovery procedure and re-start media recovery.

Max CP Size (in bytes): 17408 Max Arch File Size (in bytes): 33554432

Also, if you’re planning to use the tape(s) created by archive process, please setup restore script /usr/ud72/include/arch_restore properly (tape device) and load the first archive tape.

Do you want to continue?(y/n)[n] The mediarec utility prompts you to load archive files, by logical sequence number. The following example shows the screen display:

%SMM is started. For media recovery, you’ll be asked to upload archive files one by one by sequence number into the /ud5/log/ARCH file.

Please upload archive file sequence number = 0

Hit When Done (or ! for shell).. The utility makes sure you load the files in the correct sequence. If you do not, mediarec reports an error and lets you load another file, as shown below:

Archive file sequence number 5 doesn’t match the sequence number requested.

Would you like to retry? (y/n) [n]: If you used the automated archiving feature to save your archive files, use the arch_restore script to restore them. If the script is not in /usr/ud72/include, check for an environment variable or udtconfig parameter called ARCH_RESTORE. The script will upload the files as they are needed. If that script fails, the screen will prompt you to load the archive files by logical sequence number, as shown in the earlier examples.

Data Lost, Logs, and Archives Unaffected 7-7

If you have automated archive backup turned on, and you had to manually back up one or more archive files, consider doing all the restores manually to be absolutely sure you do not restore a partial archive. UniData detects if a partial archive is restored, and mediarec may fail.

The following example shows what the screen looks like when the recovery process is complete. In this example, no file-level recovery is needed:

*****!!! Media Recovery Finished !!!***** SM stopped successfully. SMM stopped successfully. Media Recovery finished.

Please use /usr/ud72/bin/startud to start the system The next example shows what the screen looks like when file-level recovery is involved:

de_arch: reading archive file on disk Create file D_TEST1, modulo/1,blocksize/1024 Hash type = 0 Create dynamic file TEST1, modulo/17,blocksize/1024 Hash type = 0 Added "@ID", the default record for UniData to DICT TEST1. Create file D_TEST2, modulo/1,blocksize/1024 Hash type = 0 Create file TEST2, modulo/17,blocksize/1024 Hash type = 0 Added "@ID", the default record for UniData to DICT TEST2. .

Please check /usr/ud72/FileInfo for un-recovered file level operations.

*****!!! Media Recovery Finished !!!***** SM stopped successfully. SMM stopped successfully. Media Recovery finished.

Please use /usr/ud72/bin/startud to start the system If your screen looks like the last example, copy or print the FileInfo file before proceeding. You need this file for step 11. mediarec attempts automatic recovery of all file-level operations except index operations.

After mediarec applies all the archives you restore from backup, it applies the current archive set on disk. Some time can elapse between the last prompt for you to restore a file and the successful completion of mediarec. This is normal.

7-8 Administering the Recoverable File System on UNIX 9. Delete Files in Working Space

Change to the directories noted in your media configuration file, and remove the last set of working files.

10. Start UniData

Start UniData using the startud command with no options.

11. Manually Complete File-Level Recovery

Use the FileInfo file you copied or printed in step 8 to complete file-level operations from mediarec. Complete the steps in order. The following example shows a FileInfo file from a media recovery:

CREATE.INDEX /usr/ud72/demo/INVENTORY F2

BUILD.INDEX /usr/ud72/demo/INVENTORY F2

CREATE.INDEX /usr/ud72/demo/INVENTORY F3

BUILD.INDEX /usr/ud72/demo/INVENTORY F3

CREATE.INDEX /usr/ud72/demo/INVENTORY F6

BUILD.INDEX /usr/ud72/demo/INVENTORY F6 To complete each file-level operation, you need to be in the UniData account where the VOC entry for the affected file resides.

Depending on how your backup utility works and how your restore was done, you may need to reset file permissions in your UniData accounts so that users have proper access to your data.

12. Stop UniData

Stop UniData with the stopud command.

Data Lost, Logs, and Archives Unaffected 7-9

13. Perform a Full Backup

This step is highly recommended. Before you start UniData, perform a full system backup. Make sure the backup follows symbolic links for large dynamic files.

14. Run cntl_install

Execute the cntl_install command to reinitialize logging and archiving.

15. Mount a New Archive Tape

If you have automated archive backup turned on, and you are backing up to tape, mount a new tape on your archive backup device.

16. Start UniData

Use startud with no options. Users should be able to log on and access the data.

7-10 Administering the Recoverable File System on UNIX Data and Archive Files Unaffected, Logs Lost

You should be able to recover to the last archived checkpoint; you may be able to recover to the last committed transaction.

The graceful design detects media failures and other error conditions involving the log files. UniData flushes changed records in the system buffer to your database and stops UniData. Complete the following steps for recovery.

1. Check and Correct External Problems

Identify and resolve any hardware problems related to the failure.

2. Check sm.log

Note any unusual conditions that may have contributed to the crash. Make a copy of the sm.log in case you need to refer back to it.

Warning: If you attempt to save all of the error logs located in udtbin after a media failure, do not save the logs using the UNIX mv *log command. This command causes the error logs and the aimglog and bimglog executables to be moved, and UniData cannot start.

Data and Archive Files Unaffected, Logs Lost 7-11

Make sure the messages in the sm.log indicate that graceful shutdown completed normally. The following example shows the output of the sm.log after graceful shutdown:

# pg $UDTBIN/sm.log Checking log files ..... ----- SM (28135) is started at May 01 2002 15:49:13 -----

Unidata Environment : /usr/ud72/include

SM: Restart_Flag = 0. chk_log_ovrflo(): open() failed.: No such file or directory Log file:/disk1/ud61/log/b_0001 OVERFLOW. log:log_loop Errno=1023 errno=2 log:log_loop Errno=1023 errno=2 chk_log_ovrflo(): open() failed.: No such file or directory Log file:/disk1/ud72/log/b_0000 OVERFLOW. log:log_loop Errno=1023 errno=2 log:log_loop Errno=1023 errno=2 ***** Graceful System Shutdown at May 01 1999 15:55:56 ***** ARCH: Arar_backupd: msg Q removed!

No tape backup will be available. ch msg Q removed! SBCS stopped successfully. CLEANUPD stopped successfully!! SMM stopped successfully. SM stopped successfully.

3. Run cntl_install

Execute the cntl_install command to reinitialize logging and archiving.

4. Start UniData

Start UniData with using the startud command with no options.

7-12 Administering the Recoverable File System on UNIX Data and Log Files Unaffected, Archives Lost

You should be able to recover to the last completed checkpoint.

The graceful shutdown design detects media failures and other error conditions involving the archive files. UniData flushes changed records in the system buffer to your database and stops UniData. Complete the following steps for recovery.

1. Check and Correct External Problems

Identify and resolve any hardware problems related to the failure.

2. Check sm.log

Note any unusual conditions that may have contributed to the crash. Make a copy of the sm.log in case you need to refer back to it.

Warning: If you attempt to save all of the error logs located in udtbin after a media failure, do not save the logs using the UNIX mv *log command. This command causes the error logs and the aimglog and bimglog executables to be moved, and UniData cannot start.

3. Start UniData

Start UniData using the startud command with no options. This applies current logs automatically.

4. Stop UniData

Stop UniData with the stopud command.

5. Perform a Full Backup

Create and verify a full backup of your system. Make sure your backup follows symbolic links for large dynamic files.

Data and Log Files Unaffected, Archives Lost 7-13

6. Run cntl_install

Execute cntl_install to reinitialize logging and archiving.

7. Mount a New Archive Tape

If you have automated archive backup turned on, and you are backing up to tape, mount a new tape on your archive backup device.

8. Start UniData

Start UniData using the startud command with no options.

7-14 Administering the Recoverable File System on UNIX Data and Logs Lost, Archives Unaffected

In this situation, you can only recover to the last successfully archived checkpoint.

1. Check and Correct External Problems

Identify and resolve hardware problems and software problems external to UniData.

2. Check sm.log

Note any unusual conditions that may have contributed to the crash. Make a copy of the sm.log in case you need to refer back to it.

Warning: If you attempt to save all of the error logs located in udtbin after a media failure, do not save the logs using the UNIX mv *log command. This command causes the error logs and the aimglog and bimglog executables to be moved, and UniData cannot start.

3. Preserve udt.control.file

UniData saves all the archive sequence numbers in the udt.control.file located in /usr/ud72/include. Assuming the file is intact, make a copy of it to protect against accidentally overwriting it during your restore.

You should have the current udt.control.file to restore your database through automatic recovery. If the udt.control.file is damaged or destroyed, you can still recover your database, but you will need to know the logical sequence numbers since the last backup and perform mediarec manually.

4. Restore the System from the Last Full Backup

Restoring your system re-creates the state your database was in when you created the backup, putting the data in a consistent state. Check to make sure you have the correct udt.control.file. You want the one you saved in step 3.

Data and Logs Lost, Archives Unaffected 7-15

UniData uses absolute paths in recovery. You need to restore the file system exactly as it was at backup. If you are using a different physical device, make sure you configure the file system so the absolute paths remain the same.

Note: Your ability to recover from a media failure depends on complete, verified full backups.

5. Check the Media Configuration File

The mediaconfig file, located in /usr/ud72/include, has pointers to the areas where mediarec creates working files. Check those areas and clear disk space. Make a note of how much space is available.

6. Invoke the mediarec Command

To execute mediarec, you must log on as root, and UniData must not be running. You must be at a UNIX prompt.

If you ran cntl_install after your last full backup, mediarec displays the number of the archive file to upload.

If you executed dbpause prior to performing your backup, you must use mediarec with the -T option to provide the logical sequence number after the backup. You can also use the -s option to provide the checkpoint time after the forced checkpoint. IBM recommends using the -T option. For more information about mediarec, see Chapter 2, “RFS Commands and Daemons.”

When you execute dbpause, UniData displays the following information on the terminal screen and writes it to the sm.log. The following example shows the output from the dbpause command:

#dbpause CheckPoint time before ForceCP: Thu May 1 10:54:59 2005 .CheckPoint time after ForceCP: Thu May 1 10:58:31 2005 .CP has been forced successfully. Forcearch completed, the next LSN is 1. DBpause successful. # Note: UniData saves the last 20 sm.logs in the udtbin/saved_logs directory. If the dbpause output is not in the sm.log located in udtbin, check the saved sm.log.

7-16 Administering the Recoverable File System on UNIX The following example shows the first mediarec response:

#mediarec

Using UDTBIN=/usr/ud72/bin For media recovery, you would be required to have space for two temporary files, one to hold the largest archive file and another to hold the largest CP size. Please note the following info, read documentation about media recovery procedure and re-start media recovery.

Max CP Size (in bytes): 17408 Max Arch File Size (in bytes): 4218880

Also, if you're planning to use the tape(s) created by archive process, please setup restore script /usr/ud72/include/arch_restore properly (tape device) and load the first archive tape.

Do you want to continue?(y/n)[n] If you do not have enough space, answer n. mediarec exits. Resolve the space problem and reenter the command.

The next example illustrates using mediarec with the -T option:

#mediarec -T 1

Using UDTBIN=/disk1/ud72/bin

For media recovery, you would be required to have space for two temporary files, one to hold the largest archive file and another to hold the largest CP size. Please note the following info, read documentation about media recovery procedure and re-start media recovery.

Max CP Size (in bytes): 17408 Max Arch File Size (in bytes): 33554432

Also, if you’re planning to use the tape(s) created by archive process, please setup restore script /usr/ud61/include/arch_restore properly (tape device) and load the first archive tape.

Do you want to continue?(y/n)[n]

Data and Logs Lost, Archives Unaffected 7-17

The mediarec utility prompts you to load archive files, by logical sequence number. The following example shows the screen display:

%SMM is started. For media recovery, you'll be asked to upload archive files one by one by sequence number into the /ud5/log/ARCH file.

Please upload archive file sequence number = 0

Hit When Done (or ! for shell).. The utility makes sure you load the files in the correct sequence. If you do not, mediarec reports an error and lets you load another file, as shown in the following example:

Archive file sequence number 5 doesn't match the sequence number requested.

Would you like to retry? (y/n) [n]: If you used the automated archiving feature to save your archive files, use the arch_restore script to restore them. If the script is not in /usr/ud72/include, check for an environment variable or udtconfig parameter called ARCH_RESTORE. The script will upload the files as they are needed. If that script fails, the screen will prompt you to load the archive files by logical sequence number, as shown in the earlier examples.

If you have automated archive backup turned on, and you had to manually back up one or more archive files, consider doing all the restores manually to be absolutely sure you do not restore a partial archive. UniData detects if a partial archive is restored, and mediarec may fail.

The following example shows what the screen looks like when the recovery process is complete. In this example, no file-level recovery is needed:

*****!!! Media Recovery Finished !!!***** SM stopped successfully. SMM stopped successfully. Media Recovery finished.

Please use /usr/ud72/bin/startud to start the system

7-18 Administering the Recoverable File System on UNIX The next example shows what the screen looks like when file-level recovery is involved:

de_arch: reading archive file on disk Create file D_TEST1, modulo/1,blocksize/1024 Hash type = 0 Create dynamic file TEST1, modulo/17,blocksize/1024 Hash type = 0 Added "@ID", the default record for UniData to DICT TEST1. Create file D_TEST2, modulo/1,blocksize/1024 Hash type = 0 Create file TEST2, modulo/17,blocksize/1024 Hash type = 0 Added "@ID", the default record for UniData to DICT TEST2. .

Please check /usr/ud72/FileInfo for un-recovered file level operations.

*****!!! Media Recovery Finished !!!***** SM stopped successfully. SMM stopped successfully. Media Recovery finished.

Please use /usr/ud72/bin/startud to start the system If your screen looks like the last example, copy or print the FileInfo file before proceeding. You need this file for step 11. mediarec attempts automatic recovery of all file-level operations except index operations.

After mediarec applies all the archives you restore from backup, it applies the current archive set on disk. Some time can elapse between the last prompt for you to restore a file and the successful completion of mediarec. This is normal.

7. Delete Files in Working Space

Change to the directories noted in your media configuration file, and remove the last set of working files.

8. Start UniData

Start UniData using the startud command with no options.

Data and Logs Lost, Archives Unaffected 7-19

9. Manually Complete File-Level Recovery

Use the FileInfo file you copied or printed in step 6 to complete file-level operations from mediarec. Complete the steps in order. The following example shows a FileInfo file from a media recovery:

CREATE.INDEX /usr/ud72/demo/INVENTORY F2

BUILD.INDEX /usr/ud72/demo/INVENTORY F2

CREATE.INDEX /usr/ud72/demo/INVENTORY F3

BUILD.INDEX /usr/ud72/demo/INVENTORY F3

CREATE.INDEX /usr/ud72/demo/INVENTORY F6

BUILD.INDEX /usr/ud72/demo/INVENTORY F6 To complete each file-level operation, you need to be in the UniData account where the VOC entry for the affected file resides.

Depending on how your backup utility works and how your restore was done, you may need to reset file permissions in your UniData accounts so that users have proper access to your data.

10. Stop UniData

Stop UniData with the stopud command.

11. Perform a Full Backup

This step is highly recommended. Before you start UniData, perform a full system backup. Make sure the backup follows symbolic links for large dynamic files.

12. Run cntl_install

Execute the cntl_install command to reinitialize logging and archiving.

7-20 Administering the Recoverable File System on UNIX 13. Mount a New Archive Tape

If you have automated archive backup turned on, and you are backing up to tape, mount a new tape on your archive backup device.

14. Start UniData

Use startud with no options. Users should be able to log on and access the data.

Data and Logs Lost, Archives Unaffected 7-21

Data and Archives Lost, Logs Unaffected

Because the current set of log files are of no use, you can only recover to the last complete checkpoint on the last archive file you backed up.

1. Check and Correct External Problems

Identify and resolve hardware problems and software problems external to UniData.

2. Check sm.log

Note any unusual conditions that may have contributed to the crash. Make a copy of sm.log in case you need to refer back to it.

Warning: If you attempt to save all of the error logs located in udtbin after a media failure, do not save the logs using the UNIX mv *log command. This command causes the aimglog and bimglog executables to be moved as well as the error logs, and UniData cannot start.

3. Preserve udt.control.file

UniData saves all the archive sequence numbers in the udt.control.file located in /usr/ud72/include. Assuming the file is intact, make a copy of it to protect against accidentally overwriting it during your restore.

You should have the current udt.control.file to restore your database through automatic recovery. If the udt.control.file is damaged or destroyed, you can still recover your database, but you need to know the logical sequence numbers since the last backup and perform mediarec manually.

4. Restore the System from the Last Full Backup

Restoring your system re-creates the state your database was in when you created the backup. The data is in a consistent state. Check to make sure you have the correct udt.control.file. You need the one you saved in step 3.

7-22 Administering the Recoverable File System on UNIX UniData uses absolute paths in recovery. You need to restore the file system exactly as it was at backup. If you are using a different physical device, make sure you configure the file system so the absolute paths remain the same.

Note: Your ability to recover from a media failure depends on complete, verified full backups.

5. Check the Media Configuration File

The mediaconfig file, located in /usr/ud72/include, has pointers to the areas where mediarec creates working files. Check those areas and clear disk space. Make a note of how much space is available.

6. Invoke the mediarec Command

To execute mediarec, you must log on as root, and UniData must not be running. You must be at a UNIX prompt.

If you ran cntl_install after your last full backup, mediarec displays the number of the archive file to upload.

If you executed dbpause prior to performing your backup, you must use mediarec with the -T option to provide the logical sequence number after the backup. You may also use the -s option to provide the checkpoint time after the forced checkpoint. IBM recommends using the -T option. For more information on mediarec, see Chapter 2, “RFS Commands and Daemons.”

When you execute dbpause, UniData displays the following on the terminal screen and writes it to the sm.log. The following example shows the output from the dbpause command:

#dbpause CheckPoint time before ForceCP: Thu May 1 10:54:59 2005 .CheckPoint time after ForceCP:Thu May 1 10:58:31 2005. Forcearch completed, the next LSN is 1. DBpause successful. # Note: UniData saves the last 20 sm.logs in the udtbin/saved_logs directory. If the dbpause output is not in the sm.log located in udtbin, check the saved sm.log.

Data and Archives Lost, Logs Unaffected 7-23

The following example shows the first mediarec response:

#mediarec

Using UDTBIN=/usr/ud72/bin For media recovery, you would be required to have space for two temporary files, one to hold the largest archive file and another to hold the largest CP size. Please note the following info, read documentation about media recovery procedure and re-start media recovery.

Max CP Size (in bytes): 17408 Max Arch File Size (in bytes): 4218880

Also, if you're planning to use the tape(s) created by archive process, please setup restore script /usr/ud72/include/arch_restore properly (tape device) and load the first archive tape.

Do you want to continue?(y/n)[n] If you do not have enough space, answer n. mediarec exits. Resolve the space problem and reenter the command.

The next example illustrates using mediarec with the -T option:

#mediarec -T 1

Using UDTBIN=/disk1/ud72/bin

For media recovery, you would be required to have space for two temporary files, one to hold the largest archive file and another to hold the largest CP size. Please note the following info, read documentation about media recovery procedure and re-start media recovery.

Max CP Size (in bytes): 17408 Max Arch File Size (in bytes): 33554432

Also, if you’re planning to use the tape(s) created by archive process, please setup restore script /usr/ud72/include/arch_restore properly (tape device) and load the first archive tape.

Do you want to continue?(y/n)[n]

7-24 Administering the Recoverable File System on UNIX The mediarec utility prompts you to load archive files, by logical sequence number. The following example shows the screen display:

%SMM is started. For media recovery, you'll be asked to upload archive files one by one by sequence number into the /ud5/log/ARCH file.

Please upload archive file sequence number = 0

Hit When Done (or ! for shell).. The utility makes sure you load the files in the correct sequence. If you do not, mediarec reports an error and lets you load another file, as shown below:

Archive file sequence number 5 doesn’t match the sequence number requested.

Would you like to retry? (y/n) [n]: If you used the automated archiving feature to save your archive files, use the arch_restore script to restore them. If the script is not in /usr/ud72/include, check for an environment variable or udtconfig parameter called ARCH_RESTORE. The script will upload the files as they are needed. If that script fails, the screen will prompt you to load the archive files by logical sequence number, as shown in the earlier examples.

If you have automated archive backup turned on, and you had to manually back up one or more archive files, consider doing all the restores manually to be absolutely sure you do not restore a partial archive. UniData detects if a partial archive is restored, and mediarec may fail.

The following example shows what the screen looks like when the recovery process is complete. In this example, no file-level recovery is needed:

*****!!! Media Recovery Finished !!!***** SM stopped successfully. SMM stopped successfully. Media Recovery finished.

Please use /usr/ud72/bin/startud to start the system

Data and Archives Lost, Logs Unaffected 7-25

The next example shows what the screen looks like when file-level recovery is involved:

de_arch: reading archive file on disk Create file D_TEST1, modulo/1,blocksize/1024 Hash type = 0 Create dynamic file TEST1, modulo/17,blocksize/1024 Hash type = 0 Added "@ID", the default record for UniData to DICT TEST1. Create file D_TEST2, modulo/1,blocksize/1024 Hash type = 0 Create file TEST2, modulo/17,blocksize/1024 Hash type = 0 Added "@ID", the default record for UniData to DICT TEST2. .

Please check /usr/ud72/FileInfo for un-recovered file level operations.

*****!!! Media Recovery Finished !!!***** SM stopped successfully. SMM stopped successfully. Media Recovery finished.

Please use /usr/ud72/bin/startud to start the system If your screen looks like the last example, copy or print the FileInfo file before proceeding. You need this file for step 11. mediarec attempts automatic recovery of all file-level operations except index operations.

After mediarec applies all the archives you restore from backup, it applies the current archive set on disk. Some time can elapse between the last prompt for you to restore a file and the successful completion of mediarec. This is normal.

7. Delete Files in Working Space

Change to the directories noted in your media configuration file, and remove the last set of working files.

8. Start UniData

Start UniData using the startud command with no options.

7-26 Administering the Recoverable File System on UNIX 9. Manually Complete File-Level Recovery

Use the FileInfo file you copied or printed in step 8 to complete file level operations from mediarec. Complete the steps in order. The following example shows a FileInfo file from a media recovery:

CREATE.INDEX /usr/ud72/demo/INVENTORY F2

BUILD.INDEX /usr/ud72/demo/INVENTORY F2

CREATE.INDEX /usr/ud72/demo/INVENTORY F3

BUILD.INDEX /usr/ud72/demo/INVENTORY F3

CREATE.INDEX /usr/ud72/demo/INVENTORY F6

BUILD.INDEX /usr/ud72/demo/INVENTORY F6 To complete each file-level operation, you need to be in the UniData account where the VOC entry for the affected file resides.

Depending on how your backup utility works and how your restore was done, you may need to reset file permissions in your UniData accounts so that users have proper access to your data.

10. Stop UniData

Stop UniData with the stopud command.

11. Perform a Full Backup

This step is required in this situation. Before you start UniData, perform a full system backup. Make sure the backup follows symbolic links for large dynamic files.

12. Run cntl_install

Execute the cntl_install command to reinitialize logging and archiving.

Data and Archives Lost, Logs Unaffected 7-27

13. Mount a New Archive Tape

If you have automated archive backup turned on, and you are backing up to tape, mount a new tape on your archive backup device.

14. Start UniData

Use startud with no options. Users should be able to log on and access the data.

7-28 Administering the Recoverable File System on UNIX Logs and Archives Lost, Data Unaffected

The graceful shutdown design detects media failures and other error conditions involving the log files and archive files. UniData flushes changed records in the system buffer to your database and stops UniData. Complete the following steps for recovery.

1. Run cntl_install

Execute the cntl_install command to reinitialize the log files and the archive files.

2. Run guide

Run the guide utility in each data directory to ensure that your data files are not corrupted.

3. Repair Files, if Necessary

If guide detected an errors, execute the fixfile file command to repair the affected files. For information about guide and fixfile, see Administering UniData.

4. Perform a Full Backup

This step is required in this situation. Before you start UniData, perform a full system backup. Make sure the backup follows symbolic links for large dynamic files.

5. Mount a New Archive Tape

If you have automated archive backup turned on, and you are backing up to tape, mount a new tape on your archive backup device.

6. Start UniData

Use startud with no options. Users should be able to log on and access the data.

Logs and Archives Lost, Data Unaffected 7-29

Disk Containing /usr/ud72/include Lost

You should be able to recover to the last archived checkpoint.

1. Restore /usr/ud72/include from the Last Backup

Restore the /usr/ud72/include directory from the last backup tape.

2. Stop UniData

Ask users to exit UniData, then stop UniData with the stopud command.

3. Run cntl_install

Execute the cntl_install command to re-create the system.status, restart.fileend, restart.newblk files, and to reinitialize the log and archive files.

4. Restore the System from the Last Full Backup

Restoring your system re-creates the state your database was in when you created the backup. The data is in a consistent state. Check to make sure you have the correct udt.control.file. You want the one you saved in step 3.

UniData uses absolute paths in recovery. You need to restore the file system exactly as it was at backup. If you are using a different physical device, make sure you configure the file system so the absolute paths remain the same.

Note: Your ability to recover from a media failure depends on complete, verified full backups.

5. Check the Media Configuration File

The mediaconfig file, located in /usr/ud72/include, has pointers to the areas where mediarec creates working files. Check those areas and clear disk space. Make a note of how much space is available.

7-30 Administering the Recoverable File System on UNIX 6. Invoke the mediarec Command

To execute mediarec, you must log on as root, and UniData must not be running. You must be at a UNIX prompt.

If you ran cntl_install after your last full backup, mediarec displays the number of the archive file to upload.

If you executed dbpause prior to performing your backup, you must use mediarec with the -T option to provide the logical sequence number after the backup. You may also use the -s option to provide the checkpoint time after the forced checkpoint. IBM recommends using the -T option.

When you execute dbpause, this information is displayed on the terminal screen and written to the sm.log. The following example shows the output from the dbpause command:

#dbpause CheckPoint time before ForceCP: Thu May 1 10:54:59 2004 .CheckPoint time after ForceCP: Thu May 1 10:58:31 2004 .CP has been forced successfully. Forcearch completed, the next LSN is 1. DBpause successful. # Note: UniData saves the last 20 sm.logs in the udtbin/saved_logs directory. If the dbpause output is not in the sm.log located in udtbin, check the saved sm.log.

The following example shows the first mediarec response:

#mediarec

Using UDTBIN=/usr/ud72/bin For media recovery, you would be required to have space for two temporary files, one to hold the largest archive file and another to hold the largest CP size. Please note the following info, read documentation about media recovery procedure and re-start media recovery.

Max CP Size (in bytes): 17408 Max Arch File Size (in bytes): 4218880

Also, if you're planning to use the tape(s) created by archive process, please setup restore script /usr/ud72/include/arch_restore properly (tape device) and load the first archive tape.

Do you want to continue?(y/n)[n] If you do not have enough space, answer n. mediarec exits. Resolve the space problem and reenter the command.

Disk Containing /usr/ud72/include Lost 7-31

The next example illustrates using mediarec with the -T option:

#mediarec -T 1

Using UDTBIN=/disk1/ud72/bin

For media recovery, you would be required to have space for two temporary files, one to hold the largest archive file and another to hold the largest CP size. Please note the following info, read documentation about media recovery procedure and re-start media recovery.

Max CP Size (in bytes): 17408 Max Arch File Size (in bytes): 33554432

Also, if you're planning to use the tape(s) created by archive process, please setup restore script /usr/ud72/include/arch_restore properly (tape device) and load the first archive tape.

Do you want to continue?(y/n)[n] The mediarec utility prompts you to load archive files by logical sequence number. The following example shows the screen display:

%SMM is started. For media recovery, you’ll be asked to upload archive files one by one by sequence number into the /ud5/log/ARCH file.

Please upload archive file sequence number = 0

Hit When Done (or ! for shell).. The utility makes sure you load the files in the correct sequence. If you do not, mediarec reports an error and lets you load another file, as shown in the following example:

Archive file sequence number 5 doesn't match the sequence number requested.

Would you like to retry? (y/n) [n]: If you used the automated archiving feature to save your archive files, use the arch_restore script to restore them. If the script is not in /usr/ud72/include, check for an ARCH_RESTORE environment variable or udtconfig parameter. The script uploads the files as they are needed. If that script fails, UniData prompts you to load the archive files by logical sequence number, as shown in the earlier examples.

7-32 Administering the Recoverable File System on UNIX If you have automated archive backup turned on and you had to manually back up one or more archive files, consider doing all the restores manually to be absolutely sure you do not restore a partial archive. UniData detects if a partial archive is restored, and mediarec may fail.

The following example shows what the screen looks like when the recovery process is complete. In this example, no file level recovery is needed:

*****!!! Media Recovery Finished !!!***** SM stopped successfully. SMM stopped successfully. Media Recovery finished.

Please use /usr/ud72/bin/startud to start the system The next example shows what the screen looks like when file level recovery is involved:

de_arch: reading archive file on disk Create file D_TEST1, modulo/1,blocksize/1024 Hash type = 0 Create dynamic file TEST1, modulo/17,blocksize/1024 Hash type = 0 Added "@ID", the default record for UniData to DICT TEST1. Create file D_TEST2, modulo/1,blocksize/1024 Hash type = 0 Create file TEST2, modulo/17,blocksize/1024 Hash type = 0 Added "@ID", the default record for UniData to DICT TEST2.

Please check /usr/ud72/FileInfo for un-recovered file level operations.

*****!!! Media Recovery Finished !!!***** SM stopped successfully. SMM stopped successfully. Media Recovery finished.

Please use /usr/ud72/bin/startud to start the system If your screen looks like the last example, copy or print the FileInfo file before proceeding. You need this file for step 11. mediarec attempts automatic recovery of all file-level operations except index operations.

After mediarec applies all the archives you restore from backup, it applies the current archive set on disk. Some time can elapse between the last prompt for you to restore a file and the successful completion of mediarec. This is normal.

Disk Containing /usr/ud72/include Lost 7-33

7. Delete Files in Working Space

Change to the directories noted in your media configuration file, and remove the last set of working files.

8. Start UniData

Start UniData using the startud command with no options.

9. Manually Complete File-Level Recovery

Use the FileInfo file you copied or printed in step 8 to complete file-level operations from mediarec. Complete the steps in order. The following example shows a FileInfo file from a media recovery:

CREATE.INDEX /usr/ud72/demo/INVENTORY F2

BUILD.INDEX /usr/ud72/demo/INVENTORY F2

CREATE.INDEX /usr/ud72/demo/INVENTORY F3

BUILD.INDEX /usr/ud72/demo/INVENTORY F3

CREATE.INDEX /usr/ud72/demo/INVENTORY F6

BUILD.INDEX /usr/ud72/demo/INVENTORY F6 To complete each file-level operation, you need to be in the UniData account where the VOC entry for the affected file resides.

Depending on how your backup utility works and how your restore was done, you may need to reset file permissions in your UniData accounts so users have proper access to your data.

10. Stop UniData

Stop UniData with the stopud command.

7-34 Administering the Recoverable File System on UNIX 11. Perform a Full Backup

This step is highly recommended. Before you start UniData, perform a full system backup. Make sure the backup follows symbolic links for large dynamic files.

12. Run cntl_install

Execute the cntl_install command to reinitialize logging and archiving.

13. Mount a New Archive Tape

If you have automated archive backup turned on, and you are backing up to tape, mount a new tape on your archive backup device.

14. Start UniData

Start UniData using the startud command with no options. Users should be able to log on and access the data.

Disk Containing /usr/ud72/include Lost 7-35 Chapter RFS Configuration Parameters 8

UniData Configuration Parameters ...... 8-3 Modifying udtconfig Parameters ...... 8-7 RFS Configuration Parameters ...... 8-10 This chapter describes configuration parameters used by the Recoverable File System.

Many of the configuration parameters have default settings that are likely to be appropriate for your system, but that can be changed by modifying them in the udtconfig file as necessary. Some parameters have default settings that you should not change.

8-2

UniData Configuration Parameters

UniData configuration parameters are stored in a file called /usr/ud72/include/udtconfig. This file contains an entry for each parameter, including the parameter name and its value. To view the file, use the UNIX cat or pg command. To modify a parameter, use a UNIX text editor.

Whenever a system administrator starts UniData with the startud command, UniData reads the contents of the udtconfig file. The parameters direct UniData whether to start with RFS running, and determine a number of system-wide configu- ration settings (for instance, the size of tables in memory that smm uses). Every time a user logs in, the user’s process reads the same udtconfig file.

8-3 Administering the Recoverable File System on UNIX The following example shows the /usr/ud72/include/udtconfig:

# pg udtconfig # # Unidata Configuration Parameters # # Section 1 Neutral parameters # These parameters are required by all Unidata installations. # # 1.1 System dependent parameters, they should not be changed. LOCKFIFO=1 SYS_PV=3

# 1.2 Changable parameters NFILES=55 NUSERS=5 WRITE_TO_CONSOLE=0 TMP=/tmp/ NVLMARK= FCNTL_ON=0 TOGGLE_NAP_TIME=91 NULL_FLAG=0 N_FILESYS=200 N_GLM_GLOBAL_BUCKET=101 N_GLM_SELF_BUCKET=23 GLM_MEM_SEGSZ=4194304 USE_DF=0

# 1.3 I18N related parameter UDT_LANGGRP=255/192/129 ZERO_CHAR=131

# # Section 2 Non-RFS related parameters # # 2.1 Shared memory related parameters SBCS_SHM_SIZE=1048576 SHM_MAX_SIZE=67108864 SHM_ATT_ADD=0 SHM_LBA=4096 SHM_MIN_NATT=4 SHM_GNTBLS=20 SHM_GNPAGES=32 SHM_GPAGESZ=256 SHM_LPINENTS=10 SHM_LMINENTS=32 SHM_LCINENTS=100 SHM_LPAGESZ=8 SHM_FREEPCT=25 SHM_NFREES=1

# 2.2 Size limitation parameters AVG_TUPLE_LEN=4

UniData Configuration Parameters 8-4

EXPBLKSIZE=16 MAX_OBJ_SIZE=307200 MIN_MEMORY_TEMP=64

# 2.3 Dynamic file related parameters GRP_FREE_BLK=5 SHM_FIL_CNT=2048 SPLIT_LOAD=60 MERGE_LOAD=40 KEYDATA_SPLIT_LOAD=95 KEYDATA_MERGE_LOAD=40 MAX_FLENGTH=1073741824 PART_TBL=/liz1/ud61/parttbl

# 2.4 NFA/Telnet service related parameter EFS_LCKTIME=0 TSTIMEOUT=60 NFA_CONVERT_CHAR=0

# 2.5 Journal related parameters JRNL_MAX_PROCS=1 JRNL_MAX_FILES=400

# 2.6 UniBasic file related parameters MAX_OPEN_FILE=500 MAX_OPEN_SEQF=150 MAX_OPEN_OSF=100 MAX_DSFILES=1000

#2.7 UniBasic related parameters MAX_CAPT_LEVEL=2 MAX_RETN_LEVEL=2 COMPACTOR_POLICY=1 VARMEM_PCT=50

# 2.8 Number of semaphores per semaphore set NSEM_PSET=8

# 2.9 Index related parameters SETINDEX_BUFFER_KEYS=0 SETINDEX_VALIDATE_KEY=0

# 2.10 UPL/MGLM parameter MGLM_BUCKET_SIZE=50 UPL_LOGGING=0

# 2.11 Printer _HOLD_ file related parameters MAX_NEXT_HOLD_DIGITS=4 CHECK_HOLD_EXIST=0

# # Section 3 RFS related parameters # These parameters are only used for RFS which is turned by # setting SB_FLAG to a positive value.

8-5 Administering the Recoverable File System on UNIX # # 3.1 RFS flag SB_FLAG=1

# 3.2 File related parameters BPF_NFILES=80 N_PARTFILE=500

# 3.3 AFT related parameters N_AFT=200 N_AFT_SECTION=1 N_AFT_BUCKET=101 N_AFT_MLF_BUCKET=23 N_TMAFT_BUCKET=19

# 3.4 Archive related parameters ARCH_FLAG=0 N_ARCH=2 ARCHIVE_TO_TAPE=0 ARCH_WRITE_SZ=0

# 3.5 System buffer parameters N_BIG=233 N_PUT=8192

# 3.6 TM message queue related parameters N_PGQ=5 N_TMQ=5

# 3.7 After/before image related parameters N_AIMG=2 N_BIMG=2 AIMG_BUFSZ=102400 BIMG_BUFSZ=102400 AIMG_MIN_BLKS=10 BIMG_MIN_BLKS=10 AIMG_FLUSH_BLKS=2 BIMG_FLUSH_BLKS=2 RFS_DUMP_DIR=C:\IBM\ud72 RFS_DUMP_HISTORY=10

# 3.8 Flushing interval related parameters CHKPNT_TIME=300 GRPCMT_TIME=5

# 3.9 Sync Daemon related parameters N_SYNC=0 SYNC_TIME=0

# # Section 6 Century Pivot Date # CENTURY_PIVOT=1930

UniData Configuration Parameters 8-6

# # Section 7 Repliation parameters # REP_FLAG=1 TCA_SIZE=128 MAX_LRF_FILESIZE=134217728 N_REP_OPEN_FILE=8 MAX_REP_DISTRIB=1 MAX_REP_SHMSZ=67108864 UDR_CONVERT_CHAR=1

# # Euro data handling symbols # CONVERT_EURO=0 SYSTEM_EURO=164 TERM_EURO=164 LOG_OVRFLO=/liz1/ud61/log/log_overflow_dir REP_LOG_PATH=/liz1/ud61/log/replog # Parameters pertaining to the Recoverable File System are in section 3 of the udtconfig file. If you need to modify a parameter, proceed with the steps in the following section.

Modifying udtconfig Parameters

You can change the value for any configuration parameter. Complete the following steps:

Warning: You should never override the defaults for LOCKFIFO or SYS_PV.

1. Log on to Your System as root

2. Get a Listing of the Current Settings

Print a copy of the current udtconfig file to obtain a list of the current settings.

3. Copy Your udtconfig File

Also, make a copy of the udtconfig file. If you run into any problems, you can easily revert to this copy.

8-7 Administering the Recoverable File System on UNIX 4. Determine Necessary Changes

Review the current settings. Determine which parameters should be changed.

5. Edit the udtconfig File

You may want to change the values of parameters in the udtconfig file. Use vi or any UNIX text editor to edit the values in the file. Each line must have the format:

NAME=value where NAME is the parameter name and value is the value you want to use.

6. Stop UniData

Make sure all users have logged off. Stop UniData with the stopud command.

7. Run cntl_install

If you changed SB_FLAG, ARCH_FLAG, N_AIMG, N_BIMG, or N_ARCH in udtconfig, you need to run cntl_install to reinitialize archiving and logging. If you did not change these parameters, you do not need to run cntl_install. When you run cntl_install at this point, you ensure that your logs, archives, and your configuration parameters are synchronized. The following example shows the cntl_install command:

# cd $UDTBIN #./cntl_install

cntl_install utility resets Unidata System after a full database backup (Image Copy). This means, all log (and archive) files will also be initialized for re-use.

Do you want to continue?(y/n) [n] y

Installing Logs (and Archives) after cntl_install

......

UniData Configuration Parameters 8-8

8. Start UniData

Now the parameters you modified in udtconfig are in effect. Start UniData with the startud command with no options.

Note: For best results, you should plan to implement udtconfig changes after a full backup and after running cntl_install. If you are turning SB_FLAG or ARCH_FLAG on or off, or changing the number of log or archive files, and you do not back up your system, you may not be able to recover from a failure. Warning: If your system has crashed, do not implement any udtconfig changes until you have completed recovery from the crash. If you start UniData with new parameters before recovery is complete, you could prevent recovery for one or more files.

8-9 Administering the Recoverable File System on UNIX RFS Configuration Parameters

The configuration parameters that relate to RFS appear in section 3 of the udtconfig file.

„ SB_FLAG „ AIMG_BUFSZ „ AIMG_MIN_BLKS „ AIMG_FLUSH_BLKS „ ARCH_FLAG „ ARCHIVE_TO_TAPE „ ARCH_WRITE_SZ „ BIMG_BUFSZ „ BIMG_MIN_BLKS „ BIMG_FLUSH_BLKS „ BPF_NFILES „ CHKPNT_TIME „ GRPCMT_TIME „ LOG_OVRFLO „ N_AFT „ N_AFT_SECTION „ N_AFT_BUCKET „ N_AFT_MLF_BUCKET „ N_AIMG „ N_ARCH „ N_BIG „ N_BIMG „ N_PARTFILE „ N_PGQ „ N_PUT „ N_SYNC

RFS Configuration Parameters 8-10

„ N_TMAFT_BUCKET „ N_TMQ „ NSEM_PSET „ RFS_DUMP_DIR „ RFS_DUMP_HISTORY „ SYNC_TIME „ SYS_PV „ WRITE_TO_CONSOLE

SB_FLAG (System Buffer Flag)

The SB_FLAG parameter turns the system buffer on or off. If you inbrstall UniData with the Recoverable File System option, UniData sets the SB_FLAG to 1. The following table describes the SB_FLAG options.

Option Description

0 (zero) System buffer is off.

Any positive integer System buffer is on. SB_FLAG Options

When the system buffer is off, the Recoverable File System does not work. You cannot use logging or archiving.

AIMG_BUFSZ (After Image Buffer Size)

The AIMG_BUFSZ parameter describes the size of the after image buffer, in bytes. The default is 102,400 bytes. If you are using raw disk for your log files and you need to change the AIMG_BUFSZ parameter, change the value in increments of 4,096 bytes. If your log files are regular UNIX files, make your changes in increments equal to the block size (in bytes) that your file system uses. AIMG_BUFSZ cannot exceed the log block size multiplied by the log length. IBM recommends the value of AIMG_BUFSZ as the after image block size multiplied by 100.

Warning: If the block size in your logconfig file exceeds 4096, you must also increase the AIMG_BUFSZ and BIMG_BUFSZ configuration parameters. These parameters must be a multiple of the block size defined in logconfig.

8-11 Administering the Recoverable File System on UNIX AIMG_FLUSH_BLKS (Number of After Image Blocks Flushed)

The AIMG_FLUSH_BLKS parameter determines the number of blocks in the after image buffer that UniData flushes to the after image log files at one time. The default setting is 2 blocks.

AIMG_MIN_BLKS (Minimum Number of After Image Blocks)

The AIMG_MIN_BLKS parameter describes the minimum number of blocks required in the after image buffer before UniData flushes the blocks to the after image log. The default setting is 10 blocks. The size of blocks is defined in the log config- uration table.

ARCH_FLAG (Archiving Flag)

The ARCH_FLAG parameter turns the archiving system on or off. (For the archiving process to function, you must also complete the steps in Chapter 4, “Configuration Steps for Archiving.”) The default setting is zero (off). The following table describes the ARCH_FLAG settings.

Setting Description

0 (zero) The archiving system is off.

any positive integer The archiving system is on. Archive System Flag

RFS Configuration Parameters 8-12

ARCHIVE_TO_TAPE

The ARCHIVE_TO_TAPE parameter turns automatic backup of archive files on or off. If this parameter is set to 1, UniData executes the arch_backup script located in /usr/ud72/include. The default setting is zero (off). The following table describes the ARCHIVE_TO_TAPE settings.

Setting Description

0 (zero) Automatic backup of archive files is turned off.

any positive integer Automatic backup of archive files is turned on. Archive to Tape Flag Tip: If the ARCHIVE_TO_TAPE parameter is on and you want to automatically back up your archive files, make sure the arch_backup and arch_restore scripts are copying the files to the appropriate place and that the scripts are compatible. Note: If you are using the archiving system and the ARCHIVE_TO_TAPE parameter is turned off, you will manually have to off-load the archive files when they fill up.

ARCH_WRITE_SZ (Size of Archive File Writes)

The ARCH_WRITE_SZ parameter describes the size, in bytes, of blocks for the archive process to write from log files to archive files. The default setting is 0 (zero), meaning that after image log files are written to archive files one block at a time. If this parameter is set to a nonzero value, it must be a multiple of the log/archive block size.

Note: Setting this parameter may improve performance. The performance improvement is platform specific. Because writing log files to archive files uses memory, the larger you set this parameter, the more memory your system will use.

BIMG_BUFSZ (Before Image Buffer Size)

The BIMG_BUFSZ parameter describes the size of the before image buffer in bytes. The default is 102,400 bytes. If you are using raw disk for your log files and you need to change the BIMG_BUFSZ parameter, change the value in increments of 4,096 bytes. If your log files are regular UNIX files, make your changes in increments equal to the block size (in bytes) of your file system. BIMG_BUFSZ cannot exceed the log block size multiplied by the log length. IBM recommends the value of BIMG_BUFSZ as the before image block size multiplied by 100.

8-13 Administering the Recoverable File System on UNIX Warning: If the block size in your logconfig file exceeds 4096, you must also increase the AIMG_BUFSZ and BIMG_BUFSZ configuration parameters. These parameters must be a multiple of the block size defined in logconfig.

BIMG_FLUSH_BLKS (Number of Before Image Blocks Flushed)

The BIMG_FLUSH_BLKS parameter determines the number of blocks in the before image buffer that are flushed to the before image log files at one time. The default setting is 2 blocks.

BIMG_MIN_BLKS (Minimum Number of Before Image Blocks)

The BIMG_MIN_BLKS parameter describes the minimum number of blocks required in the before image buffer before the system will flush the blocks to the before image log. The default setting is 10 blocks. The size of blocks is defined in the log configuration table.

BPF_NFILES (Maximum Number of Open RFS Files per Process)

The BPF_NFILES parameter specifies the maximum number of open recoverable files one transaction manager can have open in UniBasic. The optimum setting is highly dependent on your application. Although a tm process can open files in excess of the limit, system performance will suffer. After the limit is reached, files must be physically closed and physically reopened as they are accessed later. The default value of BPF_NFILES is 80.

CHKPNT_TIME (Checkpoint Interval)

RFS retains recently requested and recently written file blocks in the system buffer. UniData flushes changes in the system buffer to disk each time the user-configurable CHKPNT_TIME elapses. (UniData also flushes the buffer if the buffer fills, regardless of timing.) The default value for CHKPNT_TIME is 300 seconds.

RFS Configuration Parameters 8-14

Tip: Check udtbin/sm.log frequently when you first turn on logging. If the log files overflow, you will see messages in sm.log. In that case, increase the log file size or decrease the checkpoint interval. If you need to change this value, try changing it in 60-second increments. Do not set CHKPNT_TIME to less than 60 seconds.

GRPCMT_TIME (Group Commit Interval)

UniData records each update operation in the system buffer and in an after image log file. The group commit interval keeps system performance from being too greatly affected by reducing the after image log input time. UniData writes every after image entry to an after image log buffer that is allocated in shared memory. UniData flushes the after image log buffer to the after image log file according to the group commit interval. If the after image log buffer is full before the group commit interval expires, UniData flushes the after image log buffer to the after image log file. The default setting is 5.

Tip: If you set the GRPCMT_TIME parameter too low, you may hurt system performance. UniData allows a constant-write setting, for instance (a setting of zero), but using that setting slows performance. If system performance is poor, increase the setting. The typical range of settings is from 1 to 5 seconds. Increasing this parameter increases the risk of loss in the event of a system crash. If you want immediate writes to the after image log file, set this parameter to zero, keeping in mind that system performance may be affected.

LOG_OVRFLO

The LOG_OVRFLO parameter lets you define a directory (use the absolute path) for overflow from log files. Specify the path for the overflow files, which you may want to direct to a separate file system for best system performance. If you install UniData with the Recoverable File System option, UniData prompts you for the location where you want to put your log files. UniData then creates a directory called log_overflow_dir under that log directory, and sets the LOG_OVRFLO parameter accordingly. If you install UniData and you do not install RFS, UniData does not set a value for LOG_OVRFLO. You must add the parameter to the udtconfig file if you turn on RFS.

8-15 Administering the Recoverable File System on UNIX Warning: IBM recommends using this parameter. If you do not define LOG_OVRFLO correctly for your environment and your log files overflow unexpectedly, UniData brings down your system to protect its integrity. A correlation exists between the size of your transaction and the overflow behavior of log files. Remember that large transactions that exceed the capacity of your log files will cause overflow—but no formula yet exists for predicting such occurrences.

N_AFT (Number of Entries in Active File Table)

The N_AFT parameter describes the maximum number of unique recoverable files that can be open at one time, systemwide. An alternate index occupies one entry in the AFT table, so if you open a file with an associated index, two entries will be used in the table. The default setting is 200 files.

Tip: If a UniBasic program is frequently executing the ELSE clause, or a ““cannot open file” error message displays when trying to open recoverable files, try increasing the value of this parameter.

N_AFT_SECTION (Number of Sections in Active File Table)

The N_AFT_SECTION parameter describes the number of sections in the Active File Table (AFT). The default setting is 1.

N_AFT_BUCKET (Number of Hash Buckets in Active File Table)

The N_AFT_BUCKET parameter describes the number of hash buckets in the Active File Table (AFT). UniData calculates a default based on N_AFT and N_AFT SECTION.

N_AFT_MLF_BUCKET (Number of Hash Buckets for Multilevel Files in Active File Table)

The AFT_MLF_BUCKET describes the number of hash buckets in the Active File Table (AFT) for tracking multilevel files. UniData calculates a default based on N_AFT_SECTION.

RFS Configuration Parameters 8-16

N_AIMG (Number of After Image Log Files)

The N_AIMG parameter describes the number of after image log files in each group of such files described in the log configuration table. The default setting is 2.

Tip: When you change this parameter, you must change the log configuration table and run cntl_install. For information on the log configuration table, see Chapter 3, “Configuration Steps for Logging.” You can use the sysmon utility to determine when to change N_AIMG: If the wait rate (WaitRate) is higher than 5 percent or the polling rate (PollRate) is higher than 2 percent, consider increasing the value of N_AIMG.

N_ARCH (Number of Archive Files)

The N_ARCH parameter describes the number of archive files defined in the archive configuration table. If you change this parameter in the udtconfig file, you must run cntl_install. The default setting is 2.

Tip: When you change this parameter, you must change the archive configuration table. For information on changing the archive configuration table, see Chapter 4, “Configuration Steps for Archiving.”

N_BIG (Number of Block Index Groups)

The N_BIG parameter describes the number of block index groups (BIGs). A block index group acts as an index to the pages in the system buffer, defined by N_PUT. If N_BIG is too large, the number of semaphore operations will increase since each BIG has a semaphore control, which may increase page swapping. If N_BIG is too small, there will be a lot of contention between different processes, which will negatively impact system performance. The default is 233.

Tip: The value of N_BIG must be smaller than N_PUT. The optimum value for N_BIG is highly application-dependent. As a starting point, you may set N_BIG to the prime number nearest NUSERS * 5. N_BIG must always be a prime number.

N_BIMG (Number of Before Image Log Files)

The N_BIMG parameter describes the number of before image log files in each group of such files described in the log configuration table. The default setting is 2.

8-17 Administering the Recoverable File System on UNIX Tip: When you change this parameter, you must change the log configuration file and run cntl_install. For information on the log configuration file, see Chapter 3, “Configuration Steps for Logging.” You can use the sysmon utility to determine when to change N_BIMG. If the sysmon wait rate (WaitRate) is higher than 5 percent or the polling rate (PollRate) is higher than 2 percent, consider increasing the value of N_BIMG.

N_PARTFILE (Number of Part Files)

The N_PARTFILE parameter describes the maximum number of unique recoverable dynamic part files that can be open at one time, systemwide. The default is 500.

Tip: The N_PARTFILE limit includes files opened by ECL and UniBasic. Each dynamic file has at least two part files, so opening a dynamic file means opening at least two part files.

N_PGQ (Number of Process Group Queues)

The N_PGQ parameter reports the number of message queues available for trans- action managers (tm processes) to send messages to udt processes. If you have many users, there may be message queue congestion. When you install UniData, the system calculates a default setting based on your number of user licenses. The default is one queue for every four users.

Tip: Make sure that the value of the kernel parameter defining the number of message queues on your system (usually msgmni) is large enough to accommodate the number of queues needed for N_PGQ + N_TMQ + UniData daemons. The number of queues needed for UniData daemons can be computed as three queues for sbcs, one queue for cm, and one queue for sm.

N_PUT (Number of Pages in System Buffer)

When accessing recoverable files, UniData first looks for each data file block in the system buffer. If the block is not there, UniData reads the block from disk, and then puts it into the system buffer, swapping other pages to disk if the system buffer is full.

The N_PUT parameter specifies the system buffer size in pages. Each page is 1024 bytes. If the system buffer size is too small, many files may be swapped in and out of the buffer. Increasing the system buffer size may improve performance; the optimum setting is highly application-dependent. The default setting is 8192 pages.

RFS Configuration Parameters 8-18

Tip: If you change N_PUT to a larger number, you should also increase N_BIG.

N_SYNC

If you notice significant performance degradation during a checkpoint, you can start sync daemons by setting this parameter to the number of sync daemons you want running on your system. Sync daemons periodically flush updates pages from the system buffer to the log files, reducing the amount of time it takes to complete a checkpoint.

N_TMAFT_BUCKET (Number of Hash Buckets in TM Active File Table)

The N_TMAFT_BUCKET parameter describes the number of hash buckets in the active file table (TMAFT) for each tm process. UniData calculates the default based on the value of BPF_NFILES.

N_TMQ (Number of Message Queues)

The N_TMQ parameter describes the number of message queues available for udt processes to send messages to transaction managers (tm processes). If you have many users, there may be message queue congestion. When you install UniData, UniData calculates a default setting based on your number of user licenses. The default is one queue for every four users.

Tip: Make sure that the value of the kernel parameter defining the number of message queues on your system (usually msgmni) is large enough to accommodate the number of queues needed for N_PGQ + N_TMQ.

NSEM_PSET (Number of Semaphores per Set)

The NSEM_PSET parameter describes the number of semaphores per semaphore set. You should not need to change this parameter. See the description of the SYS_PV parameter in this section. The default setting is 8.

8-19 Administering the Recoverable File System on UNIX RFS_DUMP_DIR (Location of rfs. File)

The RFS_DUMP_DIR parameter defines where UniData stores the rfs.dump file when the s_stat -s command is executed. The default value is an empty string, with UniData storing the rfs.dump file in the $UDTBIN directory. If UniData determines the defined path is invalid when it starts, UniData writes the rfs.dump file to the $UDTBIN directory, and prints a message to the sm.log file.

RFS_DUMP_HISTORY (Number of rfs.dump files to preserve)

The RFS_DUMP_HISTORY parameter specifies how many rfs.dump files to preserve when you execute the s_stat -s command.

The default value of this parameter is 1. With this value, UniData creates the rfs.dump file you specify the with RFS_DUMP_DIR parameter.

If this value is set to a positive integer, for example 4, UniData names the rfs.dump files rfs.dump1, rfs.dump2, rfs.dump3, rfs.dump4. The s_stat -s command uses the first available rfs.dump file. If all the rfs.dump files are full, the s_stat -s command reuses the oldest rfs.dump file.

If this value is set to 0, UniData preservesall rfs.dump files and names them rfs.dump1, rfs.dump2, and so forth.

SYNC_TIME

The SYNC_TIME parameter defines, in seconds, the amount of time the sync daemons wait before scanning the system buffer for updated pages.

RFS Configuration Parameters 8-20

SYS_PV (P/V Operations)

The SYS_PV parameter describes the type of P/V (lock/unlock) operations. These parameters, which UniData predetermined as appropriate for controlling access to critical resources on your system, are shown here only for your information. Do not change them.

Parameter Description

1 System semaphore

2 C language

3 Assembly language SYS_PV Parameters

Warning: Never change the value of SYS_PV unless instructed to do so by IBM Technical Support, or you will encounter unpredictable results.

WRITE_TO_CONSOLE

This parameter turns messaging to your console off and on. If WRITE_TO_CONSOLE is on, system messages (archive file full messages, for example) display on the system console. The default value is 0 (off). Set WRITE_TO_CONSOLE equal to 1 if you want to display messages at the console.

8-21 Administering the Recoverable File System on UNIX Chapter Monitoring and Tuning 9

The sysmon Utility ...... 9-3 sysmon Fields and Values ...... 9-4 Performance Tips ...... 9-11 Tuning N_PUT and N_BIG ...... 9-13 Adjusting the Log Files ...... 9-13 Adjusting the Archive Files ...... 9-14 Tuning CM_SLEEP ...... 9-14 RFS File Open Performance ...... 9-16 How RFS Tracks Open Files...... 9-16 Tuning RFS Open File Performance ...... 9-16

This chapter describes how to monitor and tune the Recoverable File System (RFS). RFS has a utility called sysmon that enables you to monitor the activity of the RFS system and determine how your system could be tuned effectively for performance. This chapter covers:

„ The sysmon utility. „ A description of the sysmon fields. „ Performance tips.

9-2 Administering the Recoverable File System on UNIX The sysmon Utility

The sysmon utility monitors the performance of the Recoverable File System. Although sysmon makes no direct recommendations about tuning your system, you can use the sysmon display to help you make decisions about UniData configuration parameters.

To use sysmon, enter the command at the UNIX prompt.

Syntax:

sysmon [-b |-m] [-o filename] [-t nn] [-s screens]

The following table describes each parameter of the syntax.

Parameter Action

-b Displays detailed information about the Block Index table (BIG) in shared memory. You cannot use the -m option with the -b option.

-m Displays detailed information about user requests. You cannot use the -b option with the -m option.

-o filename Directs sysmon output to filename.

-t nn Redisplays the data every nn seconds.

-s screens Specifies how many screens to display before exiting. sysmon Options

By default, sysmon redisplays every three seconds. You can specify a different sampling interval with the -t option.

The following example shows the sysmon display. The fields are defined below, by section.

The sysmon Utility 9-3

sysmon Fields and Values

The following tables describe the fields on the sysmon display. In many cases, you can use these fields to help you determine the best settings for tunable parameters in the udtconfig file. Chapter 8, “RFS Configuration Parameters,” describes parameters that are specific to RFS; for a full list of parameters, see Administering UniData.

Block Index Group (BIG) Statistics Section

This table describes the fields in the Block Index Group (BIG) Statistics section of the sysmon screen.

Field Name Description

PinRead Number of UniData blocks locked for reading during the sampling interval.

PinWrite Number of UniData blocks locked for writing during the sampling interval.

PinWaitQ Number of blocks waiting to be locked for writing or reading during the sampling interval.

PinWaitRate A calculation: PinWaitQ / (PinWrite + PinRead). Block Index Group (BIG) Statistics Section

9-4 Administering the Recoverable File System on UNIX Field Name Description

TmRead Number of blocks read from disk to system buffer by all TM daemons during the sampling interval.

TmWrite Number of blocks written to disk by all TM daemons during the sampling interval.

CmRead Number of blocks read from disk into system buffer by the CM daemon during the sampling interval.

CmWrite Number of blocks written to disk by the CM daemon during the sampling interval.

Dirty Blocks of data that have been written to in the system buffer, but that have yet to be written to disk.

Neat Unchanged blocks of data in the system buffer.

Total A calculation: Dirty + Neat.

Hits Number of blocks found in the system buffer (or cache) when read and/or written during the sampling interval.

HitRate A calculation: Hits / (PinWrite + PinRead). Block Index Group (BIG) Statistics Section (continued)

The sysmon Utility 9-5

Latching Statistics Section

This table describes the fields in the Latching Statistics section of the sysmon screen. For each of the types shown in the table, the Latching Statistics section shows the wait queue (WaitQ), the number of latches (Latches), the waiting rate (WaitRate), the number of poll calls (PollCall), and the polling rate (PollRate).

Field Name Description

Big How many times the system accesses the block index group (BIG) in the system buffer. If the WaitRate is higher than 5 percent or the PollRate is higher than 2 percent, increase the number of block index groups by altering the N_BIG parameter.

Aft How many times the active file table in the system buffer is locked or unlocked. This should remain at zero approximately 90 percent of the time. If it does not, increase N_AFT_SECTION.

Aimg How many times UniData locks or unlocks the after image log buffer. If the WaitRate is higher than 5 percent or the PollRate is higher than 2 percent, increase the number of after image log files in the log configuration table. You can also adjust the minimum number of blocks needed to flush the after image buffer to the after image log file by changing the AIMG_MIN_BLKS parameter. If the WaitRate or PollRate for Aimg is low, reduce the number of after image logs, or add a disk and distribute the log files between disks to improve system performance. If the GRPCMT_TIME parameter is zero, there is no group commit in effect. Each write operation waits until the corresponding after image record is written to disk, which hampers system performance.

Bimg How many times the system locks or unlocks the before image log buffer. If the WaitRate is higher than 5 percent or the PollRate is higher than 2 percent, increase the number of before image log files in the log configu- ration table. You can also adjust the minimum number of blocks needed to flush the before image buffer to the before image log file by changing the BIMG_MIN_BLKS parameter. Tip: If the WaitRate or PollRate for Bimg is low, reduce the number of after image logs, or add a disk and distribute the log files between disks to improve system performance. Latching Statistics Section

9-6 Administering the Recoverable File System on UNIX TM Status Section

This table describes the fields in the TM Status section of the sysmon screen

Field Name Description

Tm# Number of TM (transaction manager) daemons present in the system.

Req# Number of requests UniData sent to the TM daemons during the sampling interval.

ActTm Number of TM daemons active in the system during the sampling interval. TM Status Section

SHM Info Section

This table describes the fields in the SHM Info section of the sysmon screen.

Field Name Description

ShmPV Number of system semaphore locks requesting shared memory during the sampling interval.

Total Number of system semaphore locks requesting shared memory accumu- lated since sysmon started. SHM Info Section

Log File Statistics Section

This table describes the fields in the Log File Statistics section of the sysmon screen.

Field Name Description

TmBimgFlush The number of times that TM daemons flush to before image logs during the sampling interval.

TmAimgFlush The number of times that TM daemons flush to after image logs during the sampling interval.

CmBimgFlush The number of times that CM daemons flush to before image logs during the sampling interval. Log File Statistics Screen

The sysmon Utility 9-7

Field Name Description

CmAimgFlush The number of times that CM daemons flush to after image logs during the sampling interval.

WaitQ0 Number of TM daemons waiting in the queue during the sampling interval.

WaitQ1 Number of TM daemons waiting in the queue during the sampling interval.

WaitQ2 Number of TM daemons waiting in the queue during the sampling interval.

WaitQ3 Number of TM daemons waiting in the queue during the sampling interval.

LogCkSuccess Number of log files that passed checking during the sampling interval.

LogCkFail Number of log files that failed checking during the sampling interval.

LogOvrflos Number of log file overflow events accumulated since system started. A log overflow event means a log file reached 80% full.

LogSwitchd Number of log file switching events accumulated since system started (number of checkpoints).

BimgRawBlks The number of blocks written by the before image log daemon during the sampling interval.

AimgRawBlks The number of blocks written by the aimglog daemon during the sampling interval.

TotRaw Sum of AimgRawBlks and BimgRawBlks values. Log File Statistics Screen (continued)

Lower Portion of Log File Statistics Section

The lower portion of log file statistics section at the lower left corner of the sysmon screen provides information about individual log files. These are displayed by sysmon in groups of four in the current active log set. So, if you have eight log files, the system displays them in groups of four starting with the first log file (0, 1, 2, 3). Then, depending on when you have set your checkpoint, the system switches to the second set of four log files (4, 5, 6, 7).

9-8 Administering the Recoverable File System on UNIX Note: This scenario is based on a simplistic system. In a more complex case, if you have two sets, one with four after image logs and three before image logs, you will see only the first four in the current set. This is due to the way the log configuration reads the files. Therefore, you may view the four after image log files for the first set and then the four after image log files in the second set. You will not be able to view the before image log files in this case.

These log files have characteristics described in the following columns.

Column Heading Description

LogID The identifying label for the log file.

Total The number of blocks written to the log file since the last checkpoint interval. You set the checkpoint interval with the CHKPNT_TIME parameter. See Chapter 8, “RFS Configuration Parameters.”

Length The number of blocks written to the log file during the sampling interval. Lower Portion of Log File Statistics Section

Record Info Section

This table describes the fields in the Record Info section of the sysmon screen.

Field Name Description

RecRead Number of records read during sampling interval.

RecWrite Number of records written during sampling interval.

RecDelete Number of records deleted during sampling interval.

AvgRead Average number of bytes per record read during sampling interval.

AvgWrite Average number of bytes per record written during sampling interval. Record Info

The sysmon Utility 9-9

Trans Info Section

This table describes the fields in the Trans (transactions) Info section of the sysmon screen.

Field Name Description

Committed Number of transactions committed to disk during sampling interval.

Aborted Number of transactions aborted during sampling interval. Trans Info Section

9-10 Administering the Recoverable File System on UNIX Performance Tips

The following steps can help you tune your system for performance. Remember that the more frequently you write to disk, the slower the performance. You will need to tailor RFS for the type of business you have. For example, accounting applications may need more frequent writes than real estate applications and vice versa. View your system using sysmon, as described in the previous section, then use the following steps as a guideline.

„ If possible, use raw disk files as the after image and before image log files. IBM recommends that you place these files on a separate physical device from your data. Ideally, each individual after or before image log should be placed on its own disk to reduce overhead. „ Look at the “Hitrate” of the system buffer. If it is less than 90 percent, increase the number of system buffer pages. You may do this by increasing the N_PUT parameter in the udtconfig file. „ Look at the Latching-Related information. You can see the collision of various parts of the Latching operation. „ If the “WaitRate” on “Big” is higher than 5 percent, or the “PollRate” on “Big” is higher than 2 percent, then you can increase the number of entries as “Big”. Do this by making the RFS system tunable parameter N_BIG bigger. Also make sure that N_BIG is a prime number. „ If the “WaitQ” and the “WaitRate” are not zero, increae the N_AFT_SECTION parameter to reduce the conflict of simultaneously trying to access the same AFT section. „ If the “WaitRate” on “Aimg” is higher than 5 percent or the “PollRate” on “Aimg” is higher than 2 percent, increase the number of after image log files. Then change the parameters N_AIMG and AIMG_MIN_BLKS to a larger corresponding number. „ If the “WaitRate” on “Bimg” is higher than 5 percent or the “PollRate” on “Bimg” is higher than 2 percent, increase the number of after image log files. Then change the parameters N_BIMG and BIMG_MIN_BLKS to a larger corresponding number. „ Look at the “=== SHM INFO ===” field to see if shared memory getting and freeing buffer are in normal state. If the “ShmPV” is frequently nonzero, tune the parameters AVG_TUPLE_LEN and MIN_MEMORY_TEMP to larger values.

Performance Tips 9-11

„ Look at the Log file related information to check the overflow state of the log file and verify that the space for the before image/after image log buffer is adequate. „ If the “LogOvrflos” is not zero that means one or more of the log files is in overflow. Check the “LogID” file to see which Log file is currently in an overflow state. Enlarge the overflowed log file (or reduce the checkpoint time if the log file space is not enough). Also increase the size of the after image/before image log files and modify the logconfig table accordingly. „ Increase the priority of the cm and archive processes. If possible, set them to realtime priority. „ If most of the records in a file are less than 1000 bytes, make your file block size 1K. „ The “CHKPNT_TIME” should be 300 (seconds) or larger. If the checkpoint arrives and the “Total” in Log file related information is significantly smaller then the size of the log file size, reduce the space of the log file. „ GRPCMT_TIME allows you to decrease the I/O to the after image log files by having them written at intervals, rather than with each record. If the GRPCMT_TIME is 0, then it will have constant I/O; and therefore, system performance degradation. Any integer greater than zero increases the number of records per physical write. „ If you want to run a large number of users, and you experience message queue congestion, you can set the parameters “N_PGQ” and “N_TMQ” to a larger number. The default values for these are calculated, allowing one queue for each four users.

Tuning N_PUT and N_BIG

The N_PUT configuration parameter describes the total number of pages in the system buffer. Each page is 1,024 bytes. The N_BIG configuration parameter acts as an index to N_PUT. If you increase N_PUT, you should also increase N_BIG. If N_BIG is too large, the number of semaphore operations will increase since each BIG has a semaphore control. This may also increase page swapping. If N_BIG is too small, there may be a lot of contention between UniData processes.

The value of N_BIG should be the closest prime number to NUSERS * 5. It must be smaller than N_PUT.

9-12 Administering the Recoverable File System on UNIX Note: N_BIG must be a prime number. If N_BIG is not a prime number, you may experience poor performance.

Adjusting the Log Files

If sysmon is reporting log switches more frequently than checkpoint intervals, you may want to increase the size of your log files. If the log files switch before a check- point occurs, the log files have reached the 80 percent full mark before a checkpoint has taken place.

It is better to increase the size of the log files than it is to have a large number of logs. Although it is not possible to provide an exact formula for determining the size of the log files, a starting place may be to multiply the number of records expected for update during one checkpoint interval by the largest record size. Divide this number by the block size you have chosen in the log configuration table, and you have an estimation for the number of blocks needed for your log files. Increase the log length parameter in the logconfig file and run cntl_install. See Chapter 3, “Configuration Steps for Logging.”

Note: When changing the size of your log files, change the log length parameter and not the block size in the logconfig file located in /usr/ud72/include. You should always use the UNIX file system block size. If you cannot determine the UNIX file system block size on your system, use 4096. The block size cannot exceed 16,384.

Adjusting the Archive Files

One archive file should be at least as large as one full set of log files. This is to ensure that one checkpoint does not span multiple archive files. If the archive files are too small, you will have to be off-load frequently as they fill. If the archive files are too large, the time to off-load the files may be unacceptable. Although the default for the number of archive files is 2, you may want to consider having more archive files.

Note: When the full set of archive files are full, UniData processing will pause until the archive files have been off-loaded.

Performance Tips 9-13

Tuning CM_SLEEP

In rare circumstances, you may encounter slow response from UniData during check- point switching. This slow response only occurs when you are running a large number of UniData processes that have one or more large data files being updated almost continuously. If this situation exists, a processing bottleneck may occur at the point when the updates are actually written to disk.

The cm daemon first flushes “dirty” pages from the system buffer to a second buffer at the operating system (OS) level. After cm flushes the dirty pages, cm issues a UNIX fsync() call for each data file in order to flush all the dirty pages in the OS buffer to disk at one time. During the fsync, no process can access a file, even to read it. For a large data file with a large number of updates, the fsync process can take longer than 30 seconds. A user attempting to access the data file during its fsync appears to hang.

You can define the number of seconds the cm process sleeps between the flush of dirty pages to the OS buffer and the fsync process by setting the environment variable CM_SLEEP. Since the operating system itself writes pages from the OS buffer to disk over time, the OS will continue while cm is sleeping. When cm wakes, the number of dirty pages for fsync to write is much smaller, decreasing the amount of time required to fsync each file.

The default value for CM_SLEEP is 30 seconds. If you wish to change the default, set the CM_SLEEP environment variable to any positive numeric value before you start UniData, as shown in the following example:

From the C shell:

% setenv CM_SLEEP num_seconds

From the Bourne or Korn shell:

# CM_SLEEP=num_seconds;export CM_SLEEP

Note: CM_SLEEP is an environment variable, not a configuration parameter. Do not add CM_SLEEP to the udtconfig file.

9-14 Administering the Recoverable File System on UNIX RFS File Open Performance

With the implementation of RFS, users at some large installations experienced slow performance of applications that opened many recoverable files. Benchmark tests indicated that overall RFS performance for large installations could be improved by speeding up the functions that open recoverable files.

How RFS Tracks Open Files

RFS file opens are handled by the tm (transaction manager) daemons. There is one tm daemon for every user (udt) process. UniData uses tables in memory to keep track of RFS files. The Active File Table (AFT) tracks the number of unique recoverable files open system-wide. The size of the AFT is controlled by the UniData configu- ration parameter N_AFT. The AFT contains an entry for every recoverable file that is open throughout the system. In addition, each tm daemon keeps a table called TMAFT in its private memory. The TMAFT tracks the recoverable files opened by the tm. The size of the TMAFT is controlled by the UniData configuration parameter BPF_NFILES. Whenever a tm gets a request to open a recoverable file, the tm searches both its own TMAFT and the system AFT to see if the file is already open.

Tuning RFS Open File Performance

At large installations with applications that perform many file opens, the AFT and TMAFT can be quite large. You can configure the sizes of these tables by adjusting the configuration parameters described in this section.

Note: Use the sysmon utility to provide input to your decisions about tuning. The output section called “LATCHING STATISTICS” includes a parameter called “WaitRate” for the AFT. IBM strongly recommends tuning if this parameter often approaches 100%. If you see a sudden increase in “WaitRate”, tuning can help improve performance even if the value of “WaitRate” never exceeds 10 percent.

RFS File Open Performance 9-15

AFT Sections

The systemwide AFT can be divided into sections, and AFT entries hashed across the sections (just as records in a UniData hashed file are hashed into groups). Each AFT section can still be searched by only one tm at a time, but dividing the AFT into more than one section can reduce wait times because it is possible for different tm daemons to search in different sections at the same time, and because each tm has to search fewer entries. By default, the number of AFT sections is 1. To increase the number of sections, you can increase the value of the N_AFT_SECTION parameter in the udtconfig file. This number must be greater than 0 and less than (N_AFT/2). It should be a prime number. If you are going to tune N_AFT_SECTION, increase the value by small increments.

AFT Hash Buckets

AFT entries are hashed into hash buckets within sections. Hashing makes the search process faster, because a tm never has to read all the entries in the table (or section) to a particular entry.

UniData sets the number of hash buckets to a number greater than or equal to N_AFT/2 that is a prime multiple of N_AFT_SECTION. Even if N_AFT_SECTION remains at 1, hashing the AFT entries into hash buckets still improves performance because each tm only needs to search the contents of a single hash bucket rather than searching the whole AFT. The tm processes may still have to wait, but the wait times are greatly decreased. You can control the number of hash buckets by adding the N_AFT_BUCKET to the udtconfig file.

Note: N_AFT_BUCKET should be large compared to N_AFT_SECTION.

Hash Buckets for Multilevel Files

The AFT has hash buckets for tracking static or dynamic multilevel subfiles. Special treatment is needed for multilevel files because UniData needs to record opens and closes of the subfiles as well as the multilevel file itself.

By default, UniData sets the number of buckets for multilevel files to the first prime number greater than or equal to the value of N_AFT_SECTION/10. You can change the number of hash buckets for multilevel files by adding the N_AFT_MLF_BUCKET parameter to the udtconfig file. This number must be a prime number greater than 0.

9-16 Administering the Recoverable File System on UNIX TMAFT Hash Buckets

Each TMAFT is divided into hash buckets, just as the AFT. By default, UniData sets this parameter to the first prime number greater than or equal to (BPF_NFILES + 10) /5. You can change the number of TMAFT hash buckets by increasing the value of the N_TMAFT_BUCKET parameter the udtconfig file. This number must be greater than 0 and should be a prime number.

Note: To view the values of UniData configuration parameters, use the UNIX pg or cat command. For information on modifying parameters in the udtconfig file, see Chapter 8, “RFS Configuration Parameters.”

Sync Daemons

If you notice significant performance degradation during a checkpoint, you can start sync daemons by setting the udtconfig parameters N_SYNC and SYNC_TIME. Sync daemons periodically flush updated pages from the system buffer to the log files, reducing the amount of time it takes to complete a checkpoint.

N_SYNC determines the number of sync daemons UniData starts. SYNC_TIME defines, in seconds, the amount of time the sync daemons wait before scanning the system buffer for updated pages.

RFS File Open Performance 9-17 Chapter Troubleshooting RFS 10

Possible Errors ...... 10-3 Failure of UniData to Start ...... 10-3 File Log Size Too Small ...... 10-3 Inadequate Number of Message Queues Defined ...... 10-4 Value of SYS_PV Changed in udtconfig ...... 10-5 Process Errors...... 10-6 Values of N_TMQ and N_PGQ Are Zero...... 10-6 UniData Daemon Killed ...... 10-7 Errors During Processing ...... 10-8 Archive Files Are Full...... 10-8 aimglog and bimglog Executables Have Been Removed . . . . . 10-8 Parameter Limits Exceeded ...... 10-9 MAX_OPEN_FILE ...... 10-9 N_AFT ...... 10-9 BPF_NFILES ...... 10-9 Files Are Not Being Treated as Recoverable ...... 10-10 File is Not Defined as Recoverable...... 10-10 SB_FLAG Turned Off...... 10-10 Recoverable File System Not Licensed ...... 10-11 Warning Messages ...... 10-12 Log Files Are Too Small ...... 10-12 g March 8, 2010 4:17 pm

Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta

10-2

This chapter contains sample error messages and possible reasons for the error messages you may encounter when running the Recoverable File System.

10-3 Administering the Recoverable File System on UNIX Possible Errors

Failure of UniData to Start

A variety of circumstances may prevent UniData from starting. When you invoke the startud command and UniData fails to start, you may see a message on your screen similar to the following example:

# startud Using UDTBIN=/usr/ud72/bin All output and error logs have been saved to /usr/ud72/bin/saved_logs directory. SMM is started. SBCS is started. Recovery of UniData system failed. Stopping the system... SMM stopped successfully. UniData system has been shutdown.

File Log Size Too Small

The size of the file-level log must be at least NUSERS + 1. If this is not the case, an error similar to the following appears in udtbin/sm.log:

ERROR in function U_s_setupsca(), file s_setsca.c: ***file log too small. --- called by process 29539 File log size should be at least 51 block. Please change the file log size and run cntl_install,then startup system. Increase the value of the file-level log size in the logconfig file, run cntl_install, and start UniData.

Note: UniData overwrites the sm.log located in udtbin every time you execute startud. If you execute startud more than once, check the sm.log located in the udtbin/saved_logs directory, where the last 20 sm.logs are saved.

Possible Errors 10-4

Inadequate Number of Message Queues Defined

The kernel parameter which determines the maximum number of message queues allowed systemwide (usually msgmni) must be large enough to accommodate the number of message queues required by UniData when running the Recoverable File System.

When you start UniData, UniData displays a message similar to the following example:

Using UDTBIN=/usr/ud72/bin

All output and error logs have been saved to ./saved_logs directory.

Couldn't start SMM. Please check /usr/ud61/bin/smm.errlog UniData Not Started udtbin/smm.errlog will contain a message similar to the following example:

Error when creating TM message Q (errno=28). The minimum number of message queues required is defined by the following formula:

1 queue/4 users for TMQ + 1 queue/4 users for PGQ + 10 queues for Uni- Data daemons

For example, a 100-user license needs a minimum of 60 message queues for UniData. If you have other applications on your system which require additional message queues, make sure the value of the kernel parameter defining the maximum number of message queues is large enough to allow for all queues required for all applications on your system.

Increase the value of the appropriate kernel parameter, or alternatively, decrease the value of N_TMQ and N_PGQ in /usr/ud72/include/udtconfig temporarily until the kernel can be rebuilt.

Note: To view the number of message queues defined for your system, use the UniData system-level kp command. The number of message queues is the value of msgmni on most UNIX systems. If the value of msgmni is not a positive integer, contact your system administrator for the proper kernel value.

10-5 Administering the Recoverable File System on UNIX Value of SYS_PV Changed in udtconfig

You should not change the value assigned to SYS_PV unless you are explictly asked to do so by IBM Technical Support. SYS_PV describes the type of semaphore opera- tions for the Recoverable File System. The value of SYS_PV is platform-dependent and is determined by UniData during the installation process. If you change the value, unpredictable results will occur.

Possible Errors 10-6

Process Errors

Values of N_TMQ and N_PGQ Are Zero

If the values of N_TMQ and N_PGQ in /usr/ud72/include/udtconfig are 0 and the value of the SB_FLAG is 1, UniData will not start.

The udtbin/smm.errlog will contain a message similar to the following example:

# pg $UDTBIN/smm.errlog SB_FLAG is turned on, N_PGQ and N_TMQ should not be 0. udtconfig parameter(s) holds invalid value(s). Please use 'udtconf' to check it. Exit: Some parameters are missed or wrongly set. Please check the udtconfig file . Make sure all of the UniData daemons are stopped and no message queues associated with UniData remain. Set the value of N_TMQ and N_PGQ in /usr/ud72/include/udtconfig to the proper value using any UNIX text editor. Then start UniData.

Note: UniData should have 1 queue per 4 users for both N_TMQ and N_PGQ.

10-7 Administering the Recoverable File System on UNIX UniData Daemon Killed

If one of the daemons is killed during processing, UniData will shut down. UniData writes a message similar to the following example to the udtbin/sm.log:

# /disk1/ud72/bin # pg sm.log Checking log files ..... ----- SM (16212) is started at Apr 28 1999 14:41:14 -----

Unidata Environment : /usr/ud61/include

SM: Restart_Flag = 0. SM checked: cm (pid = 16213): Stopped because of Kill ----- System Crashed at Apr 28 1999 14:46:59 ----- All possible CM & TMs & AIMGLOGs & BIMGLOGs killed Dumping the system buffer to "/disk1/ud72/bin/rfs.dump"...... Done. journal stopped successfully. SBCS stopped successfully. CLEANUPD stopped successfully!! SMM stopped successfully. All user processes killed. SM exited with code 5. (EOF): Execute the showud command to make sure all the remaining daemons are stopped. If they are not, stop UniData with the stopud -f command to force the remaining daemons to shutdown. Check the error logs in udtbin to see if UniData detected any circumstances which may have caused the process to be killed, and resolve any errors. Then restart UniData.

Process Errors 10-8

Errors During Processing

Archive Files Are Full

If you are running archiving as part of the Recoverable File System and UniData appears to hang, it may be that the archive files are full and UniData is waiting for the full archive to be off-loaded. A message appears on the system console, in the window where UniData was started, and in the udtbin/sm.log, indicating that the archive file is full and must be off-loaded. Contact your system administrator to copy the archive to the appropriate place.

Note: IBM recommends that you utilize the arch_backup script to automatically off- load full archive files to tape or to disk to ensure that the system does not appear to hang during normal processing. One archive file must be at least as large as one full set of log files so that one checkpoint does not cross multiple archive files.

aimglog and bimglog Executables Have Been Removed

If you experience a system crash and attempt to save all of the error logs located in udtbin to another directory, do not save the logs using the UNIX mv *log command. This command will cause the aimglog and bimglog executables to be moved as well as the error logs, and UniData will appear to hang since these executables have been removed.

Stop UniData and move the aimglog and bimglog executables back to udtbin. Then start UniData.

10-9 Administering the Recoverable File System on UNIX Parameter Limits Exceeded

A variety of conditions can cause a UniBasic program to report a runtime error or execute an ELSE clause in an open file statement. If the UniBasic program is performing an operation against a recoverable file, you may have exceeded a parameter defined in /usr/ud72/include/udtconfig.

MAX_OPEN_FILE

MAX_OPEN_FILE is used by UniBasic as the maximum number of open files allowed per process for all types of hashed files, including recoverable and nonrecov- erable dynamic and static files. If this limit is exceeded, UniData displays a runtime error message (too many open files at line nn).

N_AFT

N_AFT defines the maximum number of unique recoverable files that can be open at one time, systemwide. Remember that a secondary index occupies one entry in this table. If you exceed this limit, a UniBasic program will execute the ELSE clause in an open file statement.

BPF_NFILES

BPF_NFILES is the logical limit of the total number of recoverable files one process can have open in UniBasic. If this limit is reached, UniData will swap in a new file by physically closing a file and physically reopening a file as it is accessed later. No error message will be generated, but system performance will be negatively impacted if this parameter is set too low.

Parameter Limits Exceeded 10-10

Files Are Not Being Treated as Recoverable

A number of factors may cause a file to be treated as nonrecoverable, even though you have chosen to run UniData with the Recoverable File System.

File is Not Defined as Recoverable

You may not have converted a file to recoverable or created a file as recoverable. To view a file’s type, execute the udfile command. If the file is nonrecoverable, stop UniData and convert the file to recoverable by executing the udfile -r command and restart Unidata. For more information on the udfile command, see Chapter 5, “Creating and Configuring Recoverable Files.”

SB_FLAG Turned Off

Make sure the SB_FLAG in /usr/ud72/include/udtconfig is set to a positive integer (normally 1). If this flag is 0, or was 0 the last time UniData was started, the Recov- erable File System is turned off. To ensure that the Recoverable File System is running, execute the UniData UNIX showud command. You should see a display similar to the following:

# showud UID PID TIME COMMAND root 17413 0:00 /liz1/ud72/bin/aimglog 0 28315 root 17414 0:00 /liz1/ud72/bin/aimglog 1 28315 root 17415 0:00 /liz1/ud72/bin/bimglog 2 28315 root 17416 0:00 /liz1/ud72/bin/bimglog 3 28315 root 17426 0:02 /liz1/ud72/bin/cleanupd -m 10 -t 20 root 17410 0:03 /liz1/ud72/bin/cm 28315 root 17422 0:00 /liz1/ud72/bin/repmanager root 17404 0:00 /liz1/ud72/bin/sbcs -r root 17409 0:00 /liz1/ud72/bin/sm 60 11477 root 17397 0:00 /liz1/ud72/bin/smm -t 60 root 22934 0:00 /liz1/unishared/unirpc/unirpcd If you do not see aimglog, bimglog, cm, or sm daemons, the Recoverable File System is turned off. Stop UniData and change the SB_FLAG to 1 in /usr/ud72/include/udtconfig using any UNIX text editor. Then start UniData.

10-11 Administering the Recoverable File System on UNIX Recoverable File System Not Licensed

UniData may not have been licensed for the Recoverable File System. To see if UniData is licensed for the Recoverable File System, execute the UniData UNIX confprod command. If the value of the Users/Licensed column is next to Recov- erable File System, your system is not licensed to run the Recoverable File System.

You must be logged in as root to stop and start UniData, or to convert a file using the udfile command.

Files Are Not Being Treated as Recoverable 10-12

Warning Messages

Log Files Are Too Small

If the log files defined in /usr/ud72/include are not large enough for your application and load, a checkpoint will occur when the log files are 80 percent full, regardless of the checkpoint time defined in udtconfig. A message will appear in udtbin/sm.log similar to the following:

*****!!! Restart Finished !!!***** Checking log files ..... ----- SM (14409) is started at Apr 28 2004 11:58:30 -----

Unidata Environment : /usr/ud72/include

WARNING: The log file is too small to contain the log records. Please enlarge the log files.

UniData should not encounter any error conditions if the log files are too small, but checkpoints will occur frequently, causing system performance to degrade.

Stop UniData and increase the size of the log files in the logconfig file. Run cntl_install and start UniData using the startud command.

Note: IBM recommends that you increase the size of the log files by increasing the log length parameter in the logconfig file, rather than increasing the block size. Make sure the block size corresponds to the UNIX file system block size, or 4096 if you do not know the UNIX file system block size.

10-13 Administering the Recoverable File System on UNIX