IBM System I Parallel Save and Restore
Total Page:16
File Type:pdf, Size:1020Kb
IBM System i Parallel Save and Restore
By Nancy Roper May 18, 2006
A shortened version of this article was published in the June 2006 issue of the COMMON Connect magazine, vol- ume 3, number 3, on page 20-22
About the Author: Nancy Roper is a Consulting IT Specialist. She currently works in the IBM Americas Advanced Technical Support group, assisting the largest System i customers with their availability strategies. Nancy is a seasoned technical expert on System i tape, SAN, and BRMS, and is co-author of the redbook “iSeries in a Storage Area Network” (SG24-6220).
Many customers are interested in spreading their backups across multiple drives simultaneously in order to shorten their backup time. The IBM® System iTM platform offers several techniques to do this, some of which have special considerations. This article will begin by outlining the various multi-streamed backup options. It will then discuss a situation with referential database constraints where care is required with multi-streamed saves. Finally, the article will delve into each of the three multi-streamed backup options, explaining how to use the save technique, and the best way to restore that type of save.
Overview of Techniques for Running Multi-Streamed Backups on the System i Platform There are three different techniques for running multi-streamed backups on the System i plat- form:
v Concurrent Saves v Parallel-Parallel Saves v Parallel-Serial Saves
Concurrent saves have been available since the early days of the AS/400TM. Multiple save jobs are run simultaneously, each using one drive. Concurrent saves are being used very successfully by many customers around the world, with very few considerations.
Parallel-parallel saves were introduced at V4R4 and were intended for customers who needed to reduce the time to save a single large object or library by splitting it across multiple drives. To use this type of save successfully, use Backup, Recovery and Media Services (BRMS) for the save, recover to a system that has access to the BRMS information about the save, and use the same number of drives for the restore as you used for the save.
Parallel-serial saves were introduced at V5R1 to allow parallel saves to run against multiple li- braries. They basically run a set of concurrent saves, but BRMS and i5/OS® divide the libraries into the save streams rather than the user. To use parallel-serial saves successfully, use BRMS for the save, and use a special recovery technique to get multiple drives working on the restore simultaneously.
Referential Constraints When a database has referential constraints, there are some scenarios where multi-streamed re- stores can fail due to seize issues with the various objects related to the constraint. As one object is being restored, the system has a seize on it. If a related object is restored at the same time, then the second restore will be unable to get the necessary seize on the first object, and the restore will fail.
For customers who have referential constraints in their databases, there are three ways to work around this issue:
(1) Use a single stream recovery rather than a multi-streamed recovery for the libraries that have files involved in the referential constraint
(2) If the related objects are all in a single library, and if that library is saved using a save- while-active command, then i5/OS is able to handle this situation.
(3) If the save is a parallel-parallel save, and the constraints are among objects in a single library, and the recovery is done with the same number of drives as the save, then i5/OS is able to handle the situation.
These considerations for customers with referential constraints are in addition to the general con- siderations described in the sections below.
Concurrent Saves For this technique, the user decides how he will allocate the objects among the tape drives. He then issues one command or job for each drive, and the various drives will run independently from one another, e.g.
Drive 1: SAVLIB LIB (A* B* C* .... H*) Drive 2: SAVLIB LIB (I* J* K* ..... R*) Drive 3: SAVLIB LIB (S* T* U* .....Z*) etc
Some tinkering will be required to make the save streams approximately the same size so they will end at approximately the same time. Care must be taken to ensure that all desired objects are included in the save, including libraries that start with non-alphabetic characters, the Docu- ment Library Objects (DLO), the Integrated File System (IFS), spoolfiles, etc.
Recovery is fairly simple. The sets of tapes can be restored using the same number of drives or fewer drives, by mounting the tapes and issuing the corresponding restore command. The key consideration on the restore is for customers who have their physical/logical files or their jour- nals/receivers in different libraries: they need to ensure that the files are restored in the proper or- der, otherwise an error message will occur, and the missed files will need to be restored separate- ly later. Careful planning of the save streams can usually allow the files to be restored in a single pass, even for customers with physicals/logicals and journals/receivers in different libraries.
Parallel-Parallel Saves Some customers who needed to shorten their backups, could not do so with concurrent saves, be- cause their data included a single large object that made up the bulk of the save. Prior to V4R4, there was no way to split this object across drives, so that one stream became the limiting factor in shortening the backup window. Adding more drives to the save was no help.
To assist these customers, IBM® introduced parallel-parallel saves. For a parallel-parallel save, the user issues a single save command, but asks i5/OS to run it across multiple drives, e.g.,
SAVLIB LIB(biglib) DEV(*MEDDFN) MEDDFN(biglibdfn) (where ”biglib” is the library with the large object, and where the “biglibdfn” media defini- tion indicates multiple drives)
Although it is possible to implement this save using i5/OS commands, it is fairly difficult since media definition objects need to be created using an application programming interface (API) for both the save and the restore. For example, the media definition for the restore tells i5/OS the details of the data on the tapes and the file sequence numbers where the data can be found. Rather than doing this manually, customers are encouraged to use BRMS since it creates the me- dia definition files in the background using the save/restore APIs and the volume and file se- quence information in the BRMS database. Customers simply tell BRMS how many drives they would like to use for the save or restore, and BRMS handles the rest.
There are three key considerations for restoring this type of save:
(1) In order to restore the data, you need access to the BRMS database so the media defini- tion can be created. This is easy for a customer who is restoring an object to a system in the same BRMS network where it was saved, or a customer doing a full system recovery who will re-load his BRMS information as part of the restore. However, it is more diffi- cult to take a parallel-parallel tape to a system that is not in the same BRMS network and try to restore it since the media definition will need to be created manually.
(2) When doing the recovery, i5/OS will restore all parts of the first library from each tape, then go back and restore all parts of the second library from each tape, and so on. If you have the same number of drives on the restore as you had on the save, then i5/OS mounts all the tapes and moves quickly among them and pulls the data off each tape in order. The recovery typically takes 1-2 times as long as the save. However, if the restore is done with fewer drives than the save, then i5/OS needs to mount each tape once for each library that is saved on it. Due to all the tape mounts, searches, rewinds, etc this can take a very long time and is likely not practical.
(3) As discussed above, using the same number of drives for the save and restore will avoid seize problems when there are referential database constraints among objects in a single library. The net of this is that parallel-parallel saves can be used successfully so long as saves are done with BRMS, restored to a system that has access to the BRMS information, and restored using the same number of drives as the save. As with concurrent saves, customers who have their physical / logical files and journals / receivers in different libraries need to plan their strategies so these objects are restored in the proper order
Parallel-Serial Saves Starting at V5R1, it became possible to specify generics (eg ABC*) and special values like *AL- LUSR, *IBM, and BRMS special values like *ALLPROD, *ALLTEST, and *ASP01-*ASP99 on a parallel save. This type of save became known as a parallel-serial save since it was a cross between a parallel save (multiple drives used concurrently in a single job) and a regular (“serial”) save. It was considered parallel since the system decided how to spread the libraries among drives, unlike the concurrent save where the customer decided how to spread the libraries. How- ever, individual libraries are not spread across multiple drives: instead, i5/OS saves each library in its entirety to the next available drive, working from A-Z.
When you ask for a parallel save, BRMS and i5/OS decide whether to do a parallel-parallel or parallel-serial save, depending on the type of objects to be saved as follows:
v single object - parallel-parallel v single library - parallel-parallel v list of libraries - BRMS decides which to do depending on various factors v generic libraries (eg LIB*) - parallel-serial v special values (eg *ALLUSR, *ALLPROD) - parallel-serial
The tapes that result from a parallel-serial save have the libraries jumbled on them alphabetically due to the round-robin algorithm. For example, with a 3-drive save, if all the libraries were ap- proximately the same size, the libraries might be spread something like the following:
Tape 1: A-lib, D-lib, G-lib, etc Tape 2: B-lib, E-lib, H-lib, etc Tape 3: C-lib, F-lib, I-lib, etc
Now as an aside: as mentioned above, programming convention on the System i platform says that customers should try to put their physicals / logicals or their journals / receivers in the same library. In this case, i5/OS is able to restore these related files successfully. For customers who need these objects to be in separate libraries, programming convention says to name the libraries alphabetically so that i5/OS will restore the base files first and the dependent files second when it does a restore in alphabetical order.
When restoring a parallel-serial save, BRMS tries to follow the above convention by restoring the libraries in alphabetical order. People are often surprised to learn that regardless of how many drives they ask BRMS to use, the default restore only reads from a single drive at a time as follows: If a customer asks BRMS to use the same number of drives for the restore as for the save, then the first tape of each set is loaded into each drive. However, i5/OS then moves through the drives one by one, restoring the libraries in alphabetical order, such that only one drive is in use at any given time.
If customers ask BRMS to use a single drive for the restore, then i5/OS mounts the first tape and restores the first library. BRMS then rewinds/dismounts the first tape and mounts/loads the second tape and restores the second library. This continues, cycling through the tapes one by one until all libraries are restored. Not only is the restore being performed on a single drive, but there are also numerous mount/search/rewind/dismount cycles to wait for.
Since a single-drive restore is typically not practical for customers, a work-around was devised whereby multiple restore jobs are submitted to get multiple drives working simultaneously. At V5R3 and lower, customers need to create these streams manually by carefully selecting the items to be restored using either i5/OS or BRMS. At V5R4, new function was added to BRMS in the STRRCYBRM screens to make this simpler. Here is a description of these various tech- niques:
During a recovery, the BRMS STRRCYBRM command brings up a screen showing all the objects that are available for restore. This list is in alphabetical order. There are 3 ways to handle this screen:
(1) If you ask BRMS to go ahead and do the default restore, you will find yourself in one of the situations described above where only a single drive is active on the recovery at a given time.
(2) The workaround for V5R3 and earlier releases is to submit the recovery in multiple sections. This needs to be done very carefully. Start by figuring out all the tapes that are in the first set. Then open the STRRCYBRM screen and go down through the list and select all the items that are on that set of tapes, then submit the command for restore. Then go through the list again, and select the items that are on the second set of tapes, and submit them for restore. Continue until all items have been selected from the list.
(3) Starting at V5R4, the same technique can be used, but BRMS offers assistance in se- lecting the items to be restored in each set. On the STRRCYBRM ACTION(*RESTORE) screen that lists the items for restore, there are two selection fields in the top right corner that let you specify *VOLSET and a volume number prior to using the F16=Select key. All saved items that are on the same set of tapes as that volume are marked with a ‘1’. You then use F9 for recovery defaults, and F4 to submit the restore. You then repeat this for each different save stream until all restores have been submitted.
As with concurrent and parallel-parallel saves, customers who have their physicals / logicals and journals / receivers in different libraries may receive error messages indicating that files could not be restored since their base file was not yet on the system. Operators need to check for these messages and go back and restore the missing files later. Alternatively, it may be possible to plan the save strategy to minimize these concerns, using one of the following techniques: vDesign the save strategy so all traditional libraries are in a single stream, then use the other streams for other items like DLO, IFS, etc. That way the traditional library stream will be on a single drive and will restore in alphabetical order without dependencies between drives vDesign the streams so one stream saves all the dependent files (e.g. logical files and journal re- ceivers). During the recovery, restore that stream last, after all the other streams are already back on the system
The net of this is that parallel-serial saves can be used successfully so long as the special recov- ery technique is used to get multiple drives running simultaneously during the recovery, and the restore order of physical / logical files and journals / receivers is handled.
Note that parallel-serial saves are very similar to concurrent saves, except that i5/OS carves the backup into multiple streams and attempts to have them all finish at the same time based on the round robin algorithm. For customers with data that varies from day to day such that it would be difficult to pre-plan concurrent streams, then parallel-serial saves offer a benefit. However, cus- tomers who are able to design their own concurrent streams, may find that concurrent saves are a better strategy due to the simpler recovery prior to V5R4 and less overhead on the system.
Conclusion Concurrent and parallel saves offer customers options to shorten their backup window. If cus- tomers understand the advantages and considerations for each, they can plan a save/restore strat- egy that will be a good fit for their organization. As always, be sure to plan a recovery test to practice the restore procedures.
The following are trademarks of International Business Machines corporation in the United States, other countries, or both: AS400, i5/OS, IBM, System i.
Many thanks to Debbie Saugen, David Bhaskaran, Mervyn Venter, Leonard Koser, Ritchie Nyland, Dan Amundson, Don Halley, Paul Koeller, Donna Fry and Bob Gintowt for their assistance with this article.