The Recovery and Preservation of Critical Exploration Datasets for a Large Multinational Oil Company

Guy C. Holmes – BSc, MBA Chief Executive Officer SpectrumData Suite 1, 14 Brodie Hall Drive, BENTLEY WA 6102

[email protected]

Introduction

In February of 2002, a large multinational oil company requested that a project be undertaken to consolidate, and in many cases reconstruct, a large dataset consisting of approximately 80,000 original magnetic tapes of various ages, formats, media types, and condition. The collection contained data acquired during 30 years of oil and gas exploration in over 50 different countries. The project requirements were unique for a number of reasons. The most interesting and challenging of which was that this was the second attempt at performing the project for the company due to the failure of a first attempt by another party. This failed attempt left portions of the data in jeopardy of being permanently lost, corrupted or disassociated from their invaluable metadata. The project involved reading the tapes, consolidating the data into logical data sets, converting the various data types to an industry standard format, and outputting the data to a new set of high density data cartridges in triplicate. The vast majority of data in this collection was in the form of seismic survey data which is the principal exploration methodology used in oil and gas exploration. The tape collection consisted of the following tape types: − 9 track reel to reel tape − 3480 cartridge − 3490E cartridge − 8mm Helical Scan Cartridge − 4mm DDS DAT Cartridges − (DLT) − A variety of smaller, less known media types including DC2120’s, DC6150, and 7 track magnetic tapes. The Consequences of Removing or Modifying Blocking Structures From Files

As this project was already attempted once by another party the first essential element of the task was to isolate exactly what was done prior to our involvement in the project. An initial review of the data found that most of the low density tapes that needed to be read were severely damaged and deteriorated. In most cases the tapes that had not been converted in the previously failed project represented small portions of a larger dataset that had been successfully copied to higher density media. As an example, portions of a data set that may have previously been recorded on 800 9 track tapes, were now on 10 DLT IV cartridges with the exception of 40 of the original 9 tracks that had not been read due to deterioration or damage. The higher density DLT IV cartridges created in the previous project were not a one to one identical copy of the original 9 track tapes. Instead each DLT IV cartridge contained many individual 9 track tapes, written to DLT IV in an altered de-blocked format, with only a file mark between the end of one original 9 track tape and the start of the next. To fully appreciate the complexity of this restoration and migration project, one needs to have a basic understanding of the underlying structure of data when it is stored on magnetic tape. Magnetic tape is a linear recording medium. When reading a linear magnetic tape, locating a specific record requires reading or passing over every record recorded on the tape before it. To read data from tape, a may have to read through almost the entire spool of tape before it can read the record requested by the user. As an example, to get to the fifth record on a tape a user must read the first four records before it can read the fifth. To write data to tape, the tape drive writes sequentially, one record after another along the length of the tape. Data cannot be written to linear tape in any random location without the risk of overwriting existing data. In order for all pre-existing data on a tape to remain, data must be written at the end of the existing data sets. Tape drives write data to tape in blocks. Each block consists of a number of and typically the software controlling the tape drive determines how many bytes per block it will write. These blocks are separated by inter-record gaps (effectively blank tape). A group of blocks written on a tape followed by a marker called a file mark constitute a logical file on a tape. Tape drives use these file marks and inter-record gaps to seek to particular locations on the tape for specific data. More than one logical file can be written to a tape and each may contain many physical files. Logical files on tape contain at least one block of data but typically contain many hundreds or thousands of blocks. In most cases software being used to read data on tape will require that the data match a defined file and blocking structure for the data to be successfully read and interpreted. To further appreciate the complexity of this project, it is important to understand how even the smallest modification to the blocking structure of a specified data format can directly affect the ability of software to interpret the data. As this project required the conversion of a vast amount of seismic data, I have chosen to use a highly specified format of seismic data known as SEGB to further demonstrate that a small change in blocking structure can have a very large impact on data integrity.

Field Seismic Recording

Exploration companies use the seismic method to explore for oil as their primary means of geo-scientific investigation. A seismic survey essentially consists of a seismograph, an array of seismic receivers known as geophones, and a synthetic source of seismic energy. This synthetic seismic energy, when released, travels through the different layers of the earth and eventually is reflected back to the surface. The time it takes for the energy to reach the surface and the wavelength of the returning seismic energy is measured by the geophones. For each burst of seismic data a seismic shot record is created and is written to tape as a single file. This seismic shot file is typically a multiplexed file and is generally written to tape as either one or two blocks of data per logical file. As discussed in the introduction of this paper, many of the tapes received for this project were duplicates where the data had been copied from many original 9 track tapes to a single new DLT IV cartridge. Because the capacity of a DLT IV cartridge is much greater than that of the original 9 track tapes, it was not uncommon to find that several hundred original 9 track tapes had been copied onto a single DLT IV cartridge. One of the critical issues created by the transfer of these original 9 track tapes to DLT IV cartridge during the first failed attempt at this project is that all of the original file and blocking structure stored on the 9 track tapes was not transferred to the new DLT IV cartridges. Essentially, data from a single 9 track tape consisting of many files, where each file contained many blocks, was transferred into a single file on a new tape with a different block structure. The removal of the original blocking and file structure from this data during the previous attempt at this project created some interesting and challenging technical issues. Firstly, true preservation of the data required that the data first be returned to its original recording format including all vital file and blocking structures. This would then allow for each seismic shot to be identified, validated and preserved prior to any conversion or migration processes being applied. For most SEGB seismic field data, the first block of a SEGB shot file is referred to as a “header” block, and the second block the “data” block. Most software applications that read field seismic data, require that the header block be correctly formatted and a specific number of bytes in length. The header block often contains vital information about the data block that follows it on the linear tape and in most cases a data block in isolation (without a header block) can not be interpreted by software. The length in bytes of a SEGB header or data block may vary from one shot file to another. As the data was binary and had lost its original blocking structure during copying, the resulting file was a stream of bytes that no longer contained the vital blocking structures to delineate one shot from another, or one header from another. Instead of 100 seismic shot files, each 960,240 bytes long (consisting of a 240 header block followed by a 960,000 byte data block), a new file of 96,024,000 bytes (100 x 960,240 byte original files concatenated together) had been created on tape. This new file was also written to tape with a block length of 10240 bytes. The original blocking structure of the data was now lost and what was once only two blocks per file had now become a single logical file of over 9,000 blocks. See figure 1.

Figure 1 – Blocking Structure Changes Through Migration Process

To conventional seismic software, this resulting new data structure would have been completely un-interpretable as there is a high degree of dependency between the interpretation of data from tape by software and the blocking structure of the data itself. SpectrumData was able to develop complex software routines that navigated the new blocking and file structure of the data and converted it back to its original format. The conversion was done to hard disk as a virtual tape set which could be interpreted by complex seismic conversion software applications. As described in the introduction, there were tapes that were never copied to DLT IV due to their advanced state of deterioration. Because these tapes had not been copied, their file and blocking structure had been retained. Ironically, the most valuable source of information regarding file and blocking structure used to salvage the bulk of data from DLT IV came from the most seriously deteriorated and damaged tapes in the collection. In essence, had the previous attempt at this project been successful, little to no information would have been available to reconstruct the data. It was only the fact that the project failed the first time, that it could have ended up as a success the second. Upon completion of the re-blocking and reconstruction of these files, SpectrumData then had to focus its attention to the damaged and deteriorated originals. Below is a description of stiction and how it affected these tapes.

Stiction

Stiction refers to a condition whereby a magnetic tape sticks to the tape drive head as it passes over the head and through the tape transport. The term stiction is a combination of the words sticky and friction (sticky friction) coined in the 1970’s when the condition was first seen. Since that time, the meaning of the term stiction has changed to include a similar conditions on both tapes and hard disks and is now seen as a combination of the words Static (rather than sticky) Friction. For hard disks, stiction refers to a condition where the read/write heads of a hard drive become stuck to the disk platters preventing the platters from spinning. When used in reference to magnetic tapes today, stiction means the same as it did in the 1970’s with the only change being that the cause of stiction is better known. Stiction on tape refers to a condition related to the degradation of the binder layer of the tape itself. This degradation causes chemical changes that result in the layers of tape wound on a spool to stick together. In addition, the tape will also stick to the tape transport mechanisms and the read write head of the tape drive. Magnetic tape is generally a polyester backing tape with a magnetic oxide coating held in place with a binding agent. The binding agent not only holds the oxide to the polyester tape backing, but also provides lubricants to the surface of the tape. This lubricant prevents sticking of the tape to the tape heads and to the layers of tape that are in contact with each other. Over time, the binding agent becomes hydroscopic and absorbs moisture. This moisture absorption causes the oxide layer to lift from the tape or for the oxide layer to become sticky. Virtually all binding agents absorb moisture over time, but some appear to do so more rapidly and to a greater degree than others. As this absorption occurs, chemical changes take place which tend to soften the binder. In addition to becoming softer, the binder also expands and becomes sticky. When this expansion occurs on the many thousands of layers of tape that are wrapped around a spool on a reel to reel or cartridge housed data tape, the layers of tape can succumb to the pressure and will begin to stick to each other. As pressures build up, the softened components within the binder act as an adhesive instead of a lubricant and has the tendency to glue consecutive layers of the tape together. When a stiction affected tape is read on a tape drive, damage can occur in several ways: 1. Peeling as tape is removed from the reel where portions of the binder oxide layer are removed from one layer of tape and redeposited on the next. 2. Gouging of the softened binder either by the tape head, cleaning blades and/or other transport components. 3. Destruction of the tape caused by jamming. 4. Head adhesions (the binder melts and adheres the tape to the tape head).

Two examples of tape damage encountered on 1986 seismic field tapes recorded on Memorex MRX V media.

The symptoms associated with reading a stiction affected tape are: − Increase in audible noise (often a grinding sound or high pitched squeal). − 'Peeling' of tape from input reel instead of smooth release. − Erratic movement through tape path and across the tape head. − Head adhesion when tape is stopped in contact with tape head. When a stiction affected tape is read on a high speed drive or cleaned or exercised, damage to the tape will almost invariably ensue – in many cases this damage can be significant with large sections of tape being destroyed and the data from these portions of tape lost forever. Preservation of tapes affected by stiction can be achieved but only through migrating the data from these tapes to new media. Stiction can often be reversed to a sufficient degree that data can be recovered from tapes but this reversal is not permanent and tends to only last for 14 days on average. SpectrumData uses proprietary low humidity tape ovens to perform this reversal which involves heating the tapes over a period of 24 to 48 hours at approximately 55 deg. C. This process hardens the binder and, in many cases, will provide a window of opportunity during which data recovery can be performed. Magnetic tapes that have been treated for stiction need to be read generally within 3 days of treatment or the tape will revert back to its former condition by absorbing moisture from the atmosphere. Certain brands of tape respond very well to this treatment, others are largely unaffected by the treatment and remain delicate and difficult to recover data from without specialised and time consuming handling. By employing this recovery process, SpectrumData was able to recover 98% of the original data from the highly deteriorated and damaged media. Once recovered and preserved, the data had to be merged back with the DLT IV data described above.

Lessons Learned

Firstly and most significantly, it should be realised by all that the migration of data from one tape to another requires careful consideration and planning. A change in length of a single block on tape of even one byte can significantly affect ones ability to access the data in the future. Secondly, as a general rule, legacy magnetic media can be highly susceptible to deterioration. Experience over the last 20 years of tape data recovery has shown that poor storage conditions, as well as the manufacturer and brand of tape are the most significant contributors to determining a tapes lifespan. As a general rule, use tapes with back coating technology such as Imation or 3M Blackwatch. These tapes tend to not be affected by the absorption of moisture to the same extent that non-back coated tapes are. Lastly, metadata such as original blocking structures and format specifications are as important in a recovery project as the data itself. The migration of bytes of data to new media does not in itself constitute preservation and in many cases can render invaluable data more vulnerable.

Acknowledgements

SpectrumData was previously a division of Encom Technology based in Sydney Australia. I would like to acknowledge Dave Pratt and in particular Ian Grierson for their contribution over the years to the pioneering and development of our understanding of tape deterioration and the recovery process, especially in the areas of stiction and stiction reversal treatment.

References

1. Northwood, E. J., Weisinger, R. C., and Bradley, J. J., 1967, Recommended standards for digital tape formats: Geophysics, v. 32, p. 1073-1084. 2. Grierson, I., Holmes, G. C., Tape and Data Recovery – Protecting Your Investment. October, 1999.