Beginners Guide to Data Corruption and How to Avoid It

Beginners Guide to Data Corruption and How to Avoid It © 2019 Veeam Software. Confidential information. All rights reserved. All trademarks are the property of their respective owners. Beginners Guide to Corruption and How to Avoid It Contents “But the backup was successful!” . 3 Types of corruption . 4 Failed to decompress LZ4 block (and similar) .....................................................................4 All instances of storage metadata are corrupted ..................................................................5 Internal VM issues .............................................................................................5 Misconfigurations .............................................................................................6 Tools and tips . 7 3-2-1 backup strategy ..........................................................................................7 SureBackup ...................................................................................................7 Health Check ..................................................................................................7 Veeam Validator ...............................................................................................8 Recommended job settings ....................................................................................8 What about the Agents? . 12 Conclusion . 12 About Veeam Software . 13 © 2019 Veeam Software. Confidential information. All rights reserved. All trademarks are the property of their respective owners. 1 Beginners Guide to Corruption and How to Avoid It When choosing a backup solution, one of the most important decision factors is reliability. Understandably, when it comes to restores, backup administrators expect not only flexibility from the software, but also a guarantee that data can be restored. Given how multifaceted and nuanced the topic of data corruption is, it is safe to say that if a vendor guarantees 100% reliability of backups they are not telling the whole truth. Still, many backup administrators simply assume that backups are fail safe. As a result, a situation where data cannot be restored from a backup can come as a huge shock and is often seen as the backup vendor failing to provide what was promised. In reality, there are different kinds of data corruption that have different causes and it is a misconception to put the blame purely on backup software (as you will see in this white paper, Veeam® Backup & Replication™ cannot be blamed for any type of corruption described. If it were, the Veeam team would have fixed it long time ago). On the bright side, many backup providers, including us here at Veeam, provide a number of tools that can help reduce the risk of running into an unrecoverable backup. My hands-on experience in Veeam Support puts me at the center of some of these situations. In this white paper, I will examine different corruption types and provide advice on the countermeasures, based experience from working with customers on various types of Veeam Backup & Replication infrastructures. For administrators thinking of buying Veeam Backup & Replication and testing out our trial, I hope that this white paper achieves two goals: 1. Set the expectations straight on what Veeam offers and what we cannot promise, if we want to be honest. 2. Show that Veeam Backup & Replication has all the tools that, if used right, can make data loss a very unlikely event. For existing customers, I encourage you to read the white paper to understand the potential risks and review your Veeam Backup & Replication setups to make sure you are using the product to its maximum potential. Disclaimer Much of the guidance in this content comes from first-hand experience working with support cases. This white paper is not intended as a definitive guide, as it is not possible to cover all possible situations. New threats might also arrive in the future. If you have a potential corruption issue it is always advised to open a Veeam support case to do an analysis and resolve this matter correctly. © 2019 Veeam Software. Confidential information. All rights reserved. All trademarks are the property of their respective owners. 2 Beginners Guide to Corruption and How to Avoid It “But the backup was successful!” This is a phrase that support teams sometimes hear from clients when we must give the bad news. For us, support engineers, this means one thing — a fundamental misunderstanding of backup process and what Veeam Backup & Replication, offers as a product. It is a big mistake to shift responsibility for hardware, operating system and application health from proper monitoring tools to Veeam Backup & Replication. Admittingly, sometimes Veeam Backup & Replication does seem to have such capabilities. It requires many components to work properly and uses many third-party APIs, so in my support practice I’ve heard countless cases where errors in Veeam Backup & Replication revealed underlying infrastructural issues clients did not realize were there. However, this is no more than a positive side effect. Before we go into discussing corruption types more deeply, as well as related countermeasures, it’s important to highlight some fundamental principles which already might help to reveal potential risks for backup corruption. The main point is this: Veeam Backup & Replication does an image-level backup of a VM and saves this information to a backup file. If the VM contains corrupted data (for example, one of its volumes became a raw space), it will appear like that in the backup. If VM was configured incorrectly (for example, using independent disk or physical RDM), that will translate into data missing in the backup file. If something happens to the backup file (due to storage problems, virus attacks or manual deletion), this backup will not be usable anymore. All these examples may seem very obvious, but they describe some of the complaints that we get in our everyday support practice! So be your own counsellor, be wary of the quality of data that you are backing up and try to look at the core of the potential issue within your setup first. © 2019 Veeam Software. Confidential information. All rights reserved. All trademarks are the property of their respective owners. 3 Beginners Guide to Corruption and How to Avoid It Types of corruption In this section we will examine the most common situations which may lead to inability to restore data from backups. Failed to decompress LZ4 block (and similar) How to reveal corruption: SureBackup®, Health Check, Veeam Validator, attempt to decompress corrupted block. KB on topic: https://www.veeam.com/kb1795 Data inside a backup file (.VBK, .VIB, .VRB) is stored in compressed blocks. A block can be saved incorrectly due to underlying issues with the storage. I will pass the mic to our Senior VP, Anton Gostev, who described it in the following manner: In human language, the issues look like: 1) We ask storage to write “MOM,” but it writes “DAD” instead and returns success. 2) We ask storage to write “MOM,” and it writes “MOM” and returns success, but if you try to read the data block, you get “MAM.” 3) We ask storage to commit the write of “MOM” to disks, and it returns success, but does not actually write data to disks, keeping it in buffer for a short period of time for performance optimization purposes. Answering your question, we can only judge on these reported successes, so we mark the job as successful. This is why, it is very important to use SureBackup to verify that what was written into the backup file is what we asked, especially once you get this error at least once and your backup storage becomes a suspect. Even if data is saved correctly, there is still a risk that it can eventually be corrupted (an issue known as “bit rot”). No storage vendor can guarantee absolute data integrity and it is more of the question of number of errors per amount of data. Our only recommendation is to stay away from cheap low-end NAS devices that use dubious optimization techniques to show better performance and suffer every now and then from bugs in firmware that can result in data corruption. Once again, Anton Gostev said it all years ago on the Veeam forum. If couple of 0s and 1s inside a compressed block get swapped, an attempt to decompress, the block (typically during restore) will fail. There is both good and bad news here. The good news is that backup is still restorable. Veeam support can provide special modified agents that allow you to skip the corrupted blocks. So, if corruption was minimal, you have a very high chance of restoring your data. The bad news, however, is that such corruption can be hard to discover. Most operations in Veeam Backup & Replication do not require blocks decompression. A corrupted block can travel from one backup file to another through merges, synthetic fulls or backup copy and not be discovered. The countermeasure here is regular backup verification and some tricks with job settings, both of which we’ll discuss later in this white paper. Note that the error message can be different, depending on what part of the backup file suffered from corruption. It is impossible to describe every error here, so be sure to open a support ticket if you are experiencing issues, as Veeam support may be able to help you. © 2019 Veeam Software. Confidential information. All rights reserved. All trademarks are the property of their respective owners. 4 Beginners Guide to Corruption and How to Avoid It All instances of storage metadata are corrupted How to reveal corruption:

Load more