Veteran storage managers who have been successfully protecting data for years, if not decades, sometimes get tripped up by the peculiarities of virtual environments. According to Kroll Ontrack, 40% of enterprises lose information every year from virtual environments, and about two thirds find that they are not able to recover all their virtual data in the event of a disaster.
This goes against the grain of popular perception. Most believe that storing data in virtual environments decreases the risk of data loss. Here are some tips on how to better protect, backup and recover data in virtual environments:
1) Understand the Difference in Environment
Those more used to physical data protection should first come to a firm understanding of how the virtual environment is different. Because of higher consolidation of servers in virtual environments, data loss due to a single server failure may be much greater.
“A hardware failure for a physical server would result in one server going down, while in virtual environments, hardware failure of a host would result in all virtual machines (VMs) going down and the data stored within them, on the host, being lost,” said Sergey Kandaurov, Director of Product Management at Acronis.
2) Protect the Hypervisor
Further, virtualization adds one more layer that needs to be protected – the hypervisor. Downtime in a virtualized environment may be caused not only by server failure, but also by problems with the virtualization infrastructure, such as hypervisor failure. Thus, the hypervisor has to be safeguarded, too.
“Products that are not specifically designed for the hypervisor you’re using won’t work efficiently, and might even do harm,” said Kandaurov. “Ensure that the backup solution you choose is capable of backing up a hypervisor with its configuration and allows easy and fast recovery.”
3) Don’t Rely on the Virtual Platform
Today’s virtual platforms come with all sorts of bells and whistles. This includes their own data protection and recovery mechanisms. But users would be wise to not only rely on them.
“Don’t think that clustering and high availability mechanisms provided by virtualization platforms like vSphere and Hyper-V reliably protect you from data loss,” said Kandaurov. “They help to minimize downtime in some cases, but they’re not replacements for a proper backup strategy.”
4) Monitor the Backup Performance Hit
In the physical world, you usually have one application per server, and that server is likely not running anywhere near capacity. In the virtual world, there generally are way more than that, and the host they're all running on is much closer to capacity. This can exert a serious impact on performance.
“That excess capacity can be used to run the backup without impacting the performance of the application that much, but if the excess capacity does not exist, then that's an issue,” said Eric Burgener, an analyst at IDC.
“Our data today shows that on average each host has about 10 Virtual Machines (VMs), and that is going to a little more than 12 by 2017. If you try to back all of them up at once, you're going to impact performance on probably all of them, so you have to come up with a way to still perform the backups in a timely manner without unduly impacting performance.”
5) Non-Sequential Backups
That’s why the industry is moving away from sequential, file-based backups in virtual computing because they are too demanding in terms of CPU and network resources during the backup window and they take too long. Users are looking at other approaches that minimize the opportunity to impact the performance of production applications. A popular one is off-host snapshot backups, added Burgener. With this approach, you create a snapshot of each VM then mount each snapshot on a backup appliance or server, and back up the snapshot.
“While this can still have a bit of an impact, it is far less than backing up straight from a server,” said Burgener
6) Find the Right API
There are Application Programming Interfaces (APIs) that administrators can use to create snapshot backups - VMware offers VMware APIs for Data Protection (VADP), Microsoft offers Windows Volume Shadowcopy Services (VSS), Oracle offers Recovery Manager (RMAN), for example. All major backup products (appliances and software) can take advantage of these APIs and use them to create snapshots they can back up off-host.
7) Don’t Back up Everything, Every Time
The old school way of conducting backups was to do a full backup on a regular basis – once a day, once a week or once a month. That resulted in storage rooms full of backup tapes which largely contained duplicative data.
Changed block tracking is a way to minimize the amount of data that has to be transferred to complete a backup. For each object you're backing up, it knows what blocks have changed since the last backup and only sends those. On the target side, it can integrate that data with the prior backup to create the "new" backup. This minimizes time and bandwidth necessary to complete backups. It is a similar idea to old school incremental backups – just back up the data that changed since the last backup.
“Everybody should be doing this kind of backup; nobody should be backing up all the data every time,” said Burgener. “You've already got the data that didn't change, which could easily be 80 - 90% of the data since the last backup, so why back it up over and over again?”
8) Be Aware of Special Cases
With so much virtualization around, it’s easy to forget that there are still a couple of physical servers involved, at the very least. Even more likely, you have several VMs that cannot be backed up on a hypervisor level (such as VMs with RDM disks).
“In a worst-case scenario, you’re forced to use another backup solution for such special cases,” said Kandaurov. “Choose a backup product that can also install an agent inside your machines.”
9) Remember Virtual Settings
Many users do quite well at protecting their data. But when a serious incident occurs, they can still be left with egg on their hard drives. Reason: they didn’t remember to set up some kind of mechanism to restore their vast array of virtualization settings.
“In addition to making sure your applications and data are protected, also protect your virtualization environment and settings,” said Greg Schulz, an analyst with StorageIO Group.
10) Don’t Forget the Basics There are certain basics that hold true between physical or virtual environments. All applications have RPO (recovery point objective) and RTO (recovery time objective) requirements. RPO defines how much data you're willing to lose. Or another way to think of it is how often you need to create a copy from which to recover. RTO defines how fast you have to recover. An application with an RTO of 5 minutes means it has to be back up and running 5 minutes after it failed, but the RPO could vary, an RTO of 5 min does not necessarily mean an RPO of 5 min.“You have to make a decision about which technologies (file-based backup, continuous data protection, replication, off host snapshot backup, etc.) are appropriate to your RPO/RTO,” said Burgener.
Photo courtesy of Shutterstock.