Data replication plays a key part in many organizations' disaster recovery strategies. There are a number of ways of achieving it, including the following:
- Array-based replication
- Host-based replication
- Replication of a specific application — usually a database
Many larger enterprises use array-based replication, but its appeal to small and medium-sized businesses (SMBs) is limited because it tends to be very expensive. That's because of:
- High upfront hardware costs: array-based replication generally requires a homogenous storage environment at both the primary and secondary storage sites (two identical and costly storage arrays.)
- High array replication software licensing costs: it's certainly the case that license fees for array-based replication software have come down over time — and some vendors package it in the cost of the array so that it appears free — but it can still represent a significant extra cost. Vendors themselves aren't even consistent in their approach between storage lines. For example, IBM's XIV includes both synchronous and asynchronous replication in the base price of the hardware, but for other products you could end up paying thousands of dollars per terabyte.
- Extra hardware costs: arrays replicate over Fibre Chanel, which means it may be necessary to buy costly extra hardware to replicate over an IP network to a remote site.
Host-Based Replication — a Lower-Cost Alternative for SMBs
Many SMBs use host-based replication because it is relatively inexpensive. The process involves installing a replication agent onto the operating systems of the servers to be replicated. This agent processes and replicates I/O traffic on any storage systems (NAS, DAS, SAN and so on) to a secondary replication target system, which use storage of any type, from any vendor.
It saves money compared to array-based replication because licensing host-based replication software is much less expensive than for most array-based replication systems. Also, there's no need to go to the expense of purchasing a second storage array that's identical to the primary one.
"If you can buy one nice expensive array at your primary site and get away with a much cheaper one at your secondary site, that really does make data replication a whole lot less expensive," says Rachel Dines, a senior analyst at Forrester. It is in effect a form of storage virtualization to the extent that secondary storage is just that: storage, which can be used irrespective of the manufacturer of the primary storage array.
In fact host-based replication can allow SMBs to get away without buying a secondary array at all. That's due to this storage virtualization effect. Once you are free to replicate to any storage hardware using host-based replication, you can replicate to storage resources hosted by a cloud storage provider, as long as the provider offers the same host-based replication software at the cloud end. "Host-based replication is a big enabler of cloud based disaster recovery," Dines adds.
One final benefit is that with host-based replication you can get continuous data protection (CDP) capabilities, says Dines. "If you have a corrupted file on the primary site, it will get replicated to the second. With CDP you can roll back to an earlier version, and with most array-based replication systems that's simply not possible," she says.
Many SMBs use agent-based backup systems, and for that reason many SMB backup solutions offer host-based replication as an add-on, Dine says. "The future is almost certainly going to see host-based replication moving in to backup suites. At the very least that's where it will be controlled and managed," she says.
The Drawbacks of Host-Based Replication
So why don't large enterprises use host based replication more widely? There are two main reasons, according Dines.
The first is simply that host based replication is less scalable. A host-based replication agent runs of the operating system of each server and this takes up considerable resources. Some vendors talk of an overhead of about 3 percent or so, but Dines says that 10 percent to 20 percent is a more realistic figure. Whatever the true figure, when it's multiplied by a large number of servers, there's potentially a huge amount of computing power being "wasted." That waste be reclaimed by using array-based replication, which carries out the underlying processing in the array hardware itself, not on the host's processor.
Storage vendor tactics also play a role, Dines believes: array makers sometimes behave a little like drug dealers. "Storage vendors have a huge amount of power in the enterprise," she says. "Even if host-based replication were to become attractive to large enterprises, the array vendors would be sure to respond by cutting prices. Occasionally you hear of array vendors throwing in a secondary array for free to get customers hooked on the two array mode. Then when it comes to replacing them, they'll be sure to buy two more."
Smaller and medium-sized businesses are less likely to be offered these sorts of deals on arrays to get them to adopt array-based replication, but there are a couple of reasons why host-based replication may not be for them.
First, there's the overhead issue, which becomes particularly acute if the servers in question are already highly utilized.
Host-based replication is also not suited to transactional databases — native database replication tools provide a much better solution.
There's also the problem that host-based replication agents run on top of operating systems, and the majority of products are targeted at Windows environments. But some vendors do offer solutions for Linux environments as well — notably Vision Solutions' Double-Take and SIOS Technology's SteelEye DataKeeper.
These problems are real, but they don't apply to the majority of SMBs. That explains why host-based replication is so widely adopted in this market, Dines says.
There is one case where host-based replication appeals to larger enterprises as well as SMBs. That's where the server infrastructure is largely virtualized, and the replication agent sits in the hypervisor. (There's a good argument for saying that strictly speaking this is hypervisor-based replication, something distinct from host-based replication. But since the benefits — lower costs, the ability to use heterogeneous hardware and so on — are the same, we will treat it as a form of host-based replication for the purposes of this article.)
One reason is that hypervisor-based replication, unlike conventional host based replication, is highly scalable because the resource overhead for hypervisor-based replication is much lower.
And hypervisor-based replication can be carried out easily using "native" systems — VMware includes replication in Essentials Plus, Standard, Enterprise and Enterprise Plus versions of vSphere, while Microsoft includes Hyper-V Replica in all versions of Windows that include Hyper-V. And it can be managed from virtualization management suites like VMware's vCenter or Microsoft's System Center. Specialist vendor Zerto also provides hypervisor based replication for VMware VMS which can be managed through vCenter.
Since hypervisor-based replication is "VM-aware," it is possible to select the VMs that need to be replicated, while saving storage space at the secondary site by avoiding replicating the ones that don't.
By contrast, array-based replication can generally only replicate an entire volume. That means that, desirable or not, it's necessary to replicate every VM on that volume. It also means that there has to be enough capacity at the remote site to accommodate that volume, even though replicas of some of those VMs may not be needed. It's possible to get around this by creating different volumes for different groups of VMs — those that need to be replicated, and those that don't. But that makes storage management very much more complex.
"Hypervisor-based replication allows you to be much more granular in what you protect, and it also allows you to group VMs by defining protection groups," says Dine. Protection groups may contain all the VMS that make up one application, and by grouping them they can be protected and recovered together, providing complete consistency regardless of physical location on servers and storage. Hypervisor-based replication also makes recovery easier because it provides control over the order that systems are recovered, Dine adds.