Data replication plays a key part in many organizations' disaster recovery strategies. There are a number of ways of achieving it, including the following:
- Array-based replication
- Host-based replication
- Replication of a specific application — usually a database
Many larger enterprises use array-based replication, but its appeal to small and medium-sized businesses (SMBs) is limited because it tends to be very expensive. That's because of:
- High upfront hardware costs: array-based replication generally requires a homogenous storage environment at both the primary and secondary storage sites (two identical and costly storage arrays.)
- High array replication software licensing costs: it's certainly the case that license fees for array-based replication software have come down over time — and some vendors package it in the cost of the array so that it appears free — but it can still represent a significant extra cost. Vendors themselves aren't even consistent in their approach between storage lines. For example, IBM's XIV includes both synchronous and asynchronous replication in the base price of the hardware, but for other products you could end up paying thousands of dollars per terabyte.
- Extra hardware costs: arrays replicate over Fibre Chanel, which means it may be necessary to buy costly extra hardware to replicate over an IP network to a remote site.
Host-Based Replication — a Lower-Cost Alternative for SMBs
Many SMBs use host-based replication because it is relatively inexpensive. The process involves installing a replication agent onto the operating systems of the servers to be replicated. This agent processes and replicates I/O traffic on any storage systems (NAS, DAS, SAN and so on) to a secondary replication target system, which use storage of any type, from any vendor.
It saves money compared to array-based replication because licensing host-based replication software is much less expensive than for most array-based replication systems. Also, there's no need to go to the expense of purchasing a second storage array that's identical to the primary one.
"If you can buy one nice expensive array at your primary site and get away with a much cheaper one at your secondary site, that really does make data replication a whole lot less expensive," says Rachel Dines, a senior analyst at Forrester. It is in effect a form of storage virtualization to the extent that secondary storage is just that: storage, which can be used irrespective of the manufacturer of the primary storage array.
In fact host-based replication can allow SMBs to get away without buying a secondary array at all. That's due to this storage virtualization effect. Once you are free to replicate to any storage hardware using host-based replication, you can replicate to storage resources hosted by a cloud storage provider, as long as the provider offers the same host-based replication software at the cloud end. "Host-based replication is a big enabler of cloud based disaster recovery," Dines adds.
One final benefit is that with host-based replication you can get continuous data protection (CDP) capabilities, says Dines. "If you have a corrupted file on the primary site, it will get replicated to the second. With CDP you can roll back to an earlier version, and with most array-based replication systems that's simply not possible," she says.
Many SMBs use agent-based backup systems, and for that reason many SMB backup solutions offer host-based replication as an add-on, Dine says. "The future is almost certainly going to see host-based replication moving in to backup suites. At the very least that's where it will be controlled and managed," she says.
The Drawbacks of Host-Based Replication
So why don't large enterprises use host based replication more widely? There are two main reasons, according Dines.
The first is simply that host based replication is less scalable. A host-based replication agent runs of the operating system of each server and this takes up considerable resources. Some vendors talk of an overhead of about 3 percent or so, but Dines says that 10 percent to 20 percent is a more realistic figure. Whatever the true figure, when it's multiplied by a large number of servers, there's potentially a huge amount of computing power being "wasted." That waste be reclaimed by using array-based replication, which carries out the underlying processing in the array hardware itself, not on the host's processor.
Storage vendor tactics also play a role, Dines believes: array makers sometimes behave a little like drug dealers. "Storage vendors have a huge amount of power in the enterprise," she says. "Even if host-based replication were to become attractive to large enterprises, the array vendors would be sure to respond by cutting prices. Occasionally you hear of array vendors throwing in a secondary array for free to get customers hooked on the two array mode. Then when it comes to replacing them, they'll be sure to buy two more."