Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
The two most common methods of data replication are host-based mirroring and RAID-based mirroring.
Each method has its advantages and disadvantages. One of the big issues with any type of replication is whether it is synchronous or asynchronous and what is the latency. I often quote John Mashey, a famous computer architect, who once said, "Money can buy you bandwidth, but latency is forever."
A number of software products allow you to mirror your data from the host. This method is often easy to configure and manage, but the downside is that this uses host resources such as CPU, memory, and I/O bandwidth. Another drawback is that you must have the software on each host that is to be mirrored. One of the areas that concerns me about this method is latency with the mirror.
There are two types of mirroring, just as there are two types of I/O from applications:
- Synchronous — Control is not given back to the application until the data is either on the RAID controller or on the disk, so until a SCSI acknowledgement is received, you cannot issue the next I/O request.
- Asynchronous — Control is returned to the application as soon as the I/O is issued to the operating system.
Even if you turn on asynchronous mirroring, if you are doing a great deal of I/O, you need to ensure that you have the bandwidth to the mirror, and remember that bandwidth is only part of the issue. If you are doing synchronous mirroring, you must calculate the expected latency and predetermine the expected slowdown of your application.
Even if you are doing synchronous mirroring, there are two potential types of I/O:
- The host receives a reply back when the I/O gets to the other RAID.
- The host receives a reply back when the I/O gets to the disk itself.
The key to success with host-based mirroring is to understand how much data you are going to write from that host and over what period of time the data will be written. Taking statistics over a 24 hour period is a good way to understand aggregate bandwidth requirements. The concern is what happens if most of that data get written when people get to work, just before lunch, and when they leave for home — you are going to have some major slowdowns and performance issues for your applications, because you cannot meet the service objectives. On the other hand, if the data is a constant load throughout the day, then aggregate numbers are reasonable for the analysis.
Applications from the major RAID hardware vendors (EMC, HDS, IBM, LSI and others) allow you to replicate data at an off-site disaster location. This location could be 10 KM away or 10,000 KM. Of course, you also have to buy the network hardware and bandwidth, which presents some challenges. The two biggest options are dark fiber and Fibre Channel or a TCP/IP-based solution over SONET.
Whatever your choice, the issues with latency and John Mashey's words still apply. All RAID-based mirroring methods that I am aware of have synchronous and asynchronous options that allow the administrator to determine which type of I/O will be used.
In addition to those options, you generally have tunable options for cache usage, such as:
- How much cache is to be used for the mirror?
- How long is the data kept in cache before it is written to disk, in case you have a write and then an immediate rewrite?
- How is the write to the other cache acknowledged?