Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure
If my primary data is inaccessible I have two minutes to recover the data. That speed is pretty fast. I see two options for a DR solution: (1) copy the data from the secondary data center to the primary data center (if I can), or (2) switch over my business applications to use the data in a secondary data center. In the first case, I will need to copy 100TB in 120 seconds resulting in a sustained throughput of 833 GB/s. Getting that much throughput between data centers is going to be really, really costly if it's even possible. For the second case, I just need to have my customer-facing servers and my important business servers start using the storage in the secondary data center. This shouldn't be too difficult to do, but I need to have some sort of system in place to detect the failure of the primary data center and then automatically fail the servers over to the secondary storage. This isn't something I would call "cheap" but it's definitely something that most companies can easily implement.
By deciding how quickly I want to recover from my "disaster" I have started to shape the DR solution and costs. But don't forget to define the upper bound of your DR solution. In my contrived example, let's assume my upper bound is the complete loss of my data center due to some disaster but my secondary data center is located 500 miles away. If the disaster extends beyond 500 miles from my data center, I'm going to assume that the company is completely destroyed (apologies for be so morbid). But it is important to set an upper limit on the disaster from which I want to recover.
I've now set some boundaries on a DR solution by defining how much data is important, how much data I need to be covered, and how fast I want to recover. But before fleshing out your DR solution, there is one more important issues to address.
Async versus synch
One obvious DR solution is to simply make a copy of your important data to a secondary storage system, possibly in a secondary data center. The question we now face is how often or how quickly do we need to copy the data? Does the secondary copy of the data have to match the primary data 100% percent? Or can the secondary data lag behind the primary data by some amount of time? Answering this question can have a big impact on the cost and performance of your storage system.
The first case, where the data in the secondary storage is identical to the primary data, is synchronous replication. Synchronous replication can have an impact on performance because both the primary and the secondary storage both need to acknowledge that the data has been written before they return to the kernel. This can slow things down a bit, but in return the data is guaranteed to be acknowledged by both storage systems.
The second case is called asynchronous and involves a time delay between the primary and secondary storage. The primary storage always has the latest version of the data while the version of the data on the secondary storage is behind the primary by some amount of time. This amount of times can range from something very small, perhaps milliseconds, to something very large, perhaps days or even larger. The size of the delay is up to you. Can you tolerate a difference between the data on the primary storage and the secondary?
Remember that you don't have to have synchronous replication for all types of data. You can have synchronous replication for some data and asynchronous for other data.
One nice result of asynchronous replication is that, generally, it is faster than synchronous replication and it is cheaper. But these are general rules of thumb and specific cases can differ.
The phrase "disaster recovery" means exactly what it says - recovery from a disaster. What happens if you lose access to your primary storage or if you lose the data in your primary storage? Being able to recover from this scenario and keep your business functioning is a beautiful thing. But you just can't go out and buy a "DR solution." You actually have to think and plan about what data you want to protect (hint: the answer is not "everything").
You have to plan for a range of disasters from which you want to recover. Having the office cleaning crew accidentally pull the plug on your laptop is not a good starting point for disaster recovery regardless of the number of cat pictures on the drive. By the same token surviving the end of the world is also probably not a good disaster from which to recover since no one will be around to use the recovered data.
As you plan for DR, just remember that the broader the range of disasters from which you want to recover and the more data you want to protect, the more expensive the solution is likely to be. Notice that I haven't talked about specific technologies for DR so the actual expenses can vary, but the general trend is always, "more money for more DR."
Photo courtesy of Shutterstock.