Planning for Disaster Recovery - Page 2


Want the latest storage insights?

Download the authoritative guide: Enterprise Data Storage 2018: Optimizing Your Storage Infrastructure

Share it on Twitter  
Share it on Facebook  
Share it on Google+
Share it on Linked in  
  • The company's centralized storage is damaged due to some sort of accident (e.g. car hits the data center, tornado, disgruntled employee)
    • Scope: Potentially a large number of people
    • Time scale: Hours, Days, Weeks, Months (varies)
    • Impact: Minimal ($0) to very large ($$$,$$$)
    • Cost: Small ($) to very large ($$$,$$$) or even larger due to loss of revenue during down time. The amount varies.
  • The company centralized storage is irreparably damaged (i.e. has to be replaced)
    • Scope: Potentially a large number of people
    • Time scale: Weeks to Months or longer depending upon how long it takes to get new storage (assumes the data center is still functioning)
    • Impact: Very large impact to a catastrophic impact (loss of all data)
    • Cost: Very large ($$$,$$$) to huge ($$$,$$$,$$$)
  • The data center experiences a power loss for an extended period of time (hours or days)
    • Scope: Potentially a large number of people including the entire company
    • Time scale: Hours to days
    • Impact: Large to very large impact with possible loss of data in flight
    • Cost: Large ($$,$$$) to Huge ($$$,$$$,$$$)
  • The data center was damaged beyond repair (e.g. being blown up or hit by a meteor)
    • Scope: Potentially a large number of people including the entire company
    • Time scale: Months to years (varies). Have to rebuild data center and buy new hardware.
    • Impact: Very large to catastrophic
    • Cost: Very large ($$$,$$$) to huge ($$$,$$$,$$$)
  • The state where the data center was located by hit by a meteor or lost power
    • Scope: Potentially a large number of people including the entire company
    • Time scale: Months to years (varies). Have to build a new data center in a new state and buy new hardware.
    • Impact: Extremely large to catastrophic
    • Cost: Massive ($,$$$,$$$) to huge ($$$,$$$,$$$)
  • The country where the data center is located is hit by a meteor or otherwise destroyed
    • Scope: A very large number of people (massive scale)
    • Time scale: Extremely long to infinite (can't recover data)
    • Impact: Massive to Catastrophic
    • Cost: Company could die (other companies as well)
  • The planet gets blown up
    • Scope: Everyone
    • Time scale: Infinity (game over)
    • Impact: Catastrophic
    • Cost: Game over

I think I'm going to stop with the planet getting blown up because I'm not sure people have thought about what happens when all the people are gone since there is no reason to recover the data (unless the Martians want it for some reason).

Given this scale of disasters starting from something slight and annoying, such as the cleaning crew pulling the plug on your desktop, to the ultimate disaster of the planet being destroyed, you should decide at which point on the scale you need to create a disaster recovery plan. For example, you might want to start a DR plan based on the centralized storage being damaged beyond repair. This means that the data is not accessible and is either not recoverable from the surviving hardware or will be extremely expensive and time consuming to recover. This type of failure can cause massive business interruption and could cost a company a huge amount of money. Having a plan where the a copy of the company's data is maintained at an alternative site and your servers can access the data with a minimum of interruption is not an unreasonable starting point.

You can pick a different starting point for your DR plan based on your needs. Instead of picking the loss of your centralized storage due to an accident and the data is not recoverable, you may want to pick something a little further "up" the list. For example, you might want to consider starting with losing access to the centralized data. This could be very important for business continuity.

Either starting point is fine. The point is that you need to pick a starting point and develop your DR plan to match it. But what many people fail to think about is at what point do you just give up hopes for data recovery? For example, do you want to make sure you can recover from a meteor hitting your primary data center? Or do you want to be able to recover from a power loss in the primary data center?

Picking the point at which the loss of data is beyond the resources of the company to recover creates an "upper bound" on the required resources (people and money). Without this step, DR planning becomes a black hole into which you pour money. If you don't plan on an upper limit, then you could end up thinking about putting backup data centers in other countries or even putting them on Mars (watch the radiation and the dust). This can become expensive very quickly. Companies do this but you need to carefully evaluate whether you can afford it and what implications it has for operations. I don't want to be overly dramatic in my examples but I do want to make the point that you need to think about the point at which the company can't survive because of the data loss.

How quickly do you want to recover?

Let's assume you have an idea of how much data you need to protect and how quickly it grows and you know the level of disaster you want to withstand. A logical next step in planning for DR is to ask the question, "How quickly do I want to recover the data?" The answer to this question will allow you to start estimating the amount of hardware you need for your DR solution and/or what kind of DR solution you need. Let's try a simple scenario to illustrate how this might impact your solution.

Let's assume I'm a small regional company with about 2,000 employees. I have a data center that stores about 200TB of centralized data of which I need to protect 100TB. For business continuity let's assume I can't afford to lose access to the data for more than 2 minutes. This last statement sets a boundary on the data recovery speed.

Submit a Comment


People are discussing this article with 0 comment(s)