Disaster Recovery Planning: Best Practices & Services

Enterprise Storage Forum content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

For businesses, having a disaster recovery plan in place is not optional – it’s critical. Indeed, the recent spike in natural disasters has many organizations thinking about their business continuity plans.

Events like hurricanes Harvey, Irma and Maria; tornadoes in the Midwest and South, fires and floods in California and storms all across the nation affected thousands of businesses, causing some to go without power and Internet connectivity for days, weeks or even months.

According to the National Oceanic and Atmospheric Administration (NOAA), 2017 was the costliest year ever for the United States when it comes to natural disasters. The country experienced 16 different events that resulted in more than a billion dollars in damage each, with a total price tag of $306.2 billion.

And it isn’t just natural disasters that affect businesses — plenty of man-made events cause business slowdowns or shutdowns as well. Ransomware, civil disturbances, mass shootings, terrorism, and more mundane events likely faulty components, accidentally deleted files, misconfigured hardware or mistakenly cut power cables can knock businesses offline.

In order to be ready for these inevitable situations, experts recommend that large enterprises and small businesses alike put together a disaster recovery/business continuity (DR/BC) plan. And because so much of today’s business takes place digitally, that means making a plan for how to get IT systems back online after an outage.

Content:

What Is Disaster Recovery?

Some people make the mistake of thinking that if they have backups, that’s enough. But true disaster recovery involves far more than just restoring files from backups.

In the case of a natural disaster, you’ll need a way to keep critical applications and services online during a power outage and/or loss of Internet connectivity. You’ll need a way for staff to communicate if normal phone lines, cell service and networks are down. And you’ll need a way to allow your knowledge employees to continue to work if their regular offices are damaged or destroyed. And while all this is taking place, you’ll need to make sure that you continue to meet your security and compliance obligations.

Depending on your industry, you might have other special needs. For example, healthcare facilities will need ways to get patients to safety. Educational institutions will need to provide a way for instructors to interact with students. Manufacturers may need access to alternate factory or warehouse locations, retailers might need different methods for getting goods to their stores, and so on. A complete disaster recovery plan will take all of these needs into account.

Best Practices for Disaster Recovery

  • Create a written plan. The biggest mistake companies can make when it comes to DR is not having a plan. If you don’t have a written plan, you’ll have to figure everything out in the middle of the emergency. And that practically guarantees that you’ll make mistakes, spend more money than you need to, and stay offline longer than you would like.
  • Follow the 3-2-1 rule. Experts often recommend following the 3-2-1 rule for backups: have three copies of data, use two different types of storage, and store at least one of those copies off-site. For example, enterprises could follow this rule by creating one local backup and one cloud-based backup. That gives them three copies of the data (primary, local backup and cloud backup), two different types of storage (local and cloud) and one copy off-site in the public cloud.
  • Test your plan. Disaster recovery plans are practically useless if they just sit in a file somewhere after they are written. In order to make sure that your plan will work, you need to test it under realistic conditions. That means creating conditions where you attempt to bring your systems online after a power and Internet outage. Obviously, you won’t want to interrupt your production applications, but you should simulate your environment as closely as possible.
  • Update your plan regularly. Your IT environment is changing all the time. You’re adding new applications, new hardware and new staff. That means your disaster recovery plan needs to evolve as well. It’s a good idea to schedule regular DR testing on a monthly, quarterly or annual basis and update your plan with what you learn during your tests.

For a comprehensive guide to Disaster Recovery Planning, see the article here.

Types of Disaster Recovery Solutions

In order to recover from a disaster, you’re going to need a failover site, a place where you can store your backups and run your production workloads in the event that your primary data center goes offline. Organizations have several different options when it comes to selecting a DR site, and each has its own strengths and weaknesses. In general, it involves finding a balance between cost and the amount of control that the organization has over the process. The right option for you will depend on the size of your company, the skills you have in-house, the complexity of your environment, your security and compliance needs, and a variety of other factors.

  • In-house Operating your own DR data center is generally the most expensive of the failover site options, but in some cases it makes sense for very large organizations with skilled staff. For example, sometimes global enterprises find themselves with extra data center space following a merger, acquisition or data center consolidation project. In some cases, it might be the most cost effective to repurpose this space for use as a disaster recovery site.

    The big benefit of this approach is that the organization has complete control over the backup and recovery process. But the biggest weakness is also that the organization has complete control over the backup and recovery process. Your internal staff may not have the specialized skills that DR vendors have, and that may be part of the reason why DR experts say that in-house DR is the most likely to fail in the event of an actual emergency.

  • Colocation A slightly less expensive option to managing your own DR site is using a colocation facility as your DR site. With traditional colocation, you’ll have access to space, power, cooling and network connectivity in a shared data center facility. The vendor will provide physical security for the space, but it will be up to you to purchase, deploy and configure the hardware and data recovery softwarethat will run in the facility.

    This option might reduce expenses and eliminate some of the burdens associated with managing your DR site in-house, but it still is going to require a lot of time, effort and skill — not to mention travel for staff to go to the physical location. It does keep most of the control in the customer’s hands however, which might be necessary for some organizations with strict compliance requirements.

  • Managed Colocation Also sometimes referred to as “hosting” or “managed hosting,” managed colocation unloads more of the burden for disaster recovery onto a managed services provider. In addition to the physical data center space and utilities, managed colocation providers also supply and deploy the IT infrastructure, as well as monitoring and maintenance software that will allow the customer to access the site remotely. Some vendors may also offer data recovery software, testing or disaster recovery services.

    This approach gives more of the burden for disaster preparedness to the vendor, but it also takes some control out of the hands of the customer. Prices and available services may vary widely, so organizations will need to do a total cost of ownership (TCO) or return on investment (ROI) analysis to determine if this is the most cost-effective option for them

  • Disaster Recovery as a Service (DRaaS) In recent years, several managed services providers (MSPs) and cloud computing vendors have begun offering DRaaS solutions. These solutions usually involve backing up and failing over to a cloud computing environment. This option puts nearly all of the control for handling backup and DR into the hands of the vendor. For small organizations that don’t have sizable IT staffs, DRaaS might be the only feasible and affordable option for disaster recovery.

    However, DRaaS might not meet all the compliance requirements faced by large organizations in certain industries. They generally also do not offer as much range for customization as the other DR site options.

Type of Disaster Recovery Solution Strengths Weaknesses
In-House ● Enterprise retains control of data, applications and processes
● Completely customizable
● Expensive
● Requires staff time and skills
● More likely to fail in a disaster situation
Colocation ● May be less expensive than owning your own data center
● Requires less time and expertise than owning your own data center
● Enterprise retains most control of data, applications and processes
● Requires some staff time and skills
● Staff must physically travel to colocation site to deploy hardware
Managed Colocation ● Vendor handles IT infrastructure deployment
● Remote infrastructure management
● May be more cost effective than other options
● Less customer control over physical infrastructure
● Less capability for customization
Disaster Recovery as a Service ● Vendor handles every aspect of disaster recovery
● May be more cost effective than other options
● Might not meet compliance requirements
● Fewer customization options
● Customer has little control over hardware and processes

Key Considerations for Selecting a Disaster Recovery Solution

Whether you setup your DR solution on your own or use a managed hosting or DRaaS vendor, you’ll need to make sure that it meets your needs and fits within your budget. The questions below can help guide you to the right disaster recovery solution for your situation:

  • What is your recovery point objective (RPO) and what is your recovery time objective? Your RPO determines how frequently your data needs to be backed up. For example, if your RPO is 24 hours, you only need to have data backed up every 24 hours. If your RPO is 10 minutes, that means your business could not lose more than 10 minutes’ worth of data.

    Your RTO is how long it takes to get your restored data and applications back up and running. For example, an RTO of five minutes means that in the event of an emergency you could failover to your DR systems and have everyone back to work again within five minutes.

    Many organizations have different RTO and RPO numbers for different applications. For example, you might have an RPO of six hours for your email systems, but an RTO of 10 seconds for your transaction-processing systems.

  • What are your compliance requirements? Depending on your industry and the geographic locations where you do business, regulations may require you to have a disaster recovery/business continuity plan, to backup your data after a set period of time or to use a failover site that meets certain criteria. Your DR plan may also need to meet certain privacy and security standards in order to meet your compliance needs.
  • What level of availability do you need in your failover site? Essentially, you need to decide what level of backups are available for your backup systems. The Uptime Institute classifies data centers into different tiers depending on their level of redundancy. Colocation and cloud vendors that advertise a Tier 4 data center meet the highest requirements (and have the highest prices), and those that offer the least availability meet only Tier 1 standards.

    Data Center Tier Redundancy Requirements Availability Downtime per year
    Tier 1 No redundancy 99.671 percent 28.8 hours
    Tier 2 Partially redundant power and cooling 99.741 percent 22 hours
    Tier 3 All components have at least one backup (N+1) 99.982 percent 1.6 hours
    Tier 4 All components have backups, and the data center will keep running even if all primary systems fail at once (2N+1) 99.995 percent 26.3 minutes
  • How far should your disaster recovery site be from your primary site? Having a failover site nearby means less latency, and therefore, faster performance in a recovery situation. However, if your failover site is too close, you run the risk that the DR site will be affected by the same disaster that impacts your primary site. To answer this question, you’ll need to consider your geography, the risk of natural or manmade disasters, and your performance needs.

  • Is the disaster recovery site adequately prepared to deal with a major disaster situation? If the DR site is in an area that might be affected by hurricanes, tornadoes, fires, floods or other events, you’ll need to make sure the vendor has taken adequate steps to handle those situations as they arise.
  • What kind of testing capabilities are available with the disaster recovery solution? As previously mentioned, it’s extremely important that you test your DR plan on a regular basis. Make sure than any vendors you use will support your testing needs and that you can include them in your SLA.
  • Does the disaster recovery solution offer appropriate security? No matter what type of DR site you are using, you’ll need to make sure the failover site has good physical security, including controlled entrances and exits and surveillance systems. You’ll also need to make sure that your failover site has the same types of IT security in place as the rest of your network, including firewalls, encryption, identity and access management, intrusion prevention, etc.
  • Will the disaster recovery solution be able to handle increasing data volumes? Because your systems are storing an ever-increasing amount of data, you’ll need to make sure that your DR solution will also scale — without busting your budget.
  • How much does the disaster recovery solution cost? Different vendors charge for their software and disaster recovery services in different ways, so you’ll need to do TCO and ROI evaluations to make sure you are comparing the different options fairly.

Disaster Recovery Services

The list of companies that provide disaster recovery solutions is incredibly long. Those below are just a sampling of some of the better known DR providers, as well as a brief overview of the type of product and services each offers:

Further Reading
Cynthia Harvey
Cynthia Harvey
Cynthia Harvey is a freelance writer and editor based in the Detroit area. She has been covering the technology industry for more than fifteen years.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends, and analysis.

Latest Articles

15 Software Defined Storage Best Practices

Software Defined Storage (SDS) enables the use of commodity storage hardware. Learn 15 best practices for SDS implementation.

What is Fibre Channel over Ethernet (FCoE)?

Fibre Channel Over Ethernet (FCoE) is the encapsulation and transmission of Fibre Channel (FC) frames over enhanced Ethernet networks, combining the advantages of Ethernet...

9 Types of Computer Memory Defined (With Use Cases)

Computer memory is a term for all of the types of data storage technology that a computer may use. Learn more about the X types of computer memory.