An enterprise disaster recovery plan serves as a roadmap for protecting a business’s IT infrastructure and recovering it when full protection isn’t possible. More than just an offsite data center or a collection of data recovery tools, disaster recovery plans (DRPs) document the policies and procedures that an organization’s IT workers should know and follow.
When a business encounters a physical or digital disaster, IT personnel need a detailed plan that they can immediately implement. The fewer questions they need to ask and the more quickly they can take action, the sooner IT teams can bring systems back online or restore lost data.
Disasters can take many forms, and they don’t necessarily have to be an earthquake or a hurricane to be destructive to enterprise operations and reputations. A natural disaster like a tornado, a flood, or a wildfire certainly interrupts a company’s typical IT operation. But human error and security vulnerabilities do the same. Improper configurations, successful cyberattacks, out-of-date applications, and mishandled passwords are just a few examples of disasters that can cause servers and other IT equipment to fail.
When creating a disaster recovery plan, IT personnel must document its scope and objectives. While objectives may vary depending on the severity of a disaster and the priorities of the organization, the documentation should be extremely clear. Whether a DRP covers only a modest fleet of desktop systems or massive data storage archives, the steps described in the plan should help meet an organization’s data recovery objectives and other goals.
Disaster recovery plans
Disaster recovery and business continuity are related terms, sometimes used interchangeably. However, they have slightly different focuses. Disaster recovery is a subset of business continuity, focused specifically on protecting and recovering IT infrastructure. Business continuity is a broader category that includes critical operations of the entire business. These operations can include:
- Physical premises (buildings, lights, security systems)
- Human resources and employment changes
- Financial management
- Third-party vendors and suppliers
- Manufacturing locations, construction zones, and industrial equipment
A business continuity plan includes policies, procedures and contingencies that show business stakeholders exactly how to continue conducting business after a disaster or interruption. Business continuity planning accounts for more than a company’s data and IT systems, reaching other areas that are typically outside an IT department’s purview. These include office spaces, suppliers, employees, and equipment.
Because IT plays a vital role in enterprise operations, disaster recovery is a critical component of a business continuity plan.
Considering a BCM solution for your enterprise? Learn more about the best business continuity platforms in Best Business Continuity Software at eSecurity Planet.
Your disaster recovery plan will be heavily influenced by the IT systems and services on which your business relies. Consider some of the top DR methods when selecting a business-wide approach to disaster recovery.
Virtualization disaster recovery
One of the benefits of virtualization is that it can eliminate the need to recreate a physical server when something goes wrong. Virtual machines are faster to create and provision than physical servers as well, so they’re ideal for businesses that need to rapidly spin up new servers in data centers or on-premises infrastructure. Placing a virtual server on reserve capacity or on the cloud can improve response times for recovery, making it easier to meet your organization’s recovery time objective (RTO).
Take stock of the virtualization platforms used in your environment, along with the backup and recovery tools used by each, and draw up a plan to get virtual workloads up and running again. Popular enterprise virtualization solutions include VMware, Microsoft Hyper-V, Oracle VM, Citrix XenServer, and Nutanix AHV.
Learn more about DR and virtualization in Virtual Machine Data Loss: How to Prevent and Recover at CIO Insight.
Network disaster recovery
Network outages can lead to failures in business applications and services that depend on reliable network connectivity. A network disaster recovery plan often includes procedures for contacting the proper IT personnel, acquiring replacement networking equipment from vendors, and alerting customers and providing any alternate services for them.
Learn more about network disaster recovery at TechRepublic.
Cloud-based disaster recovery
One of the most compelling reasons to include the cloud in your disaster recovery planning is the ability to use a cloud provider’s data center as a recovery site without investing in additional facilities, systems, and personnel. It also grants users access to cutting-edge IT capabilities, a consequence of a competitive cloud market with key players like AWS, Microsoft Azure, and Google Cloud.
There are several factors to consider before making the jump to disaster recovery as a service (DRaaS), including bandwidth, cloud storage costs, security and regulatory compliance. Third-party cloud providers are useful for organizations that don’t have the time or financial resources to develop their own backup and recovery infrastructure. They can also be risky, especially since public cloud providers don’t always have the stringent security controls of on-premises environments. However, cloud DR solutions are popular choices for enterprises of all sizes because of their flexibility and cost-efficiency.
Learn about disaster recovery for hybrid cloud infrastructures at IT Business Edge.
Data center disaster recovery
A disaster recovery plan for data centers extends well beyond the IT systems housed in a computing facility. It includes the building itself, utility providers, backup power, and physical security. Fire protection and HVAC (heating, ventilation and air conditioning) are particularly important for data-center based DR because they protect data’s physical safety. Access security protects its intellectual safety as well as its physical safety.
Preparing for a data center outage, outright damage to a data center building, intruders and other risks is essential. It requires consistent input and effort from your company’s IT teams, facilities management personnel, and physical security experts.
Learn more about disaster recovery preparation in data centers at TechRepublic.
To create a disaster recovery strategy, consider every segment of your enterprise to ensure that you cover as many risks and solutions as possible. Involve every company stakeholder needed to make the plan successful, and notify them of every step in the development and strategizing process.
Complete a risk assessment
One first step is creating a risk assessment that details the likelihood of a disaster and the risk it poses to your organization. Rank each risk (minimal, likely, highly likely) and conduct a business impact analysis to determine how each risk will affect the business. This will help your team develop clear priorities. Although you may rarely face a hurricane or feel the earth tremor, there’s a good chance that you’ll experience a server failure or a cyberattack. Plan accordingly.
Learn more about risk assessment models at TechRepublic.
Collecting data and organize a plan
Gather the necessary information needed to create disaster recovery plans. This may include an inventory of your servers and storage systems, network diagrams, data center blueprints and floorplans, key personnel, emergency contact numbers, backup and recovery procedures and workflows, third-party services, support information and more, depending on your specific setup.
Then comes the most crucial part: documenting it.
Assuming you haven’t hired a disaster recovery consultant and are using in-house recovery personnel to write a disaster recovery plan, it may be helpful to search out a disaster recovery template from an authoritative source. Even if you deviate from the template, it will help you take a methodical approach to problem solving, write for an intended skill level, avoid glaring omissions, and provide an actionable guide of your own. Ensure that you clearly document each detail of the plan, with clear and orderly steps that employees can follow.
Test your disaster recovery plan
When your organization has developed a thorough strategy, it’s time for disaster recovery testing. Like fire drills, where employees file out of an office building as if there were an actual fire, disaster recovery testing simulates an IT mishap. However, pulling the plug on a critical server or cranking up the heat in a data center is generally ill-advised. But there are other ways to test whether your plan will work.
Disaster recovery testing can involve tabletop tests where recovery procedures are discussed and evaluated without physically taking the actions described in the document. Businesses can also conduct hands-on technical tests where participants are tasked with restoring a system, helping them gauge their preparedness.
Learn how to test a database backup and recovery plan at TechnologyAdvice.
Make routine updates
In some ways, a disaster recovery plan should be a living document. DR teams should routinely update it to account for changes to your infrastructure, technology updates, and any mergers or acquisitions.
Be sure to update your testing procedures after significant changes. The old ones may no longer be useful or accurate. For example, if you migrate your on-premises server data to a public cloud environment, your business may need to implement a new testing strategy based on cloud DR processes.
Keep stakeholders informed and equipped
Disaster recovery stakeholders should be apprised of changes within the plan and their specific responsibilities. Consistently update the people involved with DR planning, and don’t silo information. Identify employees that will be put in charge when a crisis erupts, and match skillsets to the affected systems and technologies. Remember to keep your employee information current — the best laid plans will fall apart if your workers are left scrambling to find someone who can help.
An RTO and RPO assessment
The RTO and recovery point objective (RPO) should be established by leadership to define the maximum downtime allowed and the maximum data loss that the business will tolerate. Determining acceptable data loss requires a company to be aware of all financial and regulatory ramifications. Any DR plan should accomplish the intended RTO and RPO.
A communication and notification plan
A DR notification plan includes who needs to be notified and the methods used by the company to alert them. Teams can’t assume email or collaboration software is the best method, since the email system could be impacted by the disaster. They should determine multiple avenues of communication in case the usual ones are unavailable.
A roles and responsibilities plan
A responsibilities plan defines who is responsible for each task during an outage. Tasks can include contacting affected customers, launching a recovery process in a data center, or monitoring a business network. To reduce data loss, a DRP should provide explicit instructions for stakeholders so they can take immediate action in an emergency.
A critical systems inventory
An inventory of all business-critical systems associates each information technology system that needs to be protected with an individual DRP or a specific facet of the overall DRP. IT systems and platforms often have different protection and recovery methods. For example, a cloud storage platform that stores data in a public provider’s remote data center will require different recovery steps than an on-premises legacy application. Special attention should be paid to legacy systems for which it may be difficult to find additional hardware on short notice.
Security and compliance
Cybersecurity and information compliance should always be prioritized in a disaster recovery plan. A company should not have a reduced security posture while operating in disaster recovery mode. This could allow hackers or rogue threat actors to steal data or insert ransomware viruses. Including security practices as routine tasks in a DRP is a good way to ensure they’re completed regularly.
A one-and-done approach to testing is another insufficient planning method. Personnel turnover and technology upgrades can make any BCP obsolete within months. Plans become stale or irrelevant over time and need to be refreshed. Regular testing reveals weak points due to personnel, technological, and environmental changes, including:
- A key DR stakeholder leaving the company, requiring all his tasks to be reassigned
- A new CRM solution for the entire revenue team, requiring a new approach to customer data protection
- A new company office on the coast of Florida, requiring hurricane DR planning that was previously unnecessary
Disaster recovery planning is essential for businesses that want to maintain successful operations and provide quality customer experience. Although a well-crafted DRP won’t prevent every problem, it will mitigate the effects of technology outages and natural disasters when properly deployed. Successful DRPs require regular, consistent communication with all disaster recovery stakeholders and thorough testing.