New Orleans domain name and hosting provider Directnic.com adopted an interesting strategy during Hurricanes Katrina and Rita — dig in and grind it out. Situated in a high rise in the central business district, IT staff and its crisis manager battled through winds, floods, looting, army raids, power cuts, diesel shortages, water shut-offs and more to keep the data center running.
“We made sure that our critical infrastructure which supports 400,000 Internet clients around the world did not go down,” says Sig Solares, CEO of Intercosmos Media Group, the company that operates Directnic.com.
While the company made it despite facing hell and high water, it’s no surprise that management is now talking about implementing additional backup and disaster recovery options outside of New Orleans. But how far away should alternate sites be located to be safe?
Imagine a company from New Orleans with a backup data center in the vicinity of Houston or Galveston. Such a business would have been in severe jeopardy due to recent events.
“You have to be far enough away to be beyond the immediate threat you are planning for,” says Jim Grogan, vice president of consulting product development at SunGard Availability Services in Philadelphia. “At the same time, you have to be close enough for it to be practical to get to the remote facility rapidly, preferably by car.”
One Rule of Thumb
Take the case of a company where the biggest threat is tornadoes. Any recovery site would obviously be located outside of that weather pattern. Similarly in California, you don’t want your DR site located on the same fault zone.
“You have to be far enough apart to make sure that conditions in one place are not likely to be duplicated in the other,” says Mike Karp, an analyst with Enterprise Management Associates of Boulder, Colo. “A useful rule of thumb might be a minimum of about 50 km, the length of a MAN, though the other side of the continent might be necessary to play it safe.”
He tells of one major corporate IT room in a Midwestern city whose “remote” site is literally two city blocks away from the company’s primary location. While they have little to fear from hurricanes, floods or earthquakes, the entire IT operation and much of the shareholder value might evaporate in case of an unforeseen local disaster.
And that’s the whole point about disasters — you never know what you are going to get. Certainly with catastrophic situations such as double hurricanes or 14-state power outages, you have to be able to think pretty big to anticipate a threat of that magnitude, but who knows what to expect when the extremes of Mother Nature and outrageous fortune collide?
“These kind of watershed events change how DR planning is done,” says Grogan. “Most people think smaller than these types of events.”
A Post-9/11 Guideline
So how far away is far enough? An interagency white paper by the SEC, Federal Reserve and other agencies that came out after 9/11 suggested a 200-mile plus separation between the primary and secondary facilities.
However, industry response was that this wasn’t practical since the technology didn’t exist to support synchronous updates of large-scale transactional databases and other applications without a large performance hit. As a result, the final draft called for a more lenient “geographical dispersal.” That means don’t be in the same weather pattern or fault line or serviced by the same power grid and telecommunications and utility providers.
“It is vital to consider the risk events that the company is protecting itself against,” says Chip Nickolett, a DR consultant for Comprehensive Solutions of Brookfield, Wisc. “Take into account fault lines in some areas, and potential hurricane paths in others — i.e., how far a Category 5 hurricane could potentially come inland.”
Karp agrees. He recommends positioning a remote site in a geologically uninteresting place such as South Dakota or Montana rather than Silicon Valley.
Another important factor is ease of access. How easy, for example, would it be for the recovery teams to get to the site, and how feasible is it from that location to gain access to the necessary network and telecommunications infrastructure? A truly isolated site might look wonderful on paper. But if it takes 24 hours after a disaster is declared to staff it, and the telecom provider treats it as a low priority rural area for services, you might struggle to get systems online quickly enough to keep the business afloat.
If you decide to go with a hot site provider such as SunGard or IBM, the assigned recovery site is often determined by factors such as equipment requirements, capacity within the site, and the number of other customers from that region. As a result, there might not be much of a choice regarding how close or how far the recovery site is in these scenarios.
“In these cases, it is especially important that a company have definitions of what constitutes a disaster, and has clearly defined who has authority to declare a disaster and what that procedure is,” says Nickolett. “It may be very important to declare a disaster quickly in order to secure the necessary resources, as these companies may have to utilize a ‘first come, first served’ approach due to high demand.”
For Katrina and Rita, this did not prove problematic for SunGard. The company claims a 100% recovery rate for its affected customers, who made use of disaster recovery sites in Texas, Georgia, Philly and other areas. Grogan reports 24 disaster declarations for Rita, of which 14 have now returned to normal service delivery at the main office. For Katrina, 30 disasters were declared, and 17 continue to use SunGard hot site services.
“Because of the nature of the outages, these customers are not in a position to tell how long their stay with us may be,” says Grogan. “The customers that have returned home are mainly those located on the edge of the worst affected zones.”
The motto for disasters is to be prepared. Directnic.com, for example, has a Gulf War veteran crisis manager on hand who secured the building and took care of basic survival elements such as drinking water and hygiene despite having no city water available. Between those safeguards and IT staff who knew what they were doing, the business hardly skipped a beat. That sort of result was no accident.
“To be ready for disaster, you have to plan for it and thoroughly test the plan with a variety of drills,” says Grogan.
Just how many firms in the New Orleans area were unprepared, hadn’t drawn up a realistic plan and hadn’t drilled it to the point where staff could function? For some of them, it is already too late. For the rest of us, though, there is an opportunity to learn from the experience and take action now.
“As you look at the news of the hurricane aftermath, imagine what you would do if you were in charge of disaster preparedness for a site in the area affected by Katrina,” says Karp. “If you don’t have a list of disaster instructions, including a hierarchy of whom to call when things get rough, this is a really good time for some self-examination. Ask yourself how long can your company’s IT operations remain offline.”
For more storage features, visit Enterprise Storage Forum Special Reports