Disaster Recovery: IT Pros Handle Hurricane Sandy
Also see: The Cloud Backup IT Project Center
Read about disaster recovery during Hurricane Sandy, or skip to this article's concluding section, 10 Disaster Recovery Lessons from Hurricane Sandy
A storm as massive as Hurricane Sandy challenges pretty much everything: business disaster recovery plans; the preparedness of even top IT professionals; the integrity of seawalls; the head-in-the-sand, anti-science beliefs of global-warming deniers . . . and the list goes on and on.
In the last dozen years or so, we’ve seen a number of major disasters that highlight the vital importance of a workable, well-thought-out disaster recovery (DR) plan. Yet after each disaster, it seems that businesses must go back to the drawing board and relearn the lessons they claimed to have mastered the last time around.
If Hurricane Katrina or the Fukushima earthquake didn’t drive home the point that IT backup sites must be geographically separate from primary sites, nothing will. However, after Hurricane Sandy hit, plenty of businesses realized the cost of having backup facilities less than an hour’s drive away. This is a major mistake, they realized after the fact.
Doing a post-mortem on Sandy feels like a textbook case of déjà vu all over again, but once you get past the lessons we should have learned by now, new and more subtle ones emerge. Here are some of the disaster recovery (DR) and business continuity (BC) stories we’ve gathered in the aftermath of Hurricane Sandy, as well as the key takeaways businesses have drawn from those experiences.
Huffington Post: Scrambling Before the Election
For the Huffington Post, the 2012 election was a peak event. Hordes of readers turn to HuffPost for political analysis and opinions. The recent Election Day promised to be one of its highest points for traffic and page views.
And then along came Sandy, slamming into New York City eight days before Election Day.
Soon after the storm hit, HuffPost’s New York City-based data center near Battery Park flooded, bringing down the site. As HuffPost’s IT team worked frantically to switch over to their backup site in Newark, NJ, they had to cope with even more failures.
HuffPost was seemingly well protected. Between New York and Newark were three separate data circuits, for failover and redundancy – but all three went down.
“We keep learning the same lessons over and over again after these disasters, don’t we?” said John Pavley, CTO of the Huffington Post.
As the team attempted to get the site back online in a Washington, D.C. data center, Pavley and his team realized that data replication and the re-synching of data stores always takes longer than you’d originally planned for. “You look at specs of the machines, the network. You look at cables, and do a little testing, but inevitably you’re way off.”
What HuffPost thought would take a day took a week. “A lot of times we test business continuity plans under very favorable conditions. We test at night when traffic is low, or we test when the key IT people aren’t stressed out. Whether it's human or natural factors, these things add up and recovery takes significantly longer than we planned for,” he said.
Then when another storm (a Nor’easter) was bearing down on New York shortly after Sandy, and with the election looming, the HuffPost IT team worked day and night to get a full site up and running in their Washington, D.C. data center. They pulled off this Herculean feat, barely, and on the highly trafficked day of the election, HuffPost experienced no problems.
Yet when the IT team had time to catch its breath, they realized that they just barely dodged another bullet. Washington, D.C., after all, felt the effects of Sandy too, albeit to a much lesser extent. Of course, Pavley and his team can’t be blamed for thinking that Washington, D.C. is regionally distinct from New York. That may no longer be the case, but only because we’re entering a new era of extreme weather.
Another lesson that Pavley and his team learned is that our communications infrastructure is not nearly as resilient as it should be.
“We as an industry have become a part of Maslow’s hierarchy of needs,” Pavley said. That is, the Web is now essential. The weather Web sites and traffic Web sites users accessed (to see which evacuation routes were open) all proved to be critical lifelines.
“This [online media] is how people communicated into and out of devastated areas and learned about what was going on. This isn’t just entertainment anymore. This is how people live their lives. This is a responsibility, which before now, we didn’t realize we had. Media sites have a responsibility, and carriers do too, and we all have to figure out how to ensure that we’ll do a better job next time,” Pavley said.
Key points: Regional geographic variation is taking on a different meaning now, and backup sites must be moved farther away than we all thought; data recovery will take longer than you expect; critical communications infrastructure is not nearly as resilient as it should be.