Making Sure Disaster Recovery Works
Utility companies are about as critical to the national infrastructure as it gets, so not surprisingly, they place a high premium on the reliability of their own IT systems.
With more than two million electricity and natural gas customers, Northeast Utilities is New England's biggest utility company. The organization relies heavily on IT to keep its operations running, and the company can't afford any doubts when it comes to disaster recovery.
Not surprisingly, the utility company's disaster recovery planning and testing is disciplined and rigorous. Yet, despite having a high degree of confidence in its DR plans, there was still a nagging concern that some unforeseen glitch could highjack the utility company's ability to recover in the event of a catastrophe.
Truth be told, no amount of testing systems in isolation or running tabletop disaster recovery simulation exercises could replicate manual disaster recovery testing across the organization on a regular basis. Even then, companies with the best-laid plans struggle to maintain moves, adds and changes across multiple complex IT infrastructures.
Ed Goldberg, business continuity and disaster recovery coordinator at Northeast Utilities, said information on DR testing isn't widely available. "No one really talks about it," he said.
So with nothing to lose and everything to gain, Goldberg had a chance meeting with Continuity Software at an industry trade event and agreed to let the vendor run a trial scan of its RecoverGuard product (see Startup Makes Disaster Recovery Work).
RecoverGuard, according to Continuity Software, is an agentless enterprise monitoring software solution that consistently scans the IT infrastructure, including storage, database, servers and replication configurations, and detects data protection risks.
"For a small fee to run a trial scan over a couple of days, the vendor guarantees they'll discover underutilized storage to cover the fee," said Goldberg.
Northeast's IT infrastructure includes mainframes, legacy systems, some 600 to 800 Windows and Unix servers some of which are virtualized Oracle, Microsoft SQL Server and Sybase databases, and modern control systems for the company's power grid, among other features. The data centers support and protect business services across the organization, including ERP, supply chain and customer relationship management, as well as commercial and internally developed utility-specific applications. Storage volume is about half a petabyte and doubling every 18 months.
RecoverGuard scanned some of Northeast's servers for several hours, at which point, Goldberg said, "We were told some things we weren't aware of."
Goldberg was sold.
"The value of the product is that it finds things we miss," he said.
RecoverGuard works by collecting data from key technology elements, building a detailed topology map of the disaster recovery environment, then monitoring the environment to detect vulnerabilities and threats, track changes and optimize the infrastructure by detecting suboptimal configurations and underutilized storage resources.
The software scans the company's open systems disaster recovery environment nightly to ensure that no inadvertent configuration changes occurred that would hurt its recoverability.
"For us, it's like having a third-party validating what we're doing," said Goldberg, who said the solution inspires confidence.
Ensuring that a disaster recovery site is always ready to resume all operations can be a daunting, if not impossible, task, according to Bob Laliberte, analyst at the Enterprise Strategy Group, making solutions like RecoverGuard a valuable tool.
Rather then buy a software license to run the product internally, Northeast buys RecoverGuard and DR Assurance as an annual service. Continuity Software's DR Assurance combines the capabilities of RecoverGuard software with the expertise of its DR experts, who notify the customer about issues and resolution.
The Continuity Software products currently scan 64 servers at Northeast Utilities, some of which are virtual. "All of our critical servers are covered," said Goldberg.
The service has no performance effect on Northeast's IT environment. "One day I'd love to scan our mainframe environment, but the product doesn't have that capability today," he said.