If you want to provide a secure, reliable storage architecture and design, you have to one study everything individually from your ability to meet recovery objectives to the quality of your staff and facilities. That’s the advice of Robert Burns, president and founder of EverGreen Data Continuity, Newbury, Massachusetts.
Each day, Burns and his team at EverGreen help organizations, such as Fidelity Investment, Progress Software, and Rogers Communications, begin peeling away and understanding each of the layers in their storage infrastructure. Rather than sell systems or plug everything in, EverGreen conducts formal assessments of an organization’s storage, everything from data centers to distributed systems. These assessments typically look at two things: well the storage environment can meet objectives for business continuity and disaster recovery, and what can the organization do to improve storage systems in these are
In the second part of our interview with Burns, he talks about options for disaster recovery preparedness, storage area networks (SANS), and the delicate art of recommending changes to a storage system.
Prior to doing the assessment, where are most of your customers with disaster recovery?
The large organizations have out of date disaster recovery plans. Many midsize organizations don’t have a plan. Organizations don’t test what plan they have. Financial organizations, on the other hand, are the exception. The Federal law mandates that these organizations test their plans once a year. However, most of this testing concerns mainframe systems. Most organizations are still trying to figure how best to handle disaster recovery for complex distributed system environments.
Are you doing anything to better prepare your customers for disaster recovery planning?
We have a major effort underway to provide customers with a knowledgebase product for a paperless disaster recovery process. At the front end, you’ll be able to anticipate an event that will impact the data flow. At the back end, you’ll be able to electronically capture asset information about the equipment you have. You’ll know how all of your systems are configured. So, if you had to recover from an event, you could rebuild your systems instantly based of the information we’ve collected. Another database will have all of your recovery procedures. If an event occurs and you haven’t been able to mitigate it, the CIO will be able to develop recovery tasks and teams dynamically by way of an automated workflow manager.
You do a lot of assessments for major companies. If a large company isn’t prepared for disaster recovery, then what preventative measures do you recommend for small to midsize companies?
Organizations with either no IT department or a small centralized one might want to consider outsourcing the management of their IT infrastructures to managed service providers, such as an Exodus. These operations have facilities with excellent disaster recovery capabilities – such as redundant power supplies, and redundant generators.
Another alternative would be to outsource the backups to a local service provider that specializes in disk-to-disk backups and disaster recovery. Before going this route, the organization needs to consider if a recovery can take place within an acceptable time. Does the organization have to wait for the service provider to ship a tape?
If a customer had a recovery goal of zero downtime, what would you recommend?
You probably should be mirroring your data to a local facility and an off-site facility. These should be high availability applications only – and both applications should be sitting on the each site. (A lot of companies overlook this.) You also need to have some sort of a heartbeat monitoring system tied between the two sites. If the second site has to kick in, the data has to be in the same format as the data in the primary system.
Apart from zero downtime, a lot of companies are considering mirroring to an offsite location, perhaps a third-party site? What are some of recommendations you make to customers about mirroring?
Since a lot of our customers have more than one site, we tend to recommend they mirror or replicate data to one of their sites rather than go with a third-party. That’s if the customer is equipped to do it.
We recommend double mirroring. The first mirror creates operational data to use if your disks go down. The mirroring occurs on-site. The second mirror takes place at an off-site location owned by the organization. This mirror gets used if a disaster occurs. The two facilities should be between 20 miles to 30 miles away so you don’t have a regional power problem. The second site should have a data center that can be used for processing. The pecking order for application processing priority has to be clearly defined. If the local site goes down, the computers at the other site can be cut over to handle the high priority applications. The low priority applications are either shutdown or reduced during the outage.
Your assessment enables you to make recommendations for how a customer can change its storage environment. Can you drill down and elaborate on some of the recommendations you might make?
In one assessment, we estimated a three-year compounded growth rate for each of the classes of servers they had. Then we looked at how to divide the dat We also gave them a cost savings estimate for centralized storage versus storage islands, where every system does its own thing.
We’ve done recommendations on the size of tape libraries they’d need. We try to shy away from recommending specific vendors’ products. There are several strategies for how to determine the size of libraries based on how much data they have. So we also look at performance and the set up of design configurations based on the network infrastructure that supports the bandwidth. The bandwidth issue determines how much data you can supply to the network.
After you’ve done an assessment, the customer then can decide whether it wants you to recommend a new storage architecture. Can you discuss a storage architecture change you proposed for a customer?
We had a customer with 20 servers located in different facilities throughout the world. This customer had a major data center and several smaller ones. Based on the customer’s expectations, we had to look at the best way to manage all of the backups from one central point, and thereby reduce the need for an IT person in these locations. We recommended new backup software that would coordinate and backup all of the remote backup systems. We also recommended a large central StorageTek tape library to replace a bunch of Breece Hill tape backup devices.
One often reads about IT departments struggling to find more efficient, cost-saving solutions. Can you site a case where a customer, “said thanks, but no thanks” to a change in the storage architect which could’ve saved the customer a lot of money?
Yes. Our assessment of one customer’s storage environment showed up to 65 percent of the storage hadn’t been used in more than six months. By moving this data off to secondary storage – whether it was done manually or automatically through hierarchical storage management – the customer could save $50,000 a month on its storage cost. Everyone in the customer’s IT department said no to the ide They wanted to maintain the status quo just in case someone needed the dat
We’ve also recommended job descriptions for storage administrators.
As a former manager of large data centers, you lived through both centralized and distributed control of storage. Where’s the best place to put storage?
For the past 12 years, I’ve been telling everyone this: Storage ought to be in the network. A SAN puts storage in the network. Shared storage in the network architecture provides the most direct route for getting at your dat Storage should be within the site that owns it. I’m not in favor of moving your primary storage into a co-located facility managed by someone else. I’d want control over my dat
Before deploying an enterprise SAN, what changes should an organization make to its storage infrastructure, especially in the data center?
Until you start consolidating your facilities, your SAN architecture will be all over the place. We recommend centralizing as much as possible to minimize bandwidth needs. Put as many applications as you can into those servers. Also, bring as many of those servers into one standard facility, rather than scattering them everywhere. Use the power of an Internet portal to access servers in a common site.
Although you folks don’t get into how a customer purchases storage, what recommendations would you suggest for the way an organization goes about it?
We like the idea of centralizing purchases, as long as it’s done intelligently. Ever wonder why a lot of data centers have one of everything? For example, applications engineers may recommend something that doesn’t fit with what the data center folks would use. Any type of a purchasing task force, however, has to have the knowledge and understanding of the problems they’ve trying collectively to solve. Otherwise, it’s a failure.
Back to Part I of this Interview – How to uncover the holes in your storage infrastructure