Storage infrastructures have started to resemble the layers of a perpetually growing onion. The good stuff – data – resides at the core. However, before you can get to it, you have to unwrap each layer. You have storage devices, systems that support these devices, networks that support systems, applications that sit on the systems, and business units that need the systems. Robert Burns, president and founder of EverGreen Data Continuity, Inc., Newbury, Massachusetts, says, “You can’t improve your storage infrastructure unless you understand every one of the layers that make up your infrastructure.”
Each day, Burns and his team help organizations, such as Fidelity Investment, Progress Software, and Rogers Communications, begin their discovery and education process. Rather than sell systems or plug anything in, EverGreen conducts formal assessments of an organization’s current storage infrastructure (including data centers to distributed systems) in terms of business continuity. In some cases, EverGreen also recommends re-architecture changes to ensure better data protection. During the past three years, Evergreen has done more than 100 assessments, which evaluate everything from backup and recovery practices to disaster recovery preparedness.
Burns has the background to spearhead these assessments. For about 25 years, he oversaw operations of all the data centers at Bell Labs in Massachusetts. In the early 1990s, he got support from Bell Labs to develop a commercial product for centrally backing up distributed servers across a network. He worked with organizations outside of Bell Labs to market the product. In 1998, Burns, along with a team of storage professionals, started EverGreen. Burns recently provided a glimpse at how he conducts these assessments, what holes they’ve uncovered, and how he’d solve certain problems. His comments follow:
Where do organizations fall short when it comes to doing their own assessments?
Most companies don’t have the tools to evaluate their storage environments. Most IT professionals don’t understand all of the issues. They just look at how much data they have and guess at what they’re going to need. People just don’t mange their data. For example, our assessment usual shows that less than half of the data on dedicated servers is active. Some customers have bought storage management tools and never used them. Management thought these tools would be a good idea, but the systems administrators thought otherwise.
Can you explain the methodology you use for an assessment?
To provide better data protection, you need to look at the storage problem. That’s what we do. We breakdown a customer’s storage infrastructure in to modules, such as storage management, and, within each module, rate their practices against our 356 best practices for storage management.
For example, the storage management module looks at backup and recovery practices, high availability, storage growth capability, storage management performance concerning the network, and disaster preparedness.
Say a customer had a recovery time of two hours. We might rate them from a 1 to 5 (strong) how they meet this recovery time as compared to our best practice.
We’ll also provide an explanation of the best practice. Take backup. We’ll describe our best practice for doing full backups. We’ll explain how they are doing with backups relative to, for example, a full backup, which would allow them to recover in the time objective they’ve defined.
What’s the real meat of an assessment?
It has to be the design architecture phase, which may or may not happen. Let me clarify. If a customer doesn’t agree with the findings in the current storage infrastructure part of our assessment or doesn’t have the budget for it, then we don’t want to recommend a new storage architecture.
If a customer opts for it, then we’ll dig deep into what the customer has. By looking at the enterprise storage environment, we can start to develop recovery criteria based on what they’ve defined by for their environment. The criteria deal with what’s needed for storage scalability. From here, we can develop the architecture. We really get into why SANs make sense. We’ve developed evaluation criteria for all the functions they’ll require in a new storage architecture, as well the way various classes of products stack up against each other.
Since your assessment of a customer’s storage infrastructure covers many areas, can you describe how you go about getting all of the information you need?
The trick includes putting together a hierarchy of sources. We begin by having a two-hour to a four-hour discussion with the CIO and the top managers. We learn what they expect from the business critical applications, such as recovery objectives.
Next, we like to speak with their internal customers, usually business unit leaders, to get an understanding of their needs. At Rogers Communications, we spoke with 20 of the largest internal customers.
Moving along, we speak with application managers, followed by individuals running the data centers. We break the data center group into platforms, such as Unix and Windows NT. We next talk to the system architects responsible for the design of the storage. Of course, we also include the operations personnel who run the help desk.
Do you find any inconsistencies among the way each group perceives the storage environment? Let me add, do CIO’s really know their storage environment?
Plenty of inconsistencies! That’s way we interview these groups separately. Each group seems to have its own perceptions of the way things are. The biggest inconsistency occurs between what top management thinks is happening versus what the architects say the systems will do, and what the operations personnel have to cope with everyday.
Most of the time, CIO’s don’t know a lot about what works and what doesn’t work in their storage environment. These executives, however, prefer to focus on the storage cost implications to the overall organization.
You mentioned that your customers really don’t have a good handle on how their data is being used. Given this scenario, then how do you get them to give you this information?
You’re right on. The hardest thing to find out is what data they have, and how much they have. For example, “How much data do you backup weekly?” may seem like an easy question to answer. It’s not. In fact, we tend to flag this area as the most serious issue we have.
We give them a table with 13 columns. We ask for a server’s RAID levels, operating systems, backup frequency, and number of users, as well as other information. We also want to know the types of backups and when they occur.
One difficulty is that not all systems are backed up the same way. Another problem is that it’s hard to get this information from reporting part of backup software.
We can spend several weeks prying this information from the customer. Most of the time, they don’t have the information they say they do.
When it comes to storage policies and procedures, such as backup and recovery, where are your customers usually the weakest?
You said it. Backup and recovery! Most companies have adequate backup procedures. They don’t have a good handle on how to handle failures that occur in their backup process. In some cases, the person doing the backups doesn’t have a good methodology for how to recover the data if something goes wrong.
Another problem is that some organizations have backup schedules that don’t complement the recovery objectives. For example, most organizations do nightly backups. But if you have a recovery time of two hours and you’ve a failure, you can loose 23 hours worth of data because you’re not backing up every two hours. That’s why in the assessment we take a good look at how people are doing things.