With the price per gigabyte of storage coming down rapidly, that line item is no longer the overriding consideration for most storage budgets. While that is some relief for storage users, in other ways it creates a new problem: how long should you wait for storage to get faster and cheaper before you buy?
Add to that the complexity of upgrading to new technologies — 2Gbps vs. 4Gbps Fibre Channel, for example, or SASvs. SATA, SCSIor Fibre Channel — and you’re confronted with an array of planning and budgeting issues when it comes time to upgrade or replace your storage architecture.
Budgeting for storage is not just about buying more density or the latest cool stuff; it is about determining your needs based on available technology, and making sure those requirements are met. We’ll explore what I believe are some of the most important issues to consider when budgeting for storage.
The important issues to consider when budgeting for storage are:
- How will a new technology integrate into the current environment?
- Will this technology meet user requirements for performance and reliability?
- How does this new technology affect O&M (operation and maintenance) costs?
Integration
Integration of technology into the current environment is a large problem for several reasons. Let’s take a real-world example from a large site I am working with. They have servers from one vendor and storage from another. The storage vendor can provide a new storage infrastructure that will support 4Gb Fibre Channel RAID controllers, 4Gb Fibre Channel switches, and other storage components. That all sounds great, but the can the server side support the 4Gb architecture?
This is a big question that should be asked of every hardware vendor. A standard PCI bus running at full rate supports 536 MB/sec, but many PCI buses do not support this full rate, and even though the situation is better, the same is also true for a PCI-X bus running at approximately 1.1 GB/sec (twice the PCI rate). A two-port 2 Gb HBAscan require up to 800 MB/sec (200 MB/sec for each port reading and 200 MB/sec for each port writing). Therefore, a standard PCI bus cannot support two-port HBAs running at 2 Gb, which would be the same as one port at 4 Gb.
From a failover point of view, having two ports with 2 Gb provides greater redundancy if an HBA port fails, which is more common than both ports failing. This assumes that you have an HBA failure and not a PCI bus failure. In the case of PCI-X, a two-port 4 Gb HBA far exceeds the PCI-X bus bandwidth, (1.1 GB/sec for PCI-X, and two ports of a 4 Gb HBA require 1.6 GB/sec for full rate), so performance is far closer to that of two ports of a 2 Gb HBA.
All of these performance numbers assume that the I/O being done is streaming I/O. If it isn’t, then why even consider 4 Gb HBAs and infrastructure in the first place? Yes, you can get improved IOPS performance with 4 Gb HBAs from a larger command queue, but the performance improvement is not that great and is often very workload-dependent. Ranges I have seen are from 0%-20%, but your mileage may very. This improved performance is surely not a justification to run out and buy a 4 Gb infrastructure.
The bottom line is that any site considering 4 Gb technology must make sure that the servers can support this new performance level. More often than not, large servers lag in bus technology, given the large lead time it takes to design the complex memory interconnects to the bus and the availability of new bus technology. You can buy PCI-Express bus technology from Dell on one, two and four CPU systems, but try to find that on large (greater than 16) multi-CPU servers today.
User Requirements
User requirements should be a major driver of technology upgrades. Many organizations do not have a good handle on what the user application profiles look like, what the growth requirements are, and worst of all, whether the system is configured and tuned for those application profiles. This lack of understanding of the environment can lead to poor decisions on what hardware and software is needed.
One system I recently reviewed did not have an emulation or characterization of their workload. This is especially important for large sites. Without this information, how could this large site test patches for performance degradation (yes, it happens all too often), test new technology to measure performance improvements, or test increases in workloads to see if the system can handle them?
User applications and requirements should be a large component in any decision to upgrade technology. If you do not know what users are doing with the system, how do you know what they need today, let alone plan for the future? This situation often turns into a fire drill when the system is overloaded, and management starts throwing money at the problem instead of executing a master plan for technology infrastructure upgrades.
O&M Considerations
From what I have seen in my 25 years in the business, technology maintenance costs almost always follow the same pattern:
- The cost of O&M for new technology is high for early adopters.
- Over the next 6 to 18 months, the cost drops as the technology is more widely adopted.
- The cost continues to drop, and drops sharply when a technology replacement is released, until…
- The cost skyrockets as the vendor tries to phase out the technology. This value is far greater than the original cost of maintenance, and sometimes I have seen it go as high as five times greater, since the vendor no longer wants to support the technology because of its cost and wants you to upgrade.
This is the general lifecycle for O&M costs. It makes sense given vendor costs, and unless technology trends change, the pattern is likely to continue.
One other area that should be considered is the personnel cost to the organization of supporting old hardware and software. You’re not likely to find a new hire who knows how to work on Fibre Channel arbitrated loop HBAs, RAIDs and switches, and finding training course for that hardware isn’t an easy task either. Just recall the frantic search for mainframe COBOL programmers for Y2K — a clear example of personnel operations costs becoming unreasonable.
Conclusions
We have not talked much about budgeting for storage, but the issues we addressed are the ones that drive the high cost of storage changes. Most sites know what their physical storage growth will be, or at least what the budget will allow them for physical storage growth. The major cost items are not adding a few trays of disks with 146 GB drives or swapping out 36 GB drives for 300 GB drives; the major cost drivers are the infrastructure. The real question is how do you determine what you need, how much it is going to cost, and how to fit it into your current environment.
One pitfall I have seen is that sites think they can just jump into new technology without fully understanding the whole data path (the path from the application to the operating system to the HBA/NIC to the storage devices). Plugging 4 Gb HBAs in current servers into a 2 Gb storage infrastructure does not generally improve performance unless you are aggregating the performance of multiple RAID controllers and multiple hosts. The science (some call this an art, but it is really based on scientific analysis and study of the data path) of determining what users need and when they will need it is the process of budgeting for storage.
You need a full understanding of:
- Your current environment, including the performance level that environment can support today and the performance level that environment can support given technology trends;
- User requirements for performance and growth, including the current workload and the trend line for growth (performance mapped to expected new technology); and
- Your current and future O&M costs. Don’t wait until your maintenance contract ends to find out that the cost has sky rocketed — technology maintenance costs follow a pattern.
Budgeting for storage is considered by many to be a complex problem, but from what I have seen, it is not very complex if the lines of communication between the affected groups are open and free-flowing. The key is to have the data — seeing the future does not require a crystal ball, just some understanding of what you have and what you use, mixed in with a bit of history.
Henry Newman, a regular Enterprise Storage Forum contributor, is an industry consultant with 25 years experience in high-performance computing and storage.
See more articles by Henry Newman.