It seems as if every storage vendor is either bringing virtualization products to market, or talking about their plans to do so. With all the talk about this new technology, the question naturally arises: Is virtualization useful to you in your environment, and if so, how should you use it?
You have many options, from hardware to software to combinations of both. Options for mirroring and all types of storage virtualization, including making anywhere from two to 10,000 LUNs look like a single LUN, and you have file system virtualization, which is really a way of virtualizing the underlying storage, in my opinion. We’ll start with a definition of virtualization and then discuss some of your options.
I generally like to use the SNIA definitions as a starting point. The definitions listed on the SNIA Web site represent a common agreement among industry participants. This is not to say that all vendors use this definition as part of their product marketing, but you know what they say about marketing materials.
virtualization: The act of integrating one or more (back end) services or functions with additional (front end) functionality for the purpose of providing useful abstractions. Typically virtualization hides some of the back end complexity, or adds or integrates new functionality with existing back end services. Examples of virtualization are the aggregation of multiple instances of a service into one virtualized service, or to add security to an otherwise insecure service. Virtualization can be nested or applied to multiple layers of a system.
From the SNIA definition, virtualization seems to encompass hardware, software and everything in between. That raises a few questions for anyone considering virtualization:
- Should I buy virtualization hardware?
- Should I buy virtualization software?
- Should I buy a combined hardware and software product?
- Should I buy anything?
You should start with the last question, because you need to decide what your business needs are that require virtualization. Just like you have business requirements for performance and disaster recovery, you need a business reason to buy these products — or for that matter, any product at all.
Some Virtualization Examples
With the myriad of products, both hardware and software, what are the best options? You have products that virtualize storage at the server, at the switch within the storage, and all manner of combinations, and more products are being announced every month. Almost every article I have seen on virtualization touts the advantages and simplification of the environment. For me, the real question is what is the ROI?
Let’s take a look at a simple mirroring example that allows you to virtualize the remote mirror. I am aware of four potential methods for doing this:
- Software mirror at the volume manager layer;
- Software mirror at the device driver target layer;
- Hardware mirror at a switch, and
- Hardware mirror at the storage device.
Most vendors promote virtualization with claims such as, “You do not have to know where you data is,” and one of my favorites, “You never have to worry about your data again.” That’s never going to be the case, but virtualization does have its place. Even though you can solve problems without virtualization, it can help for many environments by allowing you to reduce the amount of hardware you have to manage or to solve a replication problem. And that’s the chief benefit of virtualization — saving time and money.
Another potential virtualization option are products that change the RAID protection level from say RAID-1 to RAID-5 based on usage patterns of the blocks. These products can reduce the amount of storage space you need by changing RAID levels based on usage.
However, automation can come at a price. If you are going to relinquish control of a function to a product, the solution that that the product comes up with may not be as efficient as if you had done it yourself. This is not to say that the performance is going to be abysmal, but it more than likely will not be optimal. This is less likely for functions done in hardware, such as mirrors and replication, but even there I have seen it happen. Consider your vendors carefully.
Take the case of a RAID vendor that virtualizes LUNs and RAID levels. Let’s say you are in an environment that uses one file system for the database, including indexes, redo logs, table space and possibly the database software itself. In this case, I have seen storage systems correctly migrate heavily used data to RAID-1 and lightly used data to RAID-5, improving performance and storage utilization for the system. I see these types of products often in small office environments.
But what if you need high-performance access all the time? Having a device that looks at block usage patterns might move something to RAID-5 that is important every other week (say payroll). If you were designing the database and needed high performance for certain files at certain times, surely it would be on the correct device. The hardware can only see the statistical usage, but you know the real usage. Remember my mantra — file systems, SCSI and block devices do not communicate well. In almost every case, I can get better performance out of a system by not using virtualization hardware and software. It may not be easier or cheaper for me to do it myself, however, and that is the big tradeoff of virtualization.
Why Virtualization?
In my opinion, virtualization is becoming a hot topic because most of us cannot juggle 20 or more LUNs, HBAs, failover, countless servers, replication, and all the other things that current storage environments require.
This problem will continue for the foreseeable future due to the dated nature of block devices and the SCSI protocol (please see A Storage Framework for the Future and Let’s Bid Adieu to Block Devices and SCSI). If it were up to me, in about two to three years, virtualization as we know it would be a thing of the past, since it is based on the data path issues we have today that start with block-based file systems.
From what I have seen, virtualization technology is generally purchased out of a desire to save money. That money could be saved by having fewer people handle the same work, or by larger, more complex environments managed by the same number of people. The question that every organization needs to ask themselves is where can they get the most bang for their buck.
With each type of virtualization, you are trying to solve a problem. The key to getting the correct type of virtualization from the right vendor is to ask the question, “What problem am I trying to solve?” For example, if you are looking at virtualization hardware and software to replicate data remotely, the problem you are trying to solve is probably not remote replication, it is more likely that disaster recovery plans require you to have a complete copy of the data at a remote location. Understanding what you are trying to accomplish will help you find the virtualization products that will provide the greatest ROI for your requirements. Sometimes I see people buying virtualization products to solve an architectural deficiency that should be solved differently. They add a virtualization product to the mix instead of re-architecting and solving the problem correctly.
Conclusions
I think virtualization has a place in the storage hierarchy if you understand why you are using it. Here are a couple of general issues to think about:
- Using virtualization products as a band-aid for a poor architectural design might improve the current system, but it is not the ideal solution.
- If you don’t have the staff, then products that allow to you to accomplish your mission without adding headcount are a good thing. You may find yourself trading staffing costs for lower performance, but in most cases, using hardware and software to solve a problem is cheaper than hiring experts to fine-tune, maintain and upgrade a system.
Another issue is remote replication. While this has become a pretty simple high-performance function, you still have products that live in three different parts of the data path — replication from the host, from the switch and on the RAID device itself. All of these approaches have advantages and disadvantages, and the key to using replication virtualization is to understand how the products fit within your architecture and whether they meet your requirements. For example, do you require synchronous replication, or is asynchronous sufficient?