Most companies that embark on a Virtualizationstrategy fail to plan for enough storage, at least initially.
But there are also a number of technologies that organizations can use to make the most of the storage capacity they have, including storage virtualization, thin provisioning and data de-duplication(see Combating Virtual Machine Sprawl).
These measures all address the supply side of the equation — they essentially take whatever physical disk space is available and make it go further by ensuring that more of it is actually used, used efficiently, and that less is “stranded,” or reserved but not actually required.
But there’s another side to the equation, and that’s the demand side: the amount of storage space that a given virtualization strategy requires in the first place. If that can be reduced, then there’s an opportunity to make substantial savings in the amount of storage capacity that needs to be made available, and therefore the amount of money that needs to be committed.
But how?
There’s a lot that is possible, but unfortunately we are going to have to wait a while for much of it.
“We are not using smart software much to reduce storage requirements yet, although that’s in the cards,” said Roy Illsley, a senior research analyst at the Butler Group “The other thing to point out is that there is no standardization yet. So there is a VMware format, and there are Microsoft formats, for example, but they only work together to a limited extent. If you want to move a virtual machine from one platform to the other, you have to convert it. When things have moved on a bit, then disk utilization will go down because you’ll be able to get away without holding different file formats and pointers.”
In fact, Illsley believes that entire virtual machine platforms will become more space efficient as they evolve.
“Most standard operating systems these days are mature in terms of I/O, disk and memory — they handle these efficiently,” he said. “But virtualization has come in only recently and it is simply not that efficient. As interoperability and standardization comes in, and Intel and AMD do more [to provide virtualization hardware support] so the software will become thinner.”
Linked Clones
That’s all well and good, but what can organizations do now to reduce the demand side of the storage equation?
The answer probably lies in part in the software smarts that Illsley hints at — tricks to reduce the amount of virtual machine data that needs to be stored by using templates and recording deltas (changes) — techniques which are actually very similar to data de-duplicating and differential backups.
A simple example of that is the use by VMware (NYSE: VMW) of linked clones. These use the concept of a parent virtual machine from which linked clones are made. The linked clones share virtual disks with the parent clone, but are independent in that they can change the content of the virtual disk. Any changes that a linked clone make are not reflected in what the parent sees, and vice versa.
From the moment a linked clone is created, it exists only as a series of deltas from the original snapshot of the parent from which it derives, making it extremely space-efficient. Given the right usage scenario, it is possible to have a large number of linked clones all working independently, but all requiring a very small amount of extra storage space. The main disadvantage of linked clones, though, is that they can only continue to operate as long as they remain linked to the parent: delete the parent and the linked clones become inaccessible.
Virtual Templates
Another example is the template technology for virtual environments (VEs) used by Virginia-based Parallels’ Virtuozzo virtualization system. This essentially involves a Virtuozzo server with a template area that holds templates of operating systems and applications. Individual VEs contain an operating system and one or more applications, but these are actually templates with links back to the real OS or application files stored in the template area. Any application patches or updates also only need to be applied (and stored) once in the master template to be reflected in all the VEs on a server.
Virtuozzo has introduced a technology called EZ templates, which can reduce the storage requirements of the templates themselves. It does this by making use of online Linux repositories. Rather than containing a full set of packages, EZ templates contain metadata pointing to repositories and the names of packages that are required. If a particular Linux distribution is required in a VE, an EZ template enables the latest packages to be downloaded (and handles any dependencies) so that an up to date, fully patched instance is available.
Extending the usage of templates further is a technology offered by New York-based DataSynapse. It uses templates so that organizations can reduce the space needed to store virtual machines set up with customized versions of applications. The most direct way to do this is to store a static virtual machine image for each application, including the entire application stack of operating system, any middleware, and the application code itself.
DataSynapse’s technique involves “decomposing” the virtual machine into reusable templates and building blocks for the three components of the application stack. At run-time, the virtual machine can be assembled from the template components, which DataSynapse claims can result in a reduction in space needed to store virtual images by as much as 80 percent.
It is still very early days yet in the virtualization arena, and techniques such as the ones described above will probably be seen as very primitive as the technology set develops. But just as virtualization itself allows many different virtual servers to share a single physical host, it’s quite clear that in the future, many different virtual servers will increasingly also share the same physical data in many different ways. This may well prove to be one of the key ways of keeping the storage requirements of virtualization down to sensible levels.