Storage Planning for Virtual Infrastructures -- More Than Just Managing Latency
Storage planning for virtualized systems requires some careful thinking about the issues that will impact users. One major tradeoff of virtualizing systems is that you are trading local bandwidth for network bandwidth. In a nonvirtualzaed environment, each user has local storage for at least booting his system and very often for his data as well. On a virtualized system, the user boots up over the network and accesses applications and data over the network. Here, network performance and, more importantly, network latency is critical. We all know that the seek+latency on a standard enterprise SATA drive is around 4.16 ms for seek and between 8.5ms (read) and 9.5ms (write) depending on the drive manufacturer. This is not going to change, so latency must be considered.
One key issue is the latency on your network and to the storage that the virtualized user will be accessing as compared to the local storage. However, that is far from the only issue. These four issues are paramount and must be considered as well:
- Storage contention
- Storage bandwidth
- File system data and metadata fragmentation
- File system free space
1. Storage Contention
If you are the only user on the system, you have full, un-contested access to the disk drive. You might be running applications in the background, but most likely you and the application you are running in the foreground are consuming most of the disk drive. You have 8 MB, maybe 16MB or even 64MB, of disk cache, and the disk you are using can likely support 100 random IOPS. Not bad.
In contrast, if you are on a virtualized system, you are very likely contending with many other users. Your application does not have a dedicated disk drive, and the total number of IOPS for all the users running MIGHT be less than the 100 you had. Of course, in most virtualized systems you are running on storage controllers often with large caches, which can make a big difference. Contention can be mitigated, but it is difficult to fully determine the contention without a good understanding of the cache usage.
2. Storage Bandwidth
The bandwidth to storage is also something to consider when planning for virtualization. The average disk performance for a Seagate Enterprise 2.5 inch 10K drive is about 130 MB/sec, and a Seagate 3 TB 3.5 inch enterprise SATA/SAS drive is about 112 MB/sec. For each local disk you are getting more than 100 MB/sec if you are streaming data. Of course, file systems do not necessarily allocate data and sequentially stream it, and very few applications that run locally must sustain more than 100 MB/sec, but I think it is important to understand what users need as well as the total required bandwidth across systems to be virtualized. Having a RAID controller with, for example, 5 GB/sec of bandwidth is not going to work if 200 users need 200 different datasets simultaneously and want 30 MB/sec (200*30=6000 MB/sec and the controller does 5120 MB/sec). Of course, this is not seriously oversubscribed, but if everyone is trying to stream HD video you are going to get a large number of complaints. It is pretty easy to oversubscribe the bandwidth of a storage system in a virtual environment.
3. File System Data and Metadata Fragmentation
I am not a big fan of the Windows NTFS file system given the performance I have seen, and given the data and metadata allocation algorithms and the lack of high-speed streaming I/O bandwidth support. It is very likely that local disk performance is going to be nowhere near the theoretical performance of the drive, given that the data will not likely be allocated sequentially. More than likely, the local disk is not going to have significant contention from many applications accessing the same drive. That will likely not be the case in a large RAID environment where data is allocated across the storage by the volume manger, file system and the RAID controller. More than likely, the data is not sequentially allocated, and there is far more contention at the drive level. That means the number of seeks and the latency will likely be much higher than using the local drive. That does not even include the network latency, which should also be considered.
4. File System Free Space
Finding free space for new files in a virtual environment is often more CPU-intensive and takes longer than on a local file system. On a local file system, you can just allocate the space as you own the whole drive. That is not the case on a virtual system. You often must traverse many software layers over a network to get new space. Clearly, this is not a problem if the user is using the space that has been allocated to her, but it is a problem if users are regularly exceeding their allocations. This is not a major problem, but certainly something to consider.