Storage virtualization is the technology of abstracting physical data storage resources to make them appear as if they were a centralized resource. Virtualization masks the complexities of managing resources in memory, networks, servers and storage.
Storage virtualization runs on multiple storage devices, making them appear as if they were a single storage pool. Pooled storage devices can be from different vendors and networks. The storage virtualization engine identifies available storage capacity from multiple arrays and storage media, aggregates it, manages it and presents it to applications.
The virtualization software works by intercepting storage system I/O requests to servers. Instead of the CPU processing the request and returning data to storage, the engine maps physical requests to the virtual storage pool and accesses requested data from its physical location. Once the computer process is complete, the virtualization engine sends the I/O from the CPU to its physical address, and updates its virtual mapping.
The engine centralizes storage management into a browser-based console, which allows storage admins to effectively manage multi-vendor arrays as a single storage system.
The Anatomy of Storage Virtualization
Storage virtualization can occur in a variety of different scenarios.
Data Level: Block or File
Block-Based
Block-based storage virtualization is the most common type of storage virtualization. Block-based virtualization abstracts the storage system’s logical storage from its physical components. Physical components include memory blocks and storage media, while logical components include drive partitions.
The storage virtualization engine discovers all available blocks on multiple arrays and individual media, regardless of the storage system’s physical location, logical partitions, or manufacturer. The engine leaves data in its physical location and maps the address to the virtual storage pool. This enables the engine to present multi-vendor storage system capacity to servers, as if the storage were a single array.
File Level
File level virtualization works over NAS devices to pool and administrate separate NAS appliances. While managing a single NAS is not particularly difficult, managing multiple appliances is time-consuming and costly. NAS devices are physically and logically independent of each other, which requires individual management, optimization and provisioning. This increases complexity and requires that users know the physical pathname to access a file.
One of the most time-consuming operations with multiple NAS appliances is migrating data between them. As organizations outgrow legacy NAS devices, they often buy a new and larger one. This often requires migrating data from older appliances that are near their capacity thresholds. This in turn requires significant downtime to configure the new appliance, migrate data from the legacy device, and test the migrated data before going live. But downtime affects users and projects, and extended downtime for a data migration can financially impact the organization.
Virtualizing data at the file level masks the complexity of managing multiple NAS appliances, and enables administrators to pool storage resources instead of limiting them to specific applications or workgroups. Virtualizing NAS devices also makes downtime unnecessary during data migration. The virtualization engine maintains correct physical addresses and re-maps changed addresses to the virtual pool. A user can access a file from the old device and save to the new without ever knowing migration occurred.
Intelligence: Host, Network, Array
The virtualization engine may be located in different computing components. The three most common are host, network and array. Each serves a different storage virtualization use case.
Host-Based
Primary use case: Virtualizing storage for VM environments and online applications.
Some servers provide virtualization from the OS level. The OS virtualizes available storage to optimize capacity and automate tiered storage schedules.
More common host-based storage virtualization pools storage in virtual environments and presents the pool to a guest operating system. One common implementation is a dynamically expandable VM that acts as the storage pool. Since VMs expect to see hard drives, the virtualization engine presents underlying storage to the VM as a hard drive. In fact, what the “hard drive” is the logical storage pool created from disk- and array-based storage assets.
This virtualization approach is most common in cloud and hyper-converged storage. A single host or hyper-converged system pools available storage into virtualized drives, and present the drives to guest machines.
Network-Based
Primary use case: SAN storage virtualization
Network-based storage virtualization is the most common type for SAN owners, who use it to extend their investment by adding more storage. The storage virtualization intelligence runs from a server or switch, across Fibre Channel or iSCSI networks.
The network-based device abstracts storage I/O running across the storage network, and can replicate data across all connected storage devices. It also simplifies SAN management with a single management interface for all pooled storage.
Array-Based
Primary use case: Storage tiering.
Storage-based virtualization in arrays is not new. Some RAID levels are essentially virtualized as they abstract storage from multiple physical disks into a single logical array.
Today, array-based virtualization usually refers to a specialized storage controller that intercepts I/O requests from secondary storage controllers and automatically tiers data within connected storage systems. The appliance enables admins to assign media to different storage tiers, usually SSDs to high-performance tiers and HDDs into nearline or secondary tiers. Virtualization also allows admins to mix media in the same storage tier.
This virtualization approach is more limited than host- or network-based virtualization, since virtualization only occurs over connected controllers. The secondary controllers need the same amount of bandwidth as the virtualization storage controllers, which can affect performance.
However, if an enterprise has heavily invested in an advanced hybrid array, the array’s storage intelligence may outpace what storage virtualization can provide. In this case, array-based virtualization allows the enterprise to retain the array’s native capabilities and add virtualized tiering for better efficiency.
Band: In-Band, Out-of-Band
In-Band
In-band storage virtualization occurs when the virtualization engine operates between the host and storage. Both I/O requests and data pass through the virtualization layer, which allows the engine to provide advanced functionality like data caching, replication and data migration.
In-band takes up fewer host server resources, because it does not have to find and attach multiple storage devices. The server only sees the virtually pooled storage in its data path. However, the larger the pool grows, the higher the risk that it will impact data path throughput.
Out-of-Band
Out-of-band storage virtualization splits the path into control (metadata) and data paths. Only the control path runs through the virtualization appliance, which intercepts I/O requests from the host, looks up and maps metadata on physical memory locations, and issues an updated I/O request to storage. Data does not pass through the device, which makes caching impossible.
Out-of-band virtualization installs agents on individual servers to direct their storage I/O to the virtualization appliance. Although this adds somewhat to individual server loads, out-of-band virtualization does not bottleneck data like in-band can. Nevertheless, best practices are to avoid virtualization disruption by adding additional redundant out-of-band appliances.
Benefits of Storage Virtualization
- Enables dynamic storage utilization and virtual scalability of attached storage resources, both block and file.
- Avoids downtime during data migration. Virtualization operates in the background to maintain data’s logical address to preserves access.
- Centralizes a single dashboard to manage multi-vendor storage devices, which saves management overhead and money.
- Protects existing investments by expanding available storage available to a host or SAN.
- Can add storage intelligence like tiering, caching, replication and a centralized management interface in a multi-vendor environment.